← Google

Começando a Desmontar o Google....

Lida 16551 vezes

Offline

__nan__ 
Membro
Mensagens 254 Gostos 0
Troféus totais: 24
Trófeus: (Ver todos)
Super Combination Combination Topic Starter Level 5 Level 4 Level 3 Level 2 Level 1 100 Posts 50 Posts

Afinal, como Goo.. funciona?

Milhões de pessoas usam o Google a cada hora para encontrar as informações das quais precisam. E ele sempre acerta. Como isso pode ser possível? Usamos o Google, mas como ele funciona? O que está por trás do melhor motor de busca do mundo?

Esse artigo é resultado de uma pesquisa detalhada sobre o algorítmico PageRank. Evitamos ao máximo a matemática para que todos possam entender.

1. Os artigos académicos
Ao defender uma ideia, vocês precisam de argumentos. Caso contrário, sua ideia terá pouco valor ou será pouco aceita. Esses argumentos são, em sua maioria, outras ideias, já comprovadas.

Da mesma forma, outras pessoas poderão utilizar a sua criação como base. É assim o tempo todo na ciência. Os artigos contêm citações de outros trabalhos, usados como argumentos. E também são citados por outros pesquisadores. É como se fosse uma teia. E você pode saber o quão popular é seu trabalho através do número de pessoas que se refere a ele. Com os sites acontece a mesma coisa.

2. As páginas da web

Larry e Sergey, fundadores da Google, perceberam que é relativamente fácil saber para onde sua página (ou artigo) aponta. Basta lê-la! O desafio é saber quem aponta para você.

O problema é praticamente impossível de ser resolvido no mundo real dos artigos académicos. Mas na Internet a coisa muda, pois os sites estão repletos de links. Bastaria "ler" uma página automaticamente e sair seguindo esses links, como um robô. E eles levariam a outras páginas, que levariam a outras, e outras... Em pouco tempo você poderia ter a Internet inteira em casa e poderia saber quem andou citando aquele seu site sobre ovelhas.

Juntos, os dois amigos criaram um utilitário, de nome Backrub. A função dele era vasculhar a rede mundial procurando links. Ainda não havia busca.

Devido à falta de recursos, o Backrub foi obrigado a rodar em computadores de todo o tipo, por isso se tornou extremamente estável. O hardware de baixo custo acabou se tornando uma vantagem, pois muitos computadores "fracos" unidos em cluster se mostraram mais eficientes do que as super máquinas das "concorrentes". E mais baratos.

3. A Busca
Satisfeitos com seu "monstrinho" (que consumia quase toda banda de internet da universidade), os amigos viram que ele poderia ir além. Larry Page criou um pequeno algoritmo, que buscava por palavras apenas nos endereços das páginas. Na época, quem dominava o mercado de busca era o AltaVista. Ele procurava por termos em toda página. Surpreendentemente, os resultados do BackRub quase sempre eram mais relevantes!

O Algoritmo evoluiu, e se tornou o que hoje é conhecido por PageRank. E uma curiosidade: ao contrário do que muitos pensam, a patente do PageRank pertence à Stanford, não à Google.

4. O ataque dos spammers

A medida que o Google crescia e ganhava popularidade, aumentava a disputa por boas posições nos resultados de busca. O spam de links - nome dado à prática de obter links para ganhar destaque em buscadores - crescia à medida que sites falsos eram criados se aproveitando de uma falha crucial no PageRank, que dava o mesmo valor a todos os links. Na prática, isso significava que ter seu site em destaque no www.google.com e tê-lo em uma home-page desconhecida correspondiam à mesma coisa.

O PageRank também falhava por precisar de um banco de dados completo, com todas as páginas da web, para funcionar. De tempos em tempos era preciso atualizar os servidores da Google para só então calcular a nova pontuação (ou o novo PageRank) de cada site. Uma clara desvantagem em uma Internet que muda a cada segundo.

5. O Google contra-ataca

Pouco a pouco, a sabotagem estava acabando com a qualidade dos resultados. A resposta veio em dois nomes: Freshbot e Hilltop.

O Freshbot foi uma nova versão do robô responsável por percorrer a web. Com a inovação, os sites passaram a ser adicionados à busca de forma ininterrupta. Inicialmente, os freshbots trabalharam em conjunto com os antigos robôs, mas mais tarde acabaram os substituindo. A velocidade com que as páginas eram atualizadas aumentou. E aumentou muito. Agora era quase impossível prever se "aquele" link do www.qualquercoisa.pt ajudou a melhorar a colocação de sua página.

Hilltop foi uma mudança radical no PageRank, introduzida por Krishna Bharat. Como principal alteração, cada link passou a ter seu próprio "valor". Esse número é basicamente da semelhança de conteúdo entre os sites e da importância (PageRank) de cada site. Assim, um link do www.adobe.com passou a valer muito mais do que um link do www.qualquercoisa.pt, pois existem mais links apontando para a Adobe (PageRank 10) do que para o qualquercoisa. Da mesma forma, um link no adobe.com terá mais valor ao apontar para um site de informática do que para um site de músicas. Produzir resultados adulterados ficou quase impossível.

Desde então, a qualidade da busca tem aumentado, mas pouco se sabe a respeito do PageRank atual. O algoritmo é mantido em segredo pela empresa, ainda que muitos dos elementos aqui explicados ainda se façam presentes.

PageRank Explained
PageRank is a method invented by Google to measure the relative importance of web pages, which is often called popularity. It is based on the topology of the web, i.e. the links structure between pages.
The main idea is that if a page A has a link to a page B, then the page A "thinks" that page B is important enough to deserve being cited and maybe visited by visitors of page A. This link from A to B increase the PageRank of B.
There are two other essential ideas:
•   the higher the PageRank of page A, the higher the increase of the PageRank of page B. In other terms, it is greatly efficient to get a link from the homepage of Google than one from a page of your cousin's site (otherwise he's a potential genius!).
•   the less page A has out-links, the more page B's PageRank increases. In other terms, if page A "thinks" that there is only one page deserving a link, then it is quite natural that the PageRank of this page B increases more than in the case where lots of pages get a link from page A.
Now that you know the main principles of PageRank, let's see the mathematical formulat... Our explanations are based upon a paper written by the two founders of Google (1), even though since this time the algorithm must have changed -- the basis remains the same.
Let A1, A2, ..., An: be n pages linking to a page B. Let PR(Ak) be the PageRank of page Ak, N(Ak) the number of outward links within page Ak, and d a factor between 0 and 1, generally equal to 0.85.
Then the PageRank of page B is computed from the PageRank of all pages Ak in the following way:

PR(B) = (1-d) + d x ( PR(A1) / N(A1) + ... + PR(An) / N(An) )

AS you may think, this formula is in the same time simple and complex. Simple because it only depends on several terms, complex because it is recursive: if you wan to compute the PageRank of one particular page, you must have already computed the PageRank of all pages linking to it. Then which page should you begin to process?
Actually it is very simple, you only need to initialize all the PageRank with the same value (for example: 1). The choice of this value doesn't have any impact on the final result -- if you give all pages the same initial value. The first iteration of the formula gives you another PageRank for each page, closer to the reality than the value we selected at the beginning.
Then you continue iterating this formula, computing the PageRank of all the pages of the system, based upon the previous values computed in the previous iteration. After some iterations, the system converges: the value of the PageRank of each page doesn't change from one iteration to the next one.
In pratice, the convergence is obtained after several tens of iterations (depending upon the total number of pages!).
Google PageRank Secrets & Tips
By Steve Shubitz
PageRank is a secret algorithm which Google defines as: "...PageRank continues to provide the basis for all of our web search tools."

Google also states: "Important, high-quality sites receive a higher PageRank, which Google remembers each time it conducts a search. Of course, important pages mean nothing to you if they don't match your query. So, Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google goes far beyond the number of times a term appears on a page and examines all aspects of the page's content (and the content of the pages linking to it) to determine if it's a good match for your query."
The purpose of this article is to document the pervasive myths, correct and clarify the definition by Google, correct the inaccurate PageRank information visible on many Forums, and the unwarranted obsessive nature that many folks have about PageRank. In essence, PageRank does not work the way you might think after reading the Google page.

Linking to other sites using PR (PageRank) as the determining factor and or any factor in your decision

This is a false assumption. Don't waste your time using the Google Toolbar. PR has nothing to do with where your site is listed in a given search. Want some proof? Drill down to some of this sites Webmaster Directory pages which opens in a new window, extract a few keywords and search. In some cases this site is in the top 5 listings and my page is PR2. Still not convinced? Fasten your hard disk. Run a search for "free webmaster tools" and you will see that this site is currently listed in the top 6 and the PR for my home page is only 6.

Reciprocal linking and linking guidelines

You should link to another site because it provides quality content complementary to your own, enhances and helps your visitors, propagates your "brand", and has the potential to provide both you and your link partner with targeted traffic. I've been using this criteria for a number of months on this site with absolutely no regard to PR. The net result is my Google traffic is up over 100% and so are my PageViews and unique visitors. Our own Barter/Trade Links Forum has provided thousands of Webmasters with link partners, which enhances both parties targeted traffic.

Their are no shortcuts or tricks to this procedure. You need to examine the other site, reject "Link Farms", "Off-Site Link Services", "TopSites", and any site who has a "links" page which does not branch from the sites home page. What about "bad neighborhood" sites which have a gray PR bar? If you take my advice and spend the time to visit the other site, you will spot some of these "Spam" kings and reject them without the use of the Toolbar. If a few slip through, don't worry. I have no evidence and neither do you that Google will ban your site for linking to an unrelated "Spam" site nor am I concerned because I spend the time to examine the site using the criteria I mentioned.

What is the value of PageRank?
Google uses PR in it's Directory listings. The default listings in a given Category is initially sorted by PR. This Directory uses data from ODP (Open Directory Project) which is also known as DMOZ. You can't determine the actual PR number but you can view the PR bar and ballpark the number. I consider this PR sort of limited value. Given the fact that most surfers don't drill through the Directory to reach your site. On this site, only about 15% of the Google referrals originate from the Directory. Does Google update these sorted PR listing? The evidence is inconclusive.

Want some proof to support the claim that PR may in fact never change in the Directory? My longtime friend, Ryan Adams, of ClickQuick fame used to have a fantastic site but because of time constraints, the site has not been updated or any page changed in over 1.5 years. ClickQuick is still listed as number 3 in the Google Directory with a PR6 and as one of his link partners, I can tell you that traffic from my old link on his home page has experienced a 400% reduction. Perhaps this is an isolated incidence as I have other examples where new sites move up in PR and eventually reach the sorted list as not all sites are sorted by PR. My point is that a higher listing in the Google Directory because of your sites PR is still of little concern/value and you can and should improve your site for Google with new content, rather than obsessing over PR.

Have you ever wondered why the PR number is not available in the Directory (even though the data is fetched) and why you must use the Google "Advanced Features" Toolbar, which tracks your surfing habits to obtain the PR number? I don't read minds but this procedure is a mystery to me. Is this a marketing ploy to collect your surfing habits to enhance the advertising sales of Google? I don't think so, since so few surfers actually use the Advanced Features Toolbar. Frankly, the small percent of folks who use this version of the Toolbar are SEO geeks and very teckie Webmasters. My own surveys of regular surfers indicate 90% of them never use the Advanced Features Toolbar and never use a PR value as a criteria on bookmarking the site and making it one of their favorite stops.
PageRank conclusions
It's important to remember that everything Google does is secret and in constant flux.
NO WEBMASTER (INCLUDING ME) OR ANY SEO GEEK CAN TELL YOU WITH 100% CERTAINTY EXACTLY THE WAY PR OR A SERP (SEARCH ENGINE RESULTS PAGE) WORK.
I say this after spending over 100 hours in research to create this article. PageRank is but one of perhaps 100 different factors which Google may use to determine exactly where your listing will appear in a given search. That's all that matters, top 10-20 placement in a search. It's best to view PR as a "very raw" metrix and most importantly, an automatic result of proper site construction, maintenance, and SE optimization.

What I can tell you with absolute certainty is to use common sense from a "user experience" perspective and that sites should be built using three factors in the exact order stated:

1. Appropriate Design including usability and advertising placement
2. Compelling content updated on a regular basis
3. Optimized for Search Engines and Directories

Forget PR, build your sites in this order and Google will reward you with targeted traffic.

The Optimal Link Format
As we explained it in the description of a quality link, the link context is very important. In the case of PageRank for example, a link between two pages will bring more PageRank if both contexts are identical.
In fact, the context may only be based upon
•   the title of the page A linking to page B,
•   the anchor text (text used to establish the link within the <A> HTML tag).
Then it is better that page A talks about the same subject as page B, and that the anchor text comprises the best selected keywords for page B. The optimal format of a text link of a site A towards a site B must be the following:
……<a href="http://www.siteB.com/pageB.htm">….
put here the optimal keywords for site B</a>…..
When you ask to another webmaster to make a links exchange, give him (her) a code like this one, stuffed with all your keywords (in general the text should be the title of your site).
Do not forget that to optimize the contribution of a link in terms of PageRank, the page which will contain the link to your page must have a good PageRank (let's say more than 4), and must have the least number of links. Check for example that the links page is not orphan, and that it does not contain a META tag prohibiting the crawlers from indexing it...
If you prefer making a banners exchange, the image must contain an ALT attribute with your keywords:
……<a href="http://www.siteB.com/pageB.htm">….<img src="thumbnail.png"
alt="put here the optimal keywords for site B"></a>…..

… a minha última contribuição para o Mais-tráfego
fica bem Fpware, até sempre …vou para muito longe..

..um abraço para todo o pessoal, até sempre..
Offline

MarKo 
Administrador
Mensagens 4610 Gostos 10
Feedback +1

Troféus totais: 37
Trófeus: (Ver todos)
Level 6 Windows User Super Combination Combination Topic Starter 50 Poll Votes 10 Poll Votes Poll Voter 10 Polls Poll Starter

__nan__, excelente post, nós cá te esperamos de volta!
Offline

Algemas 
Membro
Mensagens 1793 Gostos 0
Feedback +1

Troféus totais: 32
Trófeus: (Ver todos)
Level 6 Super Combination Combination Topic Starter 10 Poll Votes Poll Voter Poll Starter Level 5 Level 4 Level 3

Boa sorte __nan__!  :wink:
Offline

swing 
Membro
Mensagens 3126 Gostos 41
Feedback +13

Troféus totais: 32
Trófeus: (Ver todos)
Avatar Level 6 Linux User Mobile User Windows User Super Combination Combination Topic Starter Poll Voter Level 5

Mt bom artigo :D
Offline

morpheus 
Membro
Mensagens 1199 Gostos 0
Troféus totais: 28
Trófeus: (Ver todos)
Super Combination Combination Topic Starter Poll Voter Level 5 Level 4 Level 3 Level 2 Level 1 1000 Posts

boa sorte __nan__, serás sempre bem vindo se quiseres voltar! ;)

e excelente artigo, como sempre! ;)
Offline

Casteloes 
Membro
Mensagens 935 Gostos 0
Feedback +14

Troféus totais: 31
Trófeus: (Ver todos)
Search Linux User Mobile User Windows User Super Combination Combination Topic Starter Poll Voter Level 5 Level 4

Parabens é um excelente artigo.
Offline

asturmas 
Administrador
Mensagens 19734 Gostos 50
Feedback +2

Troféus totais: 39
Trófeus: (Ver todos)
Mobile User Windows User Super Combination Combination Topic Starter 100 Poll Votes 50 Poll Votes 10 Poll Votes Poll Voter Poll Starter

__nan__, parabens e ate a vista!
Offline

morzaban 
Membro
Mensagens 999 Gostos 0
Troféus totais: 31
Trófeus: (Ver todos)
Level 6 Super Combination Combination Topic Starter 10 Poll Votes Poll Voter Poll Starter Level 5 Level 4 Level 3

Boa sorte __nan__!  :D
Offline

surfrib 
Membro
Mensagens 213 Gostos 0
Troféus totais: 29
Trófeus: (Ver todos)
Windows User Super Combination Combination Topic Starter 10 Poll Votes Poll Voter Level 5 Level 4 Level 3 Level 2

Excelente artigo!!!

Ate qq dia...
Offline

poir0t 
Membro
Mensagens 1510 Gostos 0
Troféus totais: 29
Trófeus: (Ver todos)
Windows User Super Combination Combination Topic Starter 10 Poll Votes Poll Voter Level 5 Level 4 Level 3 Level 2

Bon voyage  8)
Offline

tbk22 
Membro
Mensagens 1268 Gostos 0
Troféus totais: 29
Trófeus: (Ver todos)
Super Combination Combination Topic Starter 10 Poll Votes Poll Voter Poll Starter Level 5 Level 4 Level 3 Level 2

Excelente artigo e até qq dia!!
Offline

fpware 
Fundador
Mensagens 15318 Gostos 7
Troféus totais: 38
Trófeus: (Ver todos)
Linux User Mobile User Level 6 Super Combination Combination Topic Starter 10 Poll Votes Poll Voter Poll Starter Level 5

Excelente post __nan__!

Mas para onde vais?  :(
Offline

karrico 
Membro
Mensagens 2962 Gostos 0
Troféus totais: 28
Trófeus: (Ver todos)
Super Combination Combination Topic Starter Poll Voter Level 5 Level 4 Level 3 Level 2 Level 1 2500 Posts

Muito bom _nan_ :) Volta Rapido
Offline

morzaban 
Membro
Mensagens 999 Gostos 0
Troféus totais: 31
Trófeus: (Ver todos)
Level 6 Super Combination Combination Topic Starter 10 Poll Votes Poll Voter Poll Starter Level 5 Level 4 Level 3

Um pouco offtopic mas também não merece um novo tópico.

Hoje é Dia de Portugal e fiquei contente por ver que o dia não passa ao lado da Google  :D

http://www.google.pt/
Offline

tbk22 
Membro
Mensagens 1268 Gostos 0
Troféus totais: 29
Trófeus: (Ver todos)
Super Combination Combination Topic Starter 10 Poll Votes Poll Voter Poll Starter Level 5 Level 4 Level 3 Level 2

Fiquei muito contente ao ver isso!!