Crawling, Indexing, ranking and What Google wants

How Google works

When you search on Google, it doesn't analyze websites or ranking factors in real time. Instead, Google pre-crawls websites and then decides if they should be indexed.

It does this for 2 main reasons. The first being, if Google went and visited many websites in real time when you did your Google searches, it would take quite some time for them to get the information from websites and then process it to see what ranks higher. Your Google searches would take literally many minutes to return results for you. Think how fast Google gives you results, it's normally fractions of a second. There is no way they could return results so quickly if they crawled sites in real time.

The second reason Google doesn't go out and decipher websites in real time is resources. Google has limited resources and is not an endless pit of processing power. It takes a lot of computer power to crawl websites with bots. If they went out and crawled every time someone did a Google search, their hosting services could well use 100x what they currently do (a guess). This would raise the cost per search so much that the whole search industry wouldn’t be viable. No, Google pre-crawls all websites before they are indexed and ranked. Google does not do it in real time.

Crawling

Google uses bots to crawl websites. They go through websites very quickly, much faster than you and I do! It tries to find all links on that page and stores those links on its servers. It then goes back and goes through all these links. It will go through the HTML and take the parts it finds useful - content like headings, title tag, schema, meta tags, and words on the page.

This is where web performance with Craft CMS comes in to help Google crawl pages. If a website's performance is good, meaning if the Googlebot can quickly go from page to page when visiting the website, then that makes Google's job so much easier. The better web performance and the quicker your HTML loads, the better for Google. Google will reward that website by visiting more, because of this speed optimisation advantage.

Google also uses sitemaps to help it find pages on websites. Craft’s popular plugin SEOMatic produces sitemaps and does a good job at this. Google also uses an index of previously visited pages to decide what pages to visit. So that’s the 3 main ways Google finds pages. 1. Their own index of pages. 2. Googlebot and 3. Sitemaps.

Indexing

Once Google has crawled a website, it then stores the information it needs in a huge database called the search index. As you can expect, this isn’t just a sole server, it’s whole server farms located throughout the world. Each one is probably the size of a farm. A huge, huge resource!

It must be stated that Google doesn’t index every web page it finds. It’s based on quality in a number of areas, much like when ranking pages. The factors are similar.

Low-quality content
Poorly linked content
Not properly accessible by the crawler
404 pages - not found
Canonical misconfiguration - the website doesn't tell what page it wants to be indexed

It could be any one of the above reasons.

Ranking

Pages that rank must be crawled and indexed before they are even considered for ranking on Google. There is no way past that. A page not crawled? No chance of ranking. A page not indexed? No chance at all of being ranked. A page must be crawled and indexed by Google, or an equivalent search engine, before showing on Google’s search results page and therefore being ranked.

How does Google rank pages?

It’s said that Google uses over 200 factors to rank pages. At an overview level, there are 3 main pillars.

The first being content. Factors such as the quality of writing, the number of pages on the topic, and the page structure. I’ve mentioned on this blog before that if your pages have low-quality content and writing, they'll struggle to get the most out of Google.

The second pillar is technical seo. That things like web performance, indexing, and crawling issues - as seen above.

The third is backlinks or off-page. The quality (over quantity) of backlinks that point to a website is a big factor. How many mentions(not an actual link) Google sees on the web about your brand is another factor.

There are other factors, such as User Experience, which I’ve talked a good bit about on this blog.

Great tool

Google's own free tool, Google Search Console, provides a wealth of data on indexing, crawling, and ranking of your pages. Once you verify it, you get immediate access to very handy insights into your site.

FAQ

How often does Google visit my website?

I’m going to give the adult answer: it depends. It really does. If your website is seen by Google as high-quality and publishes new content hourly, Google Bots may visit your site multiple times a day. If you are a relatively new site with little new content, then it may be a few weeks. The higher quality and more popular your website is, the more regularly Googlebot visits.

What can I do to stop Google from crawling my website?

You can stop GoogleBot and other search engine bots from crawling your content by using a robots.txt file. This stops all crawling, not just Google.


User-agent: *
Disallow: /

What can I do to stop AI crawlers from visiting my website?

You can also stop AI crawling by using the following in your robots.txt file, found in the root directory of your website.


User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: GPTBot
Disallow: /

Written by John Macpherson

Crawling, Indexing, ranking and What Google wants

How Google works

Crawling

Indexing

Ranking

How does Google rank pages?

Great tool

FAQ

How often does Google visit my website?

What can I do to stop Google from crawling my website?

What can I do to stop AI crawlers from visiting my website?

Categories:

Other Stuff

Search Marketing

Web Design

Squarespace

Craft CMS