Implementing crawl quota policies
Posted: Sun Feb 09, 2025 8:46 am
A few years ago, Google publicly said, "We didn't realize this ourselves, but it's actually over time that we crawl this page less and less, and we stop seeing the link, and then it doesn't count." So they implied that this was no longer a way to still pass PageRank, and eventually it would be counted as no-index and no-follow. So again, we get a slightly compromised solution.
canonical tag
The canonical tag will be crawled, just a little less over time. It will still not be indexed, and it will still pass PageRank.
This looks great, and in most cases works perfectly. But this only works if the pages are close enough to being duplicates that Google is willing to treat them as duplicates and respect the canonical. If they are not willing china mobile database to treat them as duplicates, then you may have to go back to using noindex.
301
We can use 301 as a way that is even better than canonical and no-index in terms of saving crawl budget because Google doesn’t even have to look at the page in a few cases because it just follows the 301. This also solves our indexing problem and will pass PageRank.
So, how would we actually use these strategies?
One factor that is not so intuitive is speed. As I said before, Google is allocating a lot of time or resources to crawling a given website. So if your website is fast, the server response time is low, or if you have lightweight HTML, they will crawl more pages in the same amount of time.
canonical tag
The canonical tag will be crawled, just a little less over time. It will still not be indexed, and it will still pass PageRank.
This looks great, and in most cases works perfectly. But this only works if the pages are close enough to being duplicates that Google is willing to treat them as duplicates and respect the canonical. If they are not willing china mobile database to treat them as duplicates, then you may have to go back to using noindex.
301
We can use 301 as a way that is even better than canonical and no-index in terms of saving crawl budget because Google doesn’t even have to look at the page in a few cases because it just follows the 301. This also solves our indexing problem and will pass PageRank.
So, how would we actually use these strategies?
One factor that is not so intuitive is speed. As I said before, Google is allocating a lot of time or resources to crawling a given website. So if your website is fast, the server response time is low, or if you have lightweight HTML, they will crawl more pages in the same amount of time.