The Ultimate Guide to SEO Crawlers

The Ultimate Guide to SEO Crawlers

25 Jun 2019

SEO crawlers are tools that crawl pages of a website much like search engine crawlers do in order to gain valuable SEO information. A good SEO crawler is an indispensable tool and will inevitably make SEO work much easier and less time-consuming.

These are the SEO crawlers reviewed in this article:

Here are additional SEO crawlers worth checking out which I did not review:

  • SEOCrawler.io
  • Raven Tools
  • Searchmetrics Crawler
  • IIS Site Analysis Web Crawler (a free tool)
  • Xenu’s Link Sleuth (a free tool)
  • BeamUsUp (a free tool)
  • SEOSpyder by Mobilio Development
  • SEOMator
  • CocoScan

Types of SEO Crawlers

There are two types of crawlers: desktop crawlers and cloud-based crawlers.

Desktop crawlers 

These are the crawlers that you install on your computer.

Examples include Screaming Frog, Sitebulb, Link Assistant’s WebSite Auditor, and NetPeak Spider. Usually, desktop crawlers are much cheaper than cloud crawlers but they have some drawbacks, such as:

  • Crawls consume your memory and CPU. However, the situation is much better than it used to be in that crawlers are improving in the areas of memory & CPU management.
  • Collaboration is limited. You can’t just share a report with a client/colleague. You can, however, work around this by sending them a file with a crawl project.
  • Unfortunately, desktop crawlers generally struggle with crawl comparison (Sitebulb is an exception) and scheduling.
  • In general, desktop crawlers offer fewer features than cloud crawlers.

Many professional SEOs admit that even if they work with powerful and expensive cloud-based tools, they still regularly use desktop crawlers as well. Same here. There are some areas where desktop crawlers are more useful and convenient:

  • When I need to see the screenshot of a rendered view, I use SF (currently, it’s the only tool that supports this feature).
  • If I want to start a quick crawl with real-time preview, I use Screaming Frog.
  • When I am running out of credits in the cloud tool, I simply use a desktop crawler like Screaming Frog, WebSite Auditor, or SiteBulb.
  • For now, Screaming Frog and Sitebulb are better in spotting redirect chains than most of the premium tools.

Cloud crawlers

At Onely, we run desktop crawls using a server with 8 cores and 32 GB RAM. Even with a configuration like that, it’s common for us to have to stop crawls because we’re running out of memory. That’s one of the reasons why we use cloud crawlers too.

Cloud crawlers use cloud computing to offer more scalability and flexibility.

  • Most cloud-based crawlers allow for an online collaboration. Usually, you can grant access to the crawl results to a colleague/client. Some of the cloud crawlers even allow sharing individual reports.
  • It’s common to get dedicated, live support.
  • For the most part, you can easily notice changes between various crawls.
  • Generally, cloud-based crawlers are more powerful than desktop ones.
  • Many of them have basic data visualization features.
  • Of course, this comes at a cost. Cloud crawlers are much more expensive than desktop ones!

Methodology

Basic SEO reports

List of indexable/non-indexable pages

This helps you make sure that your indexing strategy is properly implemented.

Missing title tags

A crawler should show you a list of pages that have missing title tags.

Filtering URLs by HTTP status code

How many URLs are not found (404)? How many URLs are redirected (301)?

List of Hx tags

“Google looks at the Hx headers to understand the structure of the text on a page better.” – John Mueller (Google)

View internal nofollow links

Seeing an internal nofollow list allows you to make sure there aren’t any mistakes in your internal linking.

External links list (outbound external)

A crawler should allow you to analyze both the internal and external outbound links.

Link rel=”next” (to indicate a pagination series)

When you perform an SEO audit, you should analyze if the pagination series are implemented properly.

Hreflang tags

Hreflang tags are the foundation of international SEO, so a crawler should recognize them.

Canonical tags

Every SEO crawler should inform you about the canonical tags to let you spot potential indexing issues.

Crawl depth – number of clicks from a homepage

Additional information about the crawl depth can give you an overview of the structure of your website. If an important page isn’t accessible within a few clicks from a homepage, it may indicate poor website structure.

Content analysis

List of empty/thin pages

A large number of thin pages can negatively affect your SEO efforts. A crawler should report them.

Duplicate content reports

A crawler should give you at least basic information on duplicates across your website.

Convenience

A detailed report for a given URL

You may want to see internal links pointing to a particular URL or to see its headers, canonical tags, etc.

Advanced URL filtering

It’s common that I want to see the URLs that end with “.html”, or those which contain a product ID. A crawler must allow for filtering.

Page categorizing

Some crawlers offer the possibility to categorize crawled pages (blog, product pages, etc.) and generate reports dedicated to specific categories of pages.

Adding additional columns to a report

When I view a single report, I may want to add additional columns to get the most out of the data.

Filtering URLs by type (HTML, CSS, JS, PDF, etc.)

Crawlers visit resources of various types ( HTML, PDF, JPG). A crawler should support filtering by type.

Overview

Having all the detected issues on a single dashboard will not do the job for you, but it can make SEO audits more streamlined.

Comparing crawls

It’s important to compare the crawls that were done before and after any changes implemented on the website.

Crawl settings 

List mode – crawl just the listed URLs

This feature can help you if you want to perform a quick crawl of a small set of URLs.

Changing the user agent

Some websites may block crawlers and it’s necessary to change the user agent to be able to crawl them.

Crawl speed adjusting

Much like Googlebot, you should be able to adjust your crawl speed according to the server’s response.

Setting crawl limits

When crawling a very large website, you may want to set a limit to the number of crawled URLs or the crawl depth.

Analyzing a domain protected by an htaccess login

This is a helpful feature if you want to crawl the staging environment of a website.

Directory/subdomain exclusion

It’s helpful if you can disallow the crawler from crawling a particular directory or a subdomain. 

Maintenance 

Crawl scheduling

It’s handy to be able to schedule a crawl and set monthly/weekly crawls.

Indicating the crawling progress

If you deal with large websites, you should be able to see the current status of the crawl.

Robots.txt monitoring

Accidental changes in robots.txt can lead to an SEO disaster. It’s beneficial if a crawler detects changes in robots.txt and informs you.

Crawl data retention

It’s helpful if a crawler can store crawl data for a long period of time.

Notifications

A crawler should inform you when the crawl is done (desktop notification/email).

Advanced SEO reports 

List of pages with less than x links incoming

If there are no internal links pointing to a page, Google may think it’s irrelevant.

Comparison of URLs found in sitemaps and in a crawl.

Sitemaps should contain all your valuable URLs. If some pages are not included in the sitemap, Google may struggle to find it. If a URL is included in the sitemap, but can’t be found by the crawler, Google may think that page isn’t relevant.

Internal PageRank value

Although PageRank calculations can’t reflect Google’s link graph, it’s an important feature. PageRank is still one of Google’s ranking factors.

Mobile audit

In mobile-first indexing, it’s necessary to perform a content parity audit and compare the mobile and desktop versions of your website

Additional SEO reports 

List of malformed URLs

Sometimes, websites use improper links, such as http://http://www.example.com. Users and search engine bots can’t visit those links. 

List of URLs with parameters

Commonly, URLs with parameters create duplicate content. it’s beneficial to analyze what kind of parameters a website is using.  

Redirect chains report

Nobody likes redirect chains. A crawler should find redirect chains for you so that you can fix them.

Website speed reports

Performance is increasingly more important both for users and search engines. Crawlers should analyze your web performance.

List of URLs blocked by robots.txt

You should review the list of URLs blocked by robots.txt to make sure it adheres to your indexing strategy.

Schema.org detection

Properly implemented structured data markup is crucial to your search visibility. 

Export, sharing 

Exporting to Excel/CSV?

Being able to export the crawl data to various formats will save you plenty of time.

Creating custom reports/dashboards

When working with a client or a colleague, you may want to create a dashboard showcasing a particular set of issues. 

Exporting individual reports

Let’s say that you want to share a report which shows 404 URLs with your developers. Does the crawler support it?

Granting access to a crawl to another person

It’s pretty common that two or more people work on the same SEO audit. Thanks to report sharing, you can work simultaneously.

Miscellaneous 

Why and how to address the issues

If you are new to SEO, you will appreciate the explanation of the issues that many crawlers provide.

Custom extraction

A crawler should let you perform a custom extraction to enrich your crawl. For instance, while auditing an e-commerce website, you should be able to scrape information about product availability and price.

Can a crawler detect a unique part that is not a part of the template?

Some crawlers let you only analyze the unique parts of a page (omitting the navigation, footer, header etc.)

Integration with other tools

It’s useful if you can easily combine your crawl data with data from other sources, like Google Analytics, Google Search Console, backlinks tools (Ahrefs, Majestic SEO), or server logs.

JavaScript rendering

If your website depends heavily on JavaScript, you need a crawler that is able to render it.

Why you should use the particular crawler

I reached out to the crawlers’ representatives to hear why they think their tool is the best choice.

Desktop Crawlers

SCREAMING FROG

Pricing

£149.00 per year per license. The cost is reduced if you purchase multiple licenses. For instance, if you purchase 20+ licenses, the cost per license goes down to £119.00.

Screaming Frog is the most popular desktop crawler. It checks for virtually every necessary aspect of SEO: canonicals, status codes, titles, headers, etc. It’s a very customizable tool – there are tons of options you can configure. 

Screaming Frog is also up to date with the most recent trends. It allows for JavaScript crawling, and you can integrate the crawl data with Google Analytics and Google Search Console. 

There’s one aspect where Screaming Frog could use some improvement: data visualization. In this category, Sitebulb is simply superior. 

Notable Screaming Frog features: 

  1. Structured data validation.
  2. JavaScript crawling.
  3. Website structure visualizations.
  4. Full command-line interface to manage crawls.
  5. Reporting canonical chains.
  6. Near duplicate content detection.
  7. Information on link position  – content/footer/sidebar.
  8. AMP crawling & validation.
  9. Scheduling. You can schedule crawls (daily/weekly/monthly) and set up auto exporting. It’s a big step forward, but I am still missing the ability to easily compare the data between crawls.
  10.  Web performance reports (Lighthouse  + Chrome User Experience Report).
  11.  Auto-saving & the ability to resume previously lost crawls.
  12.  When you manage huge crawls, you can exclude storing particular elements (e.g. meta keyword) to save up the disk space.

Tip: when you do a crawl, don’t forget to enable post-crawl analysis, which will allow you to get the most out of the data.

Screaming Frog now offers visualization of links. You can choose one of two types of visualizations – crawl tree or directory tree. Both are valuable for SEO audits. The former can show you groups of pages and how they are connected. The latter can help you understand the structure of a website.

Checklist for Screaming Frog.

Sitebulb

Pricing

£25 + VAT per month per user. Every additional license costs £5 + VAT – a mere 20% of the price. Sitebulb also offers a Lite plan for £10 + VAT per month – this plan is ideal for freelancers or website owners.

By visiting https://sitebulb.com/onely you can get an exclusive offer, a 60-day free trial.

Sitebulb is a relatively new tool on the market, but it has been warmly received by the SEO community. Personally, I really like Sitebulb’s visualizations:

Sitebulb's data visualizations

Because of the fact that Sitebulb is a desktop-based crawler software, you can’t just share a report with your colleagues while doing an SEO audit. You can partially work around this by exporting a report to PDF. Once you click on the “Export” button, you will see a 40-page document, full of charts, presenting the most important insights. You can also copy your crawls and work on them with your team across several instances.

The PDF reports are highly customizable. You can select the aspects of the crawl data that you want to highlight in a report that you export.

Sitebulb's feature of selecting which data to export

Crawl maps

Sitebulb’s crawl maps are a uniquely useful feature. These maps can help you understand your website’s structure, discover internal link flow, and spot groups of orphan pages.

A link graph generated by Sitebulb using the crawl data

Notable Sitebulb features:

  • Performance statistics like First Meaningful Paint (helpful for website speed optimization).
  • List mode (like in Screaming Frog).
  • Schema + Rich Results validation.
  • Code coverage report (unused CSS, JS – helpful for website speed optimization).
  • Multi-level filtering, like in Ryte, Botify, OnCrawl, and DeepCrawl.
  • AMP validation.
  • Integration with Google Sheets, Google Analytics and Google Search Console.
  • Link Explorer.
  • Crawling JavaScript websites (Sitebulb uses Chrome Evergreen).
  • Sitebulb is the only desktop crawler that has the crawl comparison feature.
  • Advanced content extractor.

Sitebulb’s main drawbacks:

  • Sitebulb doesn’t inform you about H2 tags.
  • As a Big Data fan, I’d like to be able to export all internal links to a CSV/Excel file. Screaming Frog offers that feature. However, Sitebulb’s summaries and visualizations are probably more than enough for most SEOs.
  • If Sitebulb encounters an error while retrieving a page, it will not be recrawled.
  • I can do only one crawl at a time; other crawls are added to the queue.

I believe in the case of Sitebulb the pros outweigh the cons. By the way, you can suggest your own ideas directly to the Sitebulb team by submitting them through https://features.sitebulb.com/. It seems many interesting features like crawl scheduling, and data scraping are going to be implemented in the near future. I’m keeping my fingers crossed for the project.

Checklist for Sitebulb.

WebSite Auditor

Pricing

WebSite Auditor is available for free and in two paid editions. The paid editions (Pro for 124$/year, and Enterprise for 299$/year) not only offer convenient maintenance and report sharing features, but also allow you to crawl over 500 URLs and store multiple projects in the cloud. The 500 URLs limit of the free version makes it a good choice for freelancers and website owners.

If you use our referral links at WebSite Auditor Enterprise or WebSite Auditor Professional, you will get 10% off at checkout.

WebSite Auditor gives you information about status codes, click depth, incoming/outgoing links, redirects, 404 pages, word count, canonicals, and pages restricted from indexing. You can then easily integrate the crawl data with Google Search Console and Google Analytics.   As with Screaming Frog, for every URL you can see a list of inbound links (including their anchors and source). Also, you can easily export these data in bulk.

A preview of WebSite Auditor's link reports

Website structure visualization

Just like Sitebulb, WebSite Auditor lets you visualize the internal structure of your website: Click depth, Internal Page Rank, and Pageviews (available through integration with Google Analytics).

WebSite Auditor's crawl maps

Sitebulb, FandangoSEO, and WebSite Auditor are the only crawlers on the market that offer this feature.

Content analysis

WebSite Auditor provides a module dedicated to basic content analysis. It checks if the targeted keywords are used in the title, body, and headers. In addition, this module calculates the TF-IDF score for a page.

WebSite Auditor's content analysis reports

WebSite Auditor’s unique function is the ability to look into Google index to find orphan pages.

WebSite Auditor's unique feature of looking up orphan pages

To do this, you have tick the “Search for orphan pages’ option while setting up a crawl.

Setting up Website Auditor's Search for orphan pages feature

WebSite Auditor’s main drawbacks:

  • You can’t limit the number of URLs to be crawled, however, you can specify a maximum crawl depth.
  • You can’t compare the data between different crawls.
  • Although WebSite Auditor supports advanced filtering for reports, it doesn’t support regular expressions.

Checklist for WebSite Auditor.

Netpeak Spider

Pricing

Netpeak Spider is available in three pricing options – a Standard plan for $19/month, a Pro plan for $39/month, and a Premium plan for $99/month. You get a 20% discount if you buy a license for a year. The Standard plan doesn’t offer the multi-domain crawling feature and doesn’t provide the extensive customer support that the Pro and Premium plans include.

Go to our affiliate link and use the promo code: ca480e7f to get a 10% discount for one year on purchasing Netpeak Spider and Netpeak Checker!  

Netpeak Spider was not analyzed in the initial release of the Ultimate Guide to SEO Crawlers, however, the list of improvements introduced in the recently released versions is quite impressive, so I just had to test it.

Speed improvements

First of all, according to Netpeak’s representatives, Netpeak Spider 3.0 consumes ~4 times less memory when compared to the 2.1 version. I don’t have the statistics on the most recent version though.

Netpeak Spider's improvement in RAM and disc space usage

Notable Netpeak Spider features:

  • Custom segmentation.
  • JavaScript rendering.
  • You can pause a crawl and resume it later or run it on another computer. For instance, if you see a crawl consumes too much RAM, you can pause it and move the files to a machine with a bigger capacity.
  • Integration with Google Analytics and Google Search Console.
  • You can rescan a list of URLs to check if the issues were fixed correctly.
  • A dashboard that summarizes the most important insights.
  • NetPeak shows the list of the most popular URL segments.
  • Integration with Google Drive for better report sharing.

Custom Segmentation

My favorite feature of Netpeak Spider is data segmentation. To my knowledge, Netpeak Spider is the only desktop crawler that has implemented it.

With data segmentation, you can quickly define segments (clusters of pages) and see reports related to these segments only.

Netpeak Spider's data segmentation feature

Custom segmentation is definitely a great feature, however, I miss the ability to see a segment overview report like those offered by cloud crawlers like Botify, FandangoSEO, and OnCrawl. In the screenshot from FandangoSEO below, you can see the page type breakdown when viewing the dashboard, which provides a great overview of segments.  

FandangoSEO's feature of analyzing inlinks by page type segments

Netpeak Spider’s main drawbacks:

  • Although Netpeak introduced a visual dashboard (which is fine), it still lacks the data visualization features of some other tools.
  • NetPeak Spider works only on Windows. If you are a Mac or Linux user, you can’t use the tool.

Cloud Crawlers

Let’s move on to the cloud crawlers: DeepCrawl, OnCrawl, Ryte, and Botify.  

Disclaimer: at Onely, we primarily use DeepCrawl and Ryte. We did our best to remain unbiased. The crawlers are presented alphabetically.

Botify

Pricing

Botify doesn’t disclose their pricing on their website. Three plans are mentioned and it’s stated that pricing is flexible and adjusted according to your needs.

The main page of Botify

Botify is an enterprise-level crawler. Its client list is impressive: Airbnb, Zalando, Gumtree, Dailymotion.

Botify offers many interesting features. I think it’s the most complex, but also the most expensive of all crawlers listed.

I noticed one disadvantage of Botify – it doesn’t offer a list of SEO issues on a single dashboard. In contrast, Ryte, Sitebulb, or DeepCrawl show you all the detected SEO issues listed on one dashboard. For instance, this is how Ryte does it:

Ryte's dashboard listing all SEO issues that needs addressing

I suspect that Botify’s developers will introduce this feature shortly.

Botify has the ability to filter reports and dashboards by segments:

Botify's feature of filtering a report by segments

Let’s imagine you have three sections on your website: /blog, /products, and /news. Using Botify, you can easily filter reports to see the data related only to product pages.

Various reports by Botify which split the URLs into segments

There is another useful feature on Botify that other crawlers simply miss. For every filter, you can see a dedicated chart (there are 35 charts in the library across several categories). This is pretty impressive.

Also, you can install the Botify addon for Chrome and see insights directly from the browser. Just navigate to a particular subpage of a crawled website and you will see:

  • Basic crawl stats,
  • A sample of internal links,
  • URLs with duplicated metadata (description, H1 tags),
  • URLs with duplicated content.

Botify stores HTML code for every crawled page. It allows for checking content changes across crawls.

Botify allows for server log analysis and JavaScript crawling; however, like in the case of OnCrawl, it’s not included in the basic subscription plan.

Botify offers a helpful knowledge base and webinars showing how to use their features.

Checklist for Botify.

Deepcrawl

Pricing

DeepCrawl doesn’t have a fixed price plan – you should get in touch with their sales team to start using the crawler.

DeepCrawl is a popular, cloud-based crawler. At Onely, we use it on a regular basis (along with Ryte and Screaming Frog).

We really like DeepCrawl, but one of the biggest drawbacks of it is that you can’t add additional columns to a report. Let’s say I am viewing a report dedicated to status codes and then I would like to see some additional data: canonical tags. I simply can’t do it in DeepCrawl. If I want to see the canonicals, I have to switch to the canonical report. For me, it’s an important feature that’s missing. However, I am pretty sure they will catch up shortly. I do believe that in the case of DeepCrawl, the pros outweigh the cons.

Notable DeepCrawl features:

  • JavaScript rendering.
  • Logfile integration.
  • Integration with Majestic SEO.
  • Integration with Zapier.
  • Stealth mode (the user agent, the IP address is randomized within a crawl; helpful for crawling websites with restricted crawling policy).
  • Integration with Google Search Console and Google Analytics.
  • Crawl scheduling.

Checklist for DeepCrawl.

OnCrawl

Pricing

OnCrawl offers a free 14-day trial that lets you see if it’s the right crawler for your needs. 

The paid plans include Explorer at €49/month, Enterprise at €199/month, and Ultimate at €399/month. The plans differ in the number of domains you can monitor, the number of URLs you can crawl per month, and the number of simultaneously running crawls. There’s also the Infinite&Beyond custom plan for large agencies and enterprise clients.

A lot of SEOs appreciate OnCrawl because of its unique near-duplicate detection feature – you can filter a list of URLs by similarity ratio.

Another great feature of OnCrawl that most other crawlers miss is that you can integrate OnCrawl’s crawl data with any other CSV data. Just upload a CSV file with any data you want and make sure that your CSV contains the common field: “URL”.

Note: Botify offers a similar feature for some of their clients. FandangoSEO recently added this feature as well. I like OnCrawl for its URL segmentation. Let’s say you view a list of non-indexed URLs. Then, you can quickly switch URL segmentation to see only the blog or product pages.  

OnCrawl is a cloud-based tool. While this tool is suited for bigger companies, it offers a starter plan at a reasonable price. You can crawl up to 10k URLs per month (up to 5 projects) paying 10 euros per month.

A lot of SEOs appreciate OnCrawl because of the near-duplicate detection feature – you can filter the list of URLs by a similarity ratio. There is another great feature of OnCrawl that other crawlers miss – you can integrate OnCrawl with any data. Just upload a CSV file with any data you want and make sure that your CSV contains the common field: “URL” and the sky’s the limit. Note: Botify offers a similar feature for some of their clients, but they don’t do it at scale, and recently FandangoSEO added such a feature. I like OnCrawl for its URL segmentation. Let’s say you view a list of non-indexed URLs. Then, you can quickly switch URL segmentation to see only the blog or product pages.  

The recent version of OnCrawl brings some hreflang improvements:

ultimate-guide-seo-crawlers - hreflang-improvements.png

 

OnCrawl gives you interesting reports regarding your page groups:

It also provides an overview of the link flow between page groups:

OnCrawl integrates with Google Analytics and Google Search Console. As with every cloud-based crawler, it allows for crawl scheduling. OnCrawl provides some pre-defined SEO reports, but its power is in its flexibility. You can create your own dashboards. Go to Tools -> Dashboard builder and click on the category you are interested in. As of 2nd May, there are 24 categories to choose from. Examples are Status codes, Indexability, Inlinks, Orphan pages, etc.

You can easily add or remove charts to a custom dashboard. OnCrawl provides a library of charts to choose from and a drag and drop to a specific custom dashboard.

If you ask me about OnCrawl’s drawbacks, it lacks the ability to filter crawled URLs by regular expressions. Also, Oncrawl doesn’t provide the list of detected SEO issues by default. You can work around this by clicking on the Dashboard builder -> Onsite Issues.

While using OnCrawl, I had UX issues with finding particular reports/dashboards, but they are there. OnCrawl is a quite powerful crawler, but it is difficult to digest. OnCrawl’s price depends on if you want to use the Logfile analysis feature. The price list (without log file analysis):

The price list (with the log file analyzing feature):

OnCrawl has created a unique coupon for Onely readers: “Onely-OnCrawlTR2019”. This coupon will give users a 15% discount on any subscription and is valid until December 31, 2019.

Checklist for OnCrawl.

Ryte

Main competitors: DeepCrawl, OnCrawl, Botify, Audisto, JetOctopus, FandangoSEO, ContentKing Ryte is another popular web-based crawler. We use it during our everyday routine (along with DeepCrawl and Screaming Frog).

Good to see that we are listed on their partner’s list! I really like the reports generated by Ryte. On a single dashboard, I can see a list of all the detected SEO issues. Then I can click to see the detailed view and decide if it’s a real issue or if Ryte just wants to draw my attention to something. Of course, this report can’t replace human intervention, but it’s great having such a feature available. As with its main competitors, Ryte integrates with Google Search Console and Google Analytics.

Ryte’s unique function is the uptime server monitoring (they ping your server from time to time to ensure the server works well). Another interesting function is the robots.txt monitoring. Ryte detects if you change robots.txt and lets you review the history.

What is more, Ryte has a comfortable credits policy – if you want to re-run an active crawl, they will not charge you for it. OK, let’s move to the drawbacks. I commonly deal with big crawls, 500K+/1M+ URLs and sometimes I need to export particular reports to CSV. Until recently CSV export was limited to 30K rows. Fortunately enough, they recently expanded it and now it’s possible to export 100K rows. And if you use their API, the sky is the limit. To get you onboarded, Ryte provides webinars.

Update: Ryte now supports JavaScript crawling. Also, there is an addon (BotLogs)  allowing you to analyze server logs.

Ryte offers different pricing plans depending on your needs:

ultimate-guide-seo-crawlers - Prices-for-Ryte.png

Checklist for Ryte.

Audisto

Audisto is a crawler popular mainly in German-speaking countries.

ultimate-guide-seo-crawlers - Audisto-crawl-report.png

Using Audisto, you can split lists of hints by category, like Quality, Canonical, Hreflang, or Ranking.

I really like Audisto’s segmentation.  You can create URL clusters based on filters and see reports and charts related only to those clusters.

Many crawlers offering this a feature require you to have knowledge about Regular expressions. Audisto is a bit different in that; you can define patterns in the same way you define “traditional” filters. Additionally, you can even add comments when adding a cluster, which may be helpful for future reviews or when many people work on the same crawl.

However, you can’t apply segment filtering for all reports. For instance, you can’t do it for a Duplicate Content report or Hreflang report. With Audisto you can easily compare two different crawls.

Bot vs User Experience

Audisto has a nice approach to bot vs user experience. They detect if users get a similar experience as Googlebot and even provide a chart to visualize the comparative experience.

Monitoring Issues

For every issue listed in the Hint section (Current monitoring -> Onpage -> Hints) you can see the trendline, which is helpful for tracking SEO issues:

Recently, Audisto improved their PageRank and CheiRank calculation.

You can now see how much PageRank is distributed to pages with different statuses (200 vs 301 vs 404 and more).

ultimate-guide-seo-crawlers - Audisto-PageRank-report.png

Now, it’s time to point out some disadvantages of Audisto:

  • You can’t add additional columns to a report (however, reports contain a lot of KPIs and this should be improved in the next iteration of their software).
  • The URL filtering is rather basic. However, you can partially work around this by using custom segmentation.
  • Audisto doesn’t offer custom extraction.
  • It doesn’t integrate with Google Analytics, or Google Search Console and server logs.  But, of course, you can do custom analyses if you use their API.

Pricing

A packet for 5M URLs cost 320 EUR (~364 USD) per month. 1 million URLs/month cost 150 EUR.

JetOctopus

JetOctopus is a relatively new tool in the market of cloud crawlers. They divide issues into six categories:

  1. Indexation
  2. Technical
  3. HTML
  4. Content
  5. Links
  6. Sitemap

It offers nice visualizations. Below are some screenshots from the tool.

Custom Segmentation

JetOctopus allows you to define a new segment, which is very easy to use. You just set the proper filter and click on “Save segment” and you don’t need to be familiar with Regular expressions.

Then, you can filter reports to predefined segments.

For now, JetOctopus doesn’t offer server log analysis, but they are in the process of building a dashboard for it.

Linking Explorer – Discover Anchors and Source of Links

I like their linking explorer (a feature added very recently). I can easily see the most popular anchors of links pointing to a page or group of pages.

Also, it shows the most popular directories linking to a page.

Here’s where page segments come in handy. You can quickly switch segments to see only the stats related to links coming from particular segments (i.e from blog or product pages).

Now, some of JetOctopus’s drawbacks:  

  • No custom extraction.
  • No JavaScript crawling.

For now, JetOctopus offers a backend server log analysis and Google Search Console integration, however, they are in the process of building a dashboard for it.

Please remember, it’s a relatively new tool on the market, and cheaper than other cloud tools. I hope they will continue to improve. You can register for a trial and crawl up to 10K, with an unlimited number of products. If you have a few small websites, you can go for the basic package (up to 100K URLs, an unlimited number of projects). It costs 20 euro (~23 USD) per month.

ultimate-guide-seo-crawlers - Jet-octopus-prices.jpg

Using the “Onely” promo code, you can get a 10% discount for Jet Octopus.

FandangoSEO

FandangoSEO is a Spanish crawler, and the name comes from the lively Spanish dance.

Like many other cloud tools, FandangoSEO offers good visualizations. Some screenshots are presented below:

 

Integration with Server Logs at no Cost

These days, server log analysis has become an integral part of many SEO analyses. FandangoSEO integrates with server logs (and like DeepCrawl, you don’t need to pay extra for it). You can upload logs once or periodically (using their interface or FTP).

Defining custom segments Similarly to Botify, OnCrawl, JetOctopus, in FandangoSEO, you can define custom segments.

Because of this, you can see some reports related to segments.

FandangoSEO requires you to know Regular expressions to define new segments. If you want to learn Regular Expressions, you can read my article on the subject.

FandangoSEO Detects Schema.org

FandangoSEO is one of few crawlers that detects Schema.org, so you can easily see URLs with Schema.org implemented.

Crawling Competitor’s Websites

You can compare data between various projects with this software, which makes it possible to crawl your competitor’s website.

Architecture Maps

Similarly to Sitebulb and Website Auditor, you can see the architecture map with FandangoSEO.

Integrate Crawls with any Data

When I initially published this article, I wrote that OnCrawl was the only crawler that is able to enrich your crawls with any data (by importing a CSV file with a common field: URL). And voila! In June, Fandango introduced a similar feature.

I’m glad to see crawlers are improving. Good job, FandangoSEO! It’s time to point out some disadvantages of FandangoSEO:

  • One of the biggest is that I can’t be filtered.  
  • Additional columns can’t be added to a report.
    • For instance, when viewing a report related to canonicals, information about a number of internal links pointing to a canonicalized page can’t be seen.
    • If there are thousands of canonicalized pages, all you can do is export reports to Excel and do the filtering there.
  • It doesn’t integrate with Google Analytics and Google Search Console

FandangoSEO’s pricing starts from 59 USD monthly (150k crawled pages, 10 projects), Medium package (600k crawled URLs) cost 177 USD monthly.

ContentKing

Real-time monitoring, change tracking and alerting

ContentKing is a unique crawler on the market since it is a real-time monitoring tool; informs you in detail about things like on-page SEO changes, robots.txt changes, indexability issues, and pages that started to redirect, 404, or return server errors.

Alerts are sent out if there are big changes or serious issues. According to ContentKing’s representatives, the internal algorithm takes into account the impact of the changes/issues and the importance of the pages involved and then decides whether or not to send out alerts. That sounds interesting, but I need more time to test it thoroughly.

ContentKing also checks for OpenGraph, TwitterCards, and the presence of tag managers and analytics software such as Google Analytics, Adobe Analytics, and Mouseflow.

Below, you can see some screencasts made by ContentKing.

https://www.contentkingapp.com/wp-content/themes/contentking/videos/change-tracking/[email protected]

https://www.contentkingapp.com/wp-content/themes/contentking/videos/events/[email protected]

How does it work? By investigating the Elephate.com/blog I was able to see that:

  • On Monday, July 16th we switched to CDN.

  • On July 4th we had some pages broken that we received an alert for (which has been resolved)
  • We publish a lot of content.

 

What’s most important, I can compare any crawled page between any dates. That’s really impressive and unique on the market.

Now it’s time to point out ContentKing’s disadvantages:

  • Their filtering needs improvement. I need to be able to combine rules when filtering: “URL starts with X, contains Y, but doesn’t contain Z.”  However, ContentKing is already implementing such a feature and it should be ready in September.
  • When viewing a list of issues, I can’t add additional columns (that feature is available only when viewing a full list of crawled pages). So, for instance, while viewing pages with an incorrect page title length, I can’t see information about the title length. (planned for October).
  • Lack of custom extraction (planned for Q4).
  • ContentKing doesn’t execute JavaScript.

Update: since ContentKing first appeared in our test, it added a couple of cool features:

  • Advanced filter operators
  • Slack integration
  • More advanced alerting.

 

 

A package for end-users for 50k pages cost 64 USD per month. Also, there are some packages for SEO agencies and enterprises. For agencies, a package for 1 million pages cost 355 USD per month. ContentKing doesn’t charge for recrawls. It charges only for 2xx pages (but not for redirects, pages not found, server errors, and timeouts).  

You can use our affiliate link by clicking here.


Cloud-based tools at no additional cost?

Do you use SEMrush for competition analysis? Did you know that this tool offers a crawler?

What about Ahrefs? If you use it, you can use their crawler at no additional cost.

Do you recognize Moz Pro?

If you subscribe to it, you can crawl your website for free.

If you use Searchmetrics, you have a crawler for free!

Truth be told, these tools are not as advanced as other cloud-based tools, like DeepCrawl, Ryte, OnCrawl, Botify, but if you need to do a basic SEO audit, they should be fine. Especially, if you don’t need to pay additional money.

Ahrefs

Main competitors: Moz, SEMrush

Ahrefs crawler (Site Auditor) is an integral part of Ahrefs Suite, a popular tool for SEOs. A similar situation to Moz – if you subscribe to Ahrefs (it offers you tools like site explorer, content explorer, keywords explorer, rank tracker), you have their crawler for free.

To let you stay focused, Ahrefs let you easily filter issues by importance (Errors, Warning, Notices).

For every issue, you can see if it’s new or occurred in the previous crawl too.

Ahrefs’ advantage over other crawlers in its segment (Ahrefs/Moz/SEMRush) is that you can add additional columns to an existing report. Also, in Ahrefs you can see which URLs are apparent in a sitemap and which are not. It does have some limitations, though. It doesn’t integrate with GSC and GA. Similarly to Moz and SEMRush, you can’t share the crawl results with your colleagues, so only one person can work on the crawl at a time.

Lifehack: you can get around this limitation. If you use a single Ahrefs account within your agency, you can work concurrently on a crawl. The risk that you will be logged out is minimal.

If you’re new to SEO, you will find the explanation on the issues provided by Ahrefs helpful.

Update: Ahrefs recently released a new feature in their crawler. It’s called “Site Structure” and shows the distribution of HTTP codes, depths, content types, etc. across the website’s subfolders and subdomains.

ultimate-guide-seo-crawlers - ahrefs-update.png

Depending on the Ahrefs plan you have, you can crawl 10k-2.5kk URLs.

Checklist for Ahrefs. 

Moz

Main competitors: Ahrefs, SEMrush

Let’s start with Moz crawler. It’s an integral part of Moz Pro (Keyword explorer, Rank tracker, Crawler, Open Site Explorer).

I really had to consider how to introduce this crawler. From one point of view, it lacks many functions and features that other crawlers support. But from another, it’s a part of Moz Pro. So if you subscribe to Moz Pro, then you have the crawler for free. In addition to that, Moz crawler provides a few unique features like marking an issue as fixed.

Moz crawler integrates with Google Analytics, but it lacks integration with Google Search Console. However, I need to defend Moz a little bit as its main competitors SEMRush and Ahrefs don’t offer this integration either.

I do, however, appreciate that Moz provides a decent explanation for these issues (written by Moz specialists).

It’s useful that the Moz crawler is integrated with other Moz tools and that you can see parameters like Page Authority and Domain Authority directly from the crawl.  

Other interesting features offered by the Moz crawler are the “Mark as fixed” and “Ignore” features. I think the Moz documentation explains it pretty well (emphasis mine):

The tool is designed to flag all these issues so you can decide whether there’s an opportunity to improve your content. Sometimes you just know that you’ve fixed an issue, or you’ve checked that you’re happy with that page and it’s not something you’re going to fix. You can mark these issues are Fixed or Ignore them from your future crawls.

Unfortunately, there are no reports related to hreflang tags and the URL filtering is rather basic. If you want to perform some analysis related to orphan pages, it’s very limited – you can’t see the list of pages with less than x links incoming. Also, you can’t see which URLs are found in sitemaps but were not crawled.

The very good news is that Moz, since August 2018, offers an On-Demand crawl, so you can crawl project outside of your Moz Pro Campaigns.

My opinion: Moz crawler may be enough for basic SEO reports, however, I wouldn’t use it for advanced SEO audits. Its main competitors, Ahrefs and SEMRush, are much more advanced.

Checklist for Moz.

SEMrush

SEMrush is a well-known tool for competitor research. Did you know they offer an SEO crawler? If you subscribe to SEMrush, then you have SEMrush crawler at no cost!

SEMRush is quite good at spotting basic SEO issues. When you go to the Issues tab, you will see all the detected SEO issues listed on a single dashboard. SEMRush divides issues by importance (Errors/Warnings/Notices) and for every issue, you can see the trend so that you can immediately spot if an issue is new. Like Moz crawler, SEMRush integrates with Google Analytics. The main drawback of SEMrush is poor filtering. That’s an area where SEMRush simply must catch up.

Let’s say you want to see no-indexed pages. Then you go to Site audit -> Issues -> blocked from crawling. Unfortunately, this report shows you not only no-indexed pages but also disallowed by robots.txt, and you can’t filter the results.

I really miss the ability to add a column with additional data.  

If you need to create a basic SEO audit for a small website, SEMrush would be fine, but you can’t use it for large websites. The SEMrush crawler only allows for crawling up to 20k URLs per crawl.

Checklist for SEMrush

As I mentioned before, you have access to a free crawler if you have an active account for Searchmetrics, Ahrefs, MOZ, or SEMRush. Check if these tools are enough for your SEO audits. If they, you can use them and save a lot of money.

I have noticed an emerging trend that many SEO tools are adding an SEO crawler feature to their toolkit. For instance, Clusteric, primarily made for link auditing and competitor analysis, now offers an SEO crawler feature.

Which Crawlers Support JavaScript?

Nowadays, an increasing number of websites use JavaScript. Crawlers try to adapt so they have started supporting JavaScript. The obvious question is: which of the crawlers support JavaScript crawling?

Crawler

Support for JavaScript crawling

DeepCrawl

Yes (it’s included in Corporate plans. For smaller packages: Starter and Consultant: price upon request)

Screaming Frog

Yes

Sitebulb

Yes

Ryte

Yes (available in the Business Suite)

Moz

No

Ahrefs

Yes (for Advanced and agency plans)

Botify

Yes (it’s not included in basic plans)

OnCrawl

Yes (it cost 10x more credits)

Searchmetrics

Yes (it costs 2x more credits)

Website Auditor

Yes

NetPeak Spider

Yes

So, Which Crawler is the Best?

I am glad you have survived up to this point!

Before I answer the question, let’s start with a short analogy. Imagine you want to buy a new car. You may ask: which car is the best? I hate to disappoint you. There is no best car in the world. Except, obviously, for the 1967 Ford Mustang.

  • Do you like to feel the wind in your hair? Then buy a cabrio.
  • Do you want to buy a car that your wife will love? Then buy a red one.
  • Do you have a family? Then buy a station wagon.

How much money do you have? Which car company do you trust?

All difficult choices and everything depends on your preferences.

The same thing with crawlers. There is no single best crawler. Everything depends on your needs, expectations, and budget.

My job was to introduce you to the most popular crawlers, and list the features that might be helpful for you.

The Perfect Crawler

  • Can store the crawl data forever (to let you review a crawl after half a year)
  • Has a reasonable price (Screaming Frog, Sitebulb, WebSite Auditor, Netpeak Spider)
  • Should allow for crawling websites that have more than 1kk URLs, if you have the need (many SEO agencies deal with websites of this size)
  • Has website structure visualization (Sitebulb, WebSite Auditor, OnCrawl, FandangoSEO)
  • Provides integration with any data (OnCrawl)
  • Can integrate with server logs, Google Analytics, and Google Search Console
  • Can easily share the crawl with your clients and colleagues (cloud crawlers are typically much better with this)
  • Can show you a list of near-duplicates
  • Groups your pages by categories (Botify, OnCrawl, Jet Octopus, Audisto, FandangoSEO)
  • Can crawl JavaScript websites
  • Allows for exporting data to CSV/Excel even if there are millions of rows to export
  • Provides a list of all detected issues on a single dashboard (Ryte)
  • Can let you see which URLs are orphans (not found in crawls, but apparent in sitemaps)
  • Can let you compare two crawls to see if things are going in the right direction
  • Can let you easily add columns with additional data to existing reports
  • Is the one that satisfies your needs!

Disclaimer: Things I was not Able to Test

Although I did my best, I was not able to test everything. Some examples include:

  • Does a crawler have maintenance errors? Or crashes all the time, not allowing you to finish a crawl? Maybe. Even if I noticed it, I can’t be sure if it’s constant or just temporary. So, I did not mention it.
  • Are the reports provided by a crawler enough for most use cases? Are reports thorough and in-depth?

Now it’s your turn!

My job is over. Now, it’s up to you!

Choose a crawler, see some screenshots on the internet, call for a trial, and test it. Investigate if it fits your needs. Push it to its limits, integrate it with any data you have, and test it. Caution: it’s common that some advanced features are only available with Pro subscriptions. Before purchasing, make sure the plan you’re buying offers you all the features you need.

Have fun and good luck! If you feel I helped you, leave me a note by Twitter.

Preparing content like this consumes a lot of time. If you feel the article was helpful, please use one of these referral links:

Updates, Updates Everywhere!

I will do my best to keep this article up to date. However, all of the SEO crawlers are constantly improving.

If you’re a crawler representative, and you have updated your crawler, let me know which cell of the fact table should I change to reflect the new features and I will be happy to update it. Send me a short screencast to show me the proof.

I would like to say thank you to all the crawler representatives that helped me with creating this article.

Book15minCallIcon

Give us 15 minutes of your time and find out why big brands trust Onely with their major technical SEO issues.