Orphan pages are webpages that aren’t internally linked from other pages or sections on the same website, making them hard to find because there are no paths leading to them. In other words, without having a direct link, visitors or crawlers won’t find them through the website. Additionally, orphan pages can be accessed via other sources like other websites, and search engines rarely index them. Such pages are hard for search engine crawlers and users to discover, hindering indexing and ranking, thus potentially wasting the effort spent on creating them. They also affect the site’s crawl budget if low-quality orphan pages are crawled instead of more valuable pages. Arise due to site migration, unoptimized site architecture, CMS creating unnoticed URLs, outdated pages, or pages deliberately left unlinked for specific purposes like promotional campaigns. Utilize SEO crawlers, sitemaps, link databases, web analytics services, search analytics, and server log files to identify orphan pages.
Summary in a nutshell
What is an Orphan Page?
Are they bad for SEO?
Common causes of orphan pages:
How to Find Them?
How to Fix Them?
Orphan pages are webpages that aren’t internally linked from other pages or sections on the same website, making them hard to find because there are no paths leading to them. In other words, without having a direct link, visitors or crawlers won’t find them through the website. Additionally, orphan pages can be accessed via other sources like other websites, and search engines rarely index them.
Such pages are hard for search engine crawlers and users to discover, hindering indexing and ranking, thus potentially wasting the effort spent on creating them. They also affect the site’s crawl budget if low-quality orphan pages are crawled instead of more valuable pages.
Arise due to site migration, unoptimized site architecture, CMS creating unnoticed URLs, outdated pages, or pages deliberately left unlinked for specific purposes like promotional campaigns.
Utilize SEO crawlers, sitemaps, link databases, web analytics services, search analytics, and server log files to identify orphan pages.
How do orphan pages impact SEO?
Orphan pages have no internal links pointing to them, making them problematic for search engine crawlers and users. They won’t be found by browsing your website; if they are found in other ways, it may be difficult to understand how they relate to the rest of your domain. This is particularly true for search engines.
How do orphan pages affect search engines?
The lack of internal links negatively influences how search engine crawlers discover content on your website.
Search engines find new pages either by:
- Following internal or external links to URLs on your website, or
- Examining your XML sitemap files.
Orphan pages might be included in the sitemap or have links from other domains. They will still be considered orphan pages, but their chances of getting crawled and indexed increase — which isn’t necessarily good.
While Google can index a URL found in a sitemap without any inbound links, it will struggle to place such a URL in the site’s hierarchy and may not view it as valuable enough due to the lack of links. Whether orphan pages in sitemaps are indexed depends on many factors, such as the website size (with larger websites, Google typically leaves many pages uncrawled and unindexed, and orphan pages are likely given very little priority).
As a result, orphan pages usually don’t get indexed and don’t rank on Google, driving no organic traffic to your website.
If an orphan page gets indexed due to other factors, the complications don’t end there. Without internal links, PageRank won’t be able to flow to the orphan page. This means that any link authority that other pages within the domain gain from having high-quality, relevant backlinks won’t get transferred to the orphan pages.
Also, with no internal links, search engines have no semantic or structural context for evaluating the page. Search engines can struggle to determine for which queries the page is relevant without knowing where it fits in your overall site structure.
If you have low-quality orphan pages and their crawling isn’t restricted in robots.txt files or their indexing isn’t blocked via a noindex tag, search engines can waste crawl budget on crawling them. This is especially detrimental if you have a large website that may suffer from crawl budget issues.
Contact us for crawl budget optimization services.
Need to optimize your crawl budget?
Contact us for crawl budget optimization services.
In rare instances, the low-quality orphan pages can also lead to index bloat, which occurs when a search engine indexes pages on a domain in an uncontrolled way, indexing any content it can find, including thin or duplicate content.
And, to make matters worse, if search engines determine the page isn’t valuable enough to be indexed, but you make it indexable, it can prevent them from indexing other pages on your site. That’s because these low-quality pages may negatively influence the general idea of the quality of your website.
Remember that if a page has even one internal link, it is no longer considered an orphan page. But, if a page only has one link and it’s essential for your website, consider building more links to strengthen its position within the site hierarchy. This way, you can also prevent the page from accidentally being orphaned if the only link gets removed.
How do orphan pages affect users?
Orphan pages are also problematic for users.
If your orphan pages contain high-quality content that ought to drive significant traffic to your website and result in conversions, users will have difficulty finding them if they aren’t included in your site’s structure. This also leads to a waste of time and resources dedicated to creating the content on such pages.
It’s different if your orphan pages were purposefully not linked to but remained findable for users. Visitors landing on these pages may come across outdated or irrelevant content, leading to a poor user experience.
Types of orphan pages
Common causes of orphan pages include:
- A site migration — such as when some of the old pages aren’t included in the new main navigation and aren’t redirected to the new target page,
- Unoptimized site architecture, where some pages go unlinked because there is no site architecture strategy. There could also be mechanisms on the site that don’t automatically include the new types of pages in the navigation,
- A CMS creating additional URLs that you are unaware of,
- Pages becoming outdated or irrelevant, where links to them are removed but the pages remain published — it could occur with out-of-stock products,
- Adding no links to certain pages on purpose – for example, landing pages for promotional or paid campaigns.
Many of these occur because of a lack of coherent, universal processes for conducting site migrations, moving sites from a staging environment to production, making significant changes to the site, etc.
If you’re struggling to conduct a site migration, consider accessing our website migration services.
Because there can be so many different reasons for the existence of orphan pages, addressing them isn’t only about adding links to these pages.
Not all pages should have links pointing to them. Adding links means you actively want search engines and users to view these pages.
Keeping them out of your site structure is one of the signals indicating to search engines that they aren’t valuable to you. This, combined with other aspects, such as restricting their crawling in robots.txt or making them unindexable with a noindex tag, will keep them out of Google’s index.
How to find orphan pages
The first step before fixing anything is finding your orphan pages. Usually, an excellent way to find all pages on your website is to use an SEO crawler, but in this case, crawlers won’t likely be enough. That’s the problem with orphan pages — crawlers won’t find them by following links on your site.
The data sources you can use to find orphan pages on your site are:
- Your sitemaps or other lists of URLs you may have.
- Link databases (like Ahrefs.com) that find links to your pages on other websites.
- Web analytics services, such as Google Analytics.
- Search analytics like Google Search Console.
- Your server log files.
Some tools combine these data sources. For instance, Ahrefs’ Site Audit shows you a section in Page Explorer with orphan pages found through backlinks and sitemaps. The limitation is that Ahrefs won’t show orphan pages that aren’t in the sitemaps or have no backlinks.
Similarly, you can find orphan pages using several data sources with SEMrush’s Site Audit. It gives you two options:
- View pages found in your sitemaps without any internal links.
- View pages with recent hits in Google Analytics that have no internal links.
Screaming Frog has a neat guide on discovering orphan pages using its SEO Spider. Their process revolves around analyzing your XML sitemaps for crawlable pages and using the integrations with Google Analytics and Google Search Console to supply the data for the crawl.
You will be able to view orphan URLs for each of the three data sources – sitemaps, Google Analytics, and Google Search Console. You can then use the Orphan Pages report to export a list of all found orphan pages.
You can also look at Sitebulb, which, similarly, offers an option to connect multiple data sources, including Google Analytics and Google Search Console – check out Sitebulb’s guide to finding orphan pages.
To access more comprehensive data about your site, you need to dig deeper into its structure. The most common solution would be to cross-reference datasets on your own.
Get a list of crawlable pages
You can retrieve a list of pages from your XML sitemap file since it should contain only your crawlable and indexable URLs. The best approach is to use a crawler.
Whichever crawler you use should be set only to crawl indexable pages. It should skip crawling pages that are:
Remember only to crawl the canonical URLs, including the correct protocol (HTTP or HTTPS) and subdomain (www or non-www).
Discover which pages are getting accessed
Once you have a list of your crawlable pages, you need to find pages that get visited by users or crawlers.
Get data from Google Analytics
Google Analytics can help you find pages that users or crawlers access by following external links (including social media) or directly typing in the address.
In Google Analytics, navigate to Behavior > Site Content > All Pages.
You will then view all URLs that have been visited before. Adjust the dates to go as far back as possible. Then, export the received list.
Get data from Google Search Console
You can also find useful data in Google Search Console, and it’s good to combine it with the data found in Google Analytics. Google Search Console may contain data about URLs that Google’s crawler accessed by means other than your internal links.
In GSC, select Performance > Pages.
Make sure that Impressions are included in the presented data. Change the date range to go as far back in time as possible, which will show you all URLs that received impressions in the selected timeframe.
Use server log files
Alternatively, instead of Google Analytics and Google Search Console, you can acquire the most comprehensive data from your server log analysis. Log files contain information about who has visited your site – including search engine crawlers and users and what pages they visited. To use it, you will need access to the server — consult your developers to learn if it’s possible.
Cross-reference the data
You need to look for pages found in the Google Analytics and Google Search Console dataset or the log files that are missing from the exported list of known pages, as these will be your orphan pages.
You can compare the datasets in Google Sheets, Excel, or any other tool.
Once you pinpoint all of your orphan pages, export them to a separate file or spreadsheet for the next part of the optimization.
How to analyze orphan pages
Once you have a list of orphan pages on your website, you need to look at the discovered pages and ask yourself some questions that will help you determine what to do with them:
- Is this page valuable for your site? Does it have an important goal connected to driving traffic or conversions?
- Is this page ranking for any keywords, despite being an orphan page?
- Where should the page exist within your website’s taxonomy?
- Is this page a duplicate or near-duplicate? Can you move the content to another related page that hasn’t been orphaned?
- Is this page optimized? Should you improve it in any way?
- Does the page have many quality backlinks?
Aside from that, it’s good to consider why the pages became orphans in the first place. This will help you be aware of such issues in the future and possibly avoid them.
Optimize orphan pages
Once you understand what purpose the orphan page serves and how it aids in driving your website and marketing goals, you can determine what step, if any, to take with the page.
Link to the page from other internal pages
When you want an orphan page to be found and visited because it’s imperative for site visitors, you must add internal links to it from other pages on your website. This way, you create an opportunity for the page to be found by search crawlers and users.
You need to think about the most suitable place to link to it from – you may want to consider the following:
- Should you add links to it from other thematically related articles?
- Do you need to restructure your site architecture to make room for this page?
- Should you rewrite any of your content to make the links fit better?
- Should there be a link to it in the main navigation or footer?
- What anchor text should you choose to give context to search engines and users who visit it?
If you’re unsure how to approach these, we’ve got you covered with our article on internal linking. You can also contact Onely for internal linking optimization.
Redirect the page
Another method is setting up a URL redirect to a new location — ideally, a relevant equivalent page that will still be helpful to visitors and complement their user journey without interruptions.
If you permanently redirect the page, use a 301 redirect to retain as much PageRank as possible and correctly indicate the move to search engines.
Remove the page
If you found an orphan page that isn’t valuable and needed for your site, and it’s impossible to redirect it, you can remove it.
The most typical approach is changing its status code to 404.
Leave the page as-is
Keep the page unlinked to if it’s serving a business need that doesn’t require internal linking to the page.
This could be the case if, for example, you have a landing page for a campaign that you only want to show users at certain times.
Regularly look for new orphan pages
Depending on the size of your site, you should set up a monitoring process to catch any future orphan pages before they get a chance to impact your SEO.
For example, you could set up a recurring crawl to find orphan pages in the future.
The best way to prevent orphan pages from appearing in the future is to identify what causes them and address the problem at the core. For example, if you pin down a mechanism on your site that generates unnecessary URLs without links, fix it now to prevent more orphan pages from appearing as time goes on.
Whenever you publish a new page, make sure links are pointing to it unless you consciously don’t want the page to be linked to. If possible, implement solutions that automatically generate internal links, such as category pages and related items.
Optimizing orphan pages on your website can help you:
- Add context to them and other pages in your site structure,
- Make the pages crawlable and indexable, giving them a higher chance of ranking for appropriate keywords,
- Transfer PageRank between more pages within your website.
Keep in mind that small amounts of orphan pages are standard for any site and shouldn’t be treated like a big issue.
The problem becomes more severe as you get more orphan pages which can make you miss out on potential rankings, traffic, and conversions, hindering your revenue and business success.
Prioritize having a regular process to catch any unwanted orphan pages and immediately address them.