Analyzing Your Backlinks: Staying Safe in the Post-Penguin Era
A few years ago, SEOs did not have to concern themselves with the quality of the backlinks that led to their website. It was a numbers game, quantity mattered over quality. What was important was to collect as many links as possible from as many different websites and make sure that the links contained a clickable money anchor text in the hyperlinks with the exact phrase they would like their website to appear in Google results. Links were placed mostly on the websites with the “low-entry barriers” (the link was easy to get) – such as low quality directories, forums, bookmarking pages, and blog networks.
With Google’s roll out of the Penguin Update on April 24, 2012, the rules of SEO completely changed. Penguin – developed to take aim at websites that failed to comply with Google’s guidelines and distributed spammy, unnatural links – was initially updated every few months.
On September 23, 2016, it started operating in real time, as a part of the core algorithm, meaning that websites and its rankings are evaluated and impacted, respectively, in real time.
Even though today’s SEO is all about gaining natural links from high quality websites and staying away from any shady link-building techniques, in order to make pages primarily for users, not for search engines, sometimes I still spot link profiles infiltrated with spam. I will cover the most common spam examples further in the article and show how to add them to your disavow file.
Back to Basics: What are Backlinks?
A backlink is a link that connects one website to another. This kind of link is important because it sends signals to Google about your website and has a huge effect on how the website scores in Google rankings.
Backlinks can be displayed as a normal URL, e.g. – https://yourwebsite.com or use a clickable text in hyperlink with a brand or money phrase, e.g. – Your Website.
In HTML, the URL with the anchor text looks like this:
“http://www.example.com” > Your anchor text goes here.
The main purpose of having backlinks is to inform readers about your website or to pass the link juice and improve your Google rankings. Basically, when analyzing your links, I would recommend focusing on checking your do-follow backlinks first, since they pass the link juice and are more important in Google’s eyes. It should also be pointed out that even though no-follow links don’t pass link juice, they can still weaken your link profile – especially if they are unnatural and are coming from a big number of low quality websites. To make sure that your profile is safe, I would advise checking them right after taking care of your do-follow links.
If you suspect that malicious/spammy sites are linking to your site, or you would just like to have a clear view of your backlink profile, the next recommended step is to collect your links from every available resource.
Collecting your backlinks
The most important resource for your backlinks is your Google Search Console account, since it’s the most trusted data source – backlinks that can be found in GSC are crawled directly by Google robots.
To download them (to pull them out), start by logging into your Google Search Console account and going to Search Traffic -> Links to Your Site.
Then, click on More under the Who links the most section.
When the new page loads, click on Download more sample links and Download latest links.
To make sure you have gathered a sufficient number of backlinks and are ready to conduct a thorough backlink check, it is recommended to use more than one backlink source. There are plenty of tools that allow you to download your backlinks, such as:
The best way is to export your links (whether from a single backlink source or multiple) into an Excel spreadsheet and eliminate duplicate links. If you are planning to check your backlinks for the first time, using data from the past few years is vital, since gathering a sufficient number of data points will provide the most comprehensive link profile.
After gathering all the backlinks into one Excel file and removing duplicates, you can start assessing their quality right in your Excel spreadsheet.
It’s crucial to mention that many spammy websites block popular web crawlers (Ahrefs, Majestic, Semrush crawlers etc.) from accessing their site in the robots.txt file. This is why getting your backlink data from Google Search Console is so important.
What kind of backlinks can cause you trouble?
Even though Penguin rolled out a few years ago and changed the way SEOs gain backlinks, we still discover spammy backlink patterns that can cause a drop in Google rankings when we look at and analyze the link profiles.
Here are a few examples of the most common spam sites we were able to find in random link profiles:
Profiles created just for the sake of adding backlinks
Spammy forum profiles are created just for the sake of gaining backlinks that point back to a specific website. A backlink can be added to the profile and placed in the signature, (URL or money keyword phrase) or under a your website/www section. When you look at profiles created just to place backlinks, you will quickly notice that they usually don’t have any valuable posts (or if they do – they are just spammy, irrelevant comments) or they have no posts at all and the date of the last activity is usually the same as the join date.
Spammy forum posts are added along with a backlink with spammy, heavily promotional content.
Usually, these kinds of backlinks use a money anchor tag, e.g., “cheap hotel florida”, “hairdresser chicago” etc. and heavily promote the website connected to the backlink. When a forum is not moderated, many forum posts can be created just to gain a backlink.
Spammy comments on blogs/websites (sometimes trackbacks and pingbacks)
Such comments are added on blogs in the comment section under the article. They usually don’t have any value for readers and are added just for the sake of gaining a backlink. Furthermore, in the section dedicated to names, we would usually see a backlink stuffed with money keywords (as on the screen below).
We can divide comments into three categories:
- comments added by humans or robots
- trackbacks – info sent manually that someone wrote a blog post to in response to your article/blog post, including an excerpt of the content
- pingbacks – info sent automatically, does not include any content
Low quality directories
It goes without saying that spammy directories should be disavowed from your backlink profile. The good news is that there are still some directories that are either of high quality or are very niche specific, and so having a link placed in them can actually help your website in some way. But the bad news is that you should be concerned when you see a directory with an especially high in/out linking ratio and no moderation or “PR stats”, which means that anyone can submit links and can do so for free. I would also suggest avoiding directories into which you can place your listing with money anchor tags instead of brand name anchor keywords, and where every link is accepted (no specific niche of the directory).
Spammy blogs stuffed with money anchor keywords or blog networks
Spammy blogs are created just for the purpose of linking (distributing links) to other websites and manipulating Google rankings (SERPs).
Usually called Splogs, they distribute duplicated, spun or keyword-oriented content (keyword stuffed) with a few outgoing links, usually anchored to money keywords (over-optimized anchors). These kinds of blogs are often created using platforms such as Blogspot, Tumblr, WordPress, Soup.io, and Wix, and they are easy to spot as spammy, since they usually post about plethora of topics, sometimes include links to illegal or unethical websites, and have very little to zero info about the author (usually, that info is fake anyway).
Low-Quality Press Release Websites and Syndication
Most of the press release websites should be disavowed from your profile right away. The content distributed on most PR websites is not only duplicated, but the links have a do-follow tag, which means that they pass the link juice (domain authority and page authority) to other websites.
As Matt Cutts stated, PR websites are not likely to benefit your rankings so you should definitely add them to your disavow file.
Link listings are websites that automatically generate tons of backlinks (usually do-follow) on different subdomains. Sometimes they are also generated on blog platforms, such as blogspot.com. They don’t carry any value and are usually coming from the same IP address. Google is aware of the spam that these kinds of websites distribute, but it’s always a good idea to add them to your disavow file.
Social Bookmarks & Sharing sites
Even though there are still many social bookmarking sites that are worth keeping in your link profile – such as Stumble Upon, Diggo or Digg, the web is teeming with low quality social and sharing bookmarking sites that should absolutely be added to your disavow file. What should definitely catch your eye is the fact that these types of spammy bookmarking websites very often have very similar templates (look and feel of the website), which can mean that they are a part of a bigger linking scheme.
A good example of a spammy bookmarking site: http://www.blogbookmark.com/
When assessing the quality of your backlinks (more takeaways):
- Verify if the website that is linking to you is indexed, otherwise it could mean that the site is being removed from the search results by Google (use the “site” search operator to check that).
- If the linking website attempts to infect its visitors with malware – you should always disavow this kind of domain on the domain level.
- If you’re uncertain whether the website origin is legitimate, lookup the IP address and DNS of the website – sometimes the spammy websites can be a part of a bigger link scheme. I highly recommend the View Dns tool.
- Always be aware of the in/out linking ratio. If the ratio is high (there are a lot of hyperlinks in the content on the website), it’s highly likely that the website is a link farm.
To have your bad backlinks ignored by Google, you need to add them to your disavow file.
Creating your disavow file
Once you have looked over your backlinks pointing to your website, it’s time to eliminate the unnatural backlinks. Before adding your backlinks to the disavow file, it is imperative to try and contact the website’s webmasters where the link is placed and ask for its removal (unfortunately, most of the time contacting the webmaster is going to be unsuccessful).
If you can’t remove the links, just create your disavow file. You need to make a .txt file in UTF-8 or 7-bit ASCII format and copy and paste in every domain or URL that you want Google to ignore, one domain/URL per line.
I recommend disavowing your backlinks on the domain level (for example, domain:spammysiteexample.com) since it gives you a guarantee that you are not going to be hit with new unnatural links coming from the domain or with any that you missed. If, for any reason, you would like to keep the domain (sometimes the domain can send both good and bad links), disavow just a specific URL. In this case, just copy and paste the whole URL into your .txt disavow file in a separate line.
Once you prepared your disavow file, you are ready to upload it to the Google disavow tool.
Just select your website, click on the “Disavow links” red button and then choose the .txt file you have recently created. Voila!
Remember, if you decide to add new domains or URLs to your disavow file, you need to make sure that the previous ones that were disavowed are still in your Disavow .txt file. (you need to make sure to re-use the .txt disavow file you once uploaded to Google and place new URLs/domains in new lines).
Typical mistakes to avoid when creating your disavow file
Creating and uploading your disavow file is a really crucial task, hence it’s important to avoid these typical mistakes:
Adding good links to your disavow file
A very common mistake is when people include good links to the Google disavow tool. If this happens to you,, you can remove them by simply deleting them from your disavow file and then re-uploading the file.
It’s worth mentioning that it usually takes some time before Google recrawls the disavowed domains and URLs.
Uploading .csv or .doc files instead of .txt file
The disavow file should be created as a plain text (.txt file) in UTF-8 or 7-bit ASCII format.
Commenting in a .txt file without using “#” before the comment = syntax error
Comments about the disavowed backlinks shouldn’t be visible by Google robots, since it’s important to use “#” at the beginning of every line where the comment would be placed.
For more information, listen to Matt Cutts explaining the typical mistakes when preparing your Disavow file here.
Having a clean backlink profile, thus looking over your backlinks, getting rid of the bad ones and gaining natural links is crucial in the post-Penguin era, since it can help improve your rankings.
When looking over backlinks to your website, remember to gather data from as many data sources as possible, as it gives you a good overall picture of your links profile and guarantees that the big part of your backlinks would be analyzed and assessed. Keep in mind, that after checking your backlinks, bad links should be disavowed in the Google disavow tool. This kind of action gives you a good assurance that your off-page SEO activities are not likely going to cause drops in Google rankings.