doPDF - A Google Panda recovery case study

Here you can read announcements related to doPDF.
Softland
Posts: 1443
Joined: Thu May 23, 2013 7:19 am

Postby Softland » Wed Jul 27, 2011 4:52 pm

Short history


doPDF is a freeware PDF converter and we know users like it, based on the regular feedback we receive. Since its first release it was downloaded more than 15 million times, with an average of 12-14.000 downloads daily (these are not unique downloads). With its popularity, we started getting more and more traffic from search engines too. Google in particular accounts for most of the search engine traffic, way ahead of Bing or Yahoo.



Starting in February 2011, Google initiated a major algorithmic change, called Panda. This is the single most important algorithmic change they did in the past few years, and is something that impacted a lot of websites. A short changelog of Google Panda releases/updates:




  • Panda 1.0. It was launched on February 24, 2011, an algorithmic change that targeted the so-called content farms (websites that create shallow content en-mass). This impacted about 12% of the queries in United States.

  • Panda 2.0. Launched on April 11, 2011, this change was rolled out globally impacting all English queries.

  • Panda 2.1. At the beginning of May 2011 the algorithm was updated again, this time impacting less queries.

  • Panda 2.2. This is the last update for the algorithm released so far, and happened at the end of June 2011.

  • Panda 2.3. Another Panda update, released around July 23, 2011.


Impact on doPDF


We never worried that doPDF's website will be affected negatively by these changes, because we created all the content and it is obvious that we did so because basically we explain how the program works and what it does. So when Panda 1.0 was released we even noticed a small increase in the traffic Google sent our way and there were no signals that something is wrong. However, shortly after the second incremental algorithmic change, Panda 2.0, we've seen a huge impact on English queries. The website for doPDF is translated in over 37 languages, so we receive a lot of traffic for non-English queries too, not to mention referral and direct traffic. This is why initially we thought the decrease in overall traffic that Google sent was just seasonal, but after creating some country-specific advanced segments we noticed how hard the traffic was impacted.



In order to asses the damage we created an Advanced Segment for each country from the top 20, that sent us traffic from Google search. These segments exclude brand searches, done for the name of our product (i.e. doPDF and do pdf). We've seen that most of the lost traffic was was for six countries: United States, United Kingdom, India, Australia, Canada and Malaysia. Below are the graphics where you can see how much the traffic went down for visitors from these countries:



US - Traffic went down, received only 30% compared to pre-Panda levels.





UK - Same pattern, traffic down to 45% compared to pre-Panda





India - One of the hardest it, traffic went down to only 23% of the previous traffic





Australia - Traffic down, received 45% compared to pre-Panda





Canada - Traffic down, received 55% compared to pre-Panda





Malaysia - Traffic down, received 30% compared to pre-Panda





What we changed


Simultaneously we had traffic go down on 3 other websites, so we were hit from all the directions. We "counted our casualties" and made a plan with what needs to be changed. We went back to the guidelines provided by Google, trying to have user experience improvement in sight. Started implementing changes to the website in order to recover from this penalty, focusing mostly on removing content that might be considered low quality. These are a list of changes we did, not all at once but over a period of almost 3 months:




  • Blocked content. We have a forum for doPDF, and most of the content we added to robots.txt was from the forum. We blocked the tags, profiles for each visitor, search result pages and other forum related pages (login, register, templates). Also we removed some pages from the website that were redundant (screenshots and languages pages) and rss files.

  • Checked website. We checked the website against 404 pages, redirects and re-created the sitemap (used XENU for this purpose, excellent tool).

  • Nofollow external links. We added nofollow to external links, and removed those that were not needed anymore.

  • Modified content. We changed the content of the pages in English.

  • Reduced load time. We introduced a caching system and reduce page load by optimizing images.

  • Reduced bounce rate. We were routing all downloads through a third party site that we own (www.software112.com), thus we constantly had a bounce rate close to 75%. Changed that so the download starts from our website, reducing the bounce rate to an average of less than 30%.

  • +1 button. Implemented the +1 button as soon as it was released and created a page where we specifically encourage users that use and like doPDF to recommend it to others http://www.dopdf.com/forum/topic/we-need-your-help-in-rating-dopdf

  • Validating HTML. We validated the HTML code, but ironically broke it when we've added the plus one button. So I doubt this has any impact on rankings.

  • Webmaster tools errors. Checked and fixed the errors that were shown in Webmaster tools (crawl errors, duplicate title).

  • Blocked parameters. Webmaster tools has a section where you can provide which parameters to ignore. Since we were not happy with only that, we decided to block all the pages that had parameters in their url directly in the robots.txt file.




Recovering after Panda


There are several factors frustrating about the Panda update:


  • you don't know exactly why you were hit (webmaster tools could add a section to help webmasters with that, at least the ones that are considered honest)

  • any change you make takes a lot of time to show results

  • if a section of your site is considered low quality, your entire site might suffer from the penalty (and Google has been recommending for ages to leave the duplicate content as the search engine will sort the bad from the good one - http://www.youtube.com/watch?v=CJMFYpYQZ0c&t=30s


However, after almost 3 months, doPDF started recovering its Google traffic for the countries that were most affected. This started shortly after Panda 2.2, and for several weeks now the trend is ascending. The graphs below are relevant and clearly show an increase in traffic:



US - Traffic up, receiving 70-75% compared to pre-Panda





UK - Traffic up, receiving 75% compared to pre-Panda





India - Traffic up, receiving 69% compared to pre-Panda





Australia - Traffic up, receiving 90% compared to pre-Panda





Canada - Traffic up, receiving 75% compared to pre-Panda





Malaysia - Traffic up, receiving 75% compared to pre-Panda





The traffic we receive now for these particular segments is not 100% the traffic we received previous to being hit by Panda, but it's over 75%. One of the reasons this happens might be the summer holidays, July and August have always been lower than usual in terms of traffic, so we'll know in September if the traffic gets back to the same levels as pre-Panda. One thing is for sure, it's good to finally have some positive feedback.



As mentioned, we had other sites hit by Panda. Two of them are product sites too, just like doPDF's. Most of the changes we made for doPDF, did them on the other websites as well. However there's no sign of improvement. Only a few things are different in doPDF's case, and we suspect these might be exactly the changes that brought the website back on track:


  • Reducing the bounce rate. Because previously we were linking outside of the website for downloading doPDF, the bounce rate was very high. After we changed that the bounce rate went down under 30%.

  • The +1 button. Since we incorporated the button, doPDF received several times more +1 clicks than our other websites. Maybe it hit a critical level where it is taken into account in rankings.

  • Scrapers. Everyone that posts something about doPDF, copies a phrase or even the entire content of the homepage from our site. We try to discourage that, but it's impossible to request them all to change the content. Google mentioned that one improvement in the latest update was scraper detection, meaning attributing unique content to the right source. However I strongly doubt that this last thing is what did it, and this is my proof. Matt Cutts is a Google employee and one of the most popular names when it comes to Google search engine updates. His blog is very popular and by mistake I discovered that his blog gets outranked by scrapers too. The image below shows exactly what I mean:


    Basically if you do a search for "identity of GoogleGuy", his original blog post shows several places down after another website that basically reused its entire content from that blog post. This shows that there are still problems in attributing original content for Google.




We'll continue doing updates to the website, one thing we want to do in the near future is separate different types of content in subdomains. As mentioned initially we have our website translated in many other languages, however those are not different subdomains but are actually subfolders (i.e. /es/ is the website for the Spanish version of the site). We intend to split that in subdomains, as well as moving the forum in a subdomain too. As a conclusion to this post, I incline to believe that user experience and signals from users are much more important now and might help websites recover to pre-Panda levels.

Return to “Announcements”