SEO for PDF files: Advanced Tricks
PDF files, just like web pages, can be optimized to rank highly on Google. Many SEOs recommend steering away from PDF files as much as possible, but they are ranking all over the place on Google, so I wouldn’t particularly avoid using them. In fact, if you’ve gone through some effort to make a professionally-formatted PDF file, one might argue it’s likely to be higher “quality” content than the average run-of-the-mill web page. I would not rule out Google even slightly favoring PDF files for this reason. Tests or correlation studies don’t seem to have been done on this topic by anyone in the industry – if you know of any, please mention them in the comments below.
Here’s a list of best practices you can use to optimize your PDF files:
1. Tools to create your PDFs
Ideally you should use Adobe Acrobat, but if you’d like to do some of the things I’m suggesting here on the cheap, you can download a few free tools that can do the job. CutePDF is a printer driver for Windows that converts anything you’re printing into a PDF, and Quick PDF Tools allows you to edit the PDF’s properties after the fact. If you have Microsoft Word 2007, Microsoft has a free add-in download that enables you to save documents as PDFs as well.
2. Keyword Density
Just as with web pages, using the target keyword, and peppering in some related keywords, the right number of times, is important in telling Google what your page is about. Many in the industry have tried debunking keyword density as a ranking factor, but the fact is, it works. You can use a tool like Bruce Clay’s keyword density tool or GoRank’s tool to figure out the proper keyword density by analyzing the top pages ranking for the keyword you’re targeting (I usually use the top 4 ranking pages). Yes, it can be hard to say “lawyer in miami” 70 times, but if the pages you are competing with are doing that on average, you really must. I wouldn’t get quite as hung up on document length – often there is one very large SERP result that skews the average – but you should try have your document be longer, at least, than one of the top four.
3. Avoid Duplicate Content
Of course, if your PDF is simply an alternate, printable version of an existing web page of yours and you don’t really want it to rank, you should “noindex” it in your robots.txt file – otherwise Google may rank it rather than the web page you’d prefer to rank.
4. Make it a text-based PDF rather than an image-based one
If you’re printing from MS-Word or using CutePDF and so on, this won’t be a problem. If you’re using image editing or some sort of page layout program, you may need to check this. If you can view the file with Acrobat Reader and can select and copy text from it, then you’ve gotten this one correct.
5. Put your keyword in the file name
This is often ignored but is likely used by Google in its ranking algorithms – use dashes to separate words, i.e. “squeaky-floor.pdf”.
6. Set the Title property
Obviously you want to optimize the title just as you would for a web page. Put your keyword as far to the left as possible, and if you can get the keyword (or pieces, stems, and so on of it) in there twice, more power to you. For instance, if you want to rank for “grow tomatoes”, you might try “Grow tomatoes – tips for tomato growing”, and so on. Whatever you set the title property to is what Google will likely display as the title in the SERP. Also in the document properties place the title into the “description” field.
7. Subject Property (i.e. the Meta-Description)
You should put your meta-description into the “Subject” property of your PDF file. I have found a lot of bad advice out on the web about this directing people to use other properties such as the “keywords” field, but here is absolute proof that the “Subject” property is the correct one for your meta-description: a screenshot of a SERP result (figure 1), and the properties of the source document (figure 2).
The prosecution rests!
8. Keywords Property
Although Google is not believed to use Meta-Keywords tags from HTML pages, a slight correlation was observed in a study done by academics who worked to reverse engineer Google’s ranking algorithm. It may not help much, but throwing in your keyword and a few variations, separated by commas, certainly won’t hurt and is probably called for.
9. H1 and H2 tags
I would not obsess about adding these in as they only contribute to ranking slightly, but if you want to, you have two options. The MS-Word plug-in mentioned above allows you to save headings as bookmarks (make your H1 tag by selecting the “Heading 1″ style in the document, then when you save as PDF hit the “Options” button to select this option). The other way would be to purchase Acrobat Professional. I do not know of any free tools that will allow you to create H1 and H2 tags, but if anyone out there does, please make a comment below.
10. Other fields
The Author, Comments, and advanced fields such as Copyright Information and so on can generally be ignored for SEO purposes.
11. Linking to individual pages in your PDF
Here’s a neat trick – you can link to a specific page of a PDF (regardless of whether it has any special tags in it and so on) simply by appending [#page=] and a page number to the URL, for example:
This won’t necessarily help you from an SEO standpoint, but from a navigation standpoint within your site it can be extremely convenient.
12. Use PDF files in your internal linking strategy because they PROBABLY pass PageRank
In an interview with Stone Temple consulting, Matt Cutts implies that links in PDF files do indeed pass PageRank. If PDF’s don’t pass PageRank, Google would lose nothing by disclosing that – but if they do, then by disclosing it Google would be creating an incentive for people to proliferate PDFs (and Google is well known to hate closed standards – particularly if it’s not their own – they probably don’t want to encourage people to embed links in QuickTime videos either 😉 . You could argue that since PDF files do not show up in Google Webmaster Tools as sources of links they must not count, but what GWT displays or doesn’t display is a conscious choice on Google’s part (in fact, they point out often that the backlinks you can find in GWT are not all of your backlinks).
I would not be surprised if a link from a PDF does indeed pass PageRank and is even weighed *more* heavily than the typical link, but I am unaware of anyone in the industry doing testing in this respect.
Just as with HTML files however, it is reasonable to assume that the anchor text of links in PDF files is significant for the document being linked to for ranking purposes, so PDF files should be a part of your website’s internal cross-linking strategy.
PDF files are fine to use for SEO purposes; if you only have a few, don’t sweat the details, but if you have a lot of them, putting a standard process in place to optimize these as you create them will be well worth the effort.