24

Jan 11

The Different Types of Formula-based Content

Goddard

There have been  comparatively few articles written in the mainstream SEO blogs about creating content based on formulas – the only folks that seem to cover this topic tend to be from the seedy underside of affiliate marketing, under the term “article spinning”.  David Leonhardt’s recent article (more reputable I think), gives some good examples of what the practice entails.

There are various tools available for “spinning” content, but depending on the business goals and approach, the various methods have different tradeoffs.  Pursuing a formula-based content strategy in order to attract traffic is often unadvisable due to the probability of detection (and potential ranking penalties), and should be approached only with a high degree of caution.

Google has a technology for near-duplicate detection available to them called “Simhash“, which has been published on extensively in the literature (and Google has even openly presented on it in the academic community).  When Google presents on something and casually talks about applying it to trillions of documents, that usually means – it’s key to their search capability.

The Simhash algorithm essentially breaks documents into tiny pieces, then compares how many of the pieces are identical – in a way that is highly resistant to fooling; even if you move pieces of sentences around so the documents are in different orders,  it can still detect that the two documents are similar.  Simhash isn’t perfect, and Google doubtless has additional and more sophisticated algorithms now, but it’s instructive in that it illustrates you can’t just change one thing about a document and think “OK, that’s not a duplicate now” – in fact, it’s a near-duplicate and will likely be detected.

In this light, we’ll outline the various approaches to forumula-based content generation, and the benefits and shortcomings of each.

Page-level

Definition – Rewriting an entire article on “salsa dancing”, to create three different versions of it, for use on multiple websites, or to refresh a version on one website.  This technique is also known as “article rewriting”, and is probably most used by folks doing submitting articles to “article directories” – usually for the purpose of obtaining a backlink (embedded in the article).

Benefits

  • Can survive even a “human review”.
  • Content retains all the value-add for users of the original version.
  • Easy to implement- articles can be re-written by unsophisticated writers and then corrected by a more expensive editor.

Shortcomings

  • Expensive to rewrite entire articles multiple times.

Paragraph-level

Definition – Creating content with multiple versions of paragraphs that can be swapped in and out or combined to make multiple versions of pages.

Benefits

  • Can mix and match multiple versions of pages to make even more versions.
  • Content retains all the value-add for users of the original version.

Shortcomings

  • If generating many versions, the probability of several versions being very similar grows, still posing some detection problems.  It’s important to note that uniqueness is not enough – pages must be sufficiently dissimilar.
  • Two pages that are 90% the same are probably as likely to be detected as two pages that are 100% the same.

Sentence-level

Definition – Similar to paragraph-level, but mixing and matching individual sentences to make each paragraph.

Benefits

  • Can create a much large number of pages with this technique.

Shortcomings

  • While paragraphs are fairly easy to rewrite, and they often need not say exactly the same thing; at the sentence level it begins to become much harder to say the same thing many different ways.   As a result the writing is more difficult and tends to read much more blandly.
  • If you generate a large number of pages, remember that statistically, the more you make, the more situations will arise where two pages, although not perfect duplicates, are very close near-duplicates.

Sub-sentence-level

Definition – Constructing individual sentences from snippets.  For example:

[Are you interested in|Were you thinking about]
[purchasing a pink elephant using a credit card|making a red bracelet with your own hands]
[for that special someone on their birthday?|for your dear old Auntie for Christmas?].

The various combinations in this case would add up to 8 versions of this sentence.

Benefits

  • Can create even more versions of content than sentence level.
  • Theoretically more resistant to being classified by a search engine as “duplicate” content.

Shortcomings

  • Value-add for end-users starts to drop rapidly, it’s hard to mix subsentences and still actually say something interesting and useful.
  • No consensus in industry on what threshold of similarity is acceptable.
  • Easy to do the math on how many combinations you are making, but difficult to take into account how many will be what percent similar and so on.

Word-level

Definition – Going all the way down to having different versions of every word.  Also referred to as “article spinning”; there are a number of programs commercially available that apply synonyms to each word to generate multiple versions of an article, then allow you to edit to make the versions more readable.

Benefits

  • Can create amazing amounts of content from the tiniest bit of actual material.

Shortcomings

  • Extremely difficult to actually say something useful to users; typically results in high bounce rate and lower SERP rankings.
  • Requires extremely high level of expertise in the English language and extreme creativity to even attempt this.
  • Simply substituting synonyms does not work in many cases and requires extensive investment in editing after the content is generated.

This table summarizes the various difficulty levels and other attributes as I see them:

Type Difficulty Value-add to end-user Scalability Likelihood of Detection
Page-level 1 10 2 1
Paragraph-level 2 8 16 8
Sentence-level 4 6 256 4
Sub-sentence-level 16 4 4096 2
Word-level 32 2 65536 1

Conclusion

The best advice is, keep it simple – stick to page-level, which is also known as “article rewriting”.   If you are hell-bent on generating a lot of content with formulas, think about subsentence-level, or even consider mixing it up and doing a hybrid model – swap some paragraphs, sentences, and some subsentence-level content around.  If your goal is to keep your content fresh by refreshing titles, meta-descriptions, and so on using formulas, more power to you…but please don’t use formulas to generate tons of useless garbage – remember, as Peter Parker’s uncle said, with great power comes great responsibility!

5 Comments

  1. Marcus M says:

    Hi. For lot’s of people writing articles can be a stressful job, so I discover this great software to create original content in just few minutes and to spin my articles. This is pretty cool. You can check here:

    http://bit.ly/hIWrLG

  2. Mark F. says:

    Marcus M… could you be a spammer?

  3. Ted Ives says:

    Hmm…I’m going to step out there and say, just maybe!

    Akismet caught Marcus’s comment but I went ahead and approved it anyway, because it’s actually related to the topic, and also for his pure unadulterated moxie in spamming a fellow SEO person’s blog so blatantly!

    Come get your PageRank, all you article spinner software people! 😉

Leave a Reply

Pingbacks & Trackbacks

  1. Tweets that mention The Different Types of Formula-Based Content | Coconut Headphones -- Topsy.com - Pingback on 2011/01/24
  2. Tweets that mention The Different Types of Formula-Based Content | Coconut Headphones -- Topsy.com - Pingback on 2011/01/24