Reteep Asked:
“How much of a problem is the duplicate content stuff for the bottom layer [of] autogenerated sites? Does it matter?”
Duplicate Content is one of the great boogie men of SEO. So many people are scared of it; so many people are worried that if they have duplicate content their site will face a ranking penalty. Is it true? Does Google Penalize you for having Duplicate Content? If so how much? How can duplicate content hurt you? How can it help you? Well sit back, because although I’m sure it’s been answered before, I’ve gonna give you the straight dope on duplicate content and Google.
First Off; what do we mean by “Duplicate Content”? Duplicate content means that the text of one web page matches another page’s. The text matching does not need to be 100% to be considered duplicate content. Matching can be less than 50% and still be considered duplicate especially if various chunks of content can be found on other pages.
For Example: Any site that runs AP stories will have heaps of duplicate content. Google doesn’t penalize the news site for running the stories, but unlike last year, now all (or almost all) of the AP stories are hosted on Google. Examples:
http://www.google.com/hostednews/ap/article/ALeqM5g8-DEMtAE9q4i4ySQ0eV_qZefmRQD99D0RV80
http://www.google.com/hostednews/ap/article/ALeqM5iwJhPuY4ndVAdfJgwbiS3uh7uIGgD99CJOBO0
Interestingly, When I google “AP Interview: Hayden denies Congress not informed” The top 10 results:

Include Yahoo at number 1 and Google’s own story in the Top 10. Long term, expect Google to put itself at number 1 for all these types of queries.
The benefit to running an AP story is that if you can rank for the query (like Yahoo did), you can get search traffic. Plus there’s a chance when those stories run on XYZ newspaper site (or your site) that the story will get picked up by a slashdot, or a Digg, or Fark or whatever and receive a few hundred links. The only negative the newspaper sites could get from running these stories is by diluting their internal link juice by linking internally to these stories.
Clearly, Google does not penalize trusted sites for having duplicate content. In almost all cases, your Penalty in Google is NOT because you have duplicate content. 99.9% of the time, the problem is somewhere else.
The way that duplicate content can hurt you would be if you have multiple copies of the same story on your website. You will not get a “penalty” from Google but you will dilute both your internal link juice and could potentially split any natural inbound links (and therefore ranking power) among all the copies of the page. This type of dilution could take an item that would have ranked and banish it to obscurity.
The same applies for spam sites. Unless your site is screaming “I AM A SPAM SITE”, the duplicate content penalty is not gonna hit. And since you would have gotten hit anyway by that human reviewer that just marked your site (and Network, if you weren’t careful) as spam, we can safely say that there is no official Duplicate Content Penalty at Google.
Fire away with more questions; don’t worry, I intend to tackle some of the other ones that were asked this weekend later this week.