Last Thursday, the boys at the ‘plex announced that they would be releasing 10 gazillion keywords for statistical analysis and other research. That perked my ears up right away. We love large data sets because they are the cornerstone of building massive spam sites targeted niche aggregators.
The fine print is that you have to jump through some hoops to get the data - details are to be released, but you will likely have to be a member of the L.B.C.

“So tell me wuts up wit dis LBC thang?”
Wait . . . make that the LDC, the Linguistic Data Consortium. Their annual membership is $20k and they sometimes make you pay more for certain data sets.
The almost invisible print is pointed out by greywolf and confirmed by Matt Cutts in this threadwatch discussion.
When people sell a mailing list it’s extremly common for sellers to seed the list with some names that only exist for the purpose of catching people who are misusing it. I would have to assume the boys and girls at the plex would do the same. - Greywolf
graywolf, you have a devious, devious mind. How many other people would consider seeding the terms with some nonsense phrases? I ask you–how many other people would come up with an idea like that?
Well, I guess I can think of a couple people.. - Matt Cutts
graywolf, yes you should take it as a compliment. Not to worry, I’m familiar with the practice. My favorite is Lye Close, the fake street in London: http://wiki.openstreetmap.org/index.php/Copyright_Easter_Eggs
billhartzer, sshhh. I was just watching boogybonbon find out about “google monitor query or googletestad” today. Don’t ruin the fun. - Matt Cutts
referring to boogybonbon’s post on keyword research.

That’s right, it’s a trap.
We know about poisoning seasoning keyword lists - in fact sometimes we’ll do it ourselves. However, this exchange confirms what a few of us have been thinking all along - that the search engines are on to this tactic and use it as well.
Are you using wordcatcher, overture, the google keyword suggestor or any data directly from the search engines? It seems there’s a good chance that it could be a trap. If you’re using poisoned data, that could certainly explain why your sites are only lasting 6-9 weeks in the SERPs.
Understanding this kinda puts a damper on the 400+ meg file (update:mirror with data)that contains all the AOL searches of 500k users for the last 3 months.
“Jacta alea est!” - Julius Caesar
It’s a war. Develop your own supply lines so you don’t have to get food from the enemy.
Tagged: Matt Cutts, Tools |