Archive for the ‘Search Engine Spam’ Category

Want Google’s Trust? Here’s How to Buy It.

Greg (AKA WebGuerrilla) notes something that we have noticed in our own experiments:

What is in question is whether or not the age of the domain determines the degree to which the filtering is applied. IMO, it clearly does. If you move any new site that won’t rank to an older address, and then 301 all the links established to the original site, it will show up in Google in less than 10 days.

What does this mean for today’s search engine spammer? It means the days of buying new domains and ranking well in Google quickly are already over. IMHO, MSN and Yahoo will follow suit in within the next 18 months.

So I guess it’s time to stop search engine spamming and throw in the towel, right?

Not at all.

If Google trusts older domains with aged links, then that’s where you have to have to host your “Spam Site 2.0.” If you don’t own any older websites, you have to go out and buy them.

Most people don’t yet understand the value of their 5 year old website. If someone is making making $150 a month with a 6 year old, 300 page site – think what you could do with that same domain and a 300,000 page site. Just be careful not to add that many pages over too short of a period of time, it will raise a red flag.

Sidenote: As I write this, I’m thinking “why the hell am I telling you this?” Maybe it’s altitude sickness! I took a cable car ride today in Merida 12.5 km, (the longest cable car in the world) reaching an altitude of 4765 m. (highest cable car in the world). This altitude is higher than any point in Europe and the USA excluding Alaska.

So what should you look at when buying an aged domain Google’s Trust?

First, you can use the alexa toolbar to quickly the “online since” date.

Then you use some of the free back-link and keyword analysis tools to check (among other things) how many IP addresses and who exactly is linking to the domain.

You can then use the wayback machine to see what the website you are buying looked like years ago. If you are really ambitious, you can guesstimate the age of some of the backlines by checking the archives of some of the seemingly more important inbound links.

Don’t do too much research on any one domain until you’re sure you have a serious seller.

I don’t think there’s a term yet for buying an agged domain and throwing up a spam site in the background. Want to coin the term? Here’s your chance. (Where is Mark Cuban when you need him?)

Predictive SEO

Graywolf published a short essay on an interesting SEO practice he names Predictive SEO. Basically, try to think about what the future on a topic which suits your activity could be made of, and prepare search engines for it.

For instance, if baseball is your game and you want to be ready for web traffic in 2007:

1. build pages with titles such as “New York Yankees 2007 World Series Champions”, “New York Mets 2007 World Series Champion” and so on with every team name.

2. put them on a mini site map page on a well established domain to make sure they get spidered and eventually give them some relevant content.

As the season would go by and you would foresee potential season winnners, you would already have spidered pages ready for real content and real traffic.

Predictive SEO can apply to any subject, as long as you have a manageable number of events likely to happen. Graywolf tips a few suggestions, and among them Hurricane Names, which I spent a few minutes on as a live example.

-Ozh

If you would like to publish a story on SEO black hat and grab a free link to a site like Ozh just did, then register and send in an article.

Brand New “Newsmaster Site” Gets Slashdotted

I am so Jealous.

This story on Brainblog got slashdotted.

Create a “Newsmaster site” on Nov 18th and get slashdotted on the 23rd. That does not suck!

This newsmaster site was created with grouper which I bought about a month ago but haven’t got around to testing yet.

Wikimedia Dump Service

The following dumps are avalible:

special | wikibooks | wikinews | wikipedia | wikiquote | wiktionary | images

At the root of each Wikimedia project dump, you will find a listing of languages and the md5 sum for all files available in the tree. Each language directory contains these XML dumps:

pages_current.xml.gz – Link to the lastest current pages dump
all_titles_in_ns0.gz – All titles in the main namespace
pages_full.xml.gz – Link to the lastest full pages dump

Now what could a search engine spammer possiably do with these dumps and the markov chain?

Well, with the language feed and markov, you can now spam the search engines in tongues you don’t even speak. With markov, keep in mind that you need about 300+% imput to output ratio for optimal articles.

Link Spam Detection Based on Mass Estimation

Aaron Wall takes a scrupulous look at trustrank and link spam detection by analyzing the recent Stanford (Google alumni connections?) paper: Link Spam Detection Based on Mass Estimation.

I could rewrite his analysis, but why bother when he nailed it?

Borrowing Link Bait

Today Boing Boing Illustrated one can generate thousands of uniques by submitting “borrowed” link bait.

topbit
topbit: Just spotted this on BoingBoing, and while I love this piece, I think it only fair to link to the original writer and his page:
http://www.ernestcline.com/spokenword/npa.htm

Clearly, if you give a link to the original source, you’re not going to get a link from Boing Boing. Obviously, posting someone else’s work with out adding ANY comment or value is spam. Kudos Ghanimachan on correctly “borrowing” some link bait!

So my fellow SEO black hats, what’s the coolest or best link bait you ever stole “borrowed”?

Create Inbound Links from Authority Sites with Exploit

This threadwatch discussion talks about a more advanced way of making authority sites link to you than simply trying to get the Rojo or Google results for your site indexed.

1. A series of pages are created on a domain say www.mylittlewebsite.com and the links point to a search request on one of these sites . .
2. Notice the formatting using HEX code when surrounded by a standard HREF tag this translates the link properly when the request is made to the authority websites POST for search – the result is properly translated into basic html. This is a clever coding exploit, this format ensures the request is properly formatted in basic HTML.
3. Obviously the request is a negative search result on the authority website, however particularly site searches will cache all results of local searches, successful or otherwise.
4. If these search results are spiderable content, then a robot such as Googlebot will view the cache results and see inbound links from a high profile authority site point to the domain in question.

Sometimes hex is not required. You just enter the tags the same as if you were coding html but into the search field of a site with the vulnerability. Other times, a hex converter can come in handy.

I have seen instances that include javascrips and other elements. The red cross search results (long URL) page is a PR 0, but I’ve found up to a PR 6 (someone on TW said they had a 7). I picked the Red Cross as an example to hopefully encourage donations.

All I had to do was dig around for a bit to come up with a healthly list. If any registered seoblackhat readers would like a few more examples, just drop a comment or e-mail me.

Update: Sites with HTML injection Vernerabilities are now available only to members of the SEO Black Hat Forum.

The Long Tail of a Black Hat

Chris Anderson’s Long Tail theory:

our culture and economy is increasingly shifting away from a focus on a relatively small number of “hits” (mainstream products and markets) at the head of the demand curve and toward a huge number of niches in the tail.

Kevin Marks looks further down the tail and notes:

A true long tail business is one that copes with the ultimate niches – where there are just one, or even zero customers. You need to be sure that your submission model can cope with these limiting cases and not choke, especially as you do not know a priori which ones are going to garner customers. So, what businesses fit this model?

Answer: Black Hat Search Engine Optimization (AKA Search Engine Spamming).

Many SEO Black Hats create millions of pages over hundreds of domains using some form of automated website creation software. Each page is optimized for an obscure keyword phrase – perhaps one that is only searched for 50 times per month.

Just one of a Search Engine Spammer’s spam sites could have 30,000 pages. This site might receives just one unique visitor on less than 10% of the pages per day. Of those 3,000 visitors if only 2% (60) click on a 10 cent Adsense word, this site would generate $6.00 per day or $180 per month.

So does the smart search engine spammer stop at 1 or 5 site? Of course not. A true SEO Black Hat has hundreds (and in many cases thousands) of these sites operating at once.

As a black hat automates more steps in the site creation process, the time investment decreases dramatically. Recently, I spoke with a Search Engine Spammer who told me that he has automated EVERY step in the process from domain registration to Keyword selection, to new CSS design, to splog indexing. He claimed he can create and index a 50,000+ page website with less than a minute’s worth of incremental keystrokes.

As the cost of page creation, hosting and advertising (production and distribution) approaches zero, the number of customers required for a profitable search engine spamming business drops to much less than one per page (or micro-niche).

Search engine spamming is a business that “copes with the ultimate niches – where there are just one, or even zero customers.” It will therefore be interesting to see if Anderson devotes a few pages to Search Engine Spammers in his upcoming Long Tail Book.

Damn The Spies . . . Full Speed Ahead!

Last weekend, I floated the idea of a Black Hat SEO wiki and received several interesting responses.

First of all, it was strange having some of you ask me to charge for the wiki. Users chanting “Please take our money” is something most webmasters would kill for. So, thank you. But I don’t want your money – not yet anyway.

Some of you are concerned that discussing these ideas will “tip off the search engines” to what we are doing.

Not to burst your bubble, but

1) There is no Easter Bunny.

2) The Search Engines already know.

They hire the best and the brightest: Every PHD out there, Ivy leagers, former NSA Agents, and even Al Gore that guy who invented the Internet.

And I happen to know for a fact that some of the best search engine spammers currently work for Google. So trust me, the search engines are not coming here to learn anything new.

Fortunately, we we don’t need to outsmart the Google engineers; we just need to out maneuver the best algorithm the Google Engineers can come up with.

I do agree that it should be by invitation only, with the ip addresses logged for tracking purposes to keep potential spies out. -joker

Somehow I think that the 50 or so ex-NSA spooks working at Google just might be able to figure out a way to infiltrate our community. I’m assuming there are spies, but that’s just part of the game.

Others of you are concerned that making blackhat seo methods openly available will dilute their effectiveness. Yes, it will dilute the effectiveness of sloppy spam. But, considering that page jacking exploits were used prior to 1999, and widely known by 2003 before being actually used on Google Adsense this year, I don’t think we have too much to worry about.

Plus, an informed, collaborative black hat community will be able to foresee, adapt to, and possibly influence search engine algorithm evolutions.

For that, we need a common base of knowledge. We need to agree on terminology and concepts.

Enter the SEO Black Hat Wiki.

First things first. We need to decide which wiki engine to use. MediaWiki is an obvious choice as it’s behind the WikiPedia and well supported. But, Tiki wiki seems simpler and a bit hotter. Any expert advice on which engine to use?

Let me know if you would like to help steer the wiki, be editor/moderator, or simply contribute.

Please leave me comments /trackbacks on this. The level of interest, or lack of it, will determine if the black hat wiki ever gets off the ground.

Avoid Common Splogging Mistakes

Jean Véronis excellent post, Google, Blogger and splogs explores some of the ways splogs can be algorithmically detected.

He shows how many splogs are so blatant that they are easily spotted:

[poor splogs] repeats the same words over and over again, so its vocabulary is much poorer than you would expect to find on a normal blog.

and

the distribution of outgoing links needs to be taken into account. If most of them point to the same site, something’s probably up. The number of incoming links is also an indicator: if there are a whole lot of them, and they come from very diverse sites, it is undoubtedly not a blog.

However, his conclusion:

It seems to me to be difficult to draw the line between sites which are worthless, useless or commercial (but nonetheless legitimate) on the one hand, and splogs on the other.

echoes the conclusion that seo black hats need to make search engine spam appear organic. Don’t fall into many of the common pitfalls: vary your outbound links, link to authorities, use intelligent computer generated contend.

Your splogs should not be easily spotted by a computer because of glaring statistically anomalies.