Phrase Based Indexing and Retrieval Spam Detection

Via the Mad Hat, here’s the interesting part from a PaIR system article:

The process takes place both at indexing and retrieval. In essence the document gets its spam score at indexation and then upon retrieval, should that page be included in the results, weighting is then removed and the page is devalued during the ranking process for previously calculated Spam threshold scoring/weighting.

According to the folks that drafted it, a normal related, topical phrase occurrence (or related phrases) is in the order of 8-20 whereas the typical Spam document would contain between 100-1000 related phrases. So by looking for statistical deviations in related phrase occurrences the system can flag an item as Spam. Once again it is mostly for the high end, but a low deviation count can also be used as a flag for a low occurrences (which could be compared to the link profile for link spam)

Two things to digest there.

1. The indexing method applies a spam score both on indexing and retrieval and
2. Standard Deviation on both the high end and low end could count as a spam flag.

Of course, the only reason spammy docs sometimes have up to 100 times the related phrase density of a non spammy page is because this behavior continues to be rewarded in the SERPs. Even if the spam flag is raised and the site eventually banned, classic keyword and relate phrase stuffing continues to rank in the SERPs.

2007 Key to Success: You are the CEO

Here’s my 2007 prediction:

You will make more money in 2007 on the Net than you have ever made in the past.


By changing your perspective. This year, you will see yourself and act like the CEO of your own Virtual Real Estate Company.

Too often, you tackle problems as a web designer, an SEO, a programmer, a geek, a blogger, a content creator, a designer or a hacker. This is where you go wrong.

If you want to make the big bucks in 2007, you will have to see yourself and become CEO / President of your Internet Company. You are the Boss, the manager, the deal maker, and the head hancho.

As the President, your time is extremely valuable. As such, you need to automate every task that can be automated. This particular task will vary from company to company, but repetitive operations must be automated for you to free up more time to run your company. Throughout the day, you will be asking yourself “How can this task be automated?”

Every task that can be performed cheaply by someone else, should be. How much time do you waste per day on things that could be performed by someone for $500 per month? Stop and think about it: If you are performing a task that can be done by someone that normally makes $20 per day, how much money can you expect to make doing that same task? Create jobs, get employees, get contractors and free up your time for doing what needs to be done: running your virtual real estate company. You need low cost lackeys to do your low brainwork tasks. With every task, you will ask yourself “Can I outsource this task to someone else for less money than my time is worth?”

Unless you are heavy into PPC, you should not be checking your stats more than once per week. It’s a waste of your time. Let’s face it, if you got 46,000 uniques on one of your sites one day, 45,500 the next day and 47,000 the next, that doesn’t really matter. As the CEO, you don’t have time to waste looking at stats that simply don’t matter. Unless you are going to change your basic strategy, you are throwing away hours per week that could be spent on outsourcing, automation, and advertising.

This year you will focus your time on things that make you money. Throughout the day, you will be asking yourself “Is the task I’m doing right now going to make me more money?” If it’s not, you will think about how to eliminate that task from your routine to increase your productivity.

Finally, this year you will make a greater effort to surround yourself with winners. You will develop a network of like-minded individuals around you that know how to make money on the Web. You will have less time for people who waste your time and will make a greater effort to network with people who “get it.” Unlike in years past, this year you will actually make the effort to stay in contact with, do business with, and associate yourself with winners.

That’s how I know that 2007 will be your most profitable year ever.


The holidays are coming up along the end of the fiscal tax year. That’s 2 great reasons to give money to charity rolled into one.

Here’s what I propose:

1. We need to pick a charity. Please suggest something good in the comment section. It should be some credible and reasonably well known charity that helps poor people. I really don’t know who should be the beneficary, but it should be some people that are poor: Really poor, like even more impoverished than than white hats 😉

2. To make sure the funds actually get to the charity, (and don’t get pocketed by, say, me) I would ask that an embassador from Google (does nofollow even work?) set up an adsense account and send the funds directly to the charity we pick. Then just let us know what the adsense code is.

3. I invite all Bloggers and SEOs to put this adsense code on their blogs or on few of thier thier sites for anywhere from a day to week between December 24th and 31st.

4. No click fraud.

Let’s face it, most of us are very fortunate to be in the line of work we are in. If a few lines of code on our blogs during the holidays can help out, we should do it. What do you say?

Google Funding Terrorists

Do you recall when the “funding terrorist” campaign was targeting Marijuana, online gambling and Porn? Almost everyone was happy to stand by and let that bullshit be spread around without questioning it – even many of those who understood that this was nothing but political scare tactics.

Who cares what the ignorant masses believe? It’s not your business – right? No need to stand up for intellectual honesty. That whole “slippery slope” argument has no merit and those same tactics couldn’t be used against a real company – Right?

Well it now appears that those same sheeple are now looking at a new target and for the “Google Funding Terrorists” connection.

So for those of you that haven’t figured it out, the whole “funding terrorist” thing is total bullshit. If you buy into it, you’re a fucktard. When I wrote the Poker Bann Would fund Terrorists post, I did so because Frist is just the kind of asshole that would play the “terrorists funding” card for anything; just like when Gonzales said that illicit businesses (copyright infringement) are used, “quite frankly, to fund terrorism.”

You know how much it cost to take down the World Trade Center, put a hole in the Pentagon, and take down 4 Commercial Airlines? About $50,000. You know how much Bin Laden had in his personal fortune at the time? About $600,000,000. To put this in perspective, with the personal funding of just that one guy, they have enough funding for 12,000 similar attacks.

The Terrorists don’t Need Funding – They’ve already got it.

Is Google adsense being used to fund terrorists? Yes. All someone has to do is set up a damn adsense account like anyone else and send traffic to it. The argument is just as valid as saying that buying Marijuana or playing poker funds terrorism; it’s not valid at all.

So the next time someone says that pot, poker, Google, or Porn is being used to fund terrorism, knee them in the groin, sweep their legs out and start kicking them in the head until they stop acting like a such fucktard.

Do you Brandverb?

For a long time, we have not had a term for when a company’s brand becomes synonymous with an action. Today we do: brandverb.

The brandverb you will be most familiar with is Google. When someone says “Google it” or “I got the answer by Googling it” they are using Google as a brandverb. As you can see, the brand “Google” is being used to replace verb “search” becoming a brandverb.

Yesterday, I used Jetblue as a brandverb. While contemplating going out to Los Angeles for New Years, I said “Maybe I’ll Jetblue out there.” Once again, the brand replaced a verb (fly) and became a brandverb. Here is how the term brandverb was born:

[23:49] QuadsZilla: my flight to RIO is on the 4th from JFK
[23:50] QuadsZilla: maybe i’ll jetblue out there
[23:50] QuadsZilla: for like 3 days or something
[23:50] QuadsZilla: if that’s cool
[23:50] Jeff Random: I just made up a word based on your word
[23:50] Jeff Random: brandverb
[23:50] Jeff Random: example jetblue out there
[23:50] Jeff Random: aka to google
[23:50] QuadsZilla: nice
[23:50] QuadsZilla: i like it
[23:50] QuadsZilla: i’ll blog it

If you are going to photocopy something and say “I’ll Xerox it” you have used a brandverb. If you want your maid to vacuum your bachelor pad and ask her to “Hoover the place” you have used a brandverb. When you advise your grandmother to get rid of some junk and say “Ebay that crap, some idiot will buy it” – you have brandverbed ebay.

Want an important signed document delivered to finalize a deal? Fedex it over – you sexy brandverber!

By understanding the power of brandverbing you will become a better marketer. If you are thinking in terms of “How can I brandverb my company?”, your marketing will have a longer term impact. Especially if you brand becomes engrained in culture.

Do you brandverb?

Google and “Spammy Requests”

This morning I did a search on Google for: photoshop class

The first results included tons of non edu sites that did not have .edu in the URL. Strange. So I figured maybe I needed to troubleshoot a little and after changing the language preferences from portuguese to english and the number of results to 100, I got this crap:

Spammy Request from Google

We’re sorry…

… but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can’t process your request right now.

We’ll restore your access as quickly as possible, so try again soon. In the meantime, if you suspect that your computer or network has been infected, you might want to run a virus checker or spyware remover to make sure that your systems are free of viruses and other spurious software.

We apologize for the inconvenience, and hope we’ll see you again on Google.

The thing is, this wasn’t an automated query! It was just me typing on my keyboard and clicking with my mouse. I know, I know – Only an evil black hat spammer could want photoshop classes ONLY from edu sites. Clearly there can be no legitimate use for this query.

Better sound the alarm and tell me that I probably have a virus or spyware!

We know in the past that Google decided to cripple their backlink checker to reduce “spammy requests.” Now they are going to cripple thier inurl command?

I have this feeling that some day I’ll be sitting with my (still-to-be-concieved) son and he’ll turn to me and say:

“Dad, Tell me about back in the day when Google didn’t suck.”

What’s your Opinion on the “Open Source Making Money List?”

There have been quite a few great entries in the “Best SEO artilce of 2006”” List.

It’s nice to be exposed to great articles from bloggers you might not have otherwise have known about: a bunch of them are new additions to my RSS reader.

What is your opinion on the article that have been submitted? and what do you think of the idea of an Open Source “List” Article?

Please post your opinions and comments in the Digg Thread and submit your best article from 2006 in the earlier thread.

The Best SEO Articles of 2006

Do you blog about SEO or Making Money Online?

If so, here is your chance to get more links and more readers. Submit the best article you wrote this year in the comment section in the Format of:

[Hyperlinked title] [0-200 character description].

Comments not in this format imply that you have poor reading comprehension skills and/or cannot follow instructions.

Remember, this is your BEST post for 2006 about Search Engine Optimization or Making Money Online.

People will read what you submit here and judge your worth as a blogger. If it’s a great article, you will get links and more readers.

Limit one submission per person.

Think of it as an Open Source “Best of” List that you’re all invited help create.

Disruption Causes Opportunity in the Marketplace

It’s not just adult. This past weekend represented a huge shift in Google. This Disruption has created opportunity in the marketplace.

Learn what has changed, figure out what is happening and adapt. These disruptions may seem bad at first, but if you’re ahead of the curve you should be able to ride these changes all the way to the bank.

I haven’t seen the outside world in days . . . this is like an ever shifting video game on a global scale.

I’m sure you’ve heard the JFK quote

“There is a Chinese curse which says, “May he live in interesting times.” Like it or not, we live in interesting times…”

Funny thing is, that was just more hoax marketing. The Chinese never said that.

Change is good.

Free Porn on a Porn Free Google

If you Google for porn today, you may not find it.

That’s because this past weekend saw a huge shake up in the online porn world. A change in the Google algorithm caused almost all TGPs to fall out of the top results for single-word porn search phrases.

If you Google for any of the following words (as of this writing), you won’t find much free porn in the top ten:

Last Monday:
Pussy Last Monday

The last 10 days:
Last 10 Days

Yesterday (after the weekend update):

The lesson for everyone here is this: Don’t focus on just a few competitive keywords for your long term sites: if you have plenty of Long tail combinations, Google Twitches that look like they could kill you may actually increase your traffic.

Update: The “Porn Free” filters in Google seem to have reverted partially to where they were.