May 17, 2007

LSI is total nonsense! Hooray!

Hi folks,

This is in response to a well written anti-LSI article written by Michael Duz:

LSI is total nonsense! Hooray!

I would like to thank Michael Duz on his artful dismantling of Latent Semantic Indexing and all of its related theories.

“So what’s the bottom line for the LSI myth? If you hear or read an SEO talking about the importance of LSI in search engine optimization then you can be sure they haven’t a clue what they are talking about”.

I agree with Michael on some points. However, I believe his article gets lost in the forest with the trees which often happens when attempting to disprove something for the sake of disproving it.  

Ok, so the “LSI” buzz that everyone has been talking about is NOT technical LSI or LSA. (We have discussed this already on many occasions).  

But what he writes does not take away the fact that the Google DOES have five patents pending on co-occurrence and phrase based retrieval which is an even STRONGER way to attain the same results that LSI was supposed to- with less processing power. (Which is the whole problem with LSI as we have also discussed before).     

His comments on the use of CIRCA and Applied Semantics by Google needs a "citation" similar to the one required in a wikipedia entry. I would love to see his sources on this. Does he really think that CIRCA is the only thing Google pulled out of the Applied Semantics vault when they slapped down the cash to buy the company? That would be interesting to know. I think he is only seeing 2 fingers of the hand. We will discuss this more when we get to our experiments on PPC versus Natural indexing below.

What is CIRCA (now probably advanced to the point of including co-occurrence technology)?

From Google:

Applied Semantics' products are based on its patented CIRCA technology, which understands, organizes, and extracts knowledge from websites and information repositories in a way that mimics human thought and enables more effective information retrieval. A key application of the CIRCA technology is Applied Semantics' AdSense product that enables web publishers to understand the key themes on web pages to deliver highly relevant and targeted advertisements.”

The Google Tilde function:

The Google Tilde key is NOT a function of technical LSI nor did WE claim that it was. (We have never claimed this at Theme Zoom, but we have used the tilde key for demonstration purposes knowing that the co-occurrence matrix was in development.

The Tilde key is a simple man's METAPHOR for broadening markets based on link citation (which is how Google derives that synonym data). Michael’s own words actually HELP the cause for co-occurrence:

“As Marissa Mayer, Vice President, Search Products at Google put it when the operator was launched “We think this is a powerful and useful way to broaden results. It’s the opposite of disambiguation, which narrows a search”. Anyone who has used it will see immediately that it uses a small and very poor set of real synonyms (sorry Marissa!).”

Thank you!

He doesn’t seem to take into account what the tilde results actually mean in relationship to MARKETING, although he seems to know what they do NOT mean in relationship to LSI. (Sorry Michael).

This blog entry describes the polysemous terms that are returned with the Google tilde key. The reason these polysemous terms appear is because Google is looking at inbound LINKS to determine market driven “synonyms” not the visible text on the page.

You can put any old acronym, like “BSA” into the tilde function of Google and get back all kinds of different organizations from Boy Scouts of America to Boston Society of Architects. The Boy Scouts come up first because they have more inbound links than anyone else. Period.

So his point about the tilde key not being directly related to LSI has always been clear. However, it is a very fast way to find out what MARKETS drive these acronyms, see?

These tilde keywords are not created by taking a “snap shot” of visible text on a page or in a massive volume of documents like Michael understands is technical LSI. But these keywords may very WELL indicate general market segments that relate to each other semantically based on what GOOGLE decides a (market driven) synonym is. These are not true synonyms, but Google synonyms. Let us be clear. Also see my article on SIPS or statistically improbable phrases.

He calls them pseudo-synonyms. If only he knew! ; - ) 

There are many ways to determine market segments and “Google synonyms” besides the tilde function, and we only use this function as a demo to give the layman an easy way to wrap their head around casting a wide synonymic net inside a vertical market sector.

For example, if I wanted to own all terms for the acronym “BSA” in the whole of Google, I could build a website that covered ALL of the BSA acronyms and receive inbound links for each silo “Boy Scouts of America”, “Boston Society of Architects” and “Business Software Alliance”. I could silo each subject and create my visible text and articles to cover these topics.

With the right sort of silo structure and site linking structure- both offsite and onsite, I can effectively own everything that contains that acronym as well as its MEANING. Why anyone would want to do this is their own problem. The fact that you CAN do this pretty effectively with a fairly simple plan . . . well . . . that is interesting if you are a MARKETER. I don’t care what you call it. I call it money. It is up to YOU to decide what “BSA” acronym term is polysemous to the vertical market products and services you are dominating.       

Yes, LSI and LSA are very tricky definitions and I played my part in popularizing them. I acknowledge that Michael has spent a lot of research time on the topic of technical LSI. (I can relate). I agree that LSI is not really the best term to use. We have addressed this issue many times already within the Theme Zoom and University 2020 forums.  

Be careful how you use the term LSI!

Even if the term LSI is replaced with “co-occurrence matrix”, the co-occurrence citation matrix IS alive and well. This is based on working with these concepts every day in a live-fire environment.

I addressed the issue of “LSI versus co-occurrence” on the call with Howie Schwartz.

http://www.namespy.com/traffictactics/traffictactics-heflin-wright.mp3  

Leslie Rhoades says that even the co-occurrence matrix (with 5 patents pending by Anna Patterson) is rubbish because Google invented it in order to not use it. (Like the oil industry squashing the electric car). 

Michael argues the same thing with his comment:

“The third fallacious argument involves a belief that a raft of recent Google patents ‘proves’ that Google is using LSI. The patents in question are; Multiple index based information retrieval system, Phrase-based searching in an information retrieval system, Phrase-based indexing in an information retrieval system, Phrase-based generation of document descriptions, Phrase identification in an information retrieval system and Detecting spam documents in a phrase based information retrieval system. These patents contain some very interesting concepts and are required reading for the professional SEO. They are however only filed patents and this does not mean that all or any of the ideas in them have been implemented.”

This argument is not true. I would shut down Theme Zoom and go home! (What a relief that may be). ; - )

Aside from the fact that many of the link citation algorithms already used by Google are unrefined versions of concepts contained within these new patents, I question the value of taking a “they-might-not-use-it” approach in my website planning. There is most definitely a theme effect that works well when you implement it into a website blue print based on co-occurrence keywords and top-level themes.

When we build websites with co-occurrence in mind, it allows us to capture a wider semantic net of keyword searches every time. My experience, in the trenches, is that they ARE using this technology, or a hybrid of it. And it seems to be getting more refined with every update. This means that that more Google resources will be allocated to visible text phrases rather than just hyperlink themes and citations within a vertical market sectors.  

People who have built website blue prints based on well-researched themes and co-occurrences (with a few statistically improbable phrases thrown in) are going to be very happy that they did so.   

There are several ways to implement “pieces” or “fragments” of code based on many of the programming elements presented in these new patents by Anna Patterson. “They-might-not-use-it” arguments are based on the principles of “following the money” as a recent anti-LSI video explained. In this hypothesis, it is said that Google will not implement anything they don’t have to, because they are big, huge and rich. So why change? (I’m rich and you are stupid: The rich Jerk). ; - )  

Yet, Google has demonstrated many behaviors to the contrary. Search results are getting more refined. They are laying out categories that seem like semantic silo structures at the top of vertical market keywords:

This seems to be in preparation for “the left hand knowing what the right hand is doing”.

Many of our recent experiments have shown that Google adwords and Google natural search are shaking hands. Here is a note I received from one of my Pay Per Click mentors about the Google PPC and Natural SEO integration:

Russell,

I have noticed a lot of my OLD campaigns that got slapped last year are now doing awesome after I turned them back on.  Why?  Because now the site has a page rank and it has backlinks and it is naturally ranked for some terms. Now this site is doing awesome, my bid price dropped, and the campaign is live again.

Someone on our master mind call was saying that they experienced the same exact thing as I mentioned above. It is because of this that we are testing sites that have been indexed FIRST before we run a PPC campaign. I just wanted to let you know what is definitely happening NOW!

Amish

This is relevant because so many Google patents are pending right now, including Anna’s “Co-occurrence Matrix” (which they are not going to use of course) along with the recent purchase of Double Click for over a billion dollars (which I PRAY they re not going to use)and all roads point to where the money REALLY is.

The left hand (natural search) has learned what the right hand (paid search) is doing!

And the whole integration process started when Google used the technology of Applied Semantics to start refining the Google Adsense themes.

Follow the money is EXACTLY right!

 

CO-N + CO-P = HRPA

Co-occurrence on the natural engines plus co-occurrence on the paid engines results in high natural rankings for highest paying and most relevant paid advertisers!

Think I am just blowing smoke?

The co-occurrence algorithms in natural search combined with the co-occurrence algorithms in Pay Per Click will result in tons of ultra-relevant content and high ranking authority sites on the natural search engines for Google’s highest paying Pay Per Click clients!

Google wins AGAIN on both sides!

They get to keep the highest paying advertisers happy, and they get super relevant results on the natural engines without any spam AT THE SAME TIME!

Beginning to get the picture?  

So you can argue that there is no such thing as technical LSI if you want.  

I will be too busy building websites and making money to argue back.

This is why my friend Amish and I are building a system that links Pay Per click to co-occurrence and natural search engine results. (See Video)

 

Silo Architecture:

Furthermore, LSI or co-occurrence is only ONE of the factors used to silo a website in order to dominate all keywords within a vertical market or within your area of subject matter expertise. Of equally important consideration is the vertical market research on a given term or theme. This includes cost and traffic of a keyword as it relates to overall market value. Also worthy of consideration are low cost statistically improbable phrases used to attract a warm prospect who is further along in the buying cycle.

 

-        Russell Wright and the Theme Zoom Staff

www.themezoom.com

Permalink • Print • Comment

Trackback uri

http://www.theme-zoom.com/74/lsi-is-total-nonsense-hooray/trackback/

Track this entry

RSS BlogPulse

RSS Technorati Cosmos

Related Entries

Leave a comment

You must be logged in to post a comment.

Search Exchange Search Engine Optimization Web Portal Add URL Google PR
Made with WordPress and a search engine optimized WordPress theme • TZ-Blog skin by ThemeZoom