17
Aug
2010
SES San Francisco, Live Blogging, Mike Grehan: Introduction to Information Retrieval on the Web
By: Ben Wills |

Looking forward to hearing this session!

---

KEY POINTS

  • Connectedness is the key driver for relevance, not text.
  • The technology and protocols of the web, as it exists right now, is insufficient to determine relevance now and in the future. This will significantly drive the evolution of search/information retrieval.

---

Information retrieval isn't about just gathering data...it's about interpreting the data for its relevance.

We look at Google as if they're a black box, trying to understand them, then they're looking back at us like we're the black box.

---

"The most visionary article I've ever read." As We May Think.

---

"Are we all clear that the Internet and the World Wide Web are two different things?"

---

Tim-Berner's Lee invented the hypertext idea on a napkin at lunch. It wasn't something he had been working on for a long time. Synthesized two ideas going on at the time. Amazing.

---

As someone who's built search engines, classification engines, ranking algorithms, etc, seeing Grehan's image about how search engines work makes the concept very accessible. (Anyone have an image?)

---

Can't buy his book now, but when you can.... search engine book

---

Going through basics of information architecture and tokens in the indexing process.

---

Grehan's got hemoglobin on the mind. ;)

---

Basic ranking score with a hyperlinked document set = Term frequency (x) Inverse of the Document Frequency....

---

Imagine a search engine reading your page as a blind person would read braille. Top to bottom, sensing each part of the document.

---

It's just too easy to confuse a search engine if they only assess relevance based on keywords. Grehan once owned the top 30 results on Alta Vista for a client.

---

"There is so much more you can understand when you look at connectivity."

---

Explaining the HITS algorithm and why it was so revolutionary for search result quality.

Analyzes links and connectedness of top X results based on keyword relevance, then re-sorts based on on connectedness.

---

Get links from Authorities. These links are the biggest influencers for your search rankings.

---

PageRank does *not* demonstrate industry-authority. Ignore it.

---

Bibliometrics.

---

If a links to b and c, and d links to b and c, and e links to b and c, you can begin to see that b and c are related, without b and c linking to each other...

---

Side thought: How do you overlay backlink profiles to find the patterns that trend toward authorities in a specified market?

---

...find (and get links) from the authorities in your networks and tangential networks.

---

Side thought: What defines an authority in a market? How is the market defined? How do keywords define the market?

---

On images in search results. (rough quote) "If you see an image or an image that leads to a video, the users understands what's on the other side and is, therefore, more likely to click on that."

---

Understanding User Intent Behind Search Queries:
Information, navigation, transactional.

Side Note: Use this framework in categorizing your market's SEO keywords as well as your own information architecture. Lay over your sales funnel.

---

Query chains....SEs record your search history and look for patterns...when you search, then refine and refine and refine, you're helping suggestions for future searches that start on the same path. Hence, recommendations that aren't based on your keywords, but, rather, patterns of query chains.

---

Relevance factors:
- text, linkage data, social media data, end user behavior/browsing patterns (as logged by Google Toolbar, etc)

---

User-generated content is being created at a rate of 5:1 vs. mediated content.

Makes http obsolete to keep up just with indexing content...

---

Relevance of results is becoming more useful with social search (ie: Q&A sites, social network questions, etc).

---

"So how many unique pages does the web really contain? We don't know; we don't have time to look at them all!"
- http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html

---

"Many years ago your grandparents sat in front of a brown wooden box listening to Franklin D. Roosevelt. Who knows, maybe last week you bought a new HD-ready TV. Guess what? Same airwaves, different protocol." - Mike Grehan

Post a New Comment