Search Engine Comparison, 2009-07-09: I’m Switching to Yahoo Search

http://canonical.org/~kragen/search-comparison-2009.html

Some guy from ask.com just made the totally implausible claim that their search results are “just as good if not better” than Google’s, and their search engine also had another advantage: they were willing to put paid advertising someplace Google wouldn’t (specifically, on searches about abortion).

So I thought I would do a comparison.

Here are the last ten Google queries from my browser history:

  1. [morning-after pill]
  2. [len tower lawnmower]
  3. [melting point of solder]
  4. [melting point of silicon]
  5. [david phillip oster]
  6. [1998 blogs], with a drill-down to [1998 weblogs] and [history of weblogs]
  7. [emacs tags file syntax], with drill-down to [emacs tags table syntax] and [site:www.gnu.org emacs tags table syntax]
  8. [Eric Stoltz]
  9. [cytocomputer]
  10. [zHosting Ltd]

I evaluated them on Google, Ask.com, Yahoo Search, and Bing. I more or less have ads turned off with AdBlock Plus and NoScript, and I’m viewing everything in Firefox 3.0 with Gnash for my Flash player. So there may be annoyances that affect other people but not me.

Summary

So here are the grades for the different queries:

[morning-after pill] Grades: Google B, Ask.com B-, Yahoo Search D, Bing F.
[len tower lawnmower] Grades: Google A, Ask.com A, Yahoo Search A+, Bing A.
[melting point of solder] Grades: Google A, Ask.com B, Yahoo Search A+, Bing C.
[melting point of silicon] Grades: Google A+, Ask.com A, Yahoo Search D, Bing C.
[david phillip oster] Grades: Google C, Ask.com B, Yahoo Search A+, Bing B.
[1998 blogs] Grades: Google D, Ask.com D, Yahoo Search B, Bing C.
[emacs tags file syntax] Grades: Google F, Ask.com F, Yahoo Search F, Bing F.
[Eric Stoltz] Grades: Google A, Ask.com C, Yahoo Search B, Bing A+.
[cytocomputer] Grades: Google B, Ask.com D, Yahoo Search D, Bing F.
[zHosting Ltd.] Grades: Google A, Ask.com A+, Yahoo Search A, Bing B.

Google’s median grade is A- or B+, the best of the four. It only failed on a query where all four search engines failed. However, it was only the best search engine of the four 30% of the time. It was clearly better than the others on dealing with a controversial topic and providing search results from beyond the Web: books and academic papers.

Ask.com’s median grade is B. It, too, only failed on the query where all four search engines failed. Its results were worse than Google’s 50% of the time, equally good 30% of the time, and better than Google’s 20% of the time. So the claim by the guy from Ask.com isn’t as implausible as it appeared at first, but it still isn’t true for my query mix. It was only the best search engine of the four 10% of the time.

I’m really surprised at how well Ask.com did, because I always thought of their search engine as a joke.

Yahoo Search’s median grade is B. It, too, only failed on a query where all four search engines failed. It was the best search engine of the four 40% of the time, more than any other search engine, so I am going to switch to it as my default search engine. It was better than Ask.com less often than Google, though: it was better 40% of the time, equally good 20% of the time, and worse 30% of the time.

Bing’s median grade is C, the worst of any engine, and unlike any other engine, it failed badly on two of the nine queries the other search engines were able to answer: in one case by privileging misinformation and scaremongering over reliable information, and in a second case by simply failing to find anything relevant. It was the best search engine of the four only 10% of the time, like Ask; that was on a celebrity query. I’m sad to say this because my friend Barney Pell has been working really hard on it for years, but Bing’s performance is pathetic.

(The percentages of “best of the four” 30% + 10% + 40% + 10% add up to only 90%; that’s because one of the ten queries was failed by all four search engines, and in that case none was “the best”.)

So there isn’t really a clear winner; Yahoo Search, Google, and Ask.com are pretty even overall, even though some did much better than others on particular queries. There is a clear loser, though: Bing. Maybe I should have included Cuil to make Bing look better. I mean, I feel kind of bad.

(Actually, I did try [morning-after pill] and [david phillip oster] on Cuil. It did better than Bing.)

The rest of this document (4000 words) is taken up with explanations of the particular queries.

[morning-after pill]

Here I wanted to see if I could find accurate information about emergency contraception without having to cope with abortion-scare sites providing misinformation.

Google:

Ask.com:

Yahoo Search:

Bing:

Grades on this query: Google B, Ask.com B-, Yahoo Search D, Bing F.

[len tower lawnmower]

I wanted to find a photo of Len Tower on a human-powered riding mower that I had seen a few days ago.

Google: hit #1 is a page with the photo and background information, instantly recognizable as such.

Ask.com: same.

Yahoo Search: same, but hits #2 and #3 are also about it, with more information.

Bing: same as Google.

Grades on this query: Google A, Ask.com A¸ Yahoo Search A+, Bing A.

[melting point of solder]

I wanted to find out the melting point of traditional eutectic lead-tin solder as well as the melting point of common modern RoHS-compliant solders.

Google:

Ask.com:

Yahoo Search says, “Did you mean: melting point of soldier?”

So I followed the Wikipedia link, and it has the answer for eutectic lead-tin solder above the fold and a section on “lead-free solders” with a whole big discussion of which ones are most common and what their melting points are. So I probably should have followed that link from Google instead of hit #3.

Bing:

Grades on this query: Google A, Ask.com B, Yahoo Search A+, Bing C — would be an F except for hit 4.

[melting point of silicon]

Google has the answer in big letters above the search results: 1687 K. Wikipedia article is hit #2, and the correct answer in °C is in hit #4.

Ask.com has the answer in the snippets for hits 1, 2, and slightly wrong answers in snippets for hit 4 and hit 5, and hit #3 presumably has it if I click through.

Yahoo Search hit 1 is Wikipedia. Hits 2 and 3 are the wrong answer. Snippets for hits 4 and 5 have the right answer.

Bing:

Grades: Google A+, Ask.com A, Yahoo Search D, Bing C.

[david phillip oster]

I wanted to find his home page, thence to find his current email address, to email him.

Google: no home page, but hits 5-7 look vaguely promising. Hit 5 leads to a blog post that links to http://groups.google.com/groups/search?q=%22david+phillip+oster%22&start=0&scoring=d, which does actually link to http://groups.google.com/group/iphonesdkdevelopment/browse_thread/thread/5c9cd5561d7b0d64/da37b38ede21148d?q=%22david+phillip+oster%22#da37b38ede21148d which links to http://groups.google.com/groups/profile?enc_user=szRVXBsAAABguGT__oukXrijYyXRsYeu3jKajrjPH-s4VDv7fhNHSg, which says “davidphillipos...@gmail.com”, which is close enough. Hit 7, his Amazon reviewer page, actually has “oster@ieee.org” on the page.

Hit 9 links to a RISKS page that gives the email address he had in 1988.

In practice I gave up when I saw the page of snippets; instead I searched my email.

Ask.com: hit #2 is Google’s hit #7.

Yahoo Search: turbozen.com is hits #1 and #2, with “oster@ieee.org” in both snippets. Hit #4 is mosaiccodes.com, which links to turbozen.com.

Bing: hit #3 is Yahoo hit #2 (without the email address in the snippet, but clear that it’s his software company), and hit #5 is Google hit #7.

Grades: Google C, Ask.com B, Yahoo Search A+, Bing B.

[1998 blogs]

I was trying to remember the state of the blogosphere in 1998 when I started kragen-tol in order to justify my claim that it wasn’t very surprising that I didn’t start it as a blog.

Google: top ten hits are all trash — things that happen to be a blog or mention blogs and mention 1998. Hit #11 looks more promising but is also trash. Somewhere around hit #20 there’s Psychology of Blogs (Weblogs), from 1998, which is a pretty good snapshot of how things were in 1998 — except a little bit polluted by a 2001 update.

Ask.com: same trash as Google, except only ten hits of it. (I have Google set to display 100.)

Yahoo Search: mostly the same trash, but Psychology of Blogs is hit #4. Yahoo Search used to display 20 hits by default, but now it seems it’s down to 10, just like Google.

Bing: hit #1 talks about what the web was like in 1998, in Spanish, but doesn’t shed any light on my actual question, which is what the blogosphere was like in 1998. Hit #2 is the Spanish Wikipedia page for “blog”, which has a pretty good “Historia” section. Hit #7 is somebody’s presentation on SlideShare, which loses pretty badly (not accessible without Flash and fails freakishly in Gnash) but there’s some good information in the title.

None of these really gave me what I was looking for, which was Rebecca Blood’s “History of Weblogs” from 2000, which I couldn’t remember the title of. So when I was doing this search “for real”, the first time, instead of looking at hit #20 or trying multiple search engines, I glanced at the page full of trash and reformulated my search. The word “blog” wouldn’t be coined until 1999 (by The Brand Peter Me.) and at the time they were called “weblog”, a term Jorn Barger had invented in 1997 for what are now called “linklogs” or sometimes “microblogs” or “tumblelogs”.

So I searched for [1998 weblogs].

On Google, “Psychology of Weblogs” is hit #1, and Jason Kottke’s blog archives for 1998 are hit #3. The snippet for hit #6, from a blog I’d never heard of that ended in 2005, says, “I started this weblog in August 1998, when it was one of the first 25 or so weblogs in existence,” which is a piece of the information I was looking for but not the comprehensive overview of Wikipedia or Blood’s piece.

Ask.com is essentially identical to Google, with the same hits #1 and #3, and Google’s hit #6 moved up to #4. However, it also has a sidebar of “Related Searches”, which includes a suggestion for “history of weblogs”.

Yahoo Search has “Psychology of Weblogs” as hit #1, but also has Blood’s essay as hit #8! Also, hit #4 is “Computer History for 1998”, with some minimal information. Hit #9 mentions that Scripting News’s comments section started in October 1998, and hit #10 is “Jorn Barger, the NewsPage Network, and the Emergence of the Weblog Community”, which offers a somewhat deeper history even than Blood’s essay.

Bing gives essentially exactly the same results as for [1998 blogs].

So, since I was using Google instead of Yahoo Search, I searched a third time for [history of weblogs].

On Google, below the Google Scholar hits, which don’t have enough information on the page to tell me if they’re the right thing, Blood’s article is #1. English Wikipedia articles are the next couple of hits, followed by more articles about the early history of weblogs (1997-2000). Pure gold.

Ask.com gives basically the same results.

Yahoo Search puts Blood’s article at the top, a self-promotional post short on detail by Dave Winer, the Wikipedia article, etc.

Bing gives Blood’s essay at the top, followed by a Spanish Wikipedia article, some random irrelevant stuff, a German page (which I don’t understand), some more irrelevant stuff, and what appears to be an SEO spam page (“Interested in history? At weblogs.hu you find posts and information relevant to history. www.weblogs.hu/posts/tags/history”.)

So, grades: Google D, Ask.com D, Yahoo Search B, Bing C. On my earlier queries Yahoo Search does dramatically better than the others, well enough that I wouldn’t have proceeded to the third query and maybe not past the first.

[emacs tags file syntax]

I wanted to look up the syntax of Emacs TAGS files so I could write a program to generate one (introspectively from the state of a Python program, rather than by parsing a bunch of source code). This search originally was completely unsuccessful, although I’m not totally stymied; there is one free-software consumer of TAGS and two free-software generators of TAGS already on my machine, so I can just look at the source. If I’m lucky, it will reference a file format spec.

Google: all of the hits relate to how to invoke etags, which generates TAGS files, or how to use them in Emacs. The “syntax” being referenced is invariably the syntax of the source files, not of TAGS itself (which is called a “tags table”, apparently.) Most of them are a zillion copies of the Emacs manual and the man pages for etags and Exuberant Ctags.

Ask.com: identically useless results, except for a bunch of irrelevant “Related Searches” at the top.

Yahoo Search: same.

Bing: same.

My next attempt was to be more specific in my query: I’m looking for information about the tags table. In retrospect, I should have looked for information about the “file format”, not “syntax”, but my next search was [emacs tags table syntax].

All four search engines give basically the same results as before.

So my next attempt was to click on “more results from www.gnu.org »”, with the thought that this would give me each section of the Emacs manual only once, and many more of them. It did, on Google, but the Emacs manual does not contain the answer. I am not trying the query on the other search engines.

Searching for [emacs tags table format] does not seem to help.

I thought I would try using natural-language search on Ask.com and Bing. [how do i generate an emacs tags table?] on Ask.com yields mostly etags man pages, but also a link to http://www.emacswiki.org/cgi-bin/wiki/EmacsTags, which doesn’t help but is usually a better resource than the Emacs manual. Bing has it at the top.

Grades: Google F, Ask.com F, Yahoo Search F, Bing F.

[Eric Stoltz]

I had read that Eric Stoltz had been originally cast in Back To The Future, and I wondered who he was.

Google gave me four photos of him at the top, which was sufficient for me to know I didn’t recognize him. Hit #1 was his IMDB page and hit #3 is the Wikipedia page, which outlined his acting career in sufficient detail to satisfy me.

Ask.com has a bunch of irrelevant “related searches” at the top, followed by product images from Amazon which are too small to see the guy’s face. Then there’s the IMDB page, some TV listings for ZIP code 10010 in the US (utterly pathetic; I’m in Argentina), and then a Wikipedia page with a too-small image.

Yahoo Search has only three photos, of smaller size than Google’s, but they’re recognizable. Top few hits are from IMDB and Wikipedia.

Bing has six photos, including a closeup shot, which are highly recognizable. Then the top hit is some other guy Eric Stoltz who’s a web designer, followed by Wikipedia entries from English and Spanish, an IMDB page, and then a French Wikipedia article.

Grades: Google A, Ask.com C, Yahoo Search B, Bing A+.

[cytocomputer]

I wanted to know what had been written recently about Bob Lougheed et al.’s image processing device.

Google:

Later Google hits include crap from linkinghub.elsevier.com, expired US patents describing the Cytocomputer in some detail, and so on. So even though 60% of the top 10 Google hits are basically spam (duplicate teasers from ACM and IEEE, and ask.com SEO spam pages) there’s some good stuff in there.

Also, Google offers “Cited by 57” on the original Cytocomputer paper. Among other things, that links me to the Cheops paper from 1995 and the 400-page Image Algebra book from 1986. These only mention the Cytocomputer in passing, but they look pretty interesting.

Ask.com:

So Ask’s first ten results are almost indistinguishable from Google’s, except:

  1. They’re 90% garbage instead of 60%;
  2. They omit the spam pages produced by Ask.com properties like reference.com and thesaurus.com;
  3. They don’t have Google Books hits (naturally);
  4. As a result of lacking Google Books and spam from Ask.com, hit #9 (the jackpot) moves up to hit #5.

Yahoo Search:

So Yahoo Search found a lot of interesting stuff, but it’s marginally related to the Cytocomputer. I guess I should be flattered that two things I wrote are in the top 10, but I’m more frustrated than flattered. The most relevant items — the US patent and the 2001 Cytocomputer emulation in an FPGA — are missing entirely.

Bing:

So Bing basically gave me none of what I want.

Grades: Google B, Ask.com D, Yahoo Search D, Bing F.

I wish I could give Ask.com an F for spamming Google’s search results, but that wouldn’t accurately represent the quality of their own search results, which is at issue here. If they get successful enough at it, I guess I’ll have to stop using Google, after all.

[zHosting Ltd.]

Charlie Stross wrote about his attempt to start up a virtual Linux hosting company on an IBM mainframe in 2000. Before I got to the part where the company folded before even getting angel funding, I searched to see what the company was up to now. So “success” in this search would be a clear statement that the company had folded without customers or revenue.

On Google, hit 4 is Charlie’s story of the company. None of the other top 10 or 20 hits suggest that zHosting Ltd. of the UK has ever existed. This is somewhat confused by some guy who uses “zHosting” as his screen name when posting on webmaster-oriented forums, including some that are related to virtualization.

Ask.com has Charlie’s story as hit 2.

Yahoo Search doesn’t have Charlie’s story, but its hit #1 is from checksure.biz, which lists a zHosting Ltd. at 54 Easter Road, Edinburgh, Midlothian EH7 5RQ. I’m pretty sure that’s Charlie’s company. It offers to sell me a “report” on the company for £9.95. I’m not sure whether I should treat this as a spectacular success (I got the incorporation address of a company that folded in 2000 and never had a customer!) or a failure to filter spam (somebody tried to charge me US$15 for a “report” on a company that folded in 2000 and never had a customer!)

Bing doesn’t have Charlie’s story or anything interesting, just the guy who posts on web forums.

Grades: Google A, Ask.com A+, Yahoo Search A, Bing B.