Dan Russell on The Joy of Search

Dan Russell spent 17 years working at Google, with a significant part of that tenure as a Search Anthropologist: “someone who tries to understand how people search, what kinds of things they seek, and how their tools influence their search process.” Dan is the author of The Joy of Search, which is the focus of our conversation today.

Show notes

Dan Russell
The Joy of Search: A Google Insider’s Guide to Going Beyond the Basics by Dan Russell
The Library of Congress
Felis silvestris catus - Wikipedia, la enciclopedia libre (Spanish)
Sagrada Família - Wikipedia (Catalan)
Howard Rheingold on Tools for Thought – The Informed Life
ChatGPT
Google Bard
Bing
scite.ai

Show notes include Amazon affiliate links. We get a small commission for purchases made through these links.

If you're enjoying the show, please rate or review us in Apple's podcast directory:
https://podcasts.apple.com/us/podcast/the-informed-life/id1450117117?itsct=podcast_box&itscg=30200

This episode's transcript was produced by an AI. If you notice any errors, please get in touch.

Transcript

Jorge: Dan, welcome to the show.

Dan: Hi. Thank you. It’s nice to be here, Jorge — even at this early hour.

Jorge: It is very early. We’re both in the West Coast and we’re recording very early, but it’s a treat to be here with you. I just finished reading your book, The Joy of Search. And, you and I have not met before. This is our first time talking.

Dan: That’s right.

Jorge: And I think that for a lot of folks listening in, it might be their first time as well hearing about you and your work. How do you go about introducing yourself?

About Dan

Dan: So that’s an interesting question because, I’m a computer scientist and I’ve been working in the intersection of artificial intelligence and the field called human computer interaction or sometimes referred to as HCI. And what that means is: how do you talk to your computers or how do you communicate with them? We live on our phones and we have this little tiny surface area and we poke around on it and we do stuff with it. And if you’re an advanced user, maybe you talk to it. So what I study is how people interact with these systems in general. And specifically in the last 17 years I’ve been at Google. And I’ve been working on how people search for information, how they don’t search for information, what they think about when they’re searching for information and generally what their relationship is with a search engine: Google, Bing, Yahoo, Yandex, Baido, whatever. How do people think about that stuff? What do they do? That’s what I’m curious about. That’s how I would introduce myself.

Jorge: Well, that’s fantastic. This sort raises the subject of your book, The Joy of Search, which focuses specifically on search. And the subtitle is “A Google Insider’s Guide to Going Beyond the Basics.” Could you give us an overview of the topic of the book? What do you mean, “going beyond the basics.” I think all of us use Google, right? Or some search engine.

How people search

Dan: That’s Yeah, that’s right. So, the book started, actually, probably my first or second week at Google. I was walking down the hallway at Google. You have to imagine this is not yet 2005. And I run into Larry Page and he says, “So, Dan, what are you doing here?” I said, “Well, I’m here to study how people search.” And he says, “Oh, no, no, no. We know how people search. We have log information. We know exactly what they’re doing.” I said, “Well, that’s interesting. Let me do a little research and I’ll get back to you on that.” And so, 17 years later, we still don’t know the answer.

Because what happened is that people actually are much more interesting than what the log files might be; what the clicks and the drags and the queries that they type. And what I found, the key thing I found in my research, is that people think they’re good at search. And when I watch them as an expert searcher, I discover they’re not very good. They think they’re great, but they don’t know a lot. So, it’s like we created this great F1 very fast road speeder car, and nobody knows how to get it out of first gear. So, I wrote the book basically as a way to show people there’s a little more to these search engines that you might have thought about, so here’s some tricks.

And the way I wrote the book is I wrote a bunch of stories because I have a blog. And as I write the blog, I ask a challenge every week and I say, “Here! How would you do this?” And I wrap it usually in some story. “I was walking down the beach and I saw this thing…” That’s the kind of stuff that happens to everybody. And when I watch most people searching, their approach is terrible. And so, I wrote the book as basically a set of stories that you can read sort of in little chunks and get an idea about how to be a better searcher. And it’s so much fun that I thought, well, The Joy of Search!

Jorge: I can relate to what you’re describing there in the sense that I’ve always thought of myself as a pretty good searcher. Well, I shouldn’t say always — I think that I’ve learned some skills at searching. But in reading your book, I felt a little intimidated! It’s like, oh my gosh, I thought I knew how to do this. But this is like the advanced level, right? And what’s intriguing is that I don’t think that there were many technical things in the book that I didn’t know about. Things like using quotes to make a more precise search; I’ve been doing that for a long time. But I think that what was valuable — and this goes to this idea of the stories — is you’re describing not just the technical aspects of this but… I don’t know how to describe it, but like the kinds of questions you should be asking. And it’s almost like you’re having a conversation with the system.

Conversing with a search engine

Dan: Right. That’s actually a very good way to think about it. When you talk to somebody else, you don’t just say something like, “wildflower poisonous?” You actually have a little bit of a conversation, right? And so, one of the points of the book is to say, here’s a way to think about it. Here’s an approach. That’s what I was trying to get across in a lot of the stories. Here’s a way to think about this, and to be a really good searcher, you need to know more than how to put two or three words together into a Google query.

What you need to know is how to evaluate that information that you get back, and to know a little bit about the structure of the universe. There’s this sort of conceit, I think, among a lot of computer scientists — a lot of tech people — that information is infinite and flat. You can access it and Google just straightens it all out for you. It’s not really true, right? It turns out the information universe is lumpy and it’s got little clumps of information.

One of the stories in the book is how I ended up in the Library of Congress archives. It’s a kind of an involved story. It goes from A to B to C to D to Z, then back to Q and I ended up in the Library of Congress’s Archives, looking at these 19th century letters. There’s no way to actually do that just with regular Google searches. That’s part of the point of the book is, look: the world is infinitely interesting and structured. If you know a little bit about that, you have an approach to doing your search and you’re going to get better results.

Jorge: Yeah. One of the things I wanted to ask you about is related to this in that I came away from the book thinking, “My goodness! Dan is one of the most curious people I know!” Not in that you’re curious, but that you approach life with this kind of lens of curiosity.

Dan: I’m glad you clarified that.

Jorge: Well, we don’t know each other very well, so you know, I would be jumping the gun to make that assessment! No, but I got the sense that you go through the world just looking out for things that stand out as being curious. And moreover, it’s like you pull on those strings; you follow the threads. I think there was one story where you were on a hike next to the ocean and you saw a wreck out in the sea; the remains of a wreck. And then that leads you to this whole investigation about what happened. What ship is that? And the question is, you spoke earlier of the things that make someone a great searcher; do you think that curiosity is a prerequisite to that?

The role of curiosity

Dan: Absolutely! You put your finger on a really good point. I am curious and I drive my kids crazy because kids typically or traditionally have always been the source of questions. You know, why is the sky blue? Why doesn’t the moon fall out of the sky? So on and so on. And often with kids, they will ask follow up questions. Well, why, why, why? And you know how that goes, right? Every chain of questions ends up either with a theological question or a cosmological question. Why is anything, anything? So, I think I never lost that sort of in innate curiosity.

And the example you cite of trying to find the shipwreck. Actually it’s in the Carquinez Strait, just east of San Francisco, just north of Oakland. And it’s true: I saw that wreck in the water and I thought, “What was that?” And the thing that surprised me the most about it was I could actually find the answer. And I think that’s really in many ways the joy of the book. The joy of search is about being able now to relatively straightforward ask these insanely crazy hard questions that my parents could have never answered, but I can answer them. And so, curiosity is the force that drives me onward. I’m curious about almost everything. And as you can tell by the book, right? It sort of covers the world from shipwrecks to poisonous wildflowers to all kinds of things.

Jorge: As I was reading the book, I occasionally turned to my wife and would bring up one of the curious bits of trivia that I learned there. There was a section where you talked about parrotfish and some of the behaviors of those animals. I certainly did not expect to learn about parrotfish when reading a book about search, but it goes to this point, right? I came away thinking, a) I need to lean into my curiosity…

Dan: Right!

Jorge: … as a way to improve my ability to do…

Dan: This is great, Jorge! I have totally succeeded if your curiosity has improved a little bit, I’m happy. I’m a happy author. That’s great!

Jorge: Well, that certainly came across. And like you said, the book can be read in on several levels. And one level is these are ideas for how to improve your ability to do search. But another level is all of these very interesting stories about history and about natural science. What makes a great online researcher besides being curious? Because curiosity might be something that folks might feel is more innate. It’s not necessarily something that you can work on, even though I said I want to lean into my curiosity. But there are also probably things that people can do to just become better searchers that don’t entail like changing themselves.

Dan: Right.

Jorge: Are there any ideas that you can suggest for folks to become better researchers?

Do one more search

Dan: At the risk of beating a dead horse, just be curious and the tail end of that, if you’re a curious person, one of the things that I think good searchers do is one more search. Because one of the failure modes I see a lot is people will start searching on something. They’ll get an answer. Done. Then they’ll stop. Often — and I think the book points this out — if you go one level deeper, all of a sudden an entire universe opens up below that.

I remember reading once upon a time about a lake in Africa that exploded. And this is one of the chapters in the book. And I’m thinking, “A lake explodes? What does that mean?” And you do the search and you find, “Oh yeah, there’s this lake in Africa that actually exploded.” Okay, fine. You could stop there. But one more search will get you a whole backstory about volcanoes and carbonated oxide and lake that overturn on a little breeze and why it ended up killing thousands of people. If you stopped with the first query and you go, “Yeah, lakes explode. I’m done.” You lose a lot of backstory! I think an more important point is that really good searchers also go one level deeper. They also tend to remember things about where to find stuff in the world.

We all know about Wikipedia, but one of the chapters in the book I talk about why would you ever want to search in the Italian Wikipedia? So, a) did you know Wikipedia comes in many, many other languages? You probably know Wikipedia comes in Spanish as well. It’s Italian, it’s German, it’s Russian, etc. It basically comes at all the world’s languages. And one of the things I learned — and one of the things I think that makes a good searcher a good searcher — is realizing, “Oh! There are other perspectives on this!” What would the Spanish perspective on say, cats, be? If you’ve ever looked at the Spanish Wikipedia entry on cats, it has almost no points of contact with the English language Wikipedia entry on cats. It’s remarkably different. And you would think a cat is a cat, right? Oh, no. The Spanish language Wikipedia entry is vastly different.

I think more importantly, as I say in the book, the Italian Wikipedia article goes into enormous depth about Leonardo DaVinci. And it contains, I don’t know, five times more information than the English language entry? And so, with the assistance of something like Google Translate, all of a sudden now you have access to that. So, a good searcher realizes that there are multiple, multiple resources, including resources in other languages that they may or may not have access to. So, I think there’s a couple characteristics of what makes a really good searcher a good searcher.

Don’t limit yourself to one language

Jorge: Well, and you point out Google Translate… I was trying to place myself in the position of someone listening in and hearing this advice of consulting Wikipedia in languages not their own, and thinking, “Well, I don’t speak Spanish!” Or… Well, I do, but you know what I mean. But the answer there is something like Google Translate, which might not be perfect, but it might be good enough to give you the gist of it.

Dan: you know, to continue that thought: I don’t read Chinese at all. I bet you don’t either. But you know, it’s one of those things where the Chinese to English translation function is not perfect. It doesn’t have to be perfect. It can give me the gist and more to the point, it can point me in ways that I can search in the English language repertoire; English language resources. So all of a sudden I get a Chinese perspective that I can then use to go search in English or in Spanish, right? I don’t remember if it’s in this book, but I do give an example from La Sagrada Familia, the famous cathedral in Spain where you actually have to go search in Catalan. Which is almost, but not quite the same, as Spanish, right? And so, it’s interesting that Google Translate actually supports all these different interesting languages. And if you want to look up something about La Sagrada Familia, look, you want to go look in Spanish or Catalan because the English language content is not the same. So that’s one of the interesting points is that the world is structured and is very rich. Don’t limit yourself to just your first language.

Jorge: Yeah, there’s a couple of takeaways for me there. One is, the idea that people in other cultures are very likely going to be more knowledgeable about subjects pertaining to their culture, so go to the source. And the second idea is this notion that when you’re talking about searching, you’re not just talking about the search engines; there is this ecosystem of web-based and otherwise… you also use a lot of physical resources: libraries, physical places, right?

Search beyond online

Dan: Yeah, that’s a good point. And to complete that thought, not all of those resources are online. So, in one of my stories I talk about going to the archives in Martinez. So, Martinez is east of San Francisco. And it actually has a remarkably great archive that is totally not online. And you have to go there. And when I’ve spoken with them before, they are not particularly interested in putting any of their materials online. There’s a whole pile of information there that’s fascinating. Like for example, they have all the insane asylum admission records going back to like 1850.

And you might wonder why an arch archive would want that. Well, if you’re a medical historian, that’s really interesting, useful information because you can look at, for example, the history of different medical problems. Medical and mental illnesses. You can look at the history of that. Has, for example, the rate of depression gone up or down over time? Let’s take that record back to 1850. The problem is: it’s not digitized, and it will probably never be digitized.

So you’re right. Having these external resources that you can know about, you can discover… I found it because I was searching Google for a particular kind of information. I said, “Oh, there’s an archive in Martinez. I’ll just drive up there.” And once I did, the world opened up in new and interesting ways. And that’s sort of the thing that motivates me is discovering these fascinating backstories. One last thing about the insane asylum admission records: they’re organized by school district. School district! Why? Well, before California was a state, before it kind of had counties and all that stuff, there were schools and there were school districts. And so, that was the obvious organizational principle. It’s crazy now, but back then it was an interesting way to index information. And so, a lot of land records are actually — back in that era — indexed by school system rather than county or rather by state.

Jorge: So a lot of the focus of this podcast is on information architecture; how information is organized. And what you’re bringing up here with this idea of the insane asylum records being organized based on school district points to the fact that the different information sources that you discover during your research are organized in different ways.

Understand organizational schema

Jorge: And it sounds like… well, not in this story per se, but just in general, that having a sense for the organization schema behind information might be an aid in helping you find information more easily. Is that true? Am I leading the witness here?

Dan: no!

Jorge: in saying that?

Dan: Actually you got, I think, one of the deeper points of the book. So you’re absolutely right! So that particular information architecture is not one you would ever expect. And I think another thing to realize is that information architectures or the way people index things changes over time. And so the index structure of, say, 1960s news, is very different now than it is today.

Or another piece of this is that terminology shifts a lot over time. So for example, hay fever. What we would think of as hay fever or allergies was once upon a time called rose cold. Meaning that they thought that the congestion and all that was caused by the pollen of roses and it would give you this sort of pseudo cold. And so, the terminology shifts a lot. So, if you’re looking for example in medical records and you’re searching, for example, for the history of stroke, you have to know that it used to be called ‘apoplexy.’ Or that the influenza goes by both — in English now — influenza and flu, but it used to be called catarrh. So, the actual language and terms themselves shift over time and you have to recognize that.

Jorge: All of these things that you’re pointing to — including things like the awareness that terminology shifts, that categorization schemes shift — they point to something which feels to me like is the kind of the ur-motive for this book, which is imparting a sort of digital literacy on people. Maybe digital is too specific. It’s like information literacy in that you need a base set of skills to successfully make use of this amazing resource that we have in the world wide web.

Dan: That’s right. So, you just use a really interesting term. You said “ur” literacy. And I’m curious how many people in the podcast know what that means. Because ‘ur,’ the prefix ur — U R dash whatever — refers to like the original or the base, right? And it’s, I think, originally a German term and it’s been ported over into scholastic speak in the US. But it’s one of these things where the ability to look that stuff up changes fundamentally the way we think about our interaction with the knowledge world. I have, you know, kids who at the dinner table, a fact will be said over a conversation over dinner and they will whip out their phone and search for it. Population of Japan is 2 million people; search for it! No, no, no. That can’t possibly be right.

The easy access to searchable information and the understanding of what it means to be information literate fundamentally changes all kinds of things. So, you felt comfortable using the word ‘ur’ as a prefix for literacy, knowing full well that your listeners can look it up, right? And as an author, it’s one of those decisions you have to think about all the time. Can I use a word like catarrh in a sentence and people will understand it, or not? Knowing that information literacy is something that we should all have in our fingernails, in our blood, I think is changing the world in a really positive way. So information literacy, I think, is really core, and I hope that’s part of what this book gets across.

Jorge: Absolutely. It definitely came across and I was thinking as I read it… I was thinking back to one of the conversations I had on the show last year was with Howard Rheingold.

Dan: Huh!

Jorge: And Howard spoke about this. He spoke about digital literacy and particularly its importance given the amount of competing claims for our attention. You know, the amount of disinformation that one can find online. We have to hone our ability to be able to discern whether a source of information is trustworthy, for example.

Dan: Mm-hmm. Absolutely.

Jorge: Which is something that you touch on in the book as well. I did want to ask you, just recognizing that we’re starting to run low on time here, and I did not want to leave the conversation without asking you about large language models and generative AI. You said at the beginning that your focus or your background is in AI and HCI. And I get the sense over the last — especially over the last few months — that systems like ChatGPT and the changes that Microsoft has done to Bing transform the way that we interact with these… let’s call them indexes of information. What are your thoughts on where things might be going with regards to how we search for stuff online?

Large language models and search

Dan: This is a fascinating time to be alive if you are in this field at all. I have to admit, even though my background is in AI and I’ve done natural language processing for years, I did not see this coming. I mean, I saw language models coming a couple years ago, but I did not anticipate the breadth and depth to which these things would work. So, it’s been interesting as a person interested in information quality and the depth to which people understand these things, to see how it works.

At the moment large language models like Google’s Bard or Microsoft’s Bing, which uses ChatGPT4, are changing rapidly. That’s the first thing to recognize is that if we have this conversation in a year, everything’s going to be different. Right now, large language models have a real problem with what’s commonly called hallucination or fabrication. They’re just making stuff up. The best version of this I’ve heard is that it’s like a cybernetic mansplaining system where it’s just basically making stuff up to fill the gap.

At the same time, it also provides a kind of ability to search out information in very, very different ways. As an example, I wrote a post recently about searching for words that end in ‘-core.’ . So earlier you used a prefix ‘ur.’ So in one week I heard multiple people say something like, “synth-core” — synth-dash-core. Or “night-core,” or “mumble-core.” and I thought, wait. Have I missed something? What does this core thing mean? And I don’t know of any way to find that on Google using traditional search methods. So, I turned to Google’s Bard and I said, “Hey, tell me about these words that end in “-core.” And I gave some examples like “mumble-core” and “synth-core,” and so on. And it gave me this lovely little essay about “core” meaning a design aesthetic or perspective on the world. And then I said, “Show me 10 more examples of that.” And it gave me 10 more examples of words that I had never, ever heard about. Like “cottage-core.” I don’t know what cottage-core is. So, I went and looked that up and it turns out to be a design aesthetic about very comfortable… imagine west of England cottages with moss and wooden shingled houses and et cetera, etc.

That’s an interesting way to access information that wasn’t there before. Now, the problem with hallucination, I think is a serious one. I’ve also learned from these large language models that I died in 1993. I’m happy to report that that’s not true; rumors of my death are greatly exaggerated. But I think an important point right now is that they’re fabulous for doing some kinds of things, but you have to check absolutely everything. I saw one little essay that was written by ChatGPT-3 the other day, where it was twelve sentences long and one sentence was exactly the opposite of the other eleven sentences. It was remarkable! It completely inverted the sense of what it was saying. So at this point, you have to actually check everything.

I am optimistic, however, that this problem will be solved. I don’t know if it’s going to be in six months or two years, but I know of ways to sort of make this a whole lot better and make the results actually much more factual. There are a couple of systems out now that actually give citations for all their assertions. There’s one called “scite” - S C I T E.ai, that if you’re a scholar, it’s a really nice large language model that’s trained on the scholarly literature. And will give citations for things you ask. So if you ask for example about, “What are the metabolic processes involved in ATP, in say, lizards?” It’ll give you this nice little essay with citations for everything, which is really remarkable.

So I’m optimistic about this. I don’t think it’s going to undo all the necessity of having some literacy about information and information resources, but it’s going to give us a whole new set of tools to look at and craft and understand all the stuff that’s out there.

Jorge: Yeah. And earlier we were saying that the sort of interactions with these search engines, at least when I was reading the book, it felt to me very conversational. And this framework for interacting with computers seems to take that to another level, particularly in its chat-based implementation, right? So yeah, I agree. It is fascinating to look at. Alas, we are short on time today. Where can folks find you Dan?

Closing

Dan: Well, I would be silly if I didn’t say, “Here’s a search strategy to find out more about me!” So, there are two things to know about. If you want to find out more information about me, there are lots of webpages. I have a home website and the way you find me is to search for my name, “Daniel Russell Google” or “Daniel Russell homepage” or “Daniel Russell research scientist” — something like that. Some phrase like that and my homepage will appear in the top two or three results. So that’s one thing. If you want to find my blog: “Daniel Russell blog” — boom! There you go.

One of the things I tell my students a lot is do not remember URLs. You’re wasting your brain to remember a URL. I mean, they’re often long and they’re complicated and who knows exactly. But if you remember the search strategy, you can get there in like one tenth of the time. That’s how much more efficient use of your mental resources. So, “Daniel Russell blog” or “Daniel Russell homepage”; that’ll get you to two of my resources. And from those two places, everything is available to you.

Jorge: That’s fantastic. And I will also, add there, The Joy of Search, the name of your book. And I would suggest that — as I read in your book — if you’re going to search for that, wrap it in quotes so that you find the exact phrase, right?

Dan: That’s right. Exactly right! Yes, double quotes are a great trick to know. It’s like the thing… actually there are two things. That’s thing one. Double quotes basically says, “here! Search for this two- or three- or four-word phrase.” The longer you make, that phrase — say, if you make it five words long — the chances that you got one of the words wrong is pretty high. So, keep it short, keep it simple. The second key thing I want to teach people is how to find text on a page. Because I discovered this. And it sort of floored me when I found out that 90% of the US population does not know how to use “find in page.” What? Think Control-F or Command-F and you find the text on the page.

This is amazing. It’s a fundamental skill. And yet 90% of people don’t know this. And you might find that hard to believe, but it’s true. And I’ve measured it many, many, many different ways. And so, that’s an aspect of information literacy: how to get to your information effectively and efficiently, like with double quotes or like with Control-F.

Jorge: Well, and again, I think that your book will help them. It, again, it is not a technical volume. It is very much like sitting, watching you do these things, you know? Observing over your shoulder, which is often one of the best ways to learn about something.

Dan: That’s one of the things that people need to know is that’s the way you learn best is by looking over the shoulder of somebody. So, if I want to learn podcasting, I’m going to look over your shoulder, Jorge.

Jorge: Thank you. Thank you for letting us look over your shoulder, Dan, as you search the web. And thank you also for being on the show. It was a real treat!

Dan: It was great. Thank you. This is marvelous fun.

Subscribe via

The Informed Life