Marcia Bates is Professor Emerita in UCLA’s Department of Information Studies. Over the course of a long career in both consulting and academia, Dr. Bates produced seminal work in user-centered information system design. Her paper on “berrypicking” as a user search strategy has been widely cited and is considered a foundational text in the field. In this conversation, we discuss search systems.
- Marcia J. Bates Home Page
- Marcia J. Bates - Wikipedia
- UCLA Department of Information Studies
- The Design of Browsing and Berrypicking Techniques by Marcia J. Bates
- Selected Works Of Marcia J. Bates, Volume I
- Selected Works Of Marcia J. Bates, Volume II
- Selected Works Of Marcia J. Bates, Volume III
- The Contemporary Thesaurus of Search Terms and Synonyms: A Guide for Natural Language Computer Searching, Second Edition, by Sarah Knapp
- Boolean algebra - Wikipedia
- Carol Kuhlthau - Wikipedia
- U.S. Energy Information Administration (EIA)
Show notes include Amazon affiliate links. We get a small commission for purchases made through these links.
Jorge: Welcome to The Informed Life. In each episode of this show, we find out how people organize information to get things done. I am your host, Jorge Arango.
My guest today is Dr. Marcia J. Bates. Dr. Bates is Professor Emerita in UCLA’s Department of Information Studies. Over the course of a long career in both consulting and academia, Dr. Bates produced seminal work in user-centered information system design. Her paper on berrypicking as a user search strategy has been widely cited; it’s considered a foundational text in the field, and one that I use often in my courses. So, it was a great privilege to sit down with her to discuss search systems, which is the focus of our conversation today. And now, Dr. Marcia J. Bates.
Jorge: Marcia, welcome to the show.
Marcia: Thank you. It’s a pleasure to be here.
Jorge: Well, the pleasure is all mine. As I was telling you before we started recording, I have been influenced by your work. Some folks listening in might not be familiar with your work. How do you introduce yourself?
Marcia: You mean, what is my elevator talk?
Marcia: Well, I’m interested in information-seeking behavior and the design of information systems to make it easier for people to search for information. And that has been my main focus.
Jorge: And I would say that you’ve had a long career. Is it fair to say that it’s primarily been in academia?
Marcia: Yes, Although I did a ton of consulting, along the way and that was very useful because it anchored me in reality, you know? And it also meant that I had real things to work with. Because some of the people in my field were working with little toy databases at twenty thousand items or something, and I got to work on million-item databases to test out ideas and so on. I consulted for governments, private companies, and startups.
Jorge: And what did that work entail? Because I’m only familiar with your work as published in your papers.
Marcia: Well, I worked on the things that I’m interested in as a researcher, which is the design of information system interfaces to support search. And I advised several companies and organizations on that kind of thing.
One of the things that I’ve argued for my whole career is that you need to design a user centered vocabulary, not the indexing vocabulary. But the things that people come up with when they’re approaching an information system are distinctive and connected with each other in different ways from the way that formal indexing is done. You need to design a thesaurus for the searcher that is a searcher thesaurus and then you need to design the interface so that people can see clusters of terms around that topic.
For instance, if you look up in Google under migration, you’ll find that they’ve done a beautiful job. Every item on the page has the word “migration” featured in it. But somebody who’s interested in… you know, that that’s the word that comes to mind. And they go in and they input “migration” when they’re really interested in asylum seekers, but they didn’t think of that word to use. But when you show them a cluster of related terms, they can see immediately, “Oh yeah, that’s what I meant!” And click on that instead and get things much more relevant to their interests.
Search is not done
Jorge: I get the sense that search… certainly after the popularization of the web, and especially Google I would say, search has become kind of part of the infrastructure of daily life. And…
Marcia: Well, excuse me, but I remember hearing not too long ago, somebody say, “Oh, search is done. We’ve got that.” You know? “There’s nothing more to be done with that.” And, as I’ve said, I’ve been arguing for forty years to give people support while they’re searching. They don’t have to take it. They can ignore it, but give them the chance to see…
You know, one of the things that I think was really funny was with Yahoo. Remember, when Yahoo first came out and it was a directory and they created a whole classification scheme of categories of topics and it was very elaborate. But they didn’t let the user see it. They saw it as their proprietary product, so they were not going to let users see that.
And so, it defeated the whole point of it being a directory. But people like to see where each topic is embedded in the conceptual landscape. And if you let them see how it relates to other things, they can often immediately realize, “Oh, that’s not what I meant. I meant this other thing.” But it’s very hard to generate vocabulary in your mind. Once you think of a word for something… and buried somewhere in the psych literature this can be found. It’s actually very hard to think of another word for it. It blocks out — it inhibits — other word generation.
So, give them that instead of demanding, with this empty box, that they generate all the possible things that they can search on. Show them the world that they’re searching in. Show them the landscape of information and relatedness and let them follow the relatedness up one direction or another. And we’ve never created interfaces like that for users. They don’t have to use them, so why not make it available if they can use them and if they are interested in it.
Jorge: And specifically what you’re calling out here is this notion of the clusters of related ideas, yeah?
Marcia: Well, one of the things that folks don’t realize… and this goes back well, actually, I discovered this unwittingly in my own dissertation; it wasn’t something I was looking for, but topics have multiple — not just two or three terms for them — they have twenty, thirty, forty terms for them. And so, when you search on any one of those, you’re going to get some stuff back.
But you’re missing three-quarters of it. So, there are a lot of closely related terms and little cross references in library card catalogs were never adequate because they were only one or two cross references per word. That’s why they really need to be in clusters or clouds of terms. Each of the terms being themselves, searchable.
Actually, one time I got to the point of having completed a thesaurus like this with a thesaurus designer working for me, and we got it all done and within days the company just shut down everything connected with that. We were never able to test it. They lost funding in the defense industry and so everything shut down in California around that time. But in the early nineties, there was a recession in the rest of the country, but there was a depression in California.
Better search and searching
Jorge: I want to circle back to this idea that search is not done. And I’m thinking, in my own work, I see a distinction between when we think of a generalized search, something like Google or Bing, these global search engines that let you search through the entire content of the web. And then there are, I don’t know if to call them proprietary, but the search functionalities that you see in specific websites.
Like, for example, a company might have a knowledge base where their customers might find support articles, right? And that might have a search functionality. And I’m wondering if there’s a difference between those? Between the global kind of web — big time web searches where you can enter any concept, any word — versus something more localized, more specific.
Marcia: Well, in my experience, most of those within-website search systems are abominable. They’re throwbacks to the seventies. You know, they’re just simple left-to-right search match things, and often it’s not made clear what you’re searching.
For instance, it turns out that you’re just searching their press release database and you actually put the term in because you were trying to find some feature or element of the actual website. And you put that in and says, “Oh! Zero results.” So, I think these things are very un-legible to the typical searcher and often are designed in ways that are not very helpful for people.
See, years ago, I wanted to propose that there be a website of vocabularies. And back in the paper days, there actually was a library school — and I can’t think of where it was right now, I think it was a Canadian school — that tried to collect all the different thesauri that they could find worldwide. Which was a great idea, you know? But that was all paper.
I think there should be a website that you can go to and collect, look at all of these digitized vocabularies, classification schemes, thesauri, index languages — all the different things that they call them and maybe borrow some of these to use part or all for your own local situation. It’s just absurd that we redo the labor. And of course, because it’s so complex they don’t redo very much of it because it’s too much to do!
If you took advantage of what people have already done that could aid this tremendously and I’ve long felt that there should be the vocabulary support for searchers at websites or at Google that are varied, that have… you know, you might be able to pick out different ones. “Well, this is where you’ll find all the details of cooking vocabulary.” And, “this one is animals, but from a wildlife rather than a biological — straight biological — perspective.” So you have a little database of things about wildlife. There’s so many different ways these can be constructed and so many of them that are available out there, but often are proprietary, because the creators of them don’t want to share it with anybody.
The bottom line is that effective searching is not intuitive. People, the way that they tend to search for things isn’t the way that the literature’s actually organized and isn’t the best way to follow leads. And we’ve done some things in the information world to help people with that, but there’s a lot of things that could still be done, but there isn’t a big Google funding it. So, even the things that we’ve proposed or have done in the field have been largely ignored. The big powerful guys come up with something, but they don’t realize that searching has its distinctive qualities that require distinctive understanding and distinctive support modes.
Jorge: This notion that searching effectively is not something that we do intuitively to me points to two separate but related directions. One is that the people who design these systems perhaps should do so more thoughtfully to accommodate some of these… I’m going to use the word “affordances,” but perhaps more broadly, some of these structures. Things like the research that you’ve pointed to. So that’s kind of the design track of things, right? Like, the people who are producing the systems that we use to find information.
But the other direction that this suggests to me is that perhaps those of us who aren’t designing these systems — I shouldn’t say “us” because I am one of the people who is designing systems — but people, like everyday people, who are just using these things, perhaps need to embark on a course of… I don’t know if to call it like “self-education” to become better searchers. Is that a thing? Can we become more literate searchers?
Marcia: Oh, I, I certainly think we can. You know, there are things — straightforward things — that people could do right now, but at the same time it’s frustrating in this field because the HCI people are mostly psychologists. And of course they think they know everything about language and searching. What does anybody else need to know? They can design systems!
And the linguists… who knows more about language than linguists? They’re the ones who we should turn to for anything about language. But what they don’t realize is that there’s an expertise about information search. There are things that librarians learned in this, especially with the early online systems in the seventies, eighties, and nineties, that we still haven’t applied in the web world because nobody listens to librarians.
Marcia: Now, the particular thing that really galls me about this: I’ve been arguing through this whole session about the need for user search vocabularies, and one librarian actually wrote a book about this. And she developed all of these clusters of vocabulary terms that she found worked when she did online searches for her customers.
And she collected this from a lot of other people. It wasn’t just her own work, and she published it. I bring it up every once in a while. It’s just been totally ignored. Her name was Sara Knapp and I think it’s called A Contemporary Thesaurus: Social Science Terms or something like that. And, it exactly personifies what I’ve been arguing for forty years and what I haven’t been able to get people to do.
Jorge: I’m going to link that up in the show notes. I was not familiar with this book, which proves your point, because I’m in the industry and I hadn’t heard of it.
Marcia: A colleague and I got the idea once of trying to make it available online in electronic form, and the publisher refused because they didn’t want to give away their stuff. So, of course it meant that nobody ever… it’s probably not in print anymore because it was some years ago. So, the librarians had a unique experience because in the seventies, long before the web, there was online searching on telephone lines that was done to databases in different academic areas.
And there were companies like Dialogue and Orbit and so on, who… STN. There were a whole slew of companies that would buy these databases, mount them, develop their own search features or capabilities, which you had to learn sometimes with considerable laboriousness. And if you learn those search capabilities, you could then use their database. And at that time they were very expensive. You know, you pay a quarter a reference. Twenty-five cents a reference and $300 an hour to be online to the database.
So, the librarians not only learned to search well, they learned to prepare well so that when they went on, they were using the database for only a few minutes. And because of the sort of boolean logical structure of these search systems that the companies had, it required real skill. It required really understanding boolean logic and understanding vocabularies and how to put them together. And that whole world — it went on for about twenty years — led ultimately to Sara Knapp’s cluster vocabulary. And that was based on the experiences of hundreds of librarians.
Jorge: Some listeners might not be familiar the term boolean searches or boolean operators. What you’re referring to there are things like, term A and term B, versus term A or term B, correct?
Marcia: Right. So, I want every document that has both of these terms somewhere in the document, or I want either this term or that term somewhere in the document. But these could become quite elaborate searches because there was a lot of specificity in what these users wanted. You know, “I want it only between this and that date. I want only ones from France and Germany,” and you know, there were all of these different little sub-indexes of features that you could combine. And so the people doing this… I taught courses in it. That’s one of the things I taught in library school was how to do that kind of searching.
Information-seeking vs. problem-solving
Jorge: Right. And I think that this speaks to this idea of information-seeking literacy, right? Because if you know how to use these boolean operators, all of a sudden you can issue much more precise queries. And search engines like Google support them, right? Like, if you wrap two terms in quotes in Google that treats it as an ‘and’ operator, right? So I’m wondering if there are any other recommendations that you might have for our listeners so that they can become better searchers themselves?
Marcia: Well, one big point I would make is that I think thinking about search strategy — you know, how you’re going about finding information — is not something that most people ever do. It’s all kind of unconscious. Occasionally, you want to know when the Beatles first album came out, and so then you look up in Google and you find a date and you’re conscious of searching for information.
But most of the time, people, when they’re searching for information, are actually solving a problem in life. They have something they have to do and they’re trying to figure out. They need to get the information in order to do it, but they don’t think of themselves as seeking information. So it’s quite unconscious.
They’re problem-solving, but they’re not information-seeking. And if you want to improve your information seeking, I suggest two things. Number one: notice when you’re solving the problem, going about doing something. Notice when what you’re actually seeking is information. You’re not solving your problem by driving to the doctor or something. You’re solving your problem by looking for information about the doctor. So notice those moments because then it makes you conscious that you are engaging in this activity of searching for information.
And once you make it conscious, point number two is you can ask yourself, “What is my strategy? Might there be a better way to find this? A better source to find this? Yeah, I went to the web because that’s your automatic first thing.” But maybe there’s something… you know, academic libraries! All the information is available to you on the web. Go to the academic library. Maybe find a whole database on the subject. You know, and instead of having to search all over the web through seven different things. So the point is notice when you are actually searching for information in your problem solving. And then ask yourself, “Is this the best way to find it?”
Jorge: before we started recording, we were talking about how many of us currently do this. And I get the sense just from that conversation and also from my own personal experiences that many of us have learned these searching strategies kind of ad hoc. We’ve picked it up from our friends, our colleagues, our parents. How can somebody develop a more keen sense of what strategies to apply where? Are there sources to learn this stuff?
Marcia: Yes, but the ones that I’m aware of were still very library-oriented, although libraries are great sources of information, you know? But it’s kind of difficult to… you can’t get publishers and so on to take this sort of thing very seriously. If you look up for books about how to find information, they tend to be of the sort… they’re either a compendia of lots and lots of different sources, or if it’s about writing a research paper or something, it’ll give you a list of seven things to do. First you pick your topic. Second, you go to the library. Third, you take notes. You know? And they’re not very helpful at all in the real-world problem of coming up with and shaping a topic for your project.
Carol Kuhlthau has done wonderful work on this. She shows how, people go about this and what is stressful for them and what is wrong with the way that they’re commonly taught to do these sorts of projects.
Better searchers = less gullibility
Jorge: When you were saying that publishers don’t take it seriously, what came to my mind is that maybe they don’t because it’s not clear where the money is made here, you know? But I think that the flipside to that is I think that we would all benefit from being better searchers. Society in general would be improved, no?
Marcia: Oh, tremendously. Yes. I mean the way that so many… the gullibility of so many people for these different theories that are running around the web and so on, that they don’t seem to have learned to evaluate their information source and to try to ask, “Is this a source that I could likely trust or not? Or did they have an biased interest?” They don’t even seem to ask the most basic questions about this.
I come back to the idea that for most of our history as human beings, back when we were foraging for food, most of the things that we needed to know, we found by running into them or somebody in our little clan telling us about them. But nowadays we live in a much more complex world with information that can be high quality or poor quality. I noticed, when I was thinking about talking with you about this the other day, I was looking to find the percentage of fuel in the United States that comes from fossil fuels as opposed to renewables.
And you can find things right away when you search on Google for this thing. But long story short, I found my way to a site that was eia.gov. Well, that’s the Energy Information Administration. Guess what? There’s a whole administration that’s just for energy information. Now, when they tell me that 79% of the fuel comes from fossil fuels, of electricity and so on, I’m inclined to believe them. And so, just that minimal thing of noticing where you’re reading this from makes a huge difference.
Jorge: Yeah, I mean, just the very fact that it has a .gov top-level domain immediately raises the level of trustworthiness. It certainly more trustworthy than a random forum somewhere. I’m sure that we could keep talking a lot about this, but I want to be respectful of your time. It’s been brilliant talking with you. Where can folks follow up with you?
Marcia: I have a website. Just look up Marcia Bates. And it’s fortunately the first name that usually comes up. There aren’t forty seven other researcher Marcia Bateses. So…
Jorge: Well, that’s an area where a cluster of related terms might not help, right? If you’re looking for a proper name.
Marcia: Proper names. Yeah, that’s it. Well, actually it’s needed with proper names sometimes too, particularly in the arts and humanities. But anyway, that’s my website. I have many publications there that are open source. You can read the whole thing yourself right there at the website. And I have an email address there, and so that’s how you could contact me.
And I should mention that in 2016 I collected about forty articles of mine, that I had published over the years in three volumes called Selected Works Of Marcia J. Bates. Each book only costs… There you go, you’ve got it! Each book only costs $18 a piece or less now if you get them remaindered. And the fellow who did the type setting and so on was just fabulous and they’re beautifully done.
Jorge: I vouch for that. The books are gorgeous and the content is even better. So thank you so much for sharing that with us and for being on the show.
Marcia: Thank you.
Jorge: And Thank you for listening. As always you can find notes and a transcript for this email@example.com. If you’d like to be notified when new episodes come out, please subscribe to my newsletter at the informed.life/newsletter. And If you’re enjoying the show, please rate or review it in Apple’s podcast directory. This helps other folks find it. Thanks.