Heather Hedden is an information management consultant specialized in taxonomies, controlled vocabularies, metadata, and indexing. She’s the author of two books, including The Accidental Taxonomist, a guide to the discipline of taxonomy creation and management. In this conversation, we explore taxonomies and why they’re important for organizations.
- Hedden Information Management
- Heather’s blog
- Heather’s online taxonomy course
- The Accidental Taxonomist, 2nd Ed by Heather Hedden
- Roget’s Thesaurus
- American National Standards Institute (ANSI)
- International Organization for Standardization (ISO)
- World Wide Web Consortium (W3C)
- Web Ontology Language (OWL)
- Resource Description Framework (RDF)
- Simple Knowledge Organization System (SKOS)
- Microsoft SharePoint\
Show notes include Amazon affiliate links. We get a small commission for purchases made through these links.
If you're enjoying the show, please rate or review us in Apple's podcast directory:
Read the full transcript
Jorge: Heather, welcome to the show.
Heather: Thank you for inviting me, Jorge.
Jorge: I'm very glad to have you here. For folks who might not know you, how do you introduce yourself?
Heather: Well, very simply in one word, I'm a taxonomist. But then maybe I have to explain what that is. So, it depends on whom I'm introducing myself to. If they know what taxonomy is, it's a short introduction. And a long introduction is explaining about organizing information, making it more easy to find. Then maybe I explain tagging and indexing and I develop those terms that are used for tagging or indexing online digital content, whether it's internal or external on websites.
Jorge: You're also the author of the book called The Accidental Taxonomist, which I think is a great introduction to the subject. And you talk a little bit in the book about how you came to taxonomy work. What do you mean by "accidental taxonomist?"
Heather: Well, actually, I came up with the idea of not just about myself, but many people who do taxonomy work don't think that that's something they're going to do when they're students or starting out their career, or even when they're in a position... Suddenly they're in an organization where there's suddenly a need for a taxonomy and maybe they're the ones that are the closest to it. I was thinking at the time, when I wrote about a case of special librarians, special librarians who might normally do research, but they have had some training on cataloging and then they're suddenly asked to make a taxonomy and it's a little bit different and they might not know. But people can come from many backgrounds. They could be subject matter experts. They could be information architects, they could come from IT. And I'm even seeing out from another direction, maybe from ontologies, they could have more of a computer science background. But there are very few courses that are taught, even in library or information schools that are just on taxonomies or... a taxonomy is something that's more practical than theoretical.
Jorge: How did you come to taxonomy work?
Heather: Okay. So, my background was with content. I had a journalism job and then I got a job with a... well, you could say a library vendor, but the company was indexing and abstracting lots of different magazine and trade journal articles, and I was hired as an abstractor/indexer because I knew how to write the abstracts. And then they trained me on indexing with their controlled vocabulary. And then I moved into the group that managed the controlled vocabularies. We didn't call them taxonomies, but, you know, they fit into that. The name of that company, Information Access Company, doesn't exist anymore because it was acquired by Thompson, then merged into Gale. So, Gale still exists as a division of Cengage.
So, that's how I got started, and worked on various controlled vocabularies there. And then, didn't survive the round of layoffs, so then I had to go off on my own and I found out there were other applications and learned other things about taxonomy. Also learned about information architecture then too. And I did some contracting and then got different jobs as a taxonomist here and there. And having worked for more than one, but multiple employers... because each organization looks at taxonomy a little differently, they may have different kinds of content, different kinds of users, it’s for a slightly different purpose. So that broad experience working in multiple organizations then made me feel confident about writing about it and teaching. Actually, I do teach an online course, which I actually started before the book. And that was the basis of it.
Jorge: What was it about taxonomy work that attracted you? You said that your background was in content, and the sense I got from hearing you talk about it is that you almost kind of fell into it, but you've been involved with it for a long time. You've taught it. You've written a book. What was it about taxonomy work that drew you to it?
Heather: Well, it's analytical and it's a little bit creative too. I mean, how are you going to describe a concept? What words will you use? What synonyms should you use? What else will you need to relate that concept to? Should we include it or should we not? And then at the same time, we learn about all different kinds of subject areas. That was a nice thing about working in this company, Information Access Company, Gale, because we were just... all kinds of news, research, periodicals since it went through library. So, I could learn a little bit about many different subject areas rather than becoming a specialist in anything. But, yeah, I liked that it's analytical, it's creative it's...and there's this mix about working independently, but also collaborating with other people. You have to talk with the different stakeholders that are involved.
What is a taxonomy?
Jorge: We've been talking about terms that some of our listeners might not be clear on, starting with the term taxonomy itself. How do you describe what a taxonomy is to someone who might not be familiar with it?
Heather: Yeah, as I said, it's a set of terms, words, or phrases that describe concepts that are used to tag or index documents or content, but they are organized in some kind of structure too that traditionally is hierarchical, but it could be grouped by different aspects that we call facets, or some combination. The funny thing about taxonomy, it's part of a bigger world of what we might call, as I referred to, "controlled vocabularies" or "knowledge organization systems." But some other ones are more formal.
There's nothing actually formal as far as standards for taxonomies. Now the most closely related, is what's called a thesaurus, not Roget's Thesaurus, to look up synonyms, but an information thesaurus, because it has terms that have relationships that could be hierarchical, that is broader or narrower or associative, which is related. And then there also is what they call an equivalence relationship for synonyms or near synonyms, but just terms that could be used for instead. And there are standards for thesauri, published standards from the American National Standards Institute and from ISO, the International Organization for Standardization.
And so, taxonomies can borrow from those standards as they like. And maybe some people have heard of ontologies, which are also related to taxonomies in a slightly different way. I mean, ontologies are a way of modeling and describing a domain of knowledge with not just concepts, but more broadly different classes and relations between them and certain types of attributes. And there are standards for ontologies as well from a different organization, from The Worldwide Web Consortium. They have something called the "web ontology language," used with the acronym "OWL" and letters are not quite spelled right; a little differently. And another standard, RDF, which is the Resource Description Framework.
So, I said there are standards. And then there's a standard for vocabularies, or knowledge organization systems, also from the Worldwide Web Consortium called "Simple Knowledge Organization System," S-K-O-S. Skos, skoos, I've heard it both ways. And this is a little different because it just has to do with the exchange and interoperability. It doesn't provide guidelines for making a good taxonomy, but I'm seeing increasingly more taxonomies follow the SKOS model so they can be shared and exchanged.
In defining a taxonomy, there are different approaches too. Some people think about the hierarchy tree of categories of broader, narrower and narrower, and narrower... and other people who realize that it's something that supports a search and keywords and tags might think of it, just a controlled list of tags.
And in a sense, a taxonomy brings both of those together, the hierarchy structure, and then the controlled terms or vocabularies by having an unambiguous concept with one name and the others would just be synonyms pointing to it. And it even works that both those aspects show up in displays and in systems because you can have a content management system that lets you have categories and also tags at the same time.
And now is that one or more taxonomy, can be a question, but that’s... it's all very kind of fluid and that's part of what makes consulting very interesting because it depends so much on each situation and it's not like this is a taxonomy, you know, just do it this way.
Applications of taxonomy
Jorge: One way that I often describe to folks how they might experience a taxonomy is when considering something like search results. Like they go on, let's say Amazon, and they search for the term "e-reader." And there is the first result comes back with a Kindle, right? And it might be that the word "e-reader" does not appear in the title Kindle, but somewhere behind the scenes, there's a mapping between those two terms. Is that what you're referring to when you talk about something like a thesaurus?
Heather: Okay. The word, thesaurus can mean two different things. When I was talking about the information thesaurus, where they have those relationships of hierarchical and associative and equivalents. A simpler thesaurus that supports search could be called a search thesaurus where we really just have synonyms for each concept. And often a search system, a search engine, will have that kind of simpler thesaurus. We taxonomists also call that a "synonym ring," because it's a bunch of synonyms that are all pointing to each other with none of them as preferred or displayed. So, that's within the realm of what taxonomists do. Although we may or may not call that a taxonomy. And as I said, since there's no formal standards for what exactly is a taxonomy, the word "taxonomy" can be used in a narrow sense, in a broader sense. Sometimes the word taxonomy can be used for any controlled vocabulary, including a simple search thesaurus, a more complex information thesaurus, an authority of proper nouns, names, or the more narrow definition of taxonomy where it's either hierarchical or arranged into facets with each facet, sort of like a, a hierarchy.
Jorge: When you're describing that, particularly these relationships between terms that go beyond the simple synonym ring, it strikes me that these things could become quite complex and perhaps hard to manage. And I'm wondering what are good examples of the sort of business problems, or organizational problems, that having such a structure can help solve?
Heather: Yeah. Well, you brought up search before and I actually meant to continue with that because sometimes search isn't good enough. That can be a problem. People are complaining that search doesn't work. They're not finding what they expect to find, or they get results that are irrelevant.
So, if there is a search thesaurus or even a fuller displayed taxonomy to integrate with the search, this can help a lot. As I said, the search thesaurus doesn't display to the user, but if you have the terms that even display to the user, then they can use them to filter results and they could be grouped in a certain way, limit or filter. So, the combination of what is entered in a search box and then other topics or categories from a taxonomy to limit or filter results makes it a better experience and you get better results.
The core of taxonomy
Jorge: In the book you talk about the heart of being a taxonomist and you say that there are three things. So, the first is dealing with concepts. The second is figuring out what the best words are to describe those concepts. And then the third is determining how to arrange the concepts so people can find the information they're looking for. And that strikes me as a skill that has wide applicability in all sorts of situations. You've been describing search results but connecting people with the right information seems like a superpower somehow.
Heather: Yeah! Well see, another thing is sometimes people know what to search for and sometimes they don't. And so that's why having something displayed as a taxonomy or facets provide some guidance for those who not too sure where to start, how broad or narrow. So, I find it significant to be able to - I've used this word - connecting the users to content.
You know, the search is often dependent on, well, some people are good at searching and some people are not so experienced. But having a taxonomy kind of levels it. It's good for experienced people, and those who are novices, not just in searching, but also in the subject area too, because you can go further with it if you know the subject area, or you can just kind of explore as well by having a hierarchy with broader categories as a guiding starting point. So, I like that aspect of serving users and that's why I've also been interested in information architecture, because that is also very much user centered.
Jorge: I remember in one part of the book you talk about the difference in configuring a taxonomy for expert users versus the general public. And they are very different, right? In the case of expert users, the process requires really understanding the mental model that these folks bring to whatever subject domain you're dealing with. Whereas with a more general audience I think it might be harder to predict what the framing they might bring to the situation. Is that a fair take?
Heather: Yeah. I was just thinking about how there are different interfaces and they might offer an advanced search, versus just starting with the search box and then after getting results you can limit by or filter by other topics. So, that's simpler. So, that's one way of adapting to users of different abilities or levels. But what remains a challenge is what to call and name the actual taxonomy terms. If some would be more — you know, especially like let's say in a health or medical field — medical professionals would use different terms than patients or just the general population. So, that can be kind of a challenge if it's going to serve both users at the same time, the same content. I mean, that doesn't happen as often. Often content and its user interface is for a slightly more limited audience, but when I've worked before, there were some times trying to address both at the same time.
Jorge: This hearkens back to this idea we were talking about earlier, about the core of being a taxonomist, where the first step is dealing with concepts and then the second step is figuring out what to call them, right? And I think that for many people — and for myself, before I started getting into this field — I used to conflate the two, right? I used to think that that, well the name is what it is. Right? But that's not necessarily the case. I'm wondering if you can speak more to that, to the difference between a concept and its labels.
Heather: Yeah, well, the concept is an idea, and you first have to agree what... you know, and you can give it a temporary name, and decide, "yeah, we need this in the taxonomy. There's content about it. And people want to look it up." And then, once you've done that, you go a little further with it and you were suddenly realize, "Oh, there two different names," or, "we could call it this, or you could call it that..." Well, especially since we're talking about terms that are usually not one word, there is a noun and an adjective or maybe two adjectives. I mean, there's more that can be rearranged. And sometimes you can take up a little bit of time to look into that. I've even just gone searching on the web and seeing by usage counts, which is more common. And then of course talking... if you have access to the users or stakeholders, those involved seeing what they think, or looking up in the content itself, the content that will be tagged or indexed, what's more prevalent. I would say those are, those are the kind of three methods that I most often use to try to decide how something's going to be worded. And then what makes sense to be kind of consistent in style with the rest of the taxonomy.
Jorge: It strikes me that the primary material that you're working with is language, and this is a line of work that requires mastery of language. Is that a fair take?
Heather: Yeah. And not quite as much as you might think, because it deals with language, but also deals with identifying what's significant, what's important, identifying the concepts. And I talk about talking with users. I do conduct interviews with stakeholders. So even that skill of interviewing and getting input information and understanding user interface. So, there's a lot. And I am impressed, I've had students take my course for whom English is not their first language, or even some who worked in taxonomies when it's not their first language. They obviously have to be very, very fluent in the language that they're doing the taxonomy, but you don't have to be... you don't have to have studied linguistics or have that kind of language expertise.
Jorge: At the very least, thought, you do have to have knowledge of the different grammatical parts, right?
Heather: Yeah. Yeah! Basic grammar understanding.
Jorge: There's a difference — and this came across as well in reading the book — there's a difference between creating a taxonomy and managing a taxonomy. These language structures evolve over time. Language evolves over time, And I'm wondering you, you are a consultant now, right?
Jorge: So, and I'm a consultant as well, which means that, at least for me, I often come into projects and help organizations get the ball rolling on a project, maybe give them pointers as to what the shape of the thing should be. But then there's this work that comes afterwards of not only helping the thing come to life and be applied, but also to have it evolve over time. And I'm wondering if you could speak to the difference between the project of making a taxonomy and the governance aspect of taxonomies.
Heather: Yeah, I'm glad you brought that up. That is important, especially in my role as a consultant, because I'm coming in for the short term of developing, designing the taxonomy, and then I leave and those who are in the organization there have to maintain it. So, as part of what I do is I always develop governance, plan, documentation, guidance of how the taxonomy should be maintained, which means, what are the criteria for adding a new term to the taxonomy? And what is the guidance for its style and format that we get into the more... the "word" part of it. But then there's the guidance on how it should be applied in the indexing or tagging. So, there could be guidance on how many taxonomy terms should be tagged to a content item or document. And if they are different kinds or different facets, what they should be used. So, that is very important.
When I have worked as a staff taxonomist, controlled vocabulary editor, and then I was doing a mix of both, sometimes developing something new in a special project, while also maintaining the larger subject thesaurus-controlled vocabularies, and also the governance and maintenance documents continued to need evolving. It's surprising. You think it would have already been all documented, but then we'd come up with something else, such as relationships between two different vocabularies that hadn't been documented well enough, and I was looking into that recently.
Jorge: Yeah, I remember reading in the book, an example of a scenario, which is not uncommon in the business world, where companies merge, right? Or there's an acquisition. And all of a sudden you have this situation where — and this strikes me as more of like a project-based intervention — but it's also an intervention where the team in charge must deal with planned and organized and smooth change coming to their vocabularies.
Heather: Yeah. Well, I was thinking in that situation where two companies who already have controlled vocabularies for a similar purpose, the companies merge, and then they have to merge their controlled vocabularies. But then there are things that are related to expanding into a different market or different business and needing to reach a different audience, getting new content. And then of course there are things that come up with current events. So, we taxonomists are talking, "oh, we all had to add terms for COVID-19 and coronavirus and all that stuff."
Jorge: Right! And I'm sure that a situation like the pandemic that we're in the midst of still brings new terms like you're mentioning now. But it also changes the weighing of existing terms, right? So, I'm thinking of the word "mask," which has acquired a whole new layer of meanings. Some of them even political, surprisingly, right? And that's what I had in mind when I was talking about the constantly evolving nature of language.
Heather: Yeah, for some work I'm doing for a client now, and they have a lot of images, they had masks before; they were like Halloween masks. And now they added images, face masks, and I thought, okay, I'm go call this "face mask." And the other one is just "mask."
The value of taxonomy
Jorge: How do organizations know that they need a taxonomy?
Heather: Oh, that's a good question. Of course, you know, some organizations who publish content, it's a little bit more part of what they do. But if it's just any organization and it's just their internal content, they have maybe a enterprise content management system, they may have an intranet, they may use SharePoint, you know, which has some, has search built into it. I'd say. It usually happens that people are realizing that they're having difficulty finding what they want. And, you know, these tools have features built in for taxonomies and maybe the IT department sets it up and they put in some categories, they put in some terms. They're not experts at taxonomies, but they might have some idea of what they want to do. But if it's not working quite well enough and people have seen something better then, and it’s part of the user experience, part of getting work done that they want to decide that they want to take this more seriously, take it to the next level. And depending on how big it is, they could hire somebody as a new position, and they can make it part of a position or they could hire a contractor or a consultant to start it and then just continue and maintain it.
Jorge: I remember reading in the book on the subject of human-created taxonomies versus automated... I'm not going to call it taxonomies, but like indexing software. I'm wondering, how does someone in a business who is facing this sort of challenge know when to call someone like yourself, who can help them with a taxonomy, versus thinking, "I can go and get a piece of software that does this"?
Heather: Well, the software can do the tagging, the indexing, but the taxonomy has to still be created, and that's almost always manual. I mean, there are systems that will suggest terms and they still have to be reviewed, but I don't think they're that good. I mean, you see you put in a little bit of effort to create the taxonomy, and then it goes a long way, because then the indexing and tagging is done with it over time.
So, the question is whether that indexing or tagging is going to be done manually or automated. So, that depends on where does that content come from and how much there is and how much it gets added. If you have people who are internally offering content, you can assign them the additional task of tagging to it. If the content is coming externally, especially there's a lot of data from emails or other social media or something there about this, they want to analyze for voice of the customer. And it just lots and lots of content out there. That's when automated makes more sense, or needing to go through... I mean, even a publisher of content, if it's news and it's daily, a lot of content, and that would be automated versus something that's a scholarly journal article. That's... the journal comes out four times a year, and it needs expertise to index that that's done manually. And there can be a combination too.
Jorge: For folks who might be interested in following up with you, either for this type of work or for learning about it, what's the best place to reach you?
Heather: Well, my business is called Hedden Information, Hedden Information Management, and I have a website www.hedden-information.com. But I think people probably remember me most with the name "Accidental Taxonomist." And so, I have a blog with that name, and I have a subsite of my website with that name. I can be reached that way too. But yeah, one way to get more information, you can visit my blog and my website. I have a lot of links to past published articles and presentations. So, there's a lot of good information there.
As I mentioned, if you want to learn more, I do teach online course. It was originally through the continuing education program of a library school that discontinued its continuing education programs. So, I do it on my own and anybody can sign up anytime as an individual. And I do offer discounts for groups who want to take it together. And of course, as my book, I recommend The Accidental Taxonomist and when it's safe to do so, I will give onsite corporate workshops too. I gave my last and most recent one was in December 2019. I hope to take that up again, when it's possible
Jorge: Hear, hear! Thank you so much for your time Heather, and for being here on the show.
Heather: Yeah, thanks again for inviting me.