Andy Fitzgerald on Structured Content

Andy Fitzgerald is an information architecture and content strategy consultant. He works with mission-driven organizations to produce systems that communicate clearly, align business and user goals, and scale effectively. Our conversation today focuses on moving beyond the page as a metaphor for how information is delivered toward more flexible content structures.

Show notes

Andy Fitzgerald Consulting
Andy Fitzgerald, PhD - LinkedIn
Language + Meaning + User Experience Architecture
Dan Klyn
Richard Saul Wurman - Wikipedia
Headless content management system - Wikipedia
Taxonomy Boot Camp
The Intellectual Foundation of Information Organization by Elaine Svenonius
Mozilla Developer Network Web Docs
Schema.org
SKOS Simple Knowledge Organization System
Groucho Marx
End of Web Design by Jakob Nielsen (Jakob’s Law)
Hooked: How to Build Habit-Forming Products by Nir Eyal

Some show notes may include Amazon affiliate links. We get a small commission for purchases made through these links.

If you're enjoying the show, please rate or review us in Apple's podcast directory.

This episode's transcript was produced by an AI. If you notice any errors, please get in touch.

Transcript

Jorge: Andy, welcome to the show.

Andy: Thank you. Good to be here.

Jorge: I’m very happy to have you here. I’ve known you for a long time. You are a colleague and a friend and someone whose work I respect a lot. You not only do work in the field, but also share a lot of what you’re learning and what you’re doing and have added tremendous value to the discipline of information architecture over the years. But I’m assuming that some folks listening in might not be familiar with your work. So how do you go about introducing yourself?

About Andy

Andy: Well, first of all, thank you for that very gracious introduction to my introduction. And I appreciate that, especially coming from you. I have a lot of respect for your work as well. I normally introduce myself as an information architect, and then I follow that up immediately with: “I help organizations make complex information systems easy to understand and pleasant to use.”

And if someone hasn’t walked away by then, then I talk a little bit about the details of information architecture and making things understandable to people and helping them gain access to all this massive information that we’re creating. That classic information architecture is what I’ve been doing now for upwards of 10 years and recently that has shifted to thinking about what information architecture means in a world that is increasingly a world of data and a world of content as data and the shift articulated by the creators of the field — Tim Berners-Lee in the early parts of this century — that shift from a web of pages to a web of data.

I find the parts of the field that interests me most now is helping people, individuals, and organizations make that shift because we are still so, so enamored with pages. We absolutely love pages! And as a metaphor, it’s been useful for getting us comfortable with this insane amount of information that we now have available to us. But it’s just a metaphor and it’s one that I find as our information is increasingly connected, just hobbles individuals and organizations in their efforts to communicate. So that quickly went from an introduction to the thing I’m interested in! But that’s how it goes.

Jorge: Well, I want to dig into that distinction between pages and data, but I’m curious about your background. Did you study this stuff? How did you get into it?

Andy: So the funny thing is inadvertently I did study this stuff and it took a while for that to matter or for even it to be applicable to my work, my profession and my vocation. I have humanities degrees. So I studied English and French. I have a Bachelor’s in English and Bachelor’s in French and a Master’s in English.

And I enjoyed school, so I kept doing it. So I also have a PhD in English. And my PhD is in language and literature. And I realized quickly though, while I enjoyed the academy and I enjoy teaching, I didn’t want an academic career as a long-term career. So I found information architecture, which was a connection to the way that I like to think about hard problems and work in a field that seemed to me at the time to be more enriching.

And it has turned out to be true. And in the first ten years or so of doing that work, I learned user experience design, how to do research and interaction design, how to work with clients and all the nuts and bolts of being an effective consultant. And then in the last few years, I found in thinking about structured content and thinking about really what people are trying to do when they’re creating these websites is to communicate ideas.

And if you are an organization like the ones that I like to work with — hospitals, higher education, mission-driven organizations — you often have complex ideas to communicate and different ways you have to communicate them. And those problems have come very much back to the kinds of things that I was studying as a PhD student in English, which is: how do we make sense of the world and communicate that with this very limited and idiosyncratic tool that we have called language? Working back client problems and organization problems to those fundamentals has been a place where I feel like I’ve been able to make an impact for my clients and that has really brought me back to the ways that I was thinking through similar problems as a humanities scholar.

Jorge: It’s interesting. I did not know that your original area of study was language, but it all makes sense. I was just saying before we started recording that I was revisiting my notes in preparation for this call, and I have highlights from a blog post you wrote back in 2013, and I’m going to read it back to you. And I’m leaving a bunch of things out here, but this is the part I wanted to focus on: “The root of a key set of challenges we face as information architects designing for a multi-device and multi context infosphere: we traffic in language and ideas, but we still only bring a rudimentary understanding of the relationship between them to our discipline.” And that might be taken out of context, but it I think that it’s coherent with what you’re saying there. That this idiosyncratic tool, as you called it, is basically how we make sense of the world, right? And in some ways, I think that what you’re describing here is helping organizations do that better.

Making sense through language

Andy: Yeah, that’s definitely a fair characterization. And since I wrote that piece there’s been — and I have our mutual friend, Dan Klyn, to thank for this — one of the key axioms of thinking that I keep coming back to is the quote from Richard Saul Wurman, that “We only understand something new in terms of something we already know.” And this idea of trafficking in complex ideas, but not always being willing to track them back to the simple thing that we understand remains compelling. And I really like that Wurman quote because it boils it down really to the fundamental of the kinds of challenges that we’re trying to tackle. I’ve been thinking increasingly about shorthand; there are lots of shorthands that we employ in communication.

I’m working with a client right now helping them set up a headless content management system and shift to a structured content approach for their complex, interconnected content set. And one of the things — I actually gave a talk about this recently at Taxonomy Bootcamp in DC — one of the things that I’m seeing with people using headless content management systems is that the shorthand of the page and…

I guess to back up a little bit, a headless content management system is an approach to managing content that thinks about content not in terms of a page in which it’ll be published. So whereas something like WordPress, you have a form that you enter and that content gets published to a page and there’s a tight connection between those things. In a headless content management system or in a de-coupled approach to content, you create content as almost an abstract entity that may be published to a page, or it may be read by a voice assistant, or it may be integrated into a help bot. That’s the idea anyway.

But the challenge with that, and that I think a lot of headless content management providers — system providers — are still challenging or running up against, is that when we create content, when it’s written content, we use the shortcuts of the page. So we use headings, indentations, bullets, all of those things to communicate relationships between pieces of content. The classic recipe example: if I show you a recipe and a recipe book, you know which set of content is a list of ingredients and you know which set of content is a list of steps by the way that they’re formatted and the way that they’re put together. And the heuristics of the page or the shortcuts that things like headings and invitation and just typographic conventions allow us to do are to create complex relationships between pieces of content.

And we do that without knowing that we’re doing it because it’s common to everyone who reads, particularly, who reads in a similar culture, in Western culture, say. But if our goal is to communicate ideas, facts, and concepts, which for most organizations, that’s really their goal: it’s not to publish pages, it’s to communicate information. Even the huge amount of content that’s now available and the fact that although we can publish a lot more, our capacity to absorb it and make sense of it hasn’t grown at all, let alone to keep pace with the amount of content and the amount of information that’s being published.

We need algorithms. We need the robots to help us figure out what to look at and how to understand it. And those heuristic shortcuts of the way that something sits on a page visually communicates an idea are things that robots are really bad at picking out. So an algorithm knows that I’ve created an ordered list, but it can’t look at that content — not without some advanced processing — and know that that is a list of ingredients instead of a list of steps.

Jorge: Solely by their layout on the page, right?

Andy: Yeah, exactly. Exactly. And we do this all the time.

Jorge: I wanted to circle back to the distinction between pages and data, but I think that we’re doing it here, right?

Andy: Mm.

Jorge: The way that I’m grokking this is that when you say “pages,” what you mean is the content and the presentation of the content are somehow conflated or fused into one thing, right?

Andy: Yeah.

Jorge: Whereas the data… when I hear data is you use the word uncoupled or decoupled, right? Like…

Andy: Mm-hmm.

Jorge: …the idea that somehow the text is detached from the presentation of the text. Which, based on what you were saying just now about using the recipe example, it would require an additional layer to bring back the meaning that is lost when the text is decoupled from the page.

Bringing meaning to data

Andy: Yeah, exactly. I think there’s some comparison, at least to the distinction that Elaine Svenonius makes in The Intellectual Foundation of Information Organization between the work and the document. I believe that book is [from] 2001; predates a lot of the ways that we think about finding information now with the prevalence of Google and some of the other tools that we have. But it’s like Wurman: the thinking that she presents here is fundamental and foundational to how we make sense of information.

For her, the document is the manifestation of a work, but the work is something that exists as an abstract quality outside of the documents that embody it. So she uses the example of Hamlet. And Hamlet, the work, is instantiated in many documents, but each of those documents sits in a a set that can be commonly defined as “the work” Hamlet. And with that example, it’s interesting that the encoding of the way that individual lines and individual scenes and acts in a play work are also encoded using heuristic shortcuts, right? So our reliance on layout for structuring our thinking runs very, very deep.

And it’s because it’s so deep and because it’s so deeply embedded in how we make sense of the world, I think it’s hard for us to step back and see that that’s what we’re doing. So, a more practical example working with the same plan I mentioned a moment ago, we’re also re-doing the front end of the application that consumes the content that we’re structuring. The structured content. And I was in a collaboration session with the visual designer and we were talking about alerts. And an alert is, say, in a set of instructions. This is for a medical context. A way to say, “Hey! Be careful and make sure that this end of the tube was pointed up instead of this end, because if not, you can kill a patient.”

False example, but that’s the kind of thing that, visually, in a set of instructions, we call out or we put a box around, or we want to draw attention to it. And the the visual designer that I was working with said, “oh! We need to make sure to accommodate in the content management system a way to put something higher on the page and give it a red background so that it’s an alert.” And I’m misquoting exactly what it was, but it was still using a shorthand. And what I suggested was instead of doing that, why don’t we create a way in the system to semantically mark something as an alert so that then when we consume it on a front end, whether it’s a website or whether it’s a voice interface, it can be presented in a contextually appropriate way.

So that’s what structured content gives us. It forces us to think about the way that we encode semantics that are opaque to machines and choose the places where we need to clarify that opacity. We need to make those relationships explicit. And it’s not every relationship. So I’m not designing a system where every time you want to use a header, you’ve got to encode the rich semantics behind it, because then that’s not usable for the content producers. But there are many cases where it’s easy to describe what something looks like because those are the metaphors that we have for describing relationships between things. Whereas if — and this is where working in the content management space is so interesting and rich, I think — if we can just give our authors tools for describing those relationships and make them as easy to use as just bold facing something or making font bigger in order to communicate hierarchy, then we can help them create content that is intelligible to humans that are reading something that’s been created to the page, and that is also intelligible to the algorithms and machines that given how much content we’re creating, we increasingly have to rely on, in order to find things. And that it’s not about… I don’t really care whether the machines can read things, I care whether they can help us find things to be informed and to lead better lives with the information that’s being created by humanity right now.

Jorge: I think I’m going to reveal a lot about my age by what I’m about to say now, but I remember starting to learn HTML back in the nineties. And if you wanted to make something bold, you would use the ‘B’ tag, which suit for bold, right?

Andy: Yeah.

Jorge: Or you used the ‘I’ tag to mean italic. To italicize text. And then we had to train ourselves out of that, and we were taught to use the ‘strong’ tag versus the ‘emphasis’ tag,

Andy: Mm-hmm.

Jorge: For this very reason, right? The concept of ‘emphasis’ doesn’t specify how you’re going to render the emphasis on the screen. It’s just saying, “this part of the text is emphasized.” You can mark things up for presentation or you can mark them up like you’re saying, semantically, for meaning, right?

Andy: Mm-hmm. Yeah, it’s exactly that.

Jorge: Well, and the thing is — and I’m spinning this up into a question, I promise! — but I remember that part of the challenge back then is that even though those of us who knew how to code HTML by hand were aware of those distinctions and were doing the right thing, there were an awful lot of web-based HTML editing forms that rendered God-awful HTML in the backend that was not semantically structured.

Andy: Right!

Jorge: And it was because a lot of users were accustomed to using word processors. Things like Microsoft Word, right?

Andy: Right, right.

Jorge: And this concept of making things semantically meaningful just proved to be a point of friction. And I’m wondering if there has been any progress in that regard.

Andy: Sadly, I don’t really think so. I still see it occasionally, or I have done in the past evaluations of websites. So people — organizations — want a fresh set of eyes on something to deliver and an unvarnished truth about what their content is doing and what the information architecture is and how it’s structured. And I still see this. Headings that are marked up as a font size instead of an H1 or an H2, or… it’s usually subheadings. It’s things that are a little lower down of the document than say, the H1 for instance. So I think there’s still not a lot of progress on that. And I think it’s at least in part because of how natural it is for us, the page.

The other interesting thing about semantic markup is H1 — and this is fresh in my mind because it was part of the talk that I gave at Taxonomy Bootcamp — h1, H2, H3, H4 are hierarchical elements. So they are structuring elements. But if you actually read this section on semantics in the Mozilla Developer Network Guidelines which is the, I think, certainly an authoritative source on this, there’s still page section and guidelines. So that hierarchy is meant to communicate the hierarchy of a page. And when we think about the idea of communicating facts, ideas, and concepts across contexts, sometimes a page works.

But if you were to think, for example, of a printed recipe, whether it’s on a page or on a website that you might use, and it would be appropriate to have an H1 as the title of the recipe, an introduction, an H2 that says, “ingredients,” and then another H2 that says, “steps,” and then a list of ingredients and a list of steps. And that would be an appropriately encoded page from an HTML point of view with those semantic headings. But if you were then to query a voice interface, “how many steps are there in creating a bouillabaisse?” It wouldn’t know, because all you’ve communicated are the semantic sectioning elements of a page. You haven’t communicated the ideas, facts, and concepts behind that.

Now, there are tools and in the case of recipes, schema.org markup has a recipe class. Or has a recipe. What do they call it? We’ll call it class! And it has values for ingredients and steps within that. So there are ways to do this, and they require us to have already, in order to be applied, we have to have either already enriched the document or we have to extract from our head that extra meaning and those extra connections. So even in creating semantically correct web pages, we’re still creating pages. And we’re still implying to other human readers, a lot of the relationships between things that are… there’s still a code, and there’s still a shortcut.

Jorge: Would this be an ontology that you’re building?

Building a controlled vocabulary

Andy: Well, schema.org, I think I would call more properly a taxonomy of terms. So it’s a controlled vocabulary of terms that have specific relationships. The kinds of relationships that it embodies are the ontology behind it. So SKOS, the Simple Knowledge Organization System, which is a recommendation for taxonomy and thesauri, for example, is a way that you might encode a hierarchical list of content types, for instance. So maybe I have recipes and maybe I have recipes by country of origin and recipes by meal type. That is all something that you could encode in SKOS and you’d have a taxonomy. Where SKOS itself is a way of saying a term has a relationship to another term that is broader-than/ narrower-than. It has relationship to a literal value that is preferred term or non-preferred term. And that’s the ontology behind it.

So both of these come into play and again, at Taxonomy Bootcamp, this question came up. Like, “what’s the difference between these things?” Because the terms are increasingly getting thrown around now and people are hearing them more. And I don’t think always very clear what they refer to or what they do. But they are — although new and novel and a little bit fuddling sometimes — a useful set of language for peeling away the shortcuts and the heuristics and all the stuff that’s understood in the way that we communicate.

And again, one of the reasons that that’s necessary is to afford us access to the content that we’re creating. And as I say that, and as I repeated that now a few times, some of the other reasons that that’s important is to afford others access to it. So whether something is in translation or something is rendered in different formats, in different contexts for accessibility? Say screen readers for instance. I mean, the state of accessibility of most of the stuff that’s published on the web now is abysmal. Completely abysmal, even by organizations that you would think would be doing a lot better, there’s so much… Just watch some. You know, there are lots of people that use these technologies that have posted videos and have shown what it’s like for them to try and consume information. And the technology is there the ability to encode semantics is there. It’s just the will to do it that isn’t.

And that’s the sad state of these affairs. And to be fair, some of that will is probably lacking because there isn’t a lot of visibility or understanding about the degree to which we use understoods and shortcuts to communicate.

Jorge: I was thinking as you were saying that, that the will isn’t there as manifested in like the budgets aren’t there, because it might be unclear why this matters.

Andy: Yeah.

Jorge: And I’m wondering, as I’m hearing you describe it, if this might not be an issue that machine learning models — language models — might solve for us.

Identifying structure through machine learning

Andy: Yeah. So, that’s a great question, and it’s one that I think comes up a lot and there are some examples when there is a knowledge model that can be leveraged. Machine learning or natural language processing can extract some of that information. I think there are many more cases where an intelligible level of expressiveness has not been shared in a document. And here, I mean the difference between the work and the document in a particular document, and someone wishes — or an organization wishes — to extract the expressiveness that’s understood but not explicit or that can’t be derived from that document. And in those cases, we just get abject failure. I think that the job of structuring and communicating content is going to be around for a long time. I don’t think it’s going to be automated away in part because if the information isn’t communicated in a document in some way, it can’t be extracted. And it can’t be extracted and structured.

Let’s go back to our recipe list for instance. If I have two lists of ingredients and steps, I know by looking at them which one is which. It might be possible to train a machine model to run and to do recognition on the ingredients lists, and identify things that are simply food types and quantities and on the steps list. And identify language elements that are an imperative. Identify language elements that describe a series of things going on over time.

It might be possible to do that. But here we’re looking at a really constrained and really predictable set of information. So, recipies are fish in a barrel. Information architects all love talking about it because they’re the easy examples. You’re only going to do so many things with a recipe and they’re going to have certain types. But when you look at language as a whole — and this is where, coming back to where our conversation started, I think by virtue of enjoying thinking about language problems it invites me to continue exploring this — when you look at the complexity of language…

I have some little language games or jokes, I guess. Little linguistic sort of puzzles. And one of my favorites is, “Time flies like an arrow; fruit flies like a banana.” What’s the machine going to do with that? It is a valid sentence. “Time flies like an arrow; fruit flies like a banana.” But it tickles me. I like to pick it apart.

Jorge: My favorite: there’s one from Groucho Marx, he said, “one morning I shot an elephant in my pajamas. What he was doing in my pajamas, I’ll never know.”

Andy: Well, in order to round out the Groucho Marx… let’s see, “Outside of a dog, a book as a man’s best friend. Inside of a dog, it’s too dark to read.”

Jorge: Right.

Andy: Yeah. Yeah. So, these are extreme. Thank you, Groucho Marx! That’s probably where the banana comes from too, actually, now that I think about it. These are the other extremes. So if recipes are the end of content that we like to point to and say, “oh yeah, you can automate that.” Yeah. In very specific cases, opening and closing hours of an organization, if they’re formatted correctly, you might be able to extract and automate that. Recipe steps, maybe. Although even in that example, there’s some complexity. I mean, how do you automate or how do you extract making a roux or making a white sauce, right? Or how do you extract how to know when you know a dough was properly kneeded? I don’t know. I don’t think a recipe does that either.

So that’s one extreme. And then the Groucho Marx is the other extreme. And in between is everything that we want to communicate. And things like, well, for instance, I enjoy and seek out working with healthcare and higher education because they are complex domains that have a lot of information they have to communicate and often are situations where the people that are consuming that information are novices to it. Not many people outside of those who work in a hospital want to be experts at using their hospital’s website. Students that are looking at courses of study at a university shouldn’t need to be experts on using the website.

But it’s a new space in both of those instances that needs to be made clear and that requires structure and thinking about the content. So that’s, I guess, a little bit of a tangent, but I think that the dream of being able to put natural language processors or machine learning on something is… it would be nice? But I don’t think language is going to let it happen. I think it’s too slippery and too tricky. And I think part of the temptation to think — kind of sadly, like we’re seeing with crypto now — part of the temptation to think that, “oh yeah, this is going to solve our problems,” is not really understanding the depth of, or rather, the degree of complexity that we’re asking those kinds of algorithms to come up against. And language in the way that it structures thought, and in the way that it gives us shortcuts for communicating complex ideas, is prickly! There’s a lot there. It’s a big, big tangle.

Jorge: While you were talking, I drew a two by two matrix. It has, in one dimension… well, you talked about working in healthcare and higher education and you mentioned that one of the things that you like about those industries is that they’re complex subject domains that have to be accessible by a novice audience, right?

All organizations are dealing with complex information of some sort. Some of them need to make that information available to novice users. So I would expect that the civic space would also be like that, right? Like there are these laws, but they need to be understandable by the citizenry or residents of a place. And then the other end of that axis would be organizations that have complex information, but also have a fairly sophisticated audience. So, expert users. And then, the other dimension of this matrix would be: some digital systems are more like publications and some digital systems are more like applications. And I think that what you’re talking about is more in the publication side of the spectrum, right?

Andy: Yeah.

Jorge: So, this is not software design; this is design for systems that are very content-heavy.

Content-heavy systems

Andy: Yes. Yeah. And that is where I like to look most and is where I think I would say I tend to gravitate towards projects that are more content-heavy and more about communicating than interacting, say. Not because I don’t think the other is important. But I think that’s where I tend to have the most engagement in and impact with organizations.

I think that that matrix I completely agree with. I think there’s another dimension there as well. And I was thinking about this on a run the other day and my mind was just wandering on it. And that is the degree to which we’re asking people to understand. So back to the Richard Saul Wurman quote, “we only understand something new in terms of something we already understand.” And sometimes organizations and sometimes organizations I work with look for novelty as a way to stand out, or as a way to do something different or something different than their peers or their competitors. And I wonder if that comes at times, at the expense of communicating what actually is unique about their organization.

So, the idea that… well, I guess you would have the example of like mystery meat navigation, right? Because I don’t want to have just an ‘about’ section, or I don’t want to have an ‘article’ section, or I don’t want to have a… you know, people that find clever words for ‘homepage.’ you’re now asking your users to try and figure out what your your navigation does or what is trying to communicate, when in fact, that’s not the things that makes you stand out. Whereas, I think government organizations or civic organizations have some of this! Is that there is a pattern and there is a a predictable set of norms, commonly known as Jakob’s Law, that people expect the website they’re on to work like the other websites that they’ve been on.

And in those cases, I think the design and the information architecture just disappears. And people don’t have to think about where they’re going. And instead of consuming their ability to make sense of something new on the things that don’t matter for them, you wait until you get to your… you have the thing that is different about your organization.

Jorge: Yeah, I usually talk about this distinction between informing and persuading.

Andy: Mm-hmm.

Jorge: And the Steve Krug, “don’t make me think” thing presumes that you are trying to inform someone, right? Like, and then I think that this is what you’re talking about, like making things understandable. Whereas, you know, if you want if you want to persuade people, a little bit of friction, a little bit of thinking, a little bit of the non-sequitur, can can be a powerful tool, right?

Andy: Yeah.

Jorge: Which is why I personally have a hard time with that stuff because I am very much on the understanding side of the equation rather than the persuading side.

Andy: Right, right. Yeah, and that I know there are immeasurable articles and points of view on dark patterns and using cognitive science to persuade. I think the book… is it Nir Eyal? Hooked, that came out, I don’t know, 10-plus years ago that everybody was reading at the time, and then we realized that we’ve been hooked on things that aren’t very good for us. And now I think it’s that that whole line of thinking and design is coming under a different closer lens as it should. But yeah, those are powerful and dangerous tools.

Closing

Jorge: Fantastic, Andy. I wish that we had more time; I have so many things that we could keep talking about. But where can folks follow up with you?

Andy: People can find me on LinkedIn, which is Andy Fitzgerald. It’s my LinkedIn address. And you can also find my writing and my talks and I publish all my articles at my website, which is andyfitzgeraldconsulting.com.

Jorge: I follow you both on LinkedIn, where we’re connected, and also via RSS on your website, and I’m always dazzled by your posts. So I do recommend that folks follow you there.

Andy: Thanks so much, Jorge.

Jorge: Thank you so much for sharing with us today, Andy.

Andy: It’s been my pleasure and thanks for chatting.

Subscribe via

The Informed Life

Andy Fitzgerald on Structured Content

Show notes

Transcript

About Andy

Making sense through language

Bringing meaning to data

Building a controlled vocabulary

Identifying structure through machine learning

Content-heavy systems

Closing

You May Also Enjoy