This episode is a compilation of conversations from 2023. It’s not meant as a best-of collection, but an opportunity to highlight some themes that emerged during our conversations throughout the year. The episode is also an experiment, with the themes being curated partly by an AI.
- Episode 107: Michael Becker on Knowledge Work
- Episode 108: Carrie Hane on Content Models
- Episode 110: Nicole van der Hoeven on Obsidian
- Episode 111: Andy Fitzgerald on Structured Content
- Episode 112: Jerry Michalski on Jerry’s Brain
- Episode 114: Dan Russell on The Joy of Search
- Episode 115: Are Halland on The Core Model
- Episode 116: Bob Kasenchak on Music, part 1
- Episode 117: Bob Kasenchak on Music, part 2
- Episode 118: Maggie Appleton on Digital Gardening
- Episode 119: Aidan Helfant on PKM for Learning
- Episode 120: Alex Wright on Informatica
- Episode 121: Chiara Ogan on Personal Libraries
- Episode 122: Sönke Ahrens on Smart Notes
- Episode 125: Karl Voit on Org Mode
- Episode 126: Nate Davis on IA Sub-disciplines
Show notes include Amazon affiliate links. We get a small commission for purchases made through these links.
If you're enjoying the show, please rate or review us in Apple's podcast directory:
This episode's transcript was produced by an AI. If you notice any errors, please get in touch.
Jorge: Today’s show is a bit different. Instead of a guest, I’m sharing a compilation of episodes from 2023. This is not meant to be a best-of collection but an opportunity to highlight some themes that emerged during our conversations throughout the year.
This isn’t the first time I’ve done this. I published a similar episode at the end of 2021. What’s different about this one is that I used AI to help me pick the themes and what parts of the conversations should be featured in each. I must say up front that the clips we chose to highlight aren’t meant to be exhaustive. We had other discussions about these subjects throughout the year. These are just some that stood out.
I’m planning to write about that process in my newsletter. If you’re curious, for now, I’ll just say that while it didn’t go as smoothly as I hoped, the AI helped me arrive at a first draft that I could build on. In any case, if you want to learn more about how I used AI to curate this episode, subscribe to my newsletter at jarango.com/newsletter. And now, onto the themes for the year.
I’ve selected four to highlight. The first theme is about connecting concepts, which is a somewhat abstract take on a central idea and organizing information. That of relating concepts to other concepts to create sets of ideas. The second theme is about information architecture strategies and practices, which has to do with organizing information for others to use, whereas the third theme is about personal knowledge management strategies and practices, which has to do with organizing information for ourselves. The fourth and final theme is about the impact of artificial intelligence in organizing information.
These themes aren’t surprising. Although my background is in information architecture, much of my attention this year has been on the subject of my new book, Duly Noted, which is about personal knowledge management. And, of course, artificial intelligence broke into the mainstream in the last quarter of 2022, so it’s no surprise that many of our conversations this year focused on the impact of AI on the organization of information. But I’m getting ahead of myself. Let’s get started with the first theme, which is about connecting concepts.
Theme 1: Connecting Concepts
Jorge: By connecting concepts, I mean something that is fundamental to both information architecture and personal knowledge management: establishing explicit relationships between pieces of information. Because it’s so fundamental, this idea came up in several conversations. In episode 112, Jerry Michalski told us about how he keeps track of what he’s learned using a software system called TheBrain. Jerry explained how connecting concepts in TheBrain entails a different way of thinking.
Jerry: Anytime I add something to my Brain, which if you do the math, I’m adding 50 or 60 things to my Brain every day. I have never had the blogger’s or Twitter-person’s urge of, “Oh my God, I need fresh content.” That’s not what’s happening to me. What’s happening to me is that I have a really rich information flow because I curate my email and mailing lists. I curate my Twitter feed and who I follow. So, I see way too much interesting stuff every day, and I’m plucking the gems from that. I call them nuggets. And I’m putting a link to them in my Brain, and then I have to weave them into context.
So, the moment I decide something is worth remembering, which means putting it in my Brain, it shifts me into system two thinking. Here, I’m going to Danny Kahneman’s Thinking Fast and Slow. Mostly, we live in system one thinking, which is our responsive, reflexive reply to everything where we don’t really engage the gears. One of the things I love about curating TheBrain is that daily, I’m thrust into system two, thinking of: is this worth remembering? If so, yes. What is it? Where do I put it? Because I don’t have any orphan thoughts in my Brain — at least not intentionally. Everything is hooked like a Christmas ornament onto some branch, right? And then it’s like, okay, so what do I call it? What is it connected to? What can I learn from it?
And then I’ll google the thing some more, and then I’ll weave a little bit. And so, I’m always doing this little bit of contextual weaving all over the place, a little at a time, with no particular order. It’s extremely random. It’s as life hits me, kind of, or as the task I set forth for the day or whatever. So, when you and I have this podcast, I had set up a node — a thought — for this podcast, and I went back to it where I had connected it to the document you sent me for prep to you; I’ve got you in context, and I put you in a long time ago. So, that just refreshes my wet Brain immediately, and I can step into the conversation like I’m stepping into a stream.
Jorge: In our next clip, from episode 119, Aidan Helfant tells us about how he adapted ideas from Nicholas Luhmann’s Zettlekasten approach to knowledge management to suit his own needs.
Aidan: As beautiful as this system is — it allowed him to create so much, such a crazy amount of stuff — there are two main problems that I have with it, which I had to switch for my conceptual note-making system. Firstly, permanent notes only having one idea, I found, made it very difficult to expand upon ideas if you wanted to because you can only have one tiny little index card. It’s very hard to have a lot in that. So, I changed permanent notes to concept notes, which don’t really refer to a singular concept in the sense I just liked the name. The difference is they focus on one idea, but they can have multiple ideas in them. So, that allows you to have a bit more nuance inside of the denote itself.
Secondly, the other problem with the Zettelkasten system is its entirely bottom-up mode of thinking, which is a type of thinking where you don’t need to come into a learning endeavor with an understanding of how the individual parts fit together. That’s a more top-down learning approach. That’s what we do in school most of the time: you get the curriculum, you see everything that’s going to be given, and you know when it’s going to be given. The Zettlekasten system is entirely bottom-up. You just read whatever you want, create individual notes, and connect them together. And you don’t have that top-down structure off the bat. You have to give it back.
So, what I started doing in my conceptual note-making process was giving that top-down structure that was necessary to see the bigger view, which is why I started creating notes called Maps of Content. It’s a term from Nick Milo that refers to a note with a bunch of other individual notes inside of it. That’s a map of content, and creating that map of content lets you see how all the individual notes that you’re taking bottom-up fit together into a greater hole. So, that’s my conceptual note-making process.
Jorge: This notion of hierarchical relationships in information versus more emergent structures also came up in episode 120, where Alex Wright mentioned it as one of the themes in his book Informatica.
Alex: This really emerged as one of the kind of organizing themes of the book. As I said, I didn’t really go in with a big hypothesis or thesis or argument I was trying to make, I was really just kind of exploring this terrain. But what emerged over the course of my research was there did seem to be this kind of interesting pattern that you see over and over again when these new disruptive technologies emerge; they often seem to happen at this kind of point of conflict between networked systems that tend to be flat, associative ways of organizing information.
For example, you could think of oral cultures as being very networked. Like, people tend to know people who know people. And information flows in a very kind of loose way through networks of association. And a lot of people think the web is like that. Or at least the early version of the web was very much just hyperlinks, and everything is just kind of a big bowl of spaghetti that just keeps going, as opposed to more hierarchical systems where things are organized in a kind of top-down way, where there’s kind of a top-level category and then a subcategory and a subcategory and subcategories under that all the way down.
And this became a theme that sort of emerged over time as there is this kind of interesting dance between these kinds of archetypes. And you can see it over and over again in the kind of tension between… you know, for example, if you look at the Gutenberg era, you could say that there was a very entrenched, hierarchical knowledge management regime that was basically administered by the Roman Catholic Church, where knowledge was tightly controlled, and the way it was controlled was very closely interrelated with the organizational hierarchy of the Roman church and sort of government hierarchy. Knowledge was handed down and organized into very tightly constrained categories.
And then along comes Gutenberg, and hot on his heels comes Luther with the Protestant Reformation. And there’s a great book by Elizabeth Eisenstein that talks about this, about how the Lutheran Revolution was really powered by the printing press. And suddenly, there was this technology that enabled people to publish information outside of the auspices or the oversight of the Catholic Church. Luther came along with his 99 theses, and then the revolution started, and suddenly, it’s much more of a peer-to-peer flow of information happening. And it became a kind of paradigmatic challenge to the power of the Catholic Church and led to lots of bloodshed and revolution and all kinds of things. All kinds of challenging things happened, but it was a really intense period of societal disruption that was also like a disruption in the flow of information in the world.
In the early days of the web, people made similar claims that, you know, the web — this new powerful network technology — is going to disrupt the existing gatekeepers and the kind of knowledge bureaus, which I think has turned out to be true. It created a ton of disruption for existing kinds of knowledge brokers like publishers, record companies, and broadcast networks. Everything is much more fluid now. But at the same time, what you also see in this kind of back-and-forth dynamic is that new hierarchies often emerge out of these network systems.
Some people would say that we’re seeing that now where there was one point in the web when everything was flat and loose, and then a new kind of structure emerges around that. Some people would say that maybe that has something to do with the rise of big tech companies. There is this kind of tendency for new hierarchical systems to emerge, and then they, in turn, sometimes get disrupted over time by new networks.
So, I have started to explore that kind of dynamic a bit. And it’s interesting. Since then, I think there’s a really good book by Niall Ferguson, The and the Tower that actually talks about this exact topic of networks and hierarchies I would definitely recommend. I wish he’d written that book first because it would’ve been a great reference point for my book. But you know, I think I’m not the only one to make this argument, but it definitely plays out, you know, through the narrative in my book as well.
Jorge: In episode 122, Sönke Ahrens observed that there is a difference between concepts and the words we use to describe them. Being mindful about language nudges us to think about what we’re really saying.
Sönke: The difference between concepts and words is extremely important because you have to understand what you’re linking to. You’re not just linking to something that happens to have the same word, but you link to something that is actually on a content level relevant to the idea you’re developing. And that note you’re linking to might not even have the same word in it. It might be discussed with another term, another concept, or another word. So it’s a system that nudges you constantly to reflect on what you’re actually talking/writing about, and any connection that makes sense has to be a connection that is well understood.
I myself try to link in a way that the link is embedded in some kind of sentence that describes the connection, so I don’t collect random links that quickly become overwhelming after a while. But I give myself an account of why I link to that particular note: how is that helpful for the idea I’m trying to express, the thought I’m trying to develop? And that gets you quicker into deep work instead of shallow work where you more or less just organize the ideas.
Theme 2: Information Architecture Strategies
Jorge: Our second theme can be described as being about information architecture strategies. The key distinction here is that these were conversations that were primarily about organizing information to make it easier for other people to find and understand stuff.
In episode 108, Carrie Hane discussed the practice of content modeling as presented in her and Mike Atherton’s book, Designing Connected Content.
Carrie: The book provides a framework for publishing digital content across any channel. It’s a five-step process, starting with domain modeling — modeling the truth of the domain and the subject area that you’re working in — and then the content model, which is the content that the organization wants to represent in various ways, then designing the content based on the structure that you’ve defined in the models, and building the content management system — the content repository — to enable that, and then, designing the interfaces and the navigation to showcase the content and the reason for being: to serve the user needs.
We try to break things down so people can understand why and how chunking inside of blobs makes things more sustainable and future-friendly, so things don’t have to be disposable. Websites don’t have to be disposable anymore because you don’t have to redo the content every time because it’s more semantic and intentional than just serving one website need.
Jorge: A related conversation happened a few weeks later, in episode 111, where Andy Fitzgerald told us about structured content. In this clip, Andy explains the use of headless content management systems.
Andy: A headless content management system is an approach to managing content that thinks about content, not in terms of a page on which it’ll be published. So, with something like WordPress, you have a form that you enter, and that content gets published to a page, and there’s a tight connection between those things. In a headless content management system or in a de-coupled approach to content, you create content as almost an abstract entity that may be published to a page, or it may be read by a voice assistant, or it may be integrated into a help bot. That’s the idea, anyway.
But the challenge with that, and that I think a lot of headless content management providers — system providers — are still challenging or running up against is that when we create content when it’s written content, we use the shortcuts of the page. So we use headings, indentations, bullets, all of those things to communicate relationships between pieces of content. The classic recipe example: if I show you a recipe and a recipe book, you know which set of content is a list of ingredients, and you know which set of content is a list of steps by the way that they’re formatted and the way that they’re put together. And the heuristics of the page or the shortcuts that things like headings and invitations and just typographic conventions allow us to do are to create complex relationships between pieces of content.
And we do that without knowing that we’re doing it because it’s common to everyone who reads, particularly those who read in a similar culture, in Western culture, say. But if our goal is to communicate ideas, facts, and concepts, which, for most organizations, is really their goal: not to publish pages, to communicate information. Even with the huge amount of content that’s now available and the fact that although we can publish a lot more, our capacity to absorb it and make sense of it hasn’t grown at all, let alone to keep pace with the amount of content and the amount of information that’s being published.
We need algorithms. We need the robots to help us figure out what to look at and how to understand it. And those heuristic shortcuts of the way that something sits on a page visually communicates an idea are things that robots are really bad at picking out. So an algorithm knows that I’ve created an ordered list, but it can’t look at that content — not without some advanced processing — and know that that is a list of ingredients instead of a list of steps.
Jorge: In episode 115, Are Holland told us about his core model, a framework for creating more effective digital products.
Are: As an information architect, I’ve been working with abstractions. Structures, navigation systems, hierarchies, and actually, none of that matters if the user can’t find the answer. So, the answer is the core. But if you flip the perspective, if you start out with the answer… or not just the answer. It’s a hypothesis of what answer the user is after, and you can use that as a starting point. So you fit the perspective. Don’t start with structures and abstract strategies and all of that. But just, “Okay! This is an answer we know the user will want. Let’s try to figure out the context around that answer. What are the user tasks? What are the KPIs?”
And so that’s the use of tasks and the goals, and then look at the customer journey before and after this contact point. So that’s the inward paths and the forward paths. That’s what kind of came to me, and o that’s the basics of the core model. It’s just this simple canvas centered around some core. Doesn’t have to be a webpage; it could be actually just about anything physical, digital, whatever. But typically, with digital products and services, this is about some kind of piece of content or functionality that answers a user task. It fulfills my intention as a user at the same time that we reach a business objective. It’s at the intersection between user experience and strategy.
So, that was kind of an epiphany or whatever. Since 2006, I’ve been working with this model to try to understand its implications. And the implications are actually quite huge because this is just a very simple… you can say it’s kind of the atomic parts of the user experience. So, yeah! It’s the smallest parts and the universal parts that everyone can understand.
The simple terms in the core model are you have a user and a target group. And the user has a user task. And the business has a business objective of some kind. So that’s kind of the top floor. And then you have a main floor, which is then some kind of an inward path… the kind of the previous part of the customer journey. And then it has a forward path. And that’s kind of the rhetoric part. How can we steer the user in the right direction through… You can apply behavior, design, and all other elements.
But that’s just the structure around it with the target group, user tasks, business goals, inward parts, and forward parts. And when you have defined those, then you can come back to the answer. So this is really a diversion! So you fill out the rest of the canvas, and then you come back to the answer. So now I have a solution space where you can have full freedom within this little structure to find out what kind of answer this is. What format? What channel? And it can turn out to be something different than you thought at the beginning.
Jorge: Episode 116 featured the first of two conversations with Bob Kasenchak about the structure of music. This was a broad-ranging conversation about deep subjects in IA, which can be applied to more than digital products. One of the examples we used centered on a meal that Bob and I shared.
Bob: The restaurant, by the way, for listeners, was called N7, and it’s a French place, kind of slightly off the beaten path in New Orleans, and it was fabulous. And maybe the best wine list I’ve ever seen, although my tastes are biased. But that’s interesting. Let’s talk about that for a minute because the wine list was very extensive and presented though with very small pages and many, many pages instead of a big book with fewer pages that sort of encouraged browsing and flipping through and experiencing its depth instead of going directly to what you were looking for, which I think is interesting.
And even if we think of something very simple… so we have a restaurant, Jorge, and we have a menu, and we have appetizers, and we have whatever, and we have entrees. So, we have our list of entrees. How are entrees usually organized at a restaurant? Are they alphabetical? No, they’re not alphabetical. Are they the cheapest to the most expensive? Well, not usually that either. Usually, at the top of the entrees list are your featured or signature dishes, or the ones that you think define either your restaurant or the sort of… I don’t want to use the word ‘genre,’ but I’ll use it: genre or ‘mode’ of cooking that you’re using… the most characteristic or the most indicative dishes are the ones at the top, right?
That’s a very interesting way to organize information that we don’t see in… that you could not get away within a larger environment. No restaurant has 700,000 entrees, so you’re not searching JSTOR for your entree. There’s a list that you can probably grasp, one or maybe two pages. And so the question of how to organize that. So, menu design must be a field. Is it a thing?
Jorge: In episode 118, Maggie Appleton shared insights into her digital garden. While this was primarily a conversation about knowledge management, I was curious about how Maggie structures her public website, where she shares what she learns.
Maggie: Navigation and structure are some of these design problems in digital gardens I would say we haven’t solved yet. I think there are a lot of unsolved problems in the design of digital gardens. Like, we can get into even infrastructure problems later that there aren’t… it’s not easy to build one of these. Mine is built by hand with a bunch of janky code and hodgepodge over the years; it is not something I can just hand to someone else and have them spin up their own version of, which makes me really sad. I wish I was a better developer to be able to enable more people to build these. At the moment, we just don’t have great frameworks, systems, or principles for doing it.
Navigation — yeah, it’s definitely one of these problems in that the problem you described, where you land on a single page, and usually what people do is they put backlinks somewhere, either on a sidebar or below the post. These are like pages that link to the page you’re currently on. And that’s really great. That gives you something else to click on next, right? You can click through and explore. But sometimes, you get stuck on a dead-end page. There aren’t any links back to it.
And that’s when you need somewhat to be able to jump back out to some sort of global navigation structure, right? Like they need to go to some index page where they can see everything. And if someone has like a thousand notes on a digital garden, you can’t really browse an index page that well. So, that’s where you have to rely on things like filters and searching, you know, our kind of typical design patterns.
We have the searching and browsing information on the web. I wish there were better information architecture patterns. Again, I’m going to say this is an unsolved problem at the moment because the chronological stream gives you a very natural order of where to go next, right? You’re just going to the next post. The next post.
The dream of digital gardens is that you should actually see the most relevant next content every single time you’re on a page because everything is connected by content type, relationships, themes, and tags. But I think that there maybe aren’t good examples of this being done on many digital gardens because we’re missing that sort of infrastructure piece, and we’re missing the kind of best design practices.
Jorge: In episode 126. Nate Davis encouraged us to think more deeply about the value that information architecture adds to organizations.
Nate: As we know, over the years, it’s been very difficult for a lot of people to articulate information architecture in a way where it’s tangible to people, practical in some cases, and also situational. I wanted to get at that, which is to help people situate, to say, “Okay, this is how I can use someone with information architecture chops.
Or this is how, from a designer, this is how I can leverage some of the other thoughts and areas of concern that information architecture is considering and maybe improve my UX/UI design game a little bit more because it’s like it’s more than just these things. And that’s why I wanted to get the idea out there that information architecture is more than just these popular ideas, which is the more tactical that we see.
Theme 3: Knowledge Management Tools
Jorge: The third theme this year was about the tools and strategies we use to manage information for ourselves. This is the theme closest to my new book, Duly Noted, which is why I hosted so many conversations about this subject throughout the year.
In episode 107, Michael Becker told us about how he manages his personal knowledge, and in this clip, he focuses on writing as a way of thinking.
Michael: I spent the first thirty years of my career, or… you know, frankly, I’d say I’ve spent the first fifty years of my life working in output and focusing on output and working in output tools. And basically being told, “This is the output I want, so then, therefore, use this tool to produce that output.” And in some ways, that makes a tremendous amount of logical sense. But for me, those tools always led to abject failure. Those tools always led to me never really feeling comfortable or never really getting there because my brain simply didn’t work that way.
For example, when you’re working in what I’ll call a “vertical” format like Word or Numbers or even Scrivener, for that matter… and it might sound so contrite and simple when I say it this way, but for fifty years, I struggled and when I finally realized it was okay for the last thing I wrote to become my introduction.
The middle thing that I wrote to become my conclusion. You know? I had such a terrible time with writing and other challenges that in linear software, frankly, and again, I’m embarrassed to say; I spent the vast majority of my career thinking the first thing I needed to write had to be the first thing that people read, right? Because when you know, no one ever really taught me, or I never really understood that writing actually is thinking. And I think this is another really important tool.
And I got this during my doctoral process when I was cramming for one of my early tests. I was reading a book, and I’ll remember the author one day. I will try to find them and thank them for this. One of those pivotal moments in my life was when I read this line that said, “Writing is thinking.” When you realize that writing is thinking, it unleashes this ability that your initial** **thought doesn’t necessarily have to be the one you end with. The first thing you write doesn’t necessarily have to be where you end with; it’s allowed to move around.
From the point that I read that article — writing is thinking — it took me fourteen years to then find the tools that allow me to write in a way that gave me the capacity to think in a multi-dimensional way. Bernardo in the Tinderbox community taught me to call it “metacognition,” to actually learn to start seeing the world through metadata. And once you finally get there and you get to that level of abstraction — because thinking is also a tool of abstraction — then the world’s your oyster.
Jorge: Speaking of tools, Obsidian has become central to how I manage my personal information, and it also features prominently in Duly Noted. So, I was very excited to discuss Obsidian with Nicole van der Hoeven in episode 110.
Nicole: Obsidian is an app that is kind of like an extensible knowledge base. And one of the things that people say it is is a note-taking tool. And that would not be incorrect. It is a note-taking tool. But I think that our traditional way of thinking about taking notes really can get in the way of using Obsidian at its fullest or in the most effective way that we can.
For me, I kind of look at it as my own personal Google. It’s like a search engine that only searches my interests, what I’ve done, and what I’ve thought. And in the same way, it also has the biases that I have. So you have to be able to interject some ways to break out of that bias as well.
And the reason that I love it so much… I mean, there are a few reasons, but one is that it is very well suited to a developer’s mindset because I work for an open-source company. Obsidian is not open-source, but it still has a lot of that open-source feel. So Obsidian itself is not open source, although because it’s an electron app, it runs on Electron. You can really inspect a lot of the code. So there’s transparency. Then, its entire ecosystem of plugins is open-source.
So I love the idea that if I don’t like how something looks, then either there’s a plugin for that, or I can write one for it. And I also love the idea of modularity in tech. We have the word ‘composable,’ which means that instead of having this large monolith that is one application that does everything that you could ever want it to do, maybe it might be better to have composable apps and you choose them that they’re best for that purpose and then string them along in a stack.
And Obsidian is like that. It works well with other applications and in different use cases. They’re not very opinionated. I know there are some tools that lend themselves more to academic thinking, like Scrintal or Roam. Obsidian is so freeform; you can use it for whatever you want it to be.
Jorge: We’ve already featured a clip from Jerry Michalski in this compilation. But I couldn’t talk about knowledge management without sharing another highlight from episode 112. Here, Jerry emphasizes the importance of finding tools and practices that work for you.
Jerry: I said earlier that people like different kinds of tools, and I don’t know what the canonical reduced set of tools is. I don’t know if there are six or ten. I don’t really know. And that space — that path of inquiry — is very interesting. But I’m not on that path of inquiry. I’m not trying to catalog them and reduce them. Rather, I have this idea that people just need to figure out how to find their way to the tool that just resonates for them. And some features will make it work for them. And then they’ll be off and running with a process or a set of rhythms and routines that work for them.
That’s why I think note-taking is the simplest starting point for these kinds of conversations. I ask people, “What do you take notes in?” and a whole bunch of people have no systematic way. A few people use Apple Notes on the Apple operating system, and when they tell me that, I grimace and I’m like, “So how’s that working out for you?” And it’s like, there’s just no power tools around that, you know? You get little post-its everywhere, and you’re lost very quickly, right? Other people were fans of Evernote or DEVONthink, or there are all these older tools, such as Tinderbox from Mark Bernstein.
And so people have to find their way to the tool that works for them. And there’s a whole variety of tools out there. But then the question is, how do you create a practice that really works for you? And I’m afraid not a lot of us have something like that. I feel really fortunate because the moment I had the briefing with TheBrain’s founder, Harlan Hugh, and his then-CEO CEO, Don Block, I was a tech industry trends analyst. That was my day job. And I needed to track who competes with whom, who invests in whom, what company has what products, what PR company represents which company, all of that kind of stuff. And guess what tool was absolutely perfect for that? It was TheBrain. And I can give you a tour through my brain around the venture community and the startup community in a way that will make your head spin.
And that, I can’t emulate in any database I’ve ever tried, and I tried a bunch of other thinking tools and databases and whatever. Can’t do it in another tool. I don’t know; it just doesn’t represent as cleanly. So, then what happens is I’ve internalized how to use TheBrain so that I’m really pretty quick with it. The way anybody using any amplification of human capacity, who gets good at it… Like the 10,000 hours to mastery thing. You just internalize it. So, I’m no longer thinking about the tool when I’m busy note-taking, weaving, and curating. It just passes through me and becomes a thing. And when I put something new in my brain, and I know that it’s kind of in the right place, and I just added a little wisdom to a little shiny nugget corner of my brain, I have a little hit of oxytocin or dopamine in my head. That’s like, “Ahhhh!” And that’s the addictive formula right there. So off and running.
Jorge: Episode 121 featured a fascinating conversation with former librarian and information architect Chiara Ogan. I wanted to know how Chiara organizes her personal book collection. Wisely, she recommended starting by considering your motivations and needs.
Chiara: I think one of the first questions to ask is why. So, figuring out that motivation. Why is it that you have the books? Is it something that… really, no, it’s just a design aesthetic, or you just want it for status or whatever? There’s no judgment on any of the reasons, right? They’re all perfectly valid.
But depending on what you want, your ‘why’ for having these book collections determines what you’re going to do with them. If you have a rare book collection and you’re buying them because it’s an investment, you’re going to treat them and organize them and take care of them in a very different way than if it’s the shelves of your children’s board books that they’re going to chew on and spit on and ripped to pieces, right? Like, it’s a whole different purpose.
And so you might put the board books in a basket on the floor, and you’re not going to do that with your first edition of Jane Austen. So once you figure out your why, that’s going to help you figure out those use cases that we were talking about. Does it make sense to have the cookbooks in the office downstairs when the kitchen is upstairs? You know? Like, how do you want to live with this? Is this a working library? Is it something that you’re going to be referencing? And if it’s a working library, you know, it’s design books that you look things up and be like, how do I do a classification again? You know, what’s the best user research technique for this?
Whatever it is, then you need to make sure that your organization and however you have them is going to facilitate you quickly putting your fingers on that information, right? Because books are competing with the internet. It’s very easy to go to Google and just type what’s the best research method for evaluating classification systems instead of reaching for the Polar Bear book, Information Architecture for the World Wide Web. And so, which is going to be faster? For me, it was always reaching for the Polar Bear book because it’s right above my head. That’s where I put it.
Jorge: In episode 125, Karl Voit shared how he uses Emacs, a venerable text editor, to manage his personal information. But as Carl explained, Emacs is actually much more than a text editor.
Karl: Okay, from a probably hundred-thousand-foot height, I’d say Emacs is an endless, large box of LEGO bricks where you can take out a handful of bricks, you might think that they’re handy for your personal situation and you combine them in a way you need them. And for everybody else, the same box of LEGO bricks accomplishes different things. They build different structures out with it.
So, basically, Emacs is an Elisp interpreter, which is a platform that executes code in a computer programming language called Elisp. It’s a very old language, and many people think that it’s the most beautiful computer programming language. I have to admit I’m not that good in programming, Lisp, myself. I have other programming languages, which I would prefer, but anyway, it’s very flexible, and probably the most interesting thing about Emacs is that you can look up any functionality this thing delivers, learn how it works, and modify it if you want to your needs and your modification is working or is active instantly. So you don’t have to restart Emacs in order to make changes work. You just fiddle with the setup as it is. It comes with a full documentation of all functions.
Out of the box, Emacs is most often referred to as a text editor, but that’s probably not the whole story because Emacs is also being used for other things like gaming. I’ve seen people cutting videos with Emacs. Of course, these are strange types of applications, but they subscribe to the fact that Emacs is very flexible.
And for me, personally, it’s knowledge management. It replaced spreadsheets; it replaced work processing. So, I’m generating documents, PDF files, and presentations. I use it for to-dos and project management. And yes, those structures are built from those LEGO bricks that helped me with my daily life, and it doesn’t necessarily need to be like that.
So, other people, for example, just use it for programming, or other people just use it for simple to-do lists. So it scales very well. The good thing about Emacs is the vast amounts of functions that come along with this platform; they are not in your way when you don’t use them. So, it’s not that when you start Emacs, you’re confronted with a hundred thousand buttons, and you only click on two of them, and the others are distracting you from your work. That’s not the case with Emacs. That’s the good part. So you can start very lean; you can use only the most basic functions and be happy with them and never extend your desire to extend your personal setup.
Or you can think of, okay, now I’ve got this thing here, which is helping me with my grocery list or dealing with my to-do items in business. And now, it would probably be nice when I am able to draw graphics with it, for example, network structures or something like that. Most likely, there is already a good solution that integrates with Emacs, and most of the time, it’s Org Mode, which is an extension of Emacs itself.
So whatever you think you need a solution for, there’s most likely a great solution out there, or probably ten or twenty. You are able to look at them, for example, YouTube videos online, if they look like they would help you, and so forth if they align with your personal requirements, which is very important. And then you can decide whether or not you try them on or try something else when it doesn’t work for you.
Theme 4: The Impact of AI
Jorge: This brings us to the fourth and final theme: the impact of artificial intelligence on how people organize information. The popularization of large language models, or LLMs, is the biggest tech story of 2023. So, it’s no coincidence that the topic loomed large in our conversations this year.
In episode 111, I asked Andy Fitzgerald if AIs might help us compensate for a deficit of structured data in systems with lots of content.
Andy: Yeah. So, that’s a great question, and it’s one that I think comes up a lot. There are some examples of when there is a knowledge model that can be leveraged. Machine learning or natural language processing can extract some of that information. I think there are many more cases where an intelligible level of expressiveness has not been shared in a document. And here, I mean the difference between the work and the document in a particular document, and someone wishes — or an organization wishes — to extract the expressiveness that’s understood but not explicit or that can’t be derived from that document. And in those cases, we just get abject failure. I think that the job of structuring and communicating content is going to be around for a long time. I don’t think it’s going to be automated away in part because if the information isn’t communicated in a document in some way, it can’t be extracted. And it can’t be extracted and structured.
Let’s go back to our recipe list, for instance. If I have two lists of ingredients and steps, I know which one is which by looking at them. It might be possible to train a machine model to run and recognize the ingredient lists, identify things that are simply food types and quantities on the steps list, and identify language elements that are an imperative. Identify language elements that describe a series of things going on over time.
It might be possible to do that. But here we’re looking at a really constrained and really predictable set of information. So, recipes are fish in a barrel. Information architects all love talking about it because they’re the easy examples. You’re only going to do so many things with a recipe, and they’re going to have certain types. But when you look at language as a whole — and this is where, coming back to where our conversation started, I think by virtue of enjoying thinking about language problems, it invites me to continue exploring this — when you look at the complexity of language…
I have some little language games or jokes, I guess. Little linguistic sort of puzzles. And one of my favorites is, “Time flies like an arrow; fruit flies like a banana.” What’s the machine going to do with that? It is a valid sentence. “Time flies like an arrow; fruit flies like a banana.” But it tickles me. I like to pick it apart.
Jorge: In a similar vein, Bob Kasenchak and I discussed the impact of large language models on the organization of information. Bob emphasized the difference in scale that current models represent. This was in the second part of our conversation, which we published as episode 117.
Bob: We could probably talk for hours about large language models, but one of the things that sort of occurs to me is that things like Grammarly were already working on this same principle. What was different was the scale. Like Grammarly, the thing about the LLMs is that they have billions and billions of inputs, and we now have the processing power.
This is a Moore’s Law thing, right? We now have the processing power to do ad hoc statistical analysis on the fly of a huge data set to do predictive text. Whereas that same inferential backward on a corpus technology has existed in auto-classification systems and things like Grammarly and other things for a long time.
But the scale is… I mean, if you drew a picture of the scale, you wouldn’t even be able to see the little one, the big one. They’re so disparate in what they’re able to do. I don’t even want to go in it! The different things that people are trying to get it to do are just so interesting. You know, write me a poem about this in the style of Carl Sandberg. Like, that’s not really that interesting. What’s interesting is that you can get it to do executable Python code. You can get it to build you a taxonomy and express it in SKOS that’s valid and loadable into a system.
I think what we’re going to see, obviously, as we get to not the generalized model but a model that someone can bring in-house behind a firewall and train with their own content. You’re going to be able to see fewer hallucinations and more specific things that someone at an enterprise is going to train it to do. What it is not good at is writing prose. That’s not replacing writers anytime soon.
And you know, I’m sure - and I have been reading about - that academics are struggling with assigning essays to their students. I think there’s a lot of tells that you could tell when something’s been ChatGPT’d, you know? I’m sure that’s a massive struggle, but it’s just been so interesting the past 3, 4, 5, 6 months watching every week, people scrambling to make sense of this! What to do with it, how to use it, how not to use it, what is it, what isn’t it? And like, there’s just… you can’t keep up with the amount of content that’s coming out.
Jorge: In episode 114, Dan Russell talked about how AI might change the experience of searching for information.
Dan: This is a fascinating time to be alive if you are in this field at all. I have to admit, even though my background is in AI and I’ve done natural language processing for years, I did not see this coming. I mean, I saw language models coming a couple of years ago, but I did not anticipate the breadth and depth to which these things would work. So, it’s been interesting as a person interested in information quality and the depth to which people understand these things, to see how it works.
At the moment, large language models like Google’s Bard or Microsoft’s Bing, which uses ChatGPT4, are changing rapidly. The first thing to recognize is that if we have this conversation in a year, everything’s going to be different. Right now, large language models have a real problem with what’s commonly called hallucination or fabrication. They’re just making stuff up. The best version of this I’ve heard is that it’s like a cybernetic mansplaining system where it’s just basically making stuff up to fill the gap.
At the same time, it also provides a kind of ability to search out information in very, very different ways. As an example, I wrote a post recently about searching for words that end in ‘-core.’ So earlier, you used the prefix ‘ur.’ In one week, I heard multiple people say something like “synth-core” — synth-dash-core. Or “night-core” or “mumble-core.” and I thought, wait. Have I missed something? What does this core thing mean? And I don’t know of any way to find that on Google using traditional search methods. So, I turned to Google’s Bard, and I said, “Hey, tell me about these words that end in “-core.” And I gave some examples like “mumble-core” and “synth-core,” and so on. And it gave me this lovely little essay about “core” meaning a design aesthetic or perspective on the world. And then I said, “Show me ten more examples of that.” And it gave me ten more examples of words that I had never, ever heard about. Like “cottage-core.” I don’t know what cottage-core is. So, I went and looked that up, and it turned out to be a design aesthetic that is very comfortable. Imagine West of England cottages with moss and wooden shingled houses, etc.
That’s an interesting way to access information that wasn’t there before. Now, the problem with hallucination, I think, is a serious one. I’ve also learned from these large language models that I died in 1993. I’m happy to report that that’s not true; rumors of my death are greatly exaggerated. But I think an important point right now is that they’re fabulous for doing some kinds of things, but you have to check absolutely everything. I saw one little essay that was written by ChatGPT-3 the other day. It was twelve sentences long, and one sentence was exactly the opposite of the other eleven sentences. It was remarkable! It completely inverted the sense of what it was saying. So, at this point, you have to actually check everything.
I am optimistic, however, that this problem will be solved. I don’t know if it’s going to be in six months or two years, but I know of ways to sort of make this a whole lot better and make the results actually much more factual. There are a couple of systems out now that actually give citations for all their assertions. There’s one called “scite” — SCITE.ai, that, if you’re a scholar, it’s a really nice large language model that’s trained on the scholarly literature. And will give citations for things you ask. So if you ask, for example, “What are the metabolic processes involved in ATP in, say, lizards?” It’ll give you this nice little essay with citations for everything, which is really remarkable.
So I’m optimistic about this. I don’t think it’s going to undo all the necessity of having some literacy about information and information resources, but it’s going to give us a whole new set of tools to look at and craft and understand all the stuff that’s out there.
Jorge: In episode 118, Maggie Appleton struck a cautious note about what large language models might do to the credibility of the content we find on the web.
Maggie: I stepped into the AI world about ten months ago, and it’s been a bit of a jarring experience; I mean, for everyone, right? The last six months have been a bit shocking. My position before language models appeared on the scene was that we should all publish everything all the time.
Publishing your knowledge to the web opens you up to having relationships with other people, right? I think I’ve had so many wonderful friendships and collaborations and amazing jobs all come through writing on my website and writing on Twitter. And I would just say that’s, you know, so invaluable.
There’s nothing I could trade it for. It’s just been the best people. Because it’s like putting out a bat signal for everyone into the same things as you, and they like come running, and you’re like, “Oh yeah! These are my people!” I’ve absolutely loved that, and I couldn’t have done that without publishing it on the open web and just kind of inviting anyone who wants to talk to me to come chat with me.
And now, we’re facing this moment where language models are scraping the web for all this text and training on it. And we are not quite sure of the repercussions of that yet. But in the essay I’d written, I was mostly worried about how this will affect human relationships on the web. So, the thing that I really valued from publishing so much. And then trust and truth, I think, are kind of up for debate.
What happens is that it has just become incredibly cheap to generate content that is being published on the web. So, you can get any of the large language models like ChatGPT or Claude or any of these ones to just generate millions of words in a couple of minutes, and it’ll cost you like pennies. You can generate keyword-stuffed articles on anything you want under the sun and publish those on the web.
And I think it’s still an open question of what happens to Google search in this world. Because we don’t quite know how Google’s going to respond to this outpouring of generated content, which is already happening. We have plenty of evidence that people are already doing this. But it means that if you search for a topic on Google that otherwise would’ve led you to someone’s personal website with their personal opinion on it… an opinion that is grounded in like a very embodied reality — their experience of the world, who they’ve read, who they know — you are instead going to… all the top results will just be generated content. It’s just going to be, you know, rehashed stuff out of language models. And that doesn’t mean that the content isn’t true or it isn’t accurate, right?
We have trained these models to actually be quite accurate, but there isn’t a human behind it. So, you can’t have a relationship with whoever’s writing these words. And while it’s more likely to be accurate and true, it still isn’t grounded in reality. When it comes down to it, those words could be false, but we have no way to validate that. And you have no way to check it, because you can’t contact the person who wrote it because no one wrote it. It’s like it was just generated text.
So, I think I’m very worried about our ability to connect with one another and form relationships when everything you read on the web no longer has a human behind it and how we stay grounded in empirical scientific reality if there’s just this explosion of generated stuff, which includes lots of hallucinations. But we don’t know which content has been hallucinated and which one hasn’t.
Jorge: In episode 126, Nate Davis emphasized the importance of accountability when using AI. While Nate sees AI as an opportunity, he also believes it’s important for these systems to be aligned with the needs of organizations.
Nate: I think there are certain jobs that will be displaced because of the efficiency brought on by automation that comes from using large language models and methods for artificial intelligence. Particularly as it relates to the work that I’m interested in and the work that a lot of information architects are trying to get at, what you will find is that some of the challenges for artificial intelligence or for large language models, more specifically is that it is now augmenting or becoming an alternative source for information, right? And because of the way that it technically is architected, it is not able to check itself. It doesn’t know what it doesn’t know. And as a result, it doesn’t, in some cases — in many cases — it doesn’t necessarily understand the meaning and the nuances of semantics or inferencing in some cases where it matters.
I know that there are efforts in trying to close a lot of these gaps, but… I guess what I’m getting at is that there is a level of accountability that will still be… that was actually going to be created, right? And I think a lot of the accountability doesn’t exist today. Hopefully, organizations will realize that, in order to be accountable — conceptually, or if a statement is made — there has to be… that statement that an agent is now making on behalf of an organization has to trace back to clear intent and understanding and making sure that it’s connecting back to what is understood in the organization and what is aligned to the organization.
And that’s a lot of work because you still need people to speak on behalf of the business. But then you also need now to make sure that there are people who are helping, who are translating what the business understands into smaller bits or chunks of information, content, and concepts that can be transformed into data and content so that large language models can use that internally in an organization to stay aligned.
So, there is this alignment issue of accountability that systems will always have forever until we decide to allow technology to act on our behalf. And it’s over then if that ever happens. So, I do think that especially now that there’s going to be — there should be — more effort in making sure that you have individuals who are thinking about, “Well, how do we make sure that what is said by and done by these artificial agents that they’re conceptually aligned with the organization.” And that’s where we play. It’s an opportunity.
Jorge: We’re going to close this section with a clip from my conversation with Alex Wright in episode 120. Alex understands current developments like AI through a broader lens than most of us, and this gives him an interesting perspective on what might lie ahead.
Alex: I’m always very reluctant to try to predict any future that might happen. I feel like I’m on much safer ground talking about the past. But I will say that when I wrote the first edition of the book, I felt like we were still in a period of relative optimism about the internet. I think there was still a lot of excitement and kind of a utopian zeal around what was happening. That, “Oh! This is going to be revolutionary!” You know, “information wants to be free!” We’re going to upend all the old hierarchies, and it’s going to be this brave new world of all these new businesses. Out with the old, in with the new, and let’s see what happens!
And I think, in the fifteen years since then, the conversations have shifted. I think people have started to acknowledge the more complex and sometimes problematic effects of this technology and that it has created some fairly painful disruptions. If you look at what’s happened in the media landscape and a lot of legacy industries, the changes in the… a simple example would be the recording industry, or you could certainly talk about how things have evolved in the news industry. And beyond the media landscape, certainly massive changes in supply chains and the global networking of manufacturing and commerce and… you know, it’s a complex picture.
I think it’s over-simplistic to say, “It’s good” or “It’s bad.” It’s certainly disruptive, and I think now people have a much more sanguine view of what’s going on, that there are some problematic things that we need to think about. And now, with the rise of AI, everyone’s like, “Uh oh. What is this going to be all about? This could be amazing. It could be the end of mankind as we know it.” Like, nobody really knows. But certainly, I think all kinds of cautionary tales. But also, if you take the example of Gutenberg, was that a net good for society? Or was it… again, like at the time, a hundred years after Gutenberg, I would say it was a very mixed bag. Like, a lot of really difficult things had happened, you know? There had been a lot of societal disruption, warfare, and bloodshed.
And then, I think today, most people would say, “Oh, that was a good thing for humanity.” Probably? I don’t know. But it’s hard to say. I mean, I’m a bit of an optimist by nature, but I think it’s way too early. We should keep in mind that even though a lot of folks who maybe listen to this podcast have more or less grown up with the internet, it’s still relatively in its infancy. I mean, it’s astonishing how quickly it spread. But we’re really only, what, 25-odd years into the really commercial, popular version of the internet? And I think we’re just at the cusp of the next wave of things that’s going to be really interesting.
But historians tend to — and I don’t call myself a historian — but professional historians tend to be very leery of talking in historical terms about things that have happened in the last 20 years. Usually, you want to get a good half-century between you before you start really drawing too many conclusive statements about what just happened. So, I think we’re still in the thick of it. But it’s certainly interesting to see it up close.
So there you have it, some highlights from the Informed Life podcast in 2023. Again, this episode was partly curated by an AI. I plan to write about that process in my newsletter. Sign up at jarango.com/newsletter to find out more.
And as always, thank you for listening. I hope the podcast has brought you value in 2023. If so, please consider rating us or leaving a review in Apple’s podcast directory. The link is in the show description. I look forward to sharing more of these conversations with you in 2024.