Bob Kasenchak on Music, part 2

Bob Kasenchak is a taxonomist and information architect at Factor. This is the second of two episodes with Bob that focus on what information architects can learn from music. The first conversation, which you can find in episode 116, focused on the structure of music itself. Today’s conversation focuses on how we can make music more findable — but there are insights here for anyone working with any type of information, not just music.

Show notes

Show notes include Amazon affiliate links. We get a small commission for purchases made through these links.

If you're enjoying the show, please rate or review us in Apple's podcast directory:
https://podcasts.apple.com/us/podcast/the-informed-life/id1450117117?itsct=podcast_box&itscg=30200

This episode's transcript was produced by an AI. If you notice any errors, please get in touch.

Transcript

Jorge: Bob. Welcome back to the show.

Bob: Thanks Jorge. I’m excited to continue our conversation.

Jorge: I always know that a conversation is fun because at the end of it, I feel like, “man! I wish we had had more time!“ And our conversation last time went in an unexpected direction and we both said, “you know what? Maybe we should do a second part to this. And here we are. And…

Bob: It also… It left me smiling,

Jorge: Yeah, same here. Same here. It was uplifting. And just now, before we started recording and we were trying to revisit what we talked about in that conversation, you said something that I think captures it perfectly. Last time we talked about music as information and what we would like to focus on today is more the information about music. So I’m going to try to articulate it perhaps for folks who might not have listened to the other episode. So, music conveys some kind of emotional tone, Impact, Content — I don’t know; you feel something when you’re listening to music, right?

Information is contextually dependent

Bob: Right, but it’s very culturally encoded and it’s not objective. So that information that’s encoded in music — intellectual and emotional — is context-dependent in the same way that we don’t understand the emotional content of the music of a culture we’re not familiar with, someone who’s not familiar with our culture might be confused why we all get the same emotion out of Beethoven.

Jorge: As with all information, it’s contextually dependent, right?

Bob: Right.

Jorge: So that’s like the information that music provides, or, music as information. But then there’s a question of, “well, how do you know what to listen to? How do you find music?“ Right? And that is perhaps the other side of this coin, this idea of information about music.

Bob: Right. This has been brought to the forefront in the past, I don’t know, a couple decades, where music consumption has very much changed from physical media to non-physical media. Digital cloud-based service things. Old people like us probably still like our physical music media, but a lot of people, especially younger people now — like an entire generation — access their music as content through streaming platforms and YouTube and other things. So that brings us to an environment. Searching for music has stopped being a… and you and I probably remember going to the record store and flipping through the bins we’ve already flipped through a hundred times because this week there might be something new in our favorite artist or in a genre or something.

Now, in a record store, there’s certain amount of very broad slice genre. It’s like classical, jazz, rock or pop, and then maybe 25, 30, 40 years ago, you started seeing maybe a hip hop or rap section. But it was very broad swarths, in that they’re just alphabetical by artists. And if you get into an artist and someone’s actually alphabetized the albums in them, like, you’re not in a record store that I was in. But now, just like any other sort of searching through large bodies of content, which we’re used to doing for text-based things, documents and whatever, now this is the way that music is accessed.

And so, it requires — and has required, but still requires, because I don’t think anyone has it perfect — a rethinking about how these things are organized. And as we know, one of the entire reasons that you and I exist in the industry that we exist in, is because free text search is inadequate, right? Free text search just isn’t going to do it. You have to have clean, good, standardized, consistent metadata attached to things. Which means we have to agree on what artists’ names canonical versions are, and have alternate versions encoded. We have to agree on what genres are, which is a whole conversation we could probably have. We have to… or at least pick a lane and stick to it as to what genres are so that we can try and find what we’re looking for. And the sort of phenomenon of the single search box before you get to an advanced search page with filters where it’s like: artist date, album title, year, whatever… you just have the Google style single search box. And you need to be able to type text strings into that search box, and it has to parse out whether you’re asking for a genre, an artist, a song, an album or whatever.

Like I said, once you get to an advanced search function, it’s a little… it’s faceted and filtered, right? It’s easier to do. But parsing stuff in that single search box… like that’s very difficult when you’re searching what is effectively a half a dozen or more metadata fields at the same time.

Jorge: Well, and there’s a prerequisite there, which is you are trying to map nonverbal information… the music itself, to text, right? To some kind of verbal description. And I’m thinking of: Google has this feature where if you have an image and you don’t know what it’s called or what it is, you can search by the image, right? Like, it’s not something that you type on your computer because you type it in…

Bob: Yeah, it’s really cool, actually.

Jorge: Yeah, it is cool. And and I guess in music, the equivalent would be something like Shazam, that app that Apple acquired a while back, right? Where you hold up your phone and it listens. And it goes, “oh, this might be…“ you know, this song by I don’t know, Dua Lipa or whatever,“ right?

Bob: Right. But I think there’s actually two things at work there. There is — I am given to understand; I’m not an expert in this area — but I’m not sure whether Shazam is listening to the music or whether it’s listening to a silent track of information that actually is encoded in the music that gives some of the information about it.

Information encoded in music

Jorge: Oh, okay. Can we unpack that? Thats…

Bob: Yes. So, I would need to research this and I hope I’m not just making stuff up because I don’t think I am. But I believe that one of the ways — and, again as usual, was invented for a commercial purpose, but then repurposed for something like Shazam — the way they track how many times your song gets played on the radio and therefore how much you’re owed for your rights and stuff, is there’s like a silent bit of data underneath, as it were. That’s a metaphor, obviously, in this case.

But underneath the track that is silent to humans, but that is machine readable so that they can actually literally track how many times things were played. And it’s not clear to me if things like Shazam are listening to the snippet. Because you’re in a noisy bar and you hold your phone up and Shazam is like, that’s Dua Lippa or whatever. Like, did it really parse that out? Or is it actually accessing the… I don’t know the answer to that. But I know that that sort of thing does exist. There’s like silent machine readable metadata encoded in the audio tracks.

Jorge: There’s a signal there that is in a spectrum that you are not attuned to, but the phone is, somehow.

Bob: That’s my understanding. Again, we should fact check ourselves on this before we publish it, but…

Jorge: Well, I mean… recording it on the podcast might be a way of fact checking it. So if anyone listening in knows what is…

Bob: We’ll crowdsource our fact checking!

Jorge: Yeah, just let us know. But what this points to is this issue of how do you find stuff that is not text-based to begin with, right? Because it’s one thing to hold up your phone and have it recognize something about what it’s picking up on its microphones. But it’s another for you to initiate that search yourself, . Like, I could imagine that if something like Shazam works the way that you’re describing it, it wouldn’t work for you to hold up your phone and start humming into it, right?

Bob: that’s right.

Jorge: Right. So you do have to make this conversion — this mapping — between your memory of what you might be wanting to listen to and what it might be called, or who it might be by, or what genre it might be, or these many other criteria that we use to describe that type of information.

Encoding information in music

Bob: Right. There used to be… I wish I could remember what it was called. There used to be this little printed pamphlet — this is mid-century America and a little bit later — that was an index of famous classical themes, all sort of normalized to one key and described in text so that you could try and find the… you didn’t know what this tune you heard at the orchestra concert was, but you were able to access and reference it. Gah! I wish I could remember what it was called. And I think they normalized them all to one key. So if you could sort of hum it in solfege, you could look it up and be like, “oh! That was the third movement of Mendelssohn’s 4th Symphony.“

Jorge: oh, But thats fascinating. It’s like kind of search engine for tunes, right?

Bob: Like, in a pamphlet; in a little staple-bound bound book, yeah.

Jorge: Yeah. That’s amazing.

Bob: And I think that the way things like Shazam will work, is it doesn’t encode a melody. So you couldn’t hum into it. It takes like an entire spectrographic sonic fingerprint of the thing and matches those sound waves. But when we — outside of things like Shazam — are searching for music in a digital environment, we are reduced to text searches about non-text information. That’s the sort of weird inflection point where information science enters the arena, which is now classic information science stuff. Like, we have to have canonical names of artists, like an authority file like the Library of Congress keeps for authors and artists and stuff like that.

And then, if we’re going to go further, we have to accommodate people’s failings in spelling and variants of things. And people with ambiguous same names. Like, if I type “salt and pepper“ into a search engine, will I get Salt-N-Pepa tracks back? Like, these are all things that are possible and help with and then there must be some way to encode… so we’re going to get into all this.

Recently, I was studying some jazz with a student, and it’s time to go listen to “Dolphin Dance“ off of Herbie Hancock’s very important 1965 recording, Maiden Voyage. Now… it’s a very famous and very ambiguous tune which is a whole nother discussion. So, if you go into YouTube and you type “Dolphin Dance,“ there’s obviously thousands and thousands of versions of people who have recorded this. How does it know that this 1965 Herbie Hancock is the canonical one? Does it know? Is it just the one that’s been listened to the most? Like why does it float up to the top of the search results instead of some guy’s basement, solo guitar cover of it? Is it views? Is it ratings? Or is there a metadata encoding that says “original version“ or “first version“ or something like that? Like how does it… because it did; it got it right. The first thing that it served me was the original track. How does it know that? And I don’t know the answer to that question, but… yeah. Because it could be via metadata or it could be via behavior.

Edge cases define the model

Jorge: I wanted to circle back to the example because… Just to mention what I would consider like the canonical one in the music space would be probably that period where Prince changed his name to that symbol.

Bob: Oh yeah!

Jorge: Right? And then at that point, like, our keyboards do not have that symbol, right. So…

Bob: No!

Jorge: You cannot, well, not most of us. I know people who probably might commission one because they love Prince so much, but … and you also can’t dictate it, right? Like you can’t tell your phone, you can’t speak that symbol. It literally is a glyph that has no phonemes associated with it, right?

Bob: Yeah, yeah.

Jorge: So, in such a case, I would imagine that you would have to have some kind of metadata in the backend, or some kind of thesaurus or something that maps Prince to that thing, right?

Bob: Right! Because what choice are you left with? If you need to assign an artist to a track or an album a piece of audio content. I don’t know what else you would do. Like, you still have to equate it to Prince, so that’s probably the text string that people are going to type. Yeah, that’s a fascinating, fascinating example.

Jorge: So, Well, one thing that this makes me think of is part of the reason why I wanted to have this conversation with you about music, and this is only becoming evident as we’re having the conversation, is that music might be a type of information that a lot of people are familiar with, a lot of us interact with,

it’s a big part of our lives. And it is… when you start thinking about it, it’s pretty complex, right? And we’re talking about some edge cases like, you know, Salt-N-Pepa, and the artist formerly known as Prince. Those might be edge cases, but even within “normal“ situations, you might run into weird conditions, right?

So, I’m thinking, for example, there are artists who… well, taking a step back. The way that a lot of us consume popular music is through this construct of: there’s a taxonomy of genres, there are artists who record music within a genre, maybe within a couple of genres; those artists release works that might be packaged either as singles or tracks or compiled into things like albums. I think that we have a mental model in our minds about how this taxonomy works. And what’s becoming evident in this conversation is that while that might appear to be a fairly simple construct, there’s actually all sorts of weird situations.

Bob: Hmm.

Jorge: And I’m thinking here of an example: artists who release albums that don’t have unique names. I think Peter Gabriel’s first four albums were called Peter Gabriel, right?

Bob: All four of them?

Jorge: I believe so. And they have a nickname. Like, fans call them by their… Fans have assigned names to them based on their covers. Like, one of them is called Melt because it’s like a photograph of Peter Gabriel’s face kind of melting. But I think that formally they’re called Peter Gabriel. And then what do you do with that, right? Like, because if you have an artist who has four albums that have the same name, then you know, that creates a challenge when presenting that information, right? It places a bigger onus on things like the album art or whatever, right?

Bob: Well, and on the user. So the first thing that strikes me is like… and I guess I always thought that album was called Melt. So I didn’t know that. I think the first thing I would do would be to try and differentiate them by year of release, but that really puts an onus of a deep catalog knowledge on the searcher.

Jorge: Right. I’m looking … now I have to fact check me live, right? So I’m looking at Peter Gabriel’s discography on Apple Music and the covers indeed just say Peter Gabriel, for those first four albums. And Apple seems to have added meta… Changed the names of them. So the actual name of the album says Peter Gabriel 1, Car. Peter Gabriel 2, Scratch. Peter Gabriel 3, Melt, right? And I would… but I don’t think that the covers say that. The actual album art.

Bob: Yeah, so Wikipedia has them as Peter Gabriel, aka Peter Gabriel 1: Car, Peter Gabriel, aka, aka…. So it seems like they literally were all just titled Peter Gabriel.

Jorge: Right. Which… and again, it might be that these are edge cases, but the thing is like, these are not obscure artists we’re talking about, right? Prince and Salt-N-Pepa and Peter Gabriel are fairly mainstream. So these are exceptions, but exceptions that are highly visible, right?

Bob: Sure. And like in a lot of cases, as I think we would both argue, the edge cases define the model in some ways. Because it has to be able to accommodate them. Even though you might not use certain elements or features of a model — an information model — in most cases, you have to be able to accommodate edge cases. Which comes back to the point we were going to jump off of, which is why does Apple Music need a separate app for classical? And I understand they’re developing one for jazz as well. Is that true?

Jorge: I don’t know about that, but I definitely had that on my list of things that I wanted to talk with you about.

Apple Music Classical

Bob: Well, it just segues into it from what we were talking about because as we discussed last time a little bit, and I won’t rehash it, but I’ve been in a situation where people were trying to stuff classical album information into a pop album information format, and they’re just fundamentally different both from the perspective of how you contain the information and what they’re looking for and in the amount of information that people are interested in and complexity. Pop music is a little simpler most of the time. And again, most of the time, people aren’t looking for the composer of a pop work.

No one probably knows who wrote, “Oops, I Did It Again.“ It wasn’t Britney Spears. She’s the artist of record and there are actually several interesting covers out there of “Oops, I Did It Again.“ Which is a pretty interesting underlying tune underneath the pop veneer and production. But like, no one’s searching by composer for that work. They’re searching for artists. And even if they’re searching for a cover, they’re searching for the song and the artist.

Whereas in a classical recording, the artist and the composer — most of the time, especially for legacy stuff, you know, pre-1900 — are completely different. I mean, you can of course pick up early recordings of Rachmaninoff Plays Rachmaninoff from you know early ’78s or whatever. But for the most part, the artists and the composer are different, and they’re both something… they’re both pieces of information that an experienced, you know, educated, classical listener is going to be interested in. I might be interested in all the works by a composer… or especially once you get into the war horses, right? Your Mozarts and your Beethoven’s and even your Bach’s… It’s: who performed it? Who performed the Goldberg Variations? How many copies of the Goldberg Variations do we have?

And then you complicate the matter even further when you get to like a multi-movement work, and especially a multi-movement work for orchestra with conductor and soloists. So I’m not necessarily looking for the Berlin Philharmonic, Mahler 2. I want the Berlin Philharmonic Mahler 2 where whoever… Claudio Abbado is conducting, and these three people were doing the solo parts. Like, so you might be searching by soloist, because “artist“ has several levels. There’s sort of the soloist level, the conductor or ensemble leader level, the name of the ensemble…

And then, works are multi-movement in a way that, again, in general, we could pull out lots of edge cases in pop music. I’m thinking about Tales From Topographic Oceans or something. But like most things are not multi-movement works. And even, say, in a stage work — in an opera — you might have an opera. So that’s the top level… or even in some cases, you have a cycle of operas, and that’s the top level. The Ring is four operas — the four operas in The Ring. Each opera has acts. Those acts are divided into movements.

So, whereas a pop song is sort of a unit, and we can talk about the sort of concept of the album going by the wayside and disappearing from younger people’s mental models. But a multi-movement classical work isn’t a unit, but you need to be able to tag at that level as well as all the way down to the specific movement. Like, “I want to find all the performances of ‘Là ci darem la mano’ from Don Giovanni and then sort them by the two soloists who are singing them,“ or whatever. So, the things that people are interested in isn’t just artist and track. It’s a whole slew of levels of track and a whole slew of levels and artists, which it seems to me calls for… or, and Apple has decided!

Calls for a different search interface because it’s very hard to shoehorn that data into a interface for pop. And at the same time, if you did that, you would make it so unwieldy that most of the people looking for pop songs would have 70 fields they weren’t filling out. And it would make an advanced search interface of facets and filters and dropdowns like super cumbersome and mostly useless to people looking for pop music.

Jorge: Yeah, I remember when when the Apple Classical app came out — I tweeted something like, “this app exists because information architecture,“ basically.

Bob: Absolutely!

Jorge: Because you’re dealing with a set of information that has metadata describing it. That requires a different set of structures, a different organization, a different… the people searching for it have different criteria than the people searching for pop music, right? Just the degree of complexity when dealing with classical music. You mentioned the Goldberg Variations and where my mind went was: there’s at least one example of a performer, performing that particular work. And I’m thinking of Glenn Gould, right? Where there are two recordings in different years and they are very different from each other.

Bob: Mm-hmm.

Jorge: There are some people who like one set of Goldberg Variations by Glenn Gould and people who like another set. So even like… even that level of taxonomy it’s like, a work by a composer, performed by an artist… even the artist can have multiple versions, right?

Bob: Absolutely. And you have to be able to take into account lots of things like… so a compilation album of pop music is still pretty uncomplicated. Not all the tracks are by the same artist. Well, okay, we can accommodate that in a metadata model. But if you have… and this is very, very, very common: let’s say you have a soprano making her debut album. She’s not going to record an opera. She’s going to record her favorite arias and songs from a bunch of periods and performers to show off her range and so forth. And so, you’re not indexing the entire opera, Don Giovanni. You’ve just got “Là ci darem la mano,“ and she’s got a partner who’s singing the duet, who’s not the featured artist on the album.

And you have to be able to… but, and your point about the Goldberg Variations and Glenn Gould stands. And obviously — especially — warhorse pieces, the same artists will record different versions. People who are very into like conductors and their versions of symphonies will go to the same level of depth. “I want the 1947 Wilhelm Furtwängler Beethoven seventh and not the 1931,“ or whatever. I made those up. I don’t know if those are real. But, yes! And so it’s very important that you accommodate that level of information search. So, on one level, Jorge, I want to be like, “well, music’s music! Can’t we have one thing that…“ and like, ehhh. It’s just that we have to meet the users at their mental model, right?

Jorge: Yeah. And that’s where I wanted to take the conversation next, maybe to start bringing it back home to people is: how does this inform our day-to-day work, right? Because I’m assuming that most people listening in are not designing systems for people to find music, right? But I think that there might be things that we can learn from these situations. How does this affect our day-to-day work? Like, what can we learn from this?

Matching user mental models

Bob: I mean, I think that’s a great question, and I think that it has to do with user research and meeting people at their mental models. We have a very… I’m trying to figure out how to say this. One size fits all is not always a good approach. And no one wants to build 17 different interfaces to search the content in a, let’s just say, content management system or a DAM or something. But maybe you need more than one, depending on who the user is and what they’re searching for. Maybe we can build a single one that has different states that you can accommodate.

I mean, the basic principle that we all need to have hammered into us, usually from some kind of profound failure on our own part, at least in my case, is that you can’t assume… you can’t just build an interface that has all the buttons you need and say, “well, it has everything they need.“ That’s not understanding the user. It’s not meeting the humans halfway to the information. You know you can’t just… we have to do research to understand the users.

We have to understand what their mental models are and what their behaviors are. What actions do they take in the interface? What are they looking for? How do they pattern a search? And because it might not be the way that you envisioned it when you came up with it. And in my case, it almost certainly isn’t the way I envisioned it! And of course there are design best practices and patterns and other things like that.

But mostly what I think is that, yes! So in the same way that if you were to sort of shoehorn — as it was before this app came around, right? And I’ll bet you that there was a lot of complaining about how hard it was to find classical music well on Apple Music before they launched this classical app. And so, there’s a disconnect there about… all kinds of metaphors are springing to mind, but they’re about trying to force an Apple through your cheese grater. I don’t know! But like, there’s a disconnect there that really fails to understand how fans of classical music, understand, appreciate, and search for it.

And my guess is that it came to critical mass. Or someone saw a marketing opportunity, or both! That prompted this. Because I’m sure this was not a cheap thing to set up tests, design, build, and implement and deploy.

Jorge: Well, and to be fair, I don’t think that it was the first of its kind. There’s another company called IDAGIO, which I’ve been aware of for a while that does streaming of classical music. I suspect for many of the same reasons, and I might have this wrong, but I believe that Apple’s app was an acquisition.

Bob: Oh, really? I didn’t

Jorge: uh, But that might be wrong. Again, we’re doing our research in public!

Bob: On the fly!

Jorge: And having people write in and tell us we’re wrong. But then extrapolating from that, the other lesson that we might take from this it seems to me, might be that as you do the research and find out how people think about the domain, you might find yourself having to redefine the boundaries of the domain you thought you were working within. So you thought you were designing a music app, and in fact, you’re designing a classical music app, right? Because it turns out that music is too broad a construct. And if you try to design a thing that meets all models, you’re going to end up with a pretty complex interface that doesn’t do a good job for all of them all the time.

Designing complex search interfaces

Bob: Yes. And my suspicion is that it gets more weirdly and differently complex if you were to expand it to something like what is generically called “world music,“ which unfortunately means non-western art and western popular music. That music doesn’t fit our paradigms very well either. That’s my suspicion, but it brings me back to thinking about our more day-to-day work which is that we assume that you can tag and surface content in something like a content management system, a corporate intranet or whatever… a webpage.

And that much like we can badly flatten music out to be one kind of thing, we flatten content out to be one kind of thing. So does it not matter if I’m looking for a pdf or a PowerPoint presentation or a Word document? Now there’s obviously filters for content type, but I’m saying that maybe content type has more different kinds of metadata. I don’t know.

I mean, for a long time, as we were talking when we were setting up for this talk, the conundrum was like, well, how do I index my video? How do I index my video? And the answer was like, well, you get a transcription. I mean, that’s how you index the video! That’s where the text is. And now transcription services have gotten more and more and more and more and more and more accurate where thats…

There was a time when you would’ve to go out and hire an expensive service that was an expert in this to transcribe all your video content and then deliver text files to you. And now you can plug it into an AI widget and it does a pretty decent job of doing transcription on the fly. So that part’s getting cheaper. But is there something else we don’t understand?

So if I’m searching for presentations, I want to know: what it’s about, who presented it, when, and in what context. Was it at a conference? What year of the conference was it? I mean, there’s all this other sort of data — metadata — that isn’t applicable to a Word document with a workflow documentation on it or something like that.

But because they’re all coming from the same system, we sort of have one single interface to index and retrieve them. And this is making me wonder, like, what we can do about that.

Jorge: Well to begin with, let’s acknowledge that we’re back to where our conversation today started, which is this notion that for people to be able to connect with the information that satisfies their needs or wants, they have to go through language. Language is the API, right? So in some ways…

Bob: Oh, I like that a lot. That’s good!

Jorge: And and some information is encoded in language as part of its nature, you know? Like a document, you know, a Google Doc or something like that. That’s all language, right? Like that’s all it contains. And then there’s some information that’s more like music, where it’s not easy to express what it is in language, and you have to wrap it in this metadata.

I think one of the powerful things about large language models is that they amplify… like if we buy into this notion that language is our API for this kind of stuff, large language models give us these tools to connect with information in different ways. In ways that we might not have been able to before because we just couldn’t express ourselves accurately, right? So they do the translation for us. It’s like a translation layer on the API somehow.

Bob: So, I mean, I think that digital asset managers struggle with this all the time, and this is what their training is about. Because if you have image content… so you’re a large whatever, athletic wear corporation. So, you have pictures of athletes, you have pictures of athletes wearing your products, they’re doing things, they’re in places. So you have to have metadata about what’s… what ‘wear“ is in the images? Who’s in the images? What product is in the image? Sometimes, what colors are in the image? Who else is in the image? What’s in the background of the image? And the really tricky bit is to sort of say, okay, what’s this image about? Because that seems very contextual! But obviously, you pay a lot of money to hire athletes and promote and sponsor them to get them in your images and so, you need to be able to find them very quickly.

And I’m just going to make something up. You know, I need a picture of LeBron James wearing this shoe, but not in a basketball court. He has to be outside with someone else or without someone else. And I don’t know, the dominant color has to be green or whatever. You know, you have all these facets and filters by which you need to find what is completely non-textual data. It has recognizable objects in it that you can pick out, but that’s a lot of categorization to have to do. But it’s extremely important that whoever needs to use that image in a marketing campaign, be able to locate it quickly and grab it. That’s why you spent all that money to generate it in the first place.

Jorge: And that’s a great example of what I’m talking about, right? Because the robots are getting increasingly good at recognizing that that might be Kobe Bryant in that image, right? And I don’t know that they can do it to the same degree with music. But I could imagine that it could be done, right? Where it picks up certain patterns, certain instruments, and makes you know… and starts categorizing that information and tagging it in various ways. Languaging it, so that we can then find our way to it.

Genre: an insufficient human construct

Bob: Right, so you would have to have a very, very large training set that you were feeding it as exemplars that you were pretty sure were good exemplars? I mean, yes. I worry about our edge cases in a scenario like this, Jorge, because genre is very much a human construct that we overlay on music to sort of differentiate and categorize it. And a lot of music doesn’t fit neatly into genres or is edge cases. Or border cases maybe better than edge case… where it crosses over or is ambiguous. But you could certainly get a long way towards it with some kind of really robust and excellently tagged rating set. I wonder how many mistakes it would make.

I wonder how much overlap there is in sonic signatures between genres that we think of as unrelated. That would be very interesting. And then If my memory serves, again, we’re doing research on the fly! But if my memory serves, didn’t Guns ‘N Roses put out an album of country covers like back in the mid-nineties? I’m trying to remember. So anyway, there’s a lot of cross genre. Genre is very, very problematic. It’s very personal and it’s very subjective and it’s very sort of… what do I want to say? Subculture-dominated. Like someone who’s not into metal might be able to name three or four genres of metal. People who are into metal? There’s hundreds of subgenres and sub-subgenres, some of which are one album or one artist represents it, right? This particular branch of their mental metal model.

Jorge: Yeah. Again, genre seems like a very contextually-dependent construct, right? But maybe there are other criteria, other facets. And I’m thinking here something like whether a piece of music comes across as energetic or calm.

Bob: Hmm.

Jorge: That’s something that might be tagable by a robot, right? Or whether it’s loud or soft? Anyway, there are other criteria, right? That are less contextually dependent that you could start working on. But again, stepping back from music, like, I think that there are broader implications here because I expect that the same sort of processes or mechanisms applied to other types of information, even textual information, right? With with things like large language models, you can start detecting again, things like tone maybe?

Or whether … you know, I’ve used Grammarly. And Grammarly tells you whether the thing that you are writing is friendly or professional or you know, academic or whatever, right. And that’s an assessment that it’s making based on the information that it picks up in the corpus of the thing. Now, in that case, it’s easy because it’s language to begin with, right? So there’s no… there isn’t this translation between between modalities of information. But I think that this is an important area for development, particularly for people who do the kind of work that you and I do for a living namely organizing and structuring and tagging information in various ways so that people can find it and use it.

LLMs for music?

Bob: The whole problem and opportunity of large language models is certainly fascinating and multifarious. It seems like every day there’s a half a dozen new, interesting writings that come out on what it does, what it doesn’t do, what it should do, what it shouldn’t do. And we have to understand it as, you know, statistical, as stochastic. But there are certainly set things that it can do.

So, I wonder about this thing about tone. Is it picking out specific words or like multisyllabic words? Like how is it detecting tone or does it have a training set? And literally, it just says, it’s using what we call a bag of words approach in you know… I’m not an expert text analyst, but like basically you feed it a bunch of content and it’s tagged. “This has a casual tone. This has a business tone. This has a professional tone. This has an academic tone.“ And it makes inferences based on those tags, based on the total glob of words contained in that document. You throw out stop words like the, and stuff like that. And so, is that how it works? Or is it doing something more nuanced and on a sentence-by-sentence basis?

Jorge: I’m not sure, but I’m looking at it now. I’ve opened Grammarly on my computer. I’m looking at it now, and there’s a feature here… Well, first of all, it’s judging my writing based on four criteria: correctness, clarity, engagement, and delivery. And then I can set goals for it. And the goals cover domain, which can be academic, business, general, email, casual, etc. Intent, so it’s like, are you trying to tell a story? Are you trying to convince someone? Audience, and the audience or thinks like, am I trying to come across as knowledgeable? Or is the audience expected to be very knowledgeable? Are they expected to be experts? Or is it like a general audience.

And then formality, right? So whether it’s very informal or very formal. And I would expect that it’s a combination of all those things you were describing, right? But I don’t know; it’d be worth digging into it because it does seem relevant to this kind of work. I could imagine that… so, this is running an assessment on my writing. But it might be valuable to do the opposite. To like, have something like this run on a corpus of information and have it tag it based on these criteria. And then I could search for it, right? Like I’m looking for… I don’t know. I’m looking for a fun, light story to to read before going to bed or whatever. And…

You know, the fun, light story part of it is an assessment that the system has layered on top of data that it has. I don’t know, I’m speculating here. it does seem germane.

Bob: Again, we could probably talk for hours about large language models, but one of the things that sort of occurs to me is that things like Grammarly were already working on this same principle. What was different is the scale. Like Grammarly… the thing about the LLMs is that they have billions and billions of inputs, and we now have the processing power.

This is a Moore’s law thing, right? We now have the processing power to do ad hoc on the fly statistical analysis of a huge, huge data set to do predictive text. Whereas that same inferential backwards on a corpus technology has existed in auto classification systems and things like Grammarly and other things for a long time.

But the scale is… I mean, if you drew a picture of the scale, like you wouldn’t even be able to see the little one, the big one. I mean, it’s… they’re so disparate in what they’re able to do. And it’s interesting that… I don’t even want to like go in it! It’s just so interesting the different things that people are trying to get it to do. You know, write me a poem about this in the style of Carl Sandberg. Like, that’s not really that interesting. What’s interesting is that you can get it to do executable python code. You can get it to build you a taxonomy and express it in scos that’s valid and loadable into a system.

You can… and then I think what we’re going to see, obviously, as we get to not the generalized model, but a model that someone can bring in-house behind a firewall and train with their own content. You’re going to be able to see fewer hallucinations and more specific things that someone at an enterprise is going to train it to do. What it is not good at is writing prose. Like that’s not… And like, it’s not replacing writers anytime soon.

And you know, I’m sure — and I have been reading about — that academics are struggling with this assigning essays to their students. I think there’s a lot of tells that you could tell when something’s been ChatGPT’d, you know? I’m sure that’s a massive struggle, but it’s just been so interesting the past 3, 4, 5, 6 months watching every week, people scrambling to make sense of this! What to do with it, how to use it, how not to use it, what is it, what isn’t it? And like, there’s just… you can’t keep up with the amount of content that’s coming out.

Jorge: Absolutely. No, it’s fascinating, impactful, and we’re going to have to put a pin on it because unfortunately our time is running short here. Maybe that’s the prompt for a subsequent conversation. Listeners are going to be like, what!! They’re going to do a third segment? Maybe! But it’s been such a pleasure talking with you, Bob about this, and hope we can get to do it again. Where can folks follow up with you?

Closing

Bob: I’m fairly easy to find. I’m still on the Twitter platform @taxobob. That’s an easy place to find me and DM me. You can find Bob Kasenchak on LinkedIn. I’m the only one there! The only other person in the world with my name is my father and he is not on LinkedIn. So if you can spell it, you can reach out and locate me. And thank you again, Jorge. This is absolutely a pleasure and we’ll find a chance to do it again.

Jorge: Always fun. Thank you Bob.

Bob: You bet.

Subscribe via

The Informed Life