Andrew Mason, the CEO of Descript Artwork

Podnews Extra

Additional, long-form and ad-hoc podcasting content from Podnews and the Podnews Weekly Review. Expect speeches, interviews, and other pieces of content.

All Episodes

Podnews Extra

Andrew Mason, the CEO of Descript

August 02, 2024 • Podnews LLC

Got feedback? Send us a text message.

Sam Sethi interviews Andrew Mason, the CEO of Descript, a popular audio and video editing platform. They discuss Descript's history, innovative features like the "Overdub" voice cloning tool, and future plans for the platform, including improvements to the editing workflow and the integration of AI-powered automation tools. The conversation provides an in-depth look at the evolution of podcast production tools and the challenges of balancing powerful editing capabilities with maintaining an authentic, human-sounded final product.

We're sponsored by Buzzsprout. Start Podcasting. Keep Podcasting.

Support the show

0:12

Hello and welcome back to Plug News Weekly. I'm joined today by the CEO of the script. His name's Andrew Mason. I'm a big fan, if I'm very honest. And I was just saying that I've been using it since 2019. Andrew Hello, How are you? I Sam, it's great to meet you. Thank you. Now, for those who live under a rock, very quickly, what is the script? So the script is a audio and video editor, but instead of being a timeline editor, like you might be used to, it reinvents the interaction model to where it feels like you're just editing a doc. So everything is transcribed. And then when you select text and delete it, you're not just deleting it from the transcript, you're deleting the audio or video as well. So you can cut, copy, paste, you can even write and it will generate audio in your own voice. That's the basic idea. We started the company as an internal tool inside this other startup that was called Detour was an audio tour company, but really just like a glorified podcasting studio and got a real appreciation for how tedious and time consuming it was to work with narrative audio. I actually started out, went to school for music technology, worked in a recording studio, had a lot of experience working with music, which is what most of these traditional tools are designed for. we thought it would be cool to reimagine what the stuff work looks like using the AI that was quickly developing for an experience that's driven by narrative media. Yeah, we've been plugging away at it for quite a while now. Yeah, and I came across this group 2019, 2020. I was blown away by the simplicity of being able to see the transcript highlights what I wanted to edit and then boom, publish it and straight away I could use it much quicker than anything else that was out there. You'd acquired, though, a very interesting company around that sort of time out of Canada called Lyrebird, which was the precursor to what you now call overdub. Tell me more. Why did you go down that road? What was the thinking behind it? Yeah, so like I said, the vision was to reimagine the audio and video editor as a doc, and we'd figured out how you could delete text, We'd figured out how you could cut, copy and paste. But the big thing we were missing was writing, and we really wanted to figure that out. And Lyrebird came along with State of the Art A.I. Voice Cloning Technology, text to speech and it was just this hand-in-glove fit for what we were looking for, where it felt like they were these really smart A.I. researchers. We were pretty good on the general product side, and they fit right in and have since been responsible for building a number of other incredible A.I. technologies that only do script has. We have a kind of noise reduction, but more than that, audio enhancement effect called Studio Sound. We are in the process of launching a video version of overdub and a bunch of other like effects like eye contact and and greenscreen and stuff like that, the way it is built. Yeah. I mean, you have been pushing the boundaries quite hard. You started out there getting your seed round from Andreessen Horowitz. Mark being my old boss at Netscape, how did you approach Andreessen Horowitz Because again, clearly within the podcasting space, this was a totally new idea, totally new platform. It marks it straight away and go, Yep, there you go, There's the money. Or did you have to persuade them? Yeah, they saw it right away and they deserve a lot of credit for that. First, I should say I knew Mark and Alex, our board member from a prior life. I founded Groupon, and Marc was an investor in Groupon, so we had that relationship. But when I came in, I pitched the firm and while it was outside before my taxi, they called and offered to take the round. And I think Andreessen Horowitz is pretty early for their industry to podcasting and seeing the impact that it can have. So they've been believers in it, I should say from the beginning. Our vision from that initial seed deck, our vision has not just been to be a podcasting tool, but to be a tool for narrative media, both audio and video. So that was the size of the vision. But I think like what they deserve a lot of credit for is seeing how that could come together and the potential in this space. I think like it had been hard at that time, You weren't seeing a lot of like podcasting tools or companies getting venture funding. And I think like a mix of having some kind of relationship there along with vision on their side, we were able to get something done. And they followed up with a series, a round of 15 million, and obviously that allowed you to grow quite well. You then got a massive injection from open air and $50 million as was been reported, and you switched out Rev recently for chatting about the Openai platform for transcription. So I assume that's what's behind the engine now? Partially. Partially okay. Partially means what and how. Got it. I love talking about this level. Things in areas like this, I can just say. So at this point we use several different automatic transcription engines and we have a pretty robust back end for testing different versions of them, whether it's REV or a number of other providers, there's a bunch of them that are competing for a higher accuracy rates. There are some that are better at some things than others. Like when we're recording, we're doing live transcription and we use a different engine for that. So we still use Rev somewhat, we use whisper somewhat and by the time we go to air we very well might be using something different than we're using today. The accuracy is the thing that worries most people about transcriptions. I mean, Apple have come out with their own standard called the human evaluation word error metric or HeLa, because they wanted to get to or at least close to 100% accuracy. I mean, they've allowed other people to include transcriptions. But if they fail their level of qualification, well, they don't do it. Do you see transcriptions from what the script does improving quickly? How far before we get closer to that 100% barrier? Yeah, it's something we look at and we've had to come up with our own evaluation criteria. A few years ago we got to the point where actually the metric I'm kind of like dated, disconnected CEO of a 150 person company now. So it's clearly not in the details. But back in my day when I was paying more attention to this stuff, the metric was where it error rate that everybody use. And it's mostly looking at inaccuracy of as transcribed words, but a couple of years ago, the bigger problem became punctuation and capitalization errors even more than getting proper nouns wrong. And sometimes those traditional metrics don't capture that. And it's tricky because some punctuation is subjective and some is just like objectively wrong. So a lot of like our evaluation criteria was trying to capture that and we've moved to like just using behavioral analytics in the app when we're testing different models to determine what's better or worse than that. Even using like metrics like that that we use to try to rate, if that makes sense. Now fast forwarding towards today, again, using more A.I. like a lot of platforms are you come up with on the Lord. Tell me a little bit more about under Lord. Yeah so under Lord is what we have branded our AI Assistant editor and it's a play on the concept of the AI overlord that nobody wants. But we thought, when did it do any harm to have a nice AI under Lord. That'll do your bidding at least for now. Exactly. B it's like kind of a joke but kind of serious in a bunch of ways. And one of the things we wanted to get across is the importance of having the creator in the driver's seat of this stuff. So there's a lot of really magical new generative A.I. tools that are kind of like, Great, as long as you want what they give you. And I think what we wanted to do was implemented in a way where it was always secondary to the creator. We still focused on enabling the craft of the Creator because what A.I. is going to do is reduce the marginal cost for creating this kind of 80th percentile level of content. And it's still going to be, I think, for quite some time, the human being's ability to edit and craft and go in with some precision to get stuff done. So so what under Lord does is like a lot of the tedious jobs, if you do an unscripted recording of something, it'll go through and repeat filler words or rambling digressions. If you do a scripted recording, it'll remove multiple takes that you did. So we're close to launching on the video side, some under Lord Powers that will automatically what we call window dressing or video. It'll storyboard it out and suggests scenes that go through. But then I'll just doing this in the context of the editor where you can go in and change whatever you want. So we're trying to we're just like quite simply asking ourselves for podcasters and content creators, what are the most tedious parts of the job? Let's figure out how to automate those and and free up the creator to focus on the on the fun stuff. When I first got hold of this script, I was a kid in a candy shop. I was tweaking every button I was playing with everything and I was deleting every pause and every gap and every other word. And obviously that's not what you want to do. Experience tells me now that actually a couple of moms and I says, okay, a couple of gaps. Well, right, because that's the natural human inclination sometimes. But how do you prevent the beginner in this script from tweaking everyone's studio sound? Right. I'm going to go 80% and not know what they're doing or how can I just delete everything and just make it stale, how it is. And the Lord help those users when they're in the learning curve of editing. That's such a great question. I think the answer is I don't think we're all the way there yet. Let's take the example that you used, the way we designed the script. So when we were building Detour, half the company, we're like nerds building the app and then the other half of the company. We're former public radio audio producers and we were all working arm in arm and very much designed the tool to accommodate the workflow of these audio producers, which is I think, why a lot of our early adoption was from radio Lab or the Daily or these kind of really cool, well-produced audio podcasts and shows. And one of the things we learned from them is that those producers just learned very early in their career that you don't make paper edits, meaning you don't print out a transcript and are loaded into word and move stuff around and then hand it off to an audio engineer. To cut it up. You have to edit in ProTools or whatever so that you can hear how stuff sounds because the paper edits are going to sound terrible. I think reluctance from those producers at the idea of something like the script and what you learn is when you use it is the audio is right there at your fingertips. So it's not quite a paper edit, right? So you can hear stuff immediately, but I will definitely sometimes hear you'll listen to a podcast or something where someone has removed all the filler words or something and it just doesn't sound quite right. If you're a novice, you might not notice that it takes a little bit more of a well-trained ear to pick up on some of the craft there. So I think the answer is we don't handle that super well right now, and I would love to do that. Okay. Let's just go back probably six months. It was quite an interesting acquisition. Scott Costa Why did you go into Costco at Costco? Really? That wasn't about editing. That was not about creating the production of the podcast itself. What was your thinking? You know, since very early on, we've wondered what to do about remote recording. It's a critical part of the workflow for most podcasters. That was true before the pandemic and it just skyrocketed during the pandemic. But we were hesitant to take on the task of building it ourselves because We had enough like humility to realize how you can't really do something like that as a side project. You just can't they can't go down, right? Like, yeah, reliability is paramount. Yeah. And we had so much to do with just trying to get the core script editor really strong. At the same time, we knew it was important to own that part of the offering to have an offering for our podcasters, like they asked for it all the time. It just felt like it would be incredibly cool to have an experience where it was really deeply integrated and we tried a bunch of things. We tried like, Oh, you can just record computer audio and hook up your zoom in a certain way, trying to remember everything. We tried, but we tried a couple of like half baked ways to allow people to get a decent experience and nothing quite felt right. And we got to know the squad cast guys over the years really came to like and respect them as human beings and as entrepreneurs and their local here. And at some point it just felt right to team up and see if we could do this thing together. So we did that acquisition and we've been hard at work at building like a new and improved integrated experience that we're going to be shipping in next month. Brilliant, because that was one of the big questions in the last live video that you did when you launched to face six. Everyone was when was the integration coming? When was the integration coming? So good to hear. Now moving forward, one of the things that most people are worried about today is companies using their transcripts for training or selling their transcripts, right? I training Adobe famously was changing their terms of service. How are you going about protecting users transcripts from any or are you using that internally? Where are you with? We definitely don't sell that information. I have no interest in doing that. People should either opt in or opt out to allowing us to use their materials for training. So it's something that we put really prominently in the onboarding flow and you just make a decision about whether you opt in or out. Okay. And one of the things you've done recently is you've launched a script for the Web. It's now come out of beta officially and you've also got a new version of the desktop app. I'm sort of slightly confused as a user, should I be web based? Should I be app based? Are you just saying we have both and choose which one you want? Well, do you have a preferred route forward? Yeah, it's confusing and it gets kind of technical. So really what we did was we ended up rebuilding the underlying media playback engine that powers the script in a way that can run in a web browser before it had to run on these desktop applications, there were parts of scripts that are always web based, but the actual like media playback engine doing a lot of the heavy lifting you had to have as a desktop app. And that sounds like scary. And I know we're going to lose all that wonderful performance, but it actually is more performant. It makes it so that you can stream your projects and with instant access and you don't need to have these massive multi gigabyte downloads on your desktop. It has other benefits like we can generate effects in real time like eye contact or studio sounds. You can just hit the button and they're on like other real time effects you might have used. We can do that now with these generative effects. And so we felt this really powerful new media engine and that is powering the web version of this script. It's not in the current desktop. There's like a 1.0 version of the desktop. That is what you'll get if you just go to the website. We have a preview build of the future desktop app that is powered by this Web media engine. So you can still have like a standalone app that lives in your dock and all of those things, but it's missing a few features that we need to finish first, like dealing with Prores video and there's a handful of other things that we're chipping away at that most people don't need, frankly. So I think if you check it out, it'll tell you what's missing and you don't care about any of that stuff, then I'd recommend using that. That's what we're going to be migrating to in the coming months. Now, you've also got in beta the translation engine, which again is interesting because you're taking the raw audio and the transcript, let's say UK English or American English and offering it out in multiple languages. What's the feature that you think most editors and creators are going to use it for? Just reaching a broader audience of people in many countries. So there's two versions of it. The one that we have shipped now will obviously like translate captions and if you have like burned in captions on your video, it'll translate those. And then the one that we're getting ready to ship is actually translating your audio. So it will either do that with a voice kind of you where it recreates you speaking in French or Italian or Spanish or whatever. Or you can just use one of many different stock voices. So the demand here has been a little bit more from video creators, I think, than purely audio creators, though that may exist, too. It's something that businesses really like that want the ability to reach like a creative video and have it work in a dozen different languages. So we're pretty excited about that too. You also do the lip syncing. I mean, clearly if we were just doing pure audio and then we take this into a French or German fashion, no problem. Just it's what people hear. But instantly, if you changed the language to French or German, my lips word thinking will change. Do you have that capability to make the lips move in the same way we do? We haven't shipped it. We have two versions of it. We have what we call video regenerate, which is like overdub, but for video and it'll fix the lips. And that's the one that we're rolling out kind of imminently in beta where we just get signal on what's working and what's not working. A tricky problem to get right because you're splicing this thing right in the middle of an existing video and it has to line up on both sides. You have to deal with changes in duration of the original and final words that you're using. And then the other one is just doing a straight translation of a video to another language with lip sync. And that's a little bit more straightforward. We have it working. We haven't shifted yet. It's just been a little bit lower priority to get all that. I started to cross, so to speak, but I don't know. Did you want that? Do you think that would be useful? We're still trying to understand the who's going to use that. James And I don't use it particularly because we're an audio only podcast, but there are companies out there like Wanda Craft that are doing stuff similar to this where they've got audio translation. I think with your ability where you had the idea of moving the person's eyes away from the script back to the center. I think again, if I was looking at a video and the lip sync was out, oh, it probably after a little while, get a little bit frustrated with watching it and you probably turn the video element off and just go back to an audio. I think it's just tidying up the loose end. But as you say, yeah, maybe not the highest priority you have today. But again, this is the lovely to complete health. It's so weird. I understand what just your taste in cultural norms are going to be around all this stuff and whether that lip sync thing is going to just feel bizarre and inauthentic to people or the voice and just finding it's all out there discovering the boundaries of what feels like normal versus creepy in this brave new world. It's hard to predict. It is. Do you believe that we should label content that's been generated as such as AI Oh. How do you think we as humans should start to, I suppose, understand the differences because it's very easy to fall people with AI generated content. So should the script. Let's just keep it into the four walls of these scripts. Should you label content that's been changed through overdub? I think there are more great examples and use cases here where the answer is a little bit more like black and white. And that would be like if you're making an AI avatar with completely AI generated voice, then sure, you could definitely make the case that that kind of thing should be labeled. So I remember being surprised the first time I saw an NPR producer that was making like a kind of prestige, scripted podcast with a lot of interview footage to see just how like insanely cut up the audio is. It's a lot, right? If you went in and you look there, there's often clips every 5 seconds or something as they're tightening stuff up. So this stuff is deeply manipulated. Now, it's okay because you trust the credibility of the creator and we've all have a kind of understanding that that person's reputation will be in tatters if they do anything to mislead people. But they easily could, right? Correct. So the way that we've been using A.I., this overdub feature is in spirit allowing you to do something similar. You can only clone your own voice. We have protections against that. Unlike a lot of the other AI speech companies that have come around and you're typically doing it to save yourself some time, instead of needing to go back into the vocal booth and rerecord something, which you could do because you have possession of your voice, you're able to type a character in there. So like, does that need to be labeled when you start looking at something like that? Just an example of a different flavor of a editing technique that feels spiritually very similar to something that people have been doing for a very long time. Yeah, I think it's more in the case of where you write a transcript and then you have a third party and a voice actually speak the whole thing and then it's not labeled in any way that was an AI generated voice of sorts because you do have studio voices. Well. Yeah, if I'd be supportive, if that's the direction things go, that stuff needs to be labeled. That feels fine to me. Now, one of the things, Andrew, last question. One of the things I look at is because I've got a very simple mind, I put things into what I call the five piece bucket production, post-production, promotion and prioritization. Right? So you've done very well with pre-production. I can create my scripts, I can edit everything in readiness to do the interview. I can now with squad cast, I can record everything. Post-production is the core of what the script does. You've recently got into a very strong area of creating clips out to YouTube and other platforms. One of the areas that I as a user has been using the platform for a while find frustrating I guess is the publish element I can push it out, but it's a descriptive URL, it's not owned by me, is not branded by me. I don't have any real say in how that page looks. Is not an area. The last part maybe that you'll start to look out because I would love to be able to take the interview from here, the transcript be able to publish that, have it as a URL, but a branded URL where I can integrate that into my business flow. But I don't have that capability today, so I tend to ignore that whole part of what the script offers totally. Is there anything that you might be doing there? I see. So what's your dream product there? So it would be something that would be on your own domain with a page would be a little bit more branded. What would you do with a link like is in most of your distribution happening via like podcasting apps and RSS? Yes, I mean, look, there is the whole part of me exporting the completed audio and having that as my RSS. Right? That's one side of it. And you offer that and you offer integration to hosts. There is another side which is if I was a standalone podcaster who wanted to then put my podcast up and have all my episodes and basically a lovely video or audio with a clip out, and then the transcript being highlighted as somebody is listening to it. But at the moment that's a script page, so the URL is always going to be something, script slash something, and it means that I can't include that within my brand at all. The second part of what you said was, could we embed that whole element within my own web page? So could I put it into a WordPress site? And again, that's just an area of me wanting to be able to reuse the content I've edited in another format. Yeah. Now that's cool. I would love to build that for you. The problem is like we just can't seem to clear our plates when it comes to just making the editor feel good. It's just like a bottomless pit of stuff to do. Give you one example like the way you have to edit multitrack voice content in the script right now is inside these sequences, these kind of nested timelines, and they can be like really tedious to work with. They're incredibly confusing to new customers. They're just like slow. If you're going in and out of there a lot, even if you know all the keyboard shortcuts, which can be helpful and put a ceiling on some kinds of production techniques that you can do when you're making stuff. I've been wanting to fix that for, I don't know, four years, three years or something like that. If you told me four years ago that we still wouldn't have figured out how to offer a version of multitrack script, it didn't exist inside the sequencer. Think it'd be like telling me when I had a six month old that it's not going to get any better until they're like six years old. You're not actually going to be. Never going to sleep again. But yeah. Yeah. No one tells you that. You always think it's going to be, Oh yeah, it'll be so great when they're a year and a half or two. So we just have a lot to do there. And so I think we're one day we will be able to make that part of the stack stronger and it'll be a glorious day. Kid. I look forward to that. So last thing then. What's next for the script? I mean, you've talked about translation, you've talked about the desktop, you've talked about the Lord, you've talked about A.I.. But is there a vision that is within your head on trade that we don't all know about? Is there anything that we can look forward to coming down the track next? Gosh, I mean, I feel like you can probably guess just based on what we're building, what we want to do. There's nothing that crazy. Okay, So I'm going to be at podcast movement. Anyone who's like a listener that wants to, and especially if you're a user of the script, but also if you're not, come by the booth and say, Hi, I'd love to learn about how you use the product anyway. By then we'll have launched some new stuff. We'll of launch this new integrated remote recording experience, some really cool video podcasting stuff around making the layout experience much more powerful. I think we kind of have two main areas of investment going forward and one is just continuing to make the tool simpler and more powerful at the same time. But in short, allowing creators to practice their craft with a really great precision editing tool and then giving under the word more powers and making it better at taking all the tedious stuff off your plate so that you can get your stuff in faster and you'll just see us plugging away on those two vectors. Andrew Mason, thank you so much. Thank you for the script. It saved me hours of my life, so thank you for that. Congratulations on the recent updates. Andrew If anyone wants to know more about the script, where's the best place to go? Oh, the script dot com. You can email me too. I'm Andrew at the script dot com. Andrew Thank you. Thank you, Sam.