Announcement: AI Narrations Available for All New LessWrong Posts

post by Solenoid_Entity, Ruby, Raemon, PeterH, TYPE III AUDIO · 2023-07-20T22:17:33.454Z · LW · GW · 28 comments

Contents

  How to Access
    On Post Pages
    Podcast Feeds
  Send us your feedback.
  Is this just text-to-speech on posts?
None
29 comments

TYPE III AUDIO is running an experiment with the LessWrong team to provide automatic AI narrations on all new posts. All new LessWrong posts will be available as AI narrations (for the next few weeks).

You might have noticed the same feature recently on the EA Forum [EA · GW], where it is now an ongoing feature. Users there have provided excellent feedback and suggestions so far, and your feedback on this pilot will allow further improvements.

How to Access

On Post Pages

Click the speaker icon to listen to the AI narration:

The speaker icon is located beneath the title and author, next to the post's publication date.

Podcast Feeds

Perrin Walker (AKA Solenoid Entity) of TYPE III AUDIO will continue narrating most curated posts for now.

Send us your feedback.

Please send us your feedback! This is an experiment, and the software is improved and updated daily based on user feedback.

You could share what you find most useful, what's annoying, bugged or difficult to understand, how this compares to human narration, and what additional features you'd like to see.

Is this just text-to-speech on posts?

It's an improvement on that.

We spoke with the Nonlinear Library team about their listeners' most-requested upgrades, and we hope our AI narrations will be clearer and more engaging than unimproved TTS. Some specific improvements:

We'd like to thank Kat Woods and the team at Nonlinear Library for their work, and for giving us helpful advice on this project.

28 comments

Comments sorted by top scores.

comment by Yoav Ravid · 2023-07-21T10:54:08.358Z · LW(p) · GW(p)

Awesome! My dyslexic friend may finally get to listen to my writing :)

A few suggestions for improvement:

  1. Auto-start the narration when the speaker icon is clicked
  2. Let the user set a default speed in the user settings, or alternatively, remember which speed the user used last and apply it next time they listen to a post.
  3. Add the speaker button to posts previews in recent discussion
  4. Have a hovering mini-player on the side so you can easily pause, play, rewind, forward, increase or decrease speed from anywhere in the page.
  5. Have a visual indicator on the audio timeline to show you where section headings are are so you can jump to them like you can with the table of contents.
Replies from: PeterH
comment by PeterH · 2023-07-21T11:38:43.130Z · LW(p) · GW(p)

Thanks! We do have feature (2)—we remember whatever playback speed you last set. If you're not seeing this, please let me know what browser you're using.

Replies from: Yoav Ravid
comment by Yoav Ravid · 2023-07-21T11:47:23.792Z · LW(p) · GW(p)

Oh, great! I didn't check if it exists before writing it down (whoops), so it probably works :)

comment by MondSemmel · 2023-07-27T09:55:30.987Z · LW(p) · GW(p)

Feedback on this specific audio narration, and the feature in general: (I've also submitted this via the Feedback button.)

  • At 0:44, there's a line "That's the end of that list", which is not in the written text. Maybe there's some logic here which assumes that a colon is followed by a bunch of bullet points? In this case, there was no list (zero bullet points), and so the line "That's the end of that list" makes no sense. And besides, there never was a corresponding preceding line à la "Here's a list of bullet points".
  • At 0:52, there's a narration line "Here's a list of bullet points. Podcast feeds.". Here, there is a list of bullet points, but the narration line is inserted before the subheading of "Podcast feeds", rather than where it's supposed to be in the audio, namely afterwards.
  • At 1:19, the narration says "will continue narrating selected curated posts for now.", but the text says "will continue narrating most curated posts for now.". Presumably the text has been edited after the audio was generated. If the audio can get out of sync with the text, that's a conundrum that has to be solved somehow. Generating new audio for every edit is presumably prohibitively expensive. Although this would by no means be sufficient, the audio must at least indicate somehow that it's out of date with the post. But then we're still left with the problem where you might listen to the audio of an essay which has since been edited to say "I've changed my mind; everything I said here is wrong".
Replies from: Solenoid_Entity
comment by Solenoid_Entity · 2023-07-28T01:57:48.182Z · LW(p) · GW(p)

Thanks for the feedback!

The audio reflecting updates to the text is relatively easily fixed, and that feature is in the pipeline (though for now user reports are helpful for this.)

There's some fairly complex logic we use for lists — trying to prevent having too many repetitive audio notes, but also keeping those notes when they're helpful. We're still experimenting with it, so thanks for pointing out those formatting issues!


 

comment by Henry Prowbell · 2023-07-21T10:55:55.820Z · LW(p) · GW(p)

I really like the way it handles headlines and bullet point lists!

In an ideal world I'd like the voice to sound less robotic. Something like https://elevenlabs.io/ or https://www.descript.com/overdub.  How much I enjoy listening to text-to-speech content depends a lot on how grating I find the voice after long periods of listening.

Replies from: PeterH
comment by PeterH · 2023-07-21T11:43:22.871Z · LW(p) · GW(p)

Thanks! We're currently using Azure TTS. Our plan is to review every couple months and update to use better voices when they become available on Azure or elsewhere. Elevenlabs is a good candidate but unfortunately they're ~10x more expensive per hour of narration than Azure ($10 vs $1).

Replies from: Yoav Ravid
comment by Yoav Ravid · 2023-07-21T11:55:03.179Z · LW(p) · GW(p)

I think the cost per million words measure from the previous version of your comment was also useful to know. Did you replace it because it's incorrect?

Replies from: PeterH
comment by PeterH · 2023-07-21T12:13:14.686Z · LW(p) · GW(p)

I replaced it because it seemed like a less useful format.

  • Azure TTS cost per million characters = $16
  • Elevenlabs TTS cost per million characters = $180

1 million characters is roughly 200,000 words.

One hour of audio is roughly 9000 words.

comment by Chipmonk · 2023-07-22T01:34:18.937Z · LW(p) · GW(p)

Does the narration re-do when posts get edited?

Replies from: Solenoid_Entity
comment by Solenoid_Entity · 2023-07-23T02:40:43.924Z · LW(p) · GW(p)

Currently we can trigger this if someone requests it, and we have a feature in the pipeline to detect significant changes automatically and re-narrate. 

comment by [deleted] · 2023-07-21T10:31:59.658Z · LW(p) · GW(p)

This is really great, ty c:
Will it eventually be expanded to earlier posts? 

Replies from: PeterH
comment by PeterH · 2023-07-21T11:37:35.500Z · LW(p) · GW(p)

Yep, if the pilot goes well then I imagine we'll do all the >100 karma posts, or something like that.

We'll add narrations for all >100 karma posts on the EA Forum later this month.

Replies from: Yoav Ravid, Yoav Ravid
comment by Yoav Ravid · 2023-07-21T11:51:46.316Z · LW(p) · GW(p)

How much would it cost to narrate all the posts on Lesswrong? Or above various karma cutoffs? Cause there's a lot of good posts under 100 karma (including many from the sequences), so I wonder what's the tradeoff.

Replies from: Solenoid_Entity
comment by Solenoid_Entity · 2023-07-23T02:47:41.363Z · LW(p) · GW(p)

It's unlikely we'll ever actually GENERATE narrations for every post on LessWrong (distribution of listening time would be extremely long-tailed), but it's plausible if the service continues that we'll be able to enable the player on all LW posts above a certain Karma threshold, as well as certain important sequences.
If you have specific sequences or posts in mind, feel free to send them to us to be added to our list!

comment by Yoav Ravid · 2023-09-07T12:29:08.353Z · LW(p) · GW(p)

Perhaps instead of, or in addition to, using a karma cutoff, it could be request based? So you'd have that Icon on all posts, and if someone clicks it on an old article that doesn't yet have a narration it will ask them whether they want it to be narrated. 

comment by MondSemmel · 2023-07-28T13:57:49.573Z · LW(p) · GW(p)

I forgot to say this in my previous comment, but nowadays I prefer to listen to nonfiction articles (via TTS) rather than reading them. So I listen to a ton of TTS stuff and thus very much appreciate any work that makes the experience of listening to TTS easier or higher quality.

comment by Misaligned-Semi-intelligence (MisalignedIntelligence) · 2023-07-21T16:39:08.120Z · LW(p) · GW(p)

This is really great. As someone with pretty bad uncorrectable and constantly declining vision, a lot of my "reading" is listening. Lately I've often been thinking "Why can't I easily listen to everything I find on the internet yet?". When I tried to just use an existing service to convert things myself, I ran into a lot of the problems that the improvements listed here seem to solve.

Replies from: MondSemmel, Solenoid_Entity
comment by MondSemmel · 2023-07-27T09:53:56.200Z · LW(p) · GW(p)

I've also looked into TTS recently, and discovered that the Microsoft Edge browser has decent TTS built into both the web and mobile browsers. It's not perfect by any means, but I found it surprisingly good, especially for a free feature. I guess it's not surprising that Microsoft's offering here is good, given that tons of other TTS services use Microsoft Azure's TTS.

comment by Solenoid_Entity · 2023-07-23T02:42:56.616Z · LW(p) · GW(p)

This is great to hear, and please feel free to contact us with any other features or improvements you'd find helpful :)

comment by Yoav Ravid · 2023-07-21T14:26:05.491Z · LW(p) · GW(p)

It seems to act funny when there's a code block in the post. See GPT-2's positional embedding matrix is a helix [LW · GW] for example

Replies from: PeterH
comment by PeterH · 2023-07-21T17:46:32.702Z · LW(p) · GW(p)

Thanks for the heads up. Each of those code blocks is being treated separately, so the placeholder is repeated several times. We'll release a fix for this next week.

Usually the text inside codeblocks is not suitable for narration. This is a case where ideally we would narrate them. We'll have a think about ways to detect this.

comment by Askwho · 2023-07-30T09:41:32.085Z · LW(p) · GW(p)

If you are looking for any support on a more naturalistic version of those posts that make it to the podcast feed, I have been experimenting heavily and producing an ElevenLabs AI podcast of Yudkowsky's latest fiction. I have an API pipeline already built, and could assist in producing a test episode or two. 

comment by Waldvogel · 2023-07-22T18:37:36.996Z · LW(p) · GW(p)

Great new feature. Thank you! I will probably make use of this over the next few weeks.

But I did get a laugh out of "Specialist terminology, acronyms and idioms are handled gracefully" immediately being followed by a mispronunciation of "latex."

Replies from: Solenoid_Entity
comment by Solenoid_Entity · 2023-07-23T02:41:57.293Z · LW(p) · GW(p)

Ha, oops! Yeah, there's a lot of specialist terminology, we find feedback like this really helpful as often we're able to quickly fix this.

comment by PotteryBarn · 2023-07-21T16:51:39.593Z · LW(p) · GW(p)

Maybe somewhat unrelated, but does anyone know if there's been an effort to narrate HP:MoR using AI? I have several friends that I think could really stand to enjoy it, but who can't get past the current audiobook narration. I mostly agree with them, although it's better on 1.5x.

Replies from: Raphaël
comment by Raphaël · 2023-07-22T04:13:06.167Z · LW(p) · GW(p)

HPMOR is ~4.4 million characters, which would cost around $800–$1000 to narrate with ElevenLabs being conservative.

Replies from: Solenoid_Entity
comment by Solenoid_Entity · 2023-07-25T05:35:04.385Z · LW(p) · GW(p)

You'd probably want to factor in some time for making basic corrections to pronunciation, too.
ElevenLabs is pretty awesome but in my experience can be a little unpredictable with specialist terminology, of which HPMOR has... a lot.
It wouldn't be crazy to do an ElevenLabs version of it with multiple voices etc., but you're looking at significant human time to get that all right.

Replies from: Askwho
comment by Askwho · 2023-07-30T09:24:09.586Z · LW(p) · GW(p)