Web Audio Echo?

post by jefftk (jkaufman) · 2020-04-19T02:30:02.144Z · LW · GW · 7 comments

Contents

7 comments

Update 2020-04-19: This actually can be done fully in AudioWorklet: echo-demo-v2. Thanks stellartux!

The hello world of audio processing is an echo:

def processAudio(samples):
  return samples
You take a buffer of samples and then return them unmodified, echoing the audio back out. For example, when I built my bass whistle, that's where I started.

Today David and I were trying to see if we could build my bucket-brigade singing idea in the browser, and we couldn't find a fully worked echo example. I now have something that works, but it's kind of silly and it seems like there must be a better way.

The way I would expect this to work with the (new, experimental, not supported everywhere yet) Web Audio API is that you write an AudioWorklet. This is a way to set up JS to run off the main thread, where number crunching won't block the UI and make the page unresponsive. I'd like to attach the worklet's input to the microphone, output to the speaker, and that would be it. Except, as far as I can tell, there's no way to pipe the microphone directly to an AudioWorklet.

To read audio samples from the user the recommendation is to make a ScriptProcessorNode and give it an audioprocess callback. This runs on the main thread, though, and so when you look at MDN this is marked as deprecated. It says to use AudioWorklet instead, and I wish I could!

What I ended up doing was recording the audio on the main thread, messaging that over to the worklet, and the having the worklet play it. Here's a demo, which relies on experimental Web Audio features that so far are only implemented in Blink-based browsers: echo-demo.

Comment via: facebook

7 comments

Comments sorted by top scores.

comment by stellartux · 2020-04-19T10:14:58.562Z · LW(p) · GW(p)

You can pipe a microphone directly to AudioWorklet, using MediaStreamAudioSourceNode. Do the following in the main scope, then you can access the mic input as the input parameter of process() in the worklet scope.

const audioCtx = new (window.AudioContext || window.webkitAudioContext)()
const micStream = await navigator.mediaDevices.getUserMedia({ audio: true })
const micNode = new MediaStreamAudioSourceNode(audioCtx, { mediaStream: micStream })
micNode.connect(yourAudioWorkletNode)
yourAudioWorkletNode.connect(audioCtx.destination)
Replies from: jkaufman
comment by jefftk (jkaufman) · 2020-04-19T15:24:50.103Z · LW(p) · GW(p)

Awesome! This works: https://www.jefftk.com/echo-demo-v2 Thanks so much!

comment by [deleted] · 2020-04-21T00:04:10.866Z · LW(p) · GW(p)

Reading this (and discussing more ideas with my own chorus) made me think of another possible variation that might actually allow some form of online rehearsal: having one person's stream fan out to everyone else to sing against, and then gathering all of those feeds back into one collected stream (synchronized based on the "master" stream timestamps) for somebody else to listen to. This splits the usual role of director into two: conductor and feedback-giver, and you still can't hear the rest of the group you're singing with, but at least it allows some form of live feedback necessary for a practical rehearsal.

Replies from: jkaufman
comment by jefftk (jkaufman) · 2020-04-21T00:22:24.492Z · LW(p) · GW(p)

If you do it as a bucket brigade instead of a fanout then person N can listen to N-1 earlier people as they play: https://www.jefftk.com/p/series-singing

Replies from: None
comment by [deleted] · 2020-04-21T12:19:11.009Z · LW(p) · GW(p)

Yes, though the total latency of the system becomes the sum across all N members. For large groups (my chorus is 60+ people) that may end up being prohibitive just for cases where e.g. the feedback giver wants to pause the singing and give some feedback. I’m honestly not sure, this is all quite speculative. Thank you for sparking some thoughts though!

Replies from: jkaufman
comment by jefftk (jkaufman) · 2020-04-21T12:33:46.937Z · LW(p) · GW(p)

I wonder if conductor -> (strongest singer in every section) -> (everyone else) -> (feedback giver) would be a good compromise? Latency isn't too high, but people are also hearing their leaders.

Replies from: None
comment by [deleted] · 2020-04-21T16:16:50.838Z · LW(p) · GW(p)

Yes that does seem like it would be a good compromise.