Removing background noise in Audacity by differencing stereo channels
While at NDC13 this June, K Scott and I (joined by Rob Conery as honorary guest host) recorded a lot of Herding Code podcasts in a cool podcast booth. It was an interesting audio challenge. The conference provided us some great audio equipment - nice microphones and a good mixing board. However, the booth was open on top and in the middle of the conference room floor at an indoor stadium, so there was a lot of background noise. On the one hand, that's just part of the fun of live podcasts at a conference, but I wanted to minimize it.
Normally we record our podcasts via Skype and the worst background noise we get is usually a consistent hum - air conditioning, for instance. Audacity's Noise Removal effect does an amazing job with that - you train it with a short section of the audio that's just background noise, then run it on the whole track (or a selection) and it's all set.
That doesn't work quite as well when you've got a lot of variable background noise. There's no consistent profile, and the sounds you'd want to remove are often also human voices. Noise reduction still works, but you end up with some noticeable artifacts. I still like it, but it needs to be combined with some other tricks.
So with that in mind, here's the "best effort" system I decided on: subtract the stereo signal and mix it with a compressed, noise-reduced track. Here's how I did that.
Recording
First, I tried to get the guest on one stereo channel and the hosts on another. It wasn't exact since we had several people having a conversation around a table, but I did what I could without disrupting things.
Both channels have the same background noise, but one has most of the speaker's voice. The other channel has some of the speaker's voice too - remember, mics recording around a table - but the difference between the two channels was that the speaker's voice was louder. To say that another way:
L: 80% speaker + background noise
R: 20%: speaker + background noise
So if we subtract R from L, we get 60% speaker with very little background noise. Theoretically, at least.
Differencing stereo signals
While trying out an effect, I like to work with a duplicate so I can quickly undo if it's not working. I duplicated (Edit / Duplicate or just Ctrl+D) the track, then split stereo to two mono tracks.
Now, the trick for differencing the two sides is to invert one and sum them together. I select the Left track and use the Invert effect.
Then I select the Left and Right tracks and sum them using Tracks / Mix and Render. Important note: both channels were quiet enough that I could mix them without fading them down, but if yours are louder you need to make sure to turn them down before mixing them.
The result still has some background noise, but it's a lot tighter. The voices are clearly distinct from the background. Much better, but there's still work to do.
Mixing with standard noise reduction
We want to take out some of that background noise, but I still can't use aggressive noise reduction because you'll hear weird artifacts as the background rumble kicks in when people talk and cuts off when they stop. While that lowers the overall level, it's more noticeable when it starts and stops. I could wing it and pick a dry/wet effect mix when I apply noise reduction, but that's hit and miss. Since Audacity's effects aren't realtime, the simplest way to get this right is just duplicate the new track, effect the new copy, and mix the two to taste.
Once I've duplicated the track, I train it as shown above and run the noise reduction. Like I said, it's wicked smart - notice how the left (should be silent) part of the lower track is so much cleaner.
But that track by itself does sound a little weird - while people are talking bits of other people talking in the background pop out. So now I need to adjust the level on the upper track until that's not so obvious - in this case the non-effected track ends up at -20db and the effected one is at +0db. It's subtle but very obvious if the original track's completely muted. Normally I'd use a Dynamic Compressor at this point, but with a lot of background noise it can pull it right back up so we'll let that go.
To be clear, it's still obvious these recordings were done in a crowded room, and that's okay - they're just understandable now.
From here on out, it's standard editing - cutting out the pauses, etc. That's the time consuming part - while I spelled the above out in gory detail it takes less than two minutes in real life.