Audio Basics - Quiet audio recording

Hello all, OK so it is not VFX but as we all know sound will make or break a video/film. I am no expert, in fact I am 100% novice at audio, learning as I go. I wanted to share a tip that I learnt this weekend.

I recorded some speech using a mic at my desk, the end result was a quality recording (in my opinion) that lacked volume, when I transferred it to my phone and played it back it was barely audible. Therefore, I did a bit of googling to work out how when you switch on the radio or the telly most stuff sounds the same across the channels. How this works I am not very clear on, but what I did find out was the following.

When you view your recording in say Audacity you see a wave form, the peaks represent the audio signal, if your waveform is small your audio will be quite and if it is large it will be loud. To large and you get popping or bits blown out, basically not a good recording and the audio is lost. So how did I fix it? I used two tools in Audacity. First I ‘Normalised’ it, this with default settings increased the volume of the sample a fair bit and on its own would probable do. However, I wanted it a bit louder so I then used ‘Amplify’ which I experimented with +DBs. Basically to much and you loose the top end peaks (blown out), so I started at +5.0 which was too much and settled for +2.5 which was acceptable. Once I saved it out and transferred to my phone, it played back at an acceptable level.

If someone knows what this process is called (Normalisation?) then please share.





  • @Andy001z "Normalization" is an audio transformation where gain (increased volume) is applied to the waveform in order to raise either the AVERAGE or PEAK (RMS) volume (depending on your settings) to a predefined level.

    Normalizing a PEAK value usually doesn't have much of an effect on the overall loudness of a waveform and would be used near the end of the mastering process to bring your loudest peaks up to the maximum allowed level.

    Normalizing the RMS value will obviously increase the amplitude of the waveform, but RMS normalization can bring your peak audio over max, resulting in clipping of the waveform. However, most audio editors do provide the option to add compression to prevent clipping.

    Speaking of "Compression," This is actually the tool I would recommend you use to boost quiet audio before normalization.

    Let's back up a minute. To slightly oversimplify here, audio is represented by a waveform ranging from -infinite dB (no audio) to 0dB (the maximum level a DAW allows. In the real world the dynamic range (maximum aplitude range) is limited by several factors, including word-length of sample (8/16/24-bit). For CD quality audio (44.1kHz, 16-bit) this range is about 90dB(SPL, aka dBa. dBv is a power measure, and we're not talking about it, although it's a different thing. i did say I was simplifying). Therefore the basic values of CD quality audio are -90dB to 0dB.

    Waveforms above 0db get clipped to 0. If a wave extends far above 0, this clipping can create a nasty square edge that sounds terrible. Thing of what happens when you overexpose levels in a photo or video and you end up with massive amounts of pure white with no highlight detail. That's kind of what happens to audio.

    So, audio amplitude is basically logarithmic. Every 10dB of change is a doubling or halving of amplitude.

    With that out of the way, back to compression.

    A compressor has the following basic controls:

    Threshold: the level compression kicks in. This will be a negative number, of course, like -30, -24, -12, etc.

    Ratio: Here's the important one... This ratio defines how much the audio above the Threshold is crunched down. Let's use a low ratio of 2:1 with a Threshold of -24 as an example. With these settings audio below -24 dB is left alone, while audio that previously hit 0dB would down only go to -12, as 24/2=12.

    Attack/Release: These controls define how quickly the compression kicks in once audio exceeds the threshold and how quickly compression releases once audio falls below the threshold. Setting high compression ratios with fast attacks and releases can cause "breathing" and "pumping" sounds on fast attack transients--this is more an issue for mixing, say, drums than speech, but you should know this.

    Finally there's Gain, which is applied after the compression to boost up the audio you just crunched. Sticking with our -24, 2:1 ratio, a gain of about 10 would be good, leaving a couple of dB headroom for a loud transient that sneaks through before the compression kicks in (assuming an attack of greater than 0).

    This describes a basic single-channel dynamic range compressor. Multi band compressors do the same thing except you can assign different compression ratios to different frequency bands. Again, this is more important for music, no so much for speech.

    For more or less properly recorded speech, my starting compression ratio is usually about -24, 2:1 or 3:1 with 10 or 13dB of gain.

    Finally there's the "Limiter." A limiter is used to control attack transients from clipping over 0dB and is basically a compressor set to a high Threshold (-3 or -6) with a very high compression ration (10:1 or higher).

    Audacity has a compressor which you can read about here:


  • Thanks @Tiem23 my monday morning brain hurts now.

  • edited November 2015

    @Andy001z  I am guessing you did not get a good signal in the first place.  Without seeing your audio clip I can't make a good judgement but based on your claim as novice I'd say it was not a quality recording.  When you import to Audacity or any other audio software a quality recording should give you peaks just under 0 dBfs.  Normal speech does not contain a wide dynamic range - usually 12-20 dB.  Whispers and yelling will increase that of course but assuming you didn't do that you should have the vast majority of the tops of your waveforms well between 0 and -25 dB on your scale.  If not increase your preamp gain (during recording) till your recorder/software VU meter shows you at about -12 dB on average.  The balancing act is to applying the maximum amount of gain to your raw recording that does not clip the signal.  You can get better dynamic range with better equipment, but good recording technique is still essential.  Remember that good mics generally have around 70-84 dB signal to noise (cheap ones much less).  It doesn't matter how much your recorder has because the mic is the weak link.  If you have your peaks at something like -40dB you need 40 dB of gain to normalize.  Now your noise (from the mic) will be at -30 or -40 dB for a good mic.  Maybe even at -20 dBfs with a cheap mic.  All clearly audible.  If you normalize without having enough gain in your raw signal you will increase your noise floor and get a lot of hiss in your recording.  It is better to fade your raw signal in post than to add gain.  

    The video analogy is exposure level.  You want your raw video signal to sit within the dynamic range of the camera.  Significantly under exposed and overexposed areas have loss in fidelity forever.  While these can be boosted or reduced in post, the detail information is lost.  Adding significant normalizing gain is the audio equivalent of "fixing it in post".  Compression can help with speech that has a lot of dynamics but too much can create artifacts like pumping, etc.  Normalizing and compression should generally be a nuance.  The best approach is always to get good levels in your basic recording.

    I don't know, perhaps you did get a great basic recording......

  • dancerchris Hi, thanks for advice. Well, without looking back at the DB levels I can't comment right now on if it fits your info above. However I can tell you a bit about the recording method and result. I recorded my speach (spoken tone - I was telling a story) and doing so in a quite room with only the sounds from say computer. My Mic is a USB kayrooke style mic with no POP or DeadCat on it (hope that's the term), it was sat on my desk rested at 45degree angle towards my face. I recorded the piece not directly to mic to avoid (err popping) but guess this resulted in a low level recording. Post my normalisation and amplifier it's not a bad sounding recording, no obvious noise (but then I'm no sound expert hehe).

    Re your comment about get a good source, totally with you on that, always telling people that about video source and lighting

  • doing so in a quite room with only the sounds from say computer. = Ambient Noise Floor

    From this, we also know that if the sound reaching the audience during peaks was attenuated by air and distance by 10 dB from 130 dB SPL to 120 dB SPL, the 40 dB SPL generated by the speakers during quiet passages will also be attenuated. When the 40 dB drops to 30 dB, it will be below the ambient noise level in the audience. This means that the audience may not hear the very quietest parts of the show. This illustrates why some electronic manipulation of dynamic range is often called for. In this case, compression of the loudest peaks would allow the level to be turned up so the quiet passages are louder. Such processing is covered in Section 4.3.  --Yamaha Sound Reinforcement Handbook 666th Edition, Section 4.1.4

    It may be that we do not want any compression because, after all, it can have side effects, such as making quiet breath sounds louder, creating a pumping effect in some cases, and increasing the distortion of low frequency signals. Still, distorting on peaks is not acceptable either, so we may use another approach: apply compression only above a given threshold. Below a given signal level, no compression at all occurs. If the threshold level is chosen to approximate the nominal program level, this ensures that most ofthe program sounds completely natural. Above that threshold we use whatever amount of compression is necessary to prevent clipping. This is illustrated in Figure 4-3 where 1.43:1 compression is selected above a +4 dBu threshold. This approach squeezes the headroom requirement, but doesn't help with the quieter portions of the program, as can be seen from the fact that the dynamic range is reduced to only 84 dB; 10 dB of the program will still be lost in the noise. If the threshold were set lower, or the compression ratio to a higher value, then more dynamic range would be conserved, and the overall signal level at the compressor output could be increased to stay above the console's noise floor.  --Yamaha Sound Reinforcement Handbook 666th Edition, Section 4.3.3

    What? Could you repeat that in English please? Seriously did anybody understand any of that? --Thomas the Unheard

  • My explanation was better for the uninitiated. ;-) 

  • At some point on the path to Audio Enlightenment it is likely the acolyte will come to experience frustration with hired labor minions. If this frustration should reach a level requiring physical combat the Handbook has a defensive bonus of +2 against pierce attacks and can deal 1d6 impact damage. In addition if the acolyte has a high enough level in brawling or unskilled melee a special Spine to Nose attack can be attempted. If successful this attack deals 1d20 damage plus an additional 1d4 of damage for the next 10 rounds due to blood loss.

  • Triem23Triem23 Moderator
    edited November 2015

    Well, true, but using an Sm58 with 12 feet of XLR ("Whip of Jagger") can deal a 1d4 of damage, as well as entangle for a 1d3 meele rounds. Failing that, the crafty Audio Mage can use that same mic to generate feedback equivalent to the Mage's level of "Banshee's Howl."

    Of course these Feats aren't available to level 0 acolytes, but, respectively to Level 3 Cable Monkeys and Level 6 Knob Twiddlers. 

  • What are you two on? I am very lost.

  • Sorry, Andy, we started making "Dungeons & Dragons" jokes. Seems Aladdin and I are nerds. 

  •  A KSM9 gypsy tied to a floor stand ("Aero Hammer") causes causes even advanced Guitarists to retreat.

  • Nothing wrong with being a nerd, have you two seen the recent Nerdist D&D session with Vin Desiel, I thought pretty cool.

Sign in to comment

Leave a Comment