Acoustic Analysis of Voice

This tutorial will take you through the steps necessary to complete an acoustic assessment of voice - an integral component of the comprehensive, gold standard voice assessment. To begin, you should read the free, open-access article by ASHA. Many of the analysis settings and protocols used in this tutorial were pulled from the ASHA-recommended guidelines in that article. This article also outlines a description of the minimum requirements for assessment equipment, testing environment, and acoustic outcomes. The equipment I will be using in this tutorial which you could consider using for your own acoustic analyses are: head-mounted, AKG, omnidirectional, condenser microphone ($165 USD); a pre-amp (~$120 USD); a connecting cable for the microphone and pre-amp ($75 USD); and a personal laptop compter. I have no financial conflict of interests for any of these pieces of equipment.


This tutorial will use Praat for the acoustic analysis software. Therefore, you should download and install the freeware program Praat (free!) prior to beginning this tutorial.

Part 1: Preparing Praat's Default Settings

Step-by-step instructions are outlined below, with an extended (15-minute) and an abbreviated (1-minute) video tutorial available for viewing. If you are new to Praat, I recommend viewing the extended video tutorial, otherwise the abbreviated video covers the below information.

  1. Open (or create) an audio file the Praat Objects window by selecting ‘View & Edit’

  2. In the sound window, click ‘View’ and ‘Show analyses…’. In the ‘Show Analyses’ window, ensure ‘Show spectrogram’, ‘Show pitch’, ‘Show intensity’, and ‘Show Pulses’ are selected and change the 'longest analysis (s):' from 10 to 200 or 300. Then select ‘Apply’ and ‘OK’.

  3. In the same sound window, go to ‘Pitch’ and ‘Pitch settings…’ and in the section ‘Pitch range (Hz):’ enter the numbers 60 and 600 as the minimum and maximum F0. Then select ‘Apply’ and ‘OK’.

  4. In the same sound window, go to ‘Intensity’ > ‘Intensity settings…’ and in the section ‘View range (dB):’ enter the numbers 50 and 100 for the minimum and maximum intensities. Then select ‘Apply’ and ‘OK’.

Part 2: Pre-Recording Sound Check and Sound Level Calibration

A pre-recording sound check and sound level calibration is required immediately before each and every acoustic analysis assessment. Therefore, complete the below prompts immediately before beginning a voice sample recording with an examinee. Step-by-step instructions are outlined below, with a 6-minute video tutorial available for viewing below the written instructions.

Microphone Positioning

  • Ensure the microphone is 4-10 centimeters from the sound source (i.e., the person vocalizing).

  • If the microphone being used is omnidirectional, then ensure sure to position the microphone at a 45º angle from the center of the mouth. If it is unidirectional, then ensure it is directly in front of the mouth. Note: ASHA-recommended guidelines is to use an omnidirectonal microphone.

Pre-Recording Sound Check of Soft and Loud Vocalizations

  • In the Praat Objects window, click ‘New’ > ‘Record mono Sound…’ > ‘Record’

  • Sustain /a/ as soft as possible without it being a whisper. The soft voicing should be detected on the Meter in the SoundRecorder window. If it is not detected, then adjust the gain level (if using a pre-amp) and/or move the microphone closer or move to a quieter environment.

  • Sustain an /a/ that is ~2x the loudness of normal speaking. The loud voicing should not go in the red zone. If it goes into the red zone, adjust the gain level (if using a pre-amp) and/or move the microphone further away. Repeat until the loud voicing is in the yellow zone (but not red zone) and verify the soft voice is still being detected.

  • Make note of the mic-to-mouth distance for future evaluations. This should ideally be 4-10 cm from the mouth, but will likely be further if you are using a computer's built-in microphone (e.g., during a class assignment) rather than using a head mounted and handheld microphone (e.g., clinical practice).


Sound Level Calibration

  • Set up a sound-level meter ~30 cm from the front of the speaker’s mouth. If you do not have a Class 1 or Class 2 sound level meter, you can use the NIOSH Sound Level Meter app on your mobile phone.

  • Sustain an /a/ at an habitual loudness.

  • Record the average dB measured on the decibel reader.

  • ‘Stop’ the recording > ‘Save to list & Close’ > ‘View & Edit’ the sound file > then measure the dB of the habitual /a/.

  • Determine the difference in dB between the sound-level meter and Praat. Add this difference to all the dB outcomes measured with Praat. For example, if the sound-level meter measured 70 dB but Praat measured 61 dB, then you should add 9 dB (i.e., 70 dB – 61 dB) to all the dB outcomes.

  • Exit out of the pre-recording sound check. You are now ready to begin the acoustic voice analysis.

Part 3: Voice Sample Recording

Now that you have completed the pre-recording sound check and the sound level calibration, you are ready to obtain the voice sample recording. ASHA recommends obtaining the following voice samples:

  • Sustained vowels: Sustain the vowel /a:/ at a habitual level (habitual pitch and loudness) holding pitch and loudness as constant as possible for 3-5 s on one comfortable breath. Repeat this task three times.

  • Standard reading passage: Read a typed passage (adults: first paragraph of the Rainbow Passage [Fairbanks,1960]; children who can read: “The Trip to the Zoo” [Fletcher, 1972]) at comfortable pitch and loudness.

  • Loudness range: (a) Sustain the vowel /a:/ as quietly as possible for at least 2 s without whispering do this three times. (b) Sustain the vowel /a:/ as loudly as possible for at least 2 s. It is recommended that this task be repeated three times.

  • Pitch range: (a) Sustain the vowel /a:/ as high in pitch as possible (including falsetto/loft) for at least 2 s. Repeat this task three times. (b) Sustain the vowel /a:/ as low in pitch as possible (in the modal register with-out the inclusion of fry/pulse register) for at least 2 s. Repeat this task three times. Note that the highest and lowest pitches also may be obtained either by using a pitch glide or in a stepwise fashion (Zraick, Nelson, Montague, & Monoson, 2000)


Record all of the above in a single audio clip for Praat analysis. In addition to the above, I also recommend recoding the following voice tasks in the same audio file:

  • CAPE-V (vowels, sentences, and spontaneous speech)

  • Laryngeal diadochokinesis (DDK) 'uh' and 'huh'

  • Maximum phonation time and s/z ratio - as low-cost alternatives for aerodynamic assessment of voice (the validity of these measures are controversial, so interpret with a grain of salt)

Recording the Voice Samples

  • Select ‘Record’.

  • Record voice samples into a single audio clip.

  • Select ‘Stop’.

  • Rename the sound file

  • Select ‘File’ > ‘save as WAV file…’ > ‘Save’ .

  • Then select ‘Save to list & close’.

Part 4: Analysis

Now that you obtained the voice recordings, you are ready to begin the formal acoustic analysis to obtain a variety of speech and non-speech acoustic voice measures. Step-by-step instructions are outlined below, with an 18-minute video tutorial available for viewing.


Non-Speech Acoustic Voice Measures

Mean Habitual F0, jitter (local; %), shimmer (local; %), HNR (dB)

  • Select the middle 2-3 seconds of the sustained /a/ from the CAPE-V - avoid the onset and offset of the /a/.

  • Select ‘Pulses’ > ‘Voice report’.

  • In the voice report window, locate Median pitch (used for Mean Habitual F0), Jitter (local; %), shimmer (local; %), and Mean harmonics-to-noise ratio (dB).

  • Input these values into your evaluation report.

  • Exit the voice report window.


Mean Habitual SPL (dB)

  • The mean dB of the sustained /a/ will be displayed in green (make sure it is highlighted). Alternatively, go to ‘Intensity’ > ‘Get intensity’.

  • Input the mean habitual intensity into the evaluation report.


smoothed Cepstral Peak Prominence (CPPS)

  • Using the same 2-3 second selecting of the sustained /a/, select ‘File’ > ‘Extract selected sound (time from 0)'.

  • Exit out the waveform/spectrogram window so that you are at the Praat Objects window.

  • A new untitled object will appear in the Praat Objects window which contains the portion of the /a/ you had previously selected. Rename this object to ‘sv’.

  • Select the ‘sv’ object, then select ‘Analyse periodicity’ and ‘To PowerCepstrogram’. Select ‘OK’.

  • A new object will appear in the Praat Objects window named ‘PowerCepstrogram sv’. Highlight this file, then select ‘Query – ’ and then select ‘Get CPPS…’ . In the window the pops up, change settings (see video), then select ‘OK’.

  • The window that gets generated is the CPP. Enter this into the evaluation report.


Minimum F0, maximum F0, and F0 range

  • 'View & Edit' the audio file containing all the voicing tasks.

  • Click on ‘Pitch’ and ‘Pitch settings’, and temporarily change the maximum pitch to 1300. Click ‘Apply’ and ‘OK’

  • Zoom in on the approximate location of the maximum and minimum pitch glide tasks.

  • Highlight the middle 0.5-1.0 second of the highest and, separately, the lowest F0. If F0 was not sustained for 2 seconds, or if there is noise artifact, then manually identify the highest and lower F0 by clicking the cursor on the part of the blue contour that represents the highest/lowest F0. Input values into the evaluation report.

  • Change the F0 settings back so that the maximum F0 is 800. Then click ‘Apply’ and ‘OK’.

  • Use an online conversion tool (e.g., http://www.homepages.ucl.ac.uk/~sslyjjt/speech/semitone.html) to determine the number of semitones between the minimum and maximum F0 to get the pitch range. Input this into the evaluation report.

Minimum SPL (dB), maximum SPL (dB), and SPL range (dB)

  • Zoom in on the location of the minimum and maximum /a/ loudness tasks

  • Repeat what you did for the minimum and maximum pitch for the minimum and maximum loudness.

      • As mentioned previously, you should add the difference in dB obtained during Part 3’s SPL calibration to all the dB measures obtained in Praat when inputting into the evaluation report.

  • To determine SPL range, subtract minimum loudness from maximum loudness, then input into the evaluation report.


Acoustic Analysis: Continuous Speech Voice Measures

Mean SPL (dB), Mean F0, F0 SD, CPPS

  • Highlight and zoom in on the connected speech sample of the main audio file (e.g., The Rainbow Passage)

  • Record the mean dB, seen on the right in neon green (or by selecting ‘Intensity’ > ‘Get Intensity’)

  • Select ‘Pulses’ > ‘Voice Report’. In the Pitch section of voice report, report the Median pitch for “Mean F0” and the Standard Deviation for the F0SD.

  • Select ‘File’ > ‘Extract selected sound (time from 0). Then exit out the waveform/spectrogram window. A new untitled object will appear in the Praat Objects window which contains the connected speech you had previously selected. Rename this object to ‘cs’.

  • Obtain the connected speech CPPS by repeating the same steps as was done for the sustained vowel CPPS.

Acoustic Analysis: Acoustic Voice Quality Index (AVQI)

  • Download the AVQI script: https://drive.google.com/file/d/1G0xU0MN7vfjfLIFhPO_77bCPQhzeIkB_/view

  • Ensure the ‘sv’ and ‘cs’ files are still listed in the Praat Objects window.

  • Click ‘Praat’ and then ‘New Praat script’.

  • Open the AVQI script you just downloaded, then copy and paste the text from that file that was just downloaded into the ‘New Praat Script’ window

  • Then click ‘Run’ > ‘Run’ > ‘OK’. Wait 1-2 minutes and the AVQI will be generated.

Maximum Phonation Time, S/Z ratio, Laryngeal Diadochokinesis

  • If you completed maximum phonation time and S/Z ratio, then consider using Praat to measure the total duration of each task and calculating the measures accordingly. Just highlight the duration of each task and look at the duration indicators at the bottom of the spectrogram window.

  • If you completed laryngeal diadochokinesis, then consider using the sound waveform and the spectrogram with the pitch/intensity contours to quickly count the number of repetitions and divide by the selected duration (e.g., middle 3 seconds).

References:

  • Patel, R. R., Awan, S. N., Barkmeier-Kraemer, J., Courey, M., Deliyski, D., Eadie, T., ... & Hillman, R. (2018). Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association expert panel to develop a protocol for instrumental assessment of vocal function. American journal of speech-language pathology, 27(3), 887-905.

Acknowledgments: I'd like to thank Adrián Castillo-Allendes for supplying me with the AVQI script and his assistance in verifying the techniques described above.