Acoustic Analysis of Voice

This tutorial will take you through the steps necessary to complete manual (Part 4) and semi-automated (Part 5) acoustic assessments of voice - an integral component of the comprehensive, gold standard voice assessment. To begin, you should read the free, open-access article by ASHA. Many of the analysis settings and protocols used in this tutorial were pulled from the ASHA-recommended guidelines in that article. This article also outlines a description of the minimum requirements for assessment equipment, testing environment, and acoustic outcomes. The equipment I will be using in this tutorial which you could consider using for your own acoustic analyses are: head-mounted, omnidirectional, condenser microphone ($175 USD); a pre-amp (~$109 USD); ±a connecting cable for the microphone and pre-amp ($75 USD; might not be needed depending on the microphone you get); and a personal laptop compter. I have no financial conflict of interests for any of these pieces of equipment.

This tutorial will use Praat for the acoustic analysis software. Therefore, you should download and install the freeware program Praat (free!) prior to beginning this tutorial.

Part 1: Preparing Praat's Default Settings

Step-by-step instructions are outlined below, with an extended (15-minute) and an abbreviated (1-minute) video tutorial available for viewing. If you are new to Praat, I recommend viewing the extended video tutorial, otherwise the abbreviated video covers the below information.

Open (or create) an audio file the Praat Objects window by selecting ‘View & Edit’
In the sound window, click ‘View’ and ‘Show analyses…’. In the ‘Show Analyses’ window, ensure ‘Show spectrogram’, ‘Show pitch’, ‘Show intensity’, and ‘Show Pulses’ are selected and change the 'longest analysis (s):' from 10 to 200 or 300. Then select ‘Apply’ and ‘OK’.
In the same sound window, go to ‘Pitch’ and ‘Pitch settings…’ and in the section ‘Pitch range (Hz):’ enter the numbers 60 and 600 as the minimum and maximum F0. Then select ‘Apply’ and ‘OK’.
In the same sound window, go to ‘Intensity’ > ‘Intensity settings…’ and in the section ‘View range (dB):’ enter the numbers 50 and 100 for the minimum and maximum intensities. Then select ‘Apply’ and ‘OK’.

Part 2: Pre-Recording Sound Check and Sound Level Calibration

A pre-recording sound check and sound level calibration is required immediately before each and every acoustic analysis assessment. Therefore, complete the below prompts immediately before beginning a voice sample recording with an examinee. Step-by-step instructions are outlined below, with a 6-minute video tutorial available for viewing below the written instructions.

Recording Environment

The environment must be quiet enough so that the system does not track sound during non-voicing segments and clearly tracks voicing these during voicing tasks segments. The room/recording environment should be quiet enough to record an acoustic signal that can be reliably analyzed for quiet/minimal voice production. In addition, any transient noise sources should be identified and avoided during data acquisition. Ideally, the SNR (signal-to-back-ground noise) should be > 10 dB SPL.

Microphone Positioning

Ensure the microphone is 4-10 centimeters from the sound source (i.e., the person vocalizing).
If the microphone being used is omnidirectional, then ensure sure to position the microphone at a 45º angle from the center of the mouth. If it is unidirectional, then ensure it is directly in front of the mouth. Note: ASHA-recommended guidelines is to use an omnidirectonal microphone.

Pre-Recording Sound Check of Soft and Loud Vocalizations

In the Praat Objects window, click ‘New’ > ‘Record mono Sound…’ > ‘Record’
Sustain /a/ as soft as possible without it being a whisper. The soft voicing should be detected on the Meter in the SoundRecorder window. If it is not detected, then adjust the gain level (if using a pre-amp) and/or move the microphone closer or move to a quieter environment.
Sustain an /a/ that is ~2x the loudness of normal speaking. The loud voicing should not go in the red zone. If it goes into the red zone, adjust the gain level (if using a pre-amp) and/or move the microphone further away. Repeat until the loud voicing is in the yellow zone (but not red zone) and verify the soft voice is still being detected.
Make note of the mic-to-mouth distance for future evaluations. This should ideally be 4-10 cm from the mouth, but will likely be further if you are using a computer's built-in microphone (e.g., during a class assignment) rather than using a head mounted and handheld microphone (e.g., clinical practice).

Sound Level Calibration

Set up a sound-level meter ~30 cm from the front of the speaker’s mouth. If you do not have a Class 1 or Class 2 sound level meter, you can use the NIOSH Sound Level Meter app on your mobile phone. The sound level meter I currently use in clinical practice is by Extech Instruments (Model #: 407730) (Note: I have no financial relationships or conflicts of interest - just sharing the model as a point of reference).
Sustain an /a/ at an habitual loudness.
Record the average dB measured on the decibel reader.
‘Stop’ the recording > ‘Save to list & Close’ > ‘View & Edit’ the sound file > then measure the dB of the habitual /a/.
Determine the difference in dB between the sound-level meter and Praat. Add this difference to all the dB outcomes measured with Praat. For example, if the sound-level meter measured 70 dB but Praat measured 61 dB, then you should add 9 dB (i.e., 70 dB – 61 dB) to all the dB outcomes.
Exit out of the pre-recording sound check. You are now ready to begin the acoustic voice analysis.

Part 3: Voice Sample Recording

Now that you have completed the pre-recording sound check and the sound level calibration, you are ready to obtain the voice sample recording. ASHA recommends obtaining the following voice samples:

Sustained vowels: Sustain the vowel /a:/ at a habitual level (habitual pitch and loudness) holding pitch and loudness as constant as possible for 3-5 s on one comfortable breath. Repeat this task three times.
Standard reading passage: Read a typed passage (adults: first paragraph of the Rainbow Passage [Fairbanks,1960]; children who can read: “The Trip to the Zoo” [Fletcher, 1972]) at comfortable pitch and loudness.
Loudness range: (a) Sustain the vowel /a:/ as quietly as possible for at least 2 s without whispering do this three times. (b) Sustain the vowel /a:/ as loudly as possible for at least 2 s. It is recommended that this task be repeated three times.
Pitch range: (a) Sustain the vowel /a:/ as high in pitch as possible (including falsetto/loft) for at least 2 s. Repeat this task three times. (b) Sustain the vowel /a:/ as low in pitch as possible (in the modal register with-out the inclusion of fry/pulse register) for at least 2 s. Repeat this task three times. Note that the highest and lowest pitches also may be obtained either by using a pitch glide or in a stepwise fashion (Zraick, Nelson, Montague, & Monoson, 2000)

Record all of the above in a single audio clip for Praat analysis. In addition to the above, I also recommend recoding the following voice tasks in the same audio file:

CAPE-V (vowels, sentences, and spontaneous speech)
Laryngeal diadochokinesis (DDK) 'uh' and 'huh'
Maximum phonation time and s/z ratio - as low-cost alternatives for aerodynamic assessment of voice (the validity of these measures are controversial, so interpret with a grain of salt)

My current preferred protocol (updated as of November 2025) is outlined below. A sheet you can print to have patients use a guide when elicit the tasks can be downloaded here.

Sustained /a/ in typical speaking voice for 5 seconds (1x)
Sustained /i/ in typical speaking voice for 5 seconds (1x)
CAPE-Vr sentences
The Rainbow Passage (2nd and 3rd sentence)
Updated Cookie Theft Picture Description (15 seconds)
Maximum Phonation Time (1x)

Recording the Voice Samples

Select ‘Record’.
Record voice samples into a single audio clip.
Select ‘Stop’.
Rename the sound file
Select ‘File’ > ‘save as WAV file…’ > ‘Save’ .
Then select ‘Save to list & close’.

(The video below represents the ASHA recommended protocol, not my current protocol)

Part 4: Manual Analysis

Now that you obtained the voice recordings, you are ready to begin the formal acoustic analysis to obtain a variety of speech and non-speech acoustic voice measures. Step-by-step instructions are outlined below, with an 18-minute video tutorial available for viewing.

Non-Speech Acoustic Voice Measures

Mean Habitual F0, jitter (local; %), shimmer (local; %), HNR (dB)

Select the middle 2-3 seconds of the sustained /a/ from the CAPE-V - avoid the onset and offset of the /a/.
Select ‘Pulses’ > ‘Voice report’.
In the voice report window, locate Median pitch (used for Mean Habitual F0), Jitter (local; %), shimmer (local; %), and Mean harmonics-to-noise ratio (dB).
Input these values into your evaluation report.
Exit the voice report window.

Mean Habitual SPL (dB)

The mean dB of the sustained /a/ will be displayed in green (make sure it is highlighted). Alternatively, go to ‘Intensity’ > ‘Get intensity’.
Input the mean habitual intensity into the evaluation report.

Smoothed Cepstral Peak Prominence (CPPS)

Using the same 2-3 second selecting of the sustained /a/, select ‘File’ or 'Sound' (depending on the version of Praat you have) > ‘Extract selected sound (time from 0)'.
Exit out the waveform/spectrogram window so that you are at the Praat Objects window.
A new untitled object will appear in the Praat Objects window which contains the portion of the /a/ you had previously selected. Rename this object to ‘sv’.
Select the ‘sv’ object, then select ‘Analyse periodicity’ and ‘To PowerCepstrogram’. Ensure settings are as follows: Pitch Flow (Hz): 60; Time step (s): 0.002; Maximum frequency (hz): 5000.0; Pre-emphasis (Hz): 50. Then select ‘OK’.
A new object will appear in the Praat Objects window named ‘PowerCepstrogram sv’. Highlight this file, then select ‘Query – ’ and then select ‘Get CPPS…’ . In the window the pops up. In the new window, ensure settings are as follows: Time averaging window (s): 0.01; Quefrency averaging window (s): 0.001, Peak search pitch range (Hz): 60.0 and 330.0; Tolerance (0-1): 0.05; Interpolation: parabolic; Trend line quefrency range (s): 0.001 and 0.05; Trend type: Straight; and Fit Method: Robust. Then select ‘OK’.
The window that gets generated is the CPP. Enter this into the evaluation report.

Minimum F0, maximum F0, and F0 range

'View & Edit' the audio file containing all the voicing tasks.
Click on ‘Pitch’ and ‘Pitch settings’, and temporarily change the maximum pitch to 1300. Click ‘Apply’ and ‘OK’
Zoom in on the approximate location of the maximum and minimum pitch glide tasks.
Highlight the middle 0.5-1.0 second of the highest and, separately, the lowest F0. If F0 was not sustained for 2 seconds, or if there is noise artifact, then manually identify the highest and lower F0 by clicking the cursor on the part of the blue contour that represents the highest/lowest F0. Input values into the evaluation report.
Change the F0 settings back so that the maximum F0 is 800. Then click ‘Apply’ and ‘OK’.
Use an online conversion tool (e.g., http://www.homepages.ucl.ac.uk/~sslyjjt/speech/semitone.html) to determine the number of semitones between the minimum and maximum F0 to get the pitch range. Input this into the evaluation report.

Minimum SPL (dB), maximum SPL (dB), and SPL range (dB)

Zoom in on the location of the minimum and maximum /a/ loudness tasks
Repeat what you did for the minimum and maximum pitch for the minimum and maximum loudness.
- - As mentioned previously, you should add the difference in dB obtained during Part 3’s SPL calibration to all the dB measures obtained in Praat when inputting into the evaluation report.
To determine SPL range, subtract minimum loudness from maximum loudness, then input into the evaluation report.

Acoustic Analysis: Continuous Speech Voice Measures

Mean SPL (dB), Mean F0, F0 SD, CPPS

Highlight and zoom in on the connected speech sample of the main audio file (e.g., The Rainbow Passage)
Record the mean dB, seen on the right in neon green (or by selecting ‘Intensity’ > ‘Get Intensity’)
Select ‘Pulses’ > ‘Voice Report’. In the Pitch section of voice report, report the Median pitch for “Mean F0” and the Standard Deviation for the F0SD.
Select ‘File’ or 'Sound' (depending on the version of Praat you have)> ‘Extract selected sound (time from 0). Then exit out the waveform/spectrogram window. A new untitled object will appear in the Praat Objects window which contains the connected speech you had previously selected. Rename this object to ‘cs’.
Obtain the connected speech CPPS by repeating the same steps as was done for the sustained vowel CPPS.

Acoustic Analysis: Acoustic Voice Quality Index (AVQI)

Download the AVQI script: https://drive.google.com/file/d/1G0xU0MN7vfjfLIFhPO_77bCPQhzeIkB_/view
Ensure the ‘sv’ and ‘cs’ files are still listed in the Praat Objects window.
Click ‘Praat’ and then ‘New Praat script’.
Open the AVQI script you just downloaded, then copy and paste the text from that file that was just downloaded into the ‘New Praat Script’ window
Then click ‘Run’ > ‘Run’ > ‘OK’. Wait 1-2 minutes and the AVQI will be generated.

Maximum Phonation Time and Laryngeal Diadochokinesis

If you completed maximum phonation time, then consider using Praat to measure the total duration of each task and calculating the measures accordingly. Just highlight the duration of each task and look at the duration indicators at the bottom of the spectrogram window.
If you completed laryngeal diadochokinesis, then consider using the sound waveform and the spectrogram with the pitch/intensity contours to quickly count the number of repetitions and divide by the selected duration (e.g., middle 3 seconds).

Part 5: Semi-Automated Analysis

If you would like a quick way to analyze many of the measures outlined above, in addition to some low-cost aerodynamic measures, then consider downloading this custom-written Praat script. and this accompanying documentation sheet which auto-calculates norms (it looks like you use it online, but don't, just select 'File' and 'Download'). As part of my open-science efforts (free education and science for all!), I have developed this script to be free for use. This script was originally developed as part of a 1-hour seminar for ASHA's 2025 Annual Convention. Check back regularly for updates, as this is my first time writing a Praat script (so there may be bugs) and I expect to update this as new measures get incorporated and developed. Refer to the Aerodynamic Assessments of Voice tutorial section of my website for free information on how to measure vital capacity, Phonation Quotient, and Estimated Mean Flow Rate.

How to use the Praat script:

Save the script to an easily findable folder on your computer.
Ensure you have an object in your Praat objects window labeled as sv (lowercase s and lowercase v). This file should be a sustained vowel task (ideally the middle 3 seconds of a sustained /a/).
Ensure you have an object in your Praat objects window labeled as cs (lowercase c and lowercase s). This file should be a connected speech task (e.g., "The rainbow is a division of white light into many beautiful colors. These take the shape of a long round arch, with its path high above, and its two ends apparently beyond the horizon").
Select the main 'Praat menu', then select 'Open Praat script...'
Select the custom-written Praat script that I have developed, then select Open
Select 'Run' and 'Run'
In the window that populations, enter all requested information, including:
- - patientID: Enter a patient identifier such as patient initials, MRN number, etc.
  - dateService: Enter the date of service (it is best practice to use the format of YYYY-MM-DD).
  - meanLoudness sustained a dB: The average dB you recorded using a sound level meter during the sustained /a/ task (see the 'Sound Level Calibration' subsection of Part 2 section above). If you did not use a sound level meter to recored the average dB during the sustained /a/ task, then leave this field blank.
  - mpt seconds: Enter the duration (in seconds) of the maximum phonation time, if completed. MPT, if entered, will be used to calculate low-cost aerodynamic measures (and in and of itself is considered to be an aerodynamic measure). Example: 22.12
  - vc liters: Enter the patient's vital capacity (in liters). Example: 4.531.
  - outputFileName: This code will generate a .CSV file with all of the analyzed data. For this field, create what you want the CSV file to be named as.
Once you've entered this information, select 'OK'.
The code will then automatically analyze all measures and generate a CSV file with the acoustic and aerodynamic measures. The CSV file will (likely) be located in the same location as where the custom-written Praat script is being stored. So consider creating a folder on your computer with the Praat script and all CSV files that you generate.
See below video for tutorial.

References:

Patel, R. R., Awan, S. N., Barkmeier-Kraemer, J., Courey, M., Deliyski, D., Eadie, T., ... & Hillman, R. (2018). Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association expert panel to develop a protocol for instrumental assessment of vocal function. American journal of speech-language pathology, 27(3), 887-905.

Acknowledgments: I'd like to thank Adrián Castillo-Allendes for supplying me with the AVQI script and his assistance in verifying the techniques described above.

Page updated

Google Sites

Report abuse