Voice Analysis Using PRAAT Software and Classification of User Emotional State December 2019 International Journal of Interactive Multimedia and Artificial Intelligence IP(IP):1. Advanced speech analysis tools II: Praat and more Judging from mentions spotted on the Internet, Praat (Dutch for 'talk'), created by Paul Boersma and David Weenink of the Institute of Phonetic Sciences, University of Amsterdam, is currently among the most popular of free, downloadable speech analysis software p.
VoiceSauce is an application, implemented in Matlab, which provides automated voice measurements over time from audio recordings. Inputs are standard wave (*.wav) files and the measures currently computed are:
- F0
- Formants F1-F4
- H1(*)
- H2(*)
- H4(*)
- A1(*)
- A2(*)
- A3(*)
- 2K(*)
- 5K
- H1(*)-H2(*)
- H2(*)-H4(*)
- H1(*)-A1(*)
- H1(*)-A2(*)
- H1(*)-A3(*)
- H4(*)-2K(*)
- 2K(*)-5K
- Energy
- Cepstral Peak Prominence
- Harmonic to Noise Ratios
- Subharmonic to Harmonic Ratio
- Strength of Excitation
where (*) indicates that the harmonic/spectral amplitudes are reported with and without corrects for formant frequencies and bandwidths. More parameters to be added soon.
Requirements:
VoiceSauce requires Matlab versions 2015 and up. VoiceSauce has been successfully run under Windows (7/10) and Mac. Other operating systems may also work but have not been tested. If you are attempting to run VoiceSauce on a system other than Windows or Mac, you may need to install Tcl/Tk first; this can be obtained on ActiveState's website.
Limitations:
Since many of the parameters estimated by VoiceSauce depend on F0, meaningful results are only valid for voiced speech. Noisy speech may affect the accuracy of the F0 estimations and hence the values of the voice measurements.
The correction formula for the effects of the formant frequencies on harmonic amplitudes works best when there are accurate estimates of the formants. For example, speech produced by a high-pitched voice saying high vowels, with similar F0 and F1 values, may give a poor estimate of F1 and so return inaccurate results for H1*. It is recommended to inspect the formant frequency estimates to verify their validity. Not only the formant frequencies, but also their bandwidths, can cause errors in the corrections; see the documentation for more information.
It has been reported that wav files contained in folder names which consist of non-English characters may cause the formant estimator to fail. Equally, textgrid files from Praat encoded with 'UCS-2 Big Endian' cannot be read by Matlab and will cause it to crash. Such textgrid files need to be re-saved as ANSI or UTF-8, which can be done in e.g. Notepad (Open -> Save As, under encoding select ANSI) before they can be used with VoiceSauce.
Computer memory can be an issue. Very long files for which all parameters are to be estimated may cause VoiceSauce to hang up, or to give an Insufficient Memory message. Computing fewer parameters at once, or dividing the files into smaller files, should help. The April 2015 version addresses one cause of such problems - the resources needed by SHR and shrF0.
Download:
Distribution is currently in two forms: (1) m-code for systems with Matlab, and (2) compiled executables for systems without Matlab. Note that the compiled executables requires the installation of the Matlab Component Runtime (only needs to be installed once).
Currently compiled executables are only available for Windows systems. We welcome assistance from anyone who would like to provide a legal compiled executable for Macs.
Version changelog is available here. Please let us know about any problems.
The p-code file format was changed from Matlab 2015 onwards. For this reason, support for pre-Matlab 2015 versions have been deprecated. The p-code only affects the Straight F0 estimator.
Current active development version:
Note 1: Due to a licensing issue, Praat has been removed from the package. To install Praat, go to Settings, and under Praat, press 'Install'. Or to install manually, follow the instructions in /Praat/README.txt
Note 2: Snack is working again on OSX - thanks to Sam Gregory for providing a compatible binary version.Matlab m-code | Compiled Matlab executables - Windows 7/10 |
VoiceSauce.zip(1.7MB) Instructions:Unzip and runVoiceSauce.m from Matlab. | Matlab Component Runtime (32-bit)- MCR_R2015b_win32_installer.exe Instructions:RunMCRInstaller (only needs to be done once). Unzip VoiceSauce_bin.zipand run VoiceSauce.exe. Note:Running VoiceSauce.exefor the first time may take a few minutes to load. |
Legacy version (v1.27):
Matlab m-code | Compiled Matlab executables - Windows XP/Vista/7 |
VoiceSauce.zip (9.9MB) Instructions: Unzip and run VoiceSauce.m from Matlab. | Matlab Component Runtime - MCRInstaller.exe (179MB) Instructions: Run MCRInstaller.exe (only needs to be done once). Unzip VoiceSauce_bin.zip and run VoiceSauce.exe. Note: Running VoiceSauce.exe for the first time may take a few minutes to load. |
Documentation:
Documentaton is available here. Originally written by Chad Vicenik and later expanded by Spencer Lin, this manual is now maintained by Pat Keating, with expert input from Yen Shue. Requests for additions are always welcome. To cite this manual: Chad Vicenik, Spencer Lin, Patricia Keating, and Yen-Liang Shue (current year). Online documentation for VoiceSauce. Available at http://www.phonetics.ucla.edu/voicesauce/documentation/index.html.
Note about running VoiceSauce:
VoiceSauce's Matlab console provides various run-time messages about its dealings with individual input files. These are not necessarily error messages! Unless VoiceSauce actually crashes, or hangs up, while running, you should be able to find .mat output files in the folder you specified, and you should be able to produce a .txt output file from these. Most notably, 'Multicue failed: switching to exstraightsource' is not an error message and does not mean that VoiceSauce has crashed. See the documentation for more information about this message.Companion software:
EggWorks: A free program by Henry Tehrani, created for the NSF Voice project to analyze EGG signals (closing quotients, peak increase in contact) in batch mode; also includes utilities for splitting .pmf files into separate .wav files, for inverting .wav files, and for converting .wav files from 32- to 16-bit.
EggWorks can be found here (download link is at the bottom of the page).
Acknowledgements:
This work was supported in part by grants from the NSF to UCLA.
How to cite:
The original reference for VoiceSauce is Yen Shue's dissertation: Y.-L. Shue (2010), The voice source in speech production: Data, analysis and models. UCLA dissertation.VoiceSauce is described in this paper: Shue, Y.-L., P. Keating , C. Vicenik, K. Yu (2011) VoiceSauce: A program for voice analysis, Proceedings of the ICPhS XVII, 1846-1849.
DO NOT BE FOOLED BY the bogus citation that Google Scholar has somehow concocted (a supposed 2010 paper in the supposed journal 'Energy', with pages H1-A1!).
Questions, bug reports, and comments to yshue@ucla.edu.
New functionality was introduced in Praat 5.3.14 and is still under development. Future versions will allow the direct creation of Vocal Tract Tiers from LPC objects and LPC filtering of (source) sounds with sample-frequencies and durations that differ from those of the LPC analysis. Be aware that this code has not been tested extensively, so there will be bugs.
Links to Real Time MRI and articulatory synthesis
- Seeing Speech, Collected Works on Real-Time Imaging of Speech Production by Erik Bresch, Department of Electrical Engineering, University of Southern California
- Example 4 Nice video of articulator movements while speaking the Rainbow text, with colored region markings
- VocalTractLabTowards high-quality articulatory speech synthesis. Contains very nice demonstration videos
- Dona Nobis Pacem singing articulators in 2D (2007)
- Salvete based on Canon in D by Pachelbel singing articulators in 3D (2010)
- MULTIMODAL SPEECH SYNTHESIS, KTH Stockholm
Examples of manipulations using Vocal Tract Area functions in praat
In Praat it is possible to calculate a vocal tract area function that is equivalent to a certain (vowel) sound. The sound can then be resynthesized using the calculated vocal tract area function as a filter. The vocal tract area functions can be manipulated and modified before resynthesis. In the list below, you find some example vocal tract area functions of sustained /a/, /i/, and /y/, and the resynthesized sounds.
Female speaker
- /a/ speech female voice (original)
- /a/ vocal tract of female voice (acoustic, 42 segments)
- /i/ speech female voice (original)
- /i/ vocal tract of female voice (acoustic, 42 segments)
- /y/ speech female voice (original)
- /y/ vocal tract of female voice (acoustic, 44 segments)
Blend two vocal tracts: Paste the lips of an /y/ onto the vocal tract of an /i/. That is append the last four segments of /y/ to /i/ vocal tract function, adapt length etc.:
Male speaker
- /a/ speech male voice (original)
- /a/ vocal tract of male voice (acoustic, 42 segments)
- /i/ speech male voice (original)
- /i/ vocal tract of male voice (acoustic, 42 segments)
- /y/ speech male voice (original)
- /y/ vocal tract of male voice (acoustic, 44 segments)
Blend two vocal tracts: Paste the lips of an /y/ onto the vocal tract of an /i/. That is append the last two segments of /y/ to /i/ vocal tract function, adapt length etc.:
Attaching measured areas to a Vocal Tract Area functions in praat
Take measured areas from MRI slices of the lips, and attach them to an existing Vocal Tract Area function. Start with the recordings of /i/ and /y/ of the female speaker above. Areas for her lips were determined using an MRI image. Starting from the teeth (X=0) go outward. Only every third slice was used. Slice thickness was 1.4064 mm and the area value is positioned at slice midpoint. All values are recalculated to meters.X | /i/ (m2) | /y/ (m2) | /a/ (m2) |
0.0007032 | 0.00024051 | 0.00017821 | 0.00062801 |
0.0049224 | 0.000366 | 0.00012811 | 0.00043362 |
0.0091416 | 0.00035899 | 0.00008623 | 0.00037098 |
0.0133608 | - | 0.00001303 | 0.00039874 |
0.01758 | - | - | 0.00037381 |
Starting with the original recorded /i/
- /i/ VocalTract (order 30, length 0.17)
- /i/ resynthesis (sound)
- /i/ with lips of /i/ VocalTract (order 30, length 0.17)
- /i/ with lips of /i/ resynthesis (sound)
- /i/ with lips of /y/ VocalTract (order 31, length 0.1756)
- /i/ with lips of /y/ resynthesis (sound)
Starting with the original recorded /y/
- /y/ VocalTract (order 32, length 0.1756)
- /y/ resynthesis (sound)
- /y/ with lips of /i/ VocalTract (order 31, length 0.17)
- /y/ with lips of /i/ resynthesis (sound)
- /y/ with lips of /y/ VocalTract (order 32, length 0.1756)
- /y/ with lips of /y/ resynthesis (sound)
From Vocal Tract area functions to speech
The following table explains how to get from a Vocal Tract to a synthetic sound. For synthesis, a 'Source' sound is needed that supplies the driver of the Vocal Tract filter. In normal speech, the source sound is produced by the glottal folds, or voice box. You can generate a source as specified below. Note that the sample frequency of the source sound has to be equal to the number of segments in the Vocal Tract in kHz. For instance, if you have 40 segments (tubes), you need a source sampled with 40kHz. Use the Praat Resample... function to perform the resampling. The length of the Vocal Tract Tier must be exactly the same as the length of the Source sound. Below, we take a duration of 3 seconds in the presented examples. The audio example is 5 seconds long.
Here is an example generated by determining the vocal tract area function at a point in a recorded /a/ and one at a corresponding point in a recorded /i/ from the same speaker. The voice source signal is entirely synthetic.
To test the synthesis, you can use the standard vocal tracts in Praat or create a Vocal Tract from recorded speech. The standard phone Vocal Tracts can be created in Praat from New->Articulatory synthesis->Create Vocal Tract from phone... . To create a Vocal Tract from recorded speech, simply read in the recording and convert it to LPC with the Formants & LPC ->LPC (autocorrelation)... options. Enter the number of segments you want in your Vocal Tract as the prediction order. Then use To VocalTract (slice)... to generate the Vocal Tract object. Save it with Save->Save as short text file... . Note that there is a rather convoluted relationship between the LPC prediction order, the sample frequency, the recorded sound and the quality of the resulting LPC model.
You can download Praat from www.praat.org
Action | Praat | Script |
Sythesize sound | ||
Read VocalTract file | Open->Read from file... | Read from file... a.VocalTract |
Convert to Vocal Tract Tier | To VocalTractTier... | To VocalTractTier... 0 3 0.5 |
Convert Tier to LPC | To LPC... | To LPC... 0.005 |
Select both LPC and Source audio file | Option/Control select source audio Sound | plus Sound Source |
Filter Source with LPC | Filter... | Filter... no |
Resample to 10kHz | Convert->Resample... | Resample... 10000 50 |
Generate Source sound | ||
Create an empty PitchTier object | New->Tiers->Create PitchTier... | Create PitchTier... Source 0 3 |
Add a high starting point at 120Hz | Modify->Add point... | Add point... 0 120 |
Add a low end point at 100Hz | Modify->Add point... | Add point... duration 100 |
Convert it into a phonation sound | Synthesize->To Sound (phonation)... | To Sound (phonation)... 40000 1 0.05 0.7 0.03 3 4 no |
Scale to a nice intensity | Modify->Scale intensity... | Scale intensity... 70 |
Create Vocal Tract | ||
Read audio file | Open->Read from file... | Read from file... a.wav |
Convert to LPC with predition order 40 for 40 tube segments | Formants & LPC -> LPC (autocorrelation)... | To LPC (autocorrelation)... 40 0.025 0.005 50 |
Convert LPC to Vocal tract, use slice at 2 seconds and a total vocal tract length of 20 cm | To VocalTract (slice)... | To VocalTract (slice)... 2 0.20 |
Example files /i/ /a/
- VocalTractExample.praat: Synthesizer script
- CreateVocalTracts.praat Script to create example VocalTracts from example audio
- a.wav: Example sustained /a/ recording (natural speech)
- a.VocalTract: Example /a/ VocalTract
- i.wav: Example sustained /i/ recording (natural speech)
- i.VocalTract: Example /i/ VocalTract
- a_i_synthesis.wav: Example /a/-/i/ synthesis
Example files /i/ /y/ (LPC order 44)
- i2_y_i2_synthesis.praat: Example /i/-/y/-/i/ praat script
- i2.wav: Example sustained /i/ recording (natural speech)
- i2.VocalTract: Example /i/ VocalTract (4.0 s)
- y.wav: Example sustained /y/ recording (natural speech)
- y.VocalTract: Example /y/ VocalTract (3.5 s)
- i2_y_i2.VocalTractTier: Example /i/-/y/-/i/ Vocal tract tier
- i2_y_i2_synthesis.wav: Example /i/-/y/-/i/ synthesis
Vocal Tract tube models
The Vocal Tract area functions model the human vocal tract as a set of connected tubes with variable width. Determining the tube, or segment, areas with LPC is not very reliable. Below are presented tube models as determined with LPCPraat Voice Analysis Software
Praat Voice Analyzer
(prediction order 40) and the 'theoretical' models as given by Praat New->Articulatory synthesis->Create Vocal Tract from phone....Vocal Tract tube model of /a/ (example) | Vocal Tract tube model of /i/ (example) |
Standard Vocal Tract tube model of /a/ | Standard Vocal Tract tube model of /i/ |
Vocal Tract tube model of /y/ (example) | Standard Vocal Tract tube model of /y/ |
Praat Voice Analysis Tutorial
VocalTract file format
The example VocalTract file below is created with Save as short text file... . There is a more descriptive (longer) format that is obtained with Save as text file... .
Praat Voice Report
File type = 'ooTextFile' | The line by which Praat can recognize your file |
Object class = 'VocalTract 2' | The line that tells Praat about the contents |
Empty line | |
0 | xmin: First segment (Glottis, meter) |
0.2 | xmax: Last segment (Lips, meter) |
40 | nx: Number of segments |
0.005 | dx: Segment length (m) |
0.0025 | x1: Position of first segment |
1 | ymin: NA |
1 | ymax: NA |
1 | ny: NA |
1 | dy: NA |
1 | y1: NA |
0.00010813061971705616 | Area in m2 |
0.00010390570341053334 | Area in m2 |
8.903563828398031e-05 | Area in m2 |
0.00010876151465927323 | Area in m2 |
.... | Many more values |
0.008175693406171154 | Area in m2 |
0.0013459947563683344 | Area in m2 |
0.04293933951717365 | Area in m2 |
0.000489118171677886 | Area in m2 |