Praat Voice



Voice Analysis Using PRAAT Software and Classification of User Emotional State December 2019 International Journal of Interactive Multimedia and Artificial Intelligence IP(IP):1. Advanced speech analysis tools II: Praat and more Judging from mentions spotted on the Internet, Praat (Dutch for 'talk'), created by Paul Boersma and David Weenink of the Institute of Phonetic Sciences, University of Amsterdam, is currently among the most popular of free, downloadable speech analysis software p.

  1. Praat Voice Analysis Software
  2. Praat Voice Analyzer
  3. Praat Voice Analysis Tutorial
  4. Praat Voice Report

VoiceSauce is an application, implemented in Matlab, which provides automated voice measurements over time from audio recordings. Inputs are standard wave (*.wav) files and the measures currently computed are:

  • F0
  • Formants F1-F4
  • H1(*)
  • H2(*)
  • H4(*)
  • A1(*)
  • A2(*)
  • A3(*)
  • 2K(*)
  • 5K
  • H1(*)-H2(*)
  • H2(*)-H4(*)
  • H1(*)-A1(*)
  • H1(*)-A2(*)
  • H1(*)-A3(*)
  • H4(*)-2K(*)
  • 2K(*)-5K
  • Energy
  • Cepstral Peak Prominence
  • Harmonic to Noise Ratios
  • Subharmonic to Harmonic Ratio
  • Strength of Excitation

where (*) indicates that the harmonic/spectral amplitudes are reported with and without corrects for formant frequencies and bandwidths. More parameters to be added soon.

Requirements:

VoiceSauce requires Matlab versions 2015 and up. VoiceSauce has been successfully run under Windows (7/10) and Mac. Other operating systems may also work but have not been tested. If you are attempting to run VoiceSauce on a system other than Windows or Mac, you may need to install Tcl/Tk first; this can be obtained on ActiveState's website.

Limitations:

Since many of the parameters estimated by VoiceSauce depend on F0, meaningful results are only valid for voiced speech. Noisy speech may affect the accuracy of the F0 estimations and hence the values of the voice measurements.

The correction formula for the effects of the formant frequencies on harmonic amplitudes works best when there are accurate estimates of the formants. For example, speech produced by a high-pitched voice saying high vowels, with similar F0 and F1 values, may give a poor estimate of F1 and so return inaccurate results for H1*. It is recommended to inspect the formant frequency estimates to verify their validity. Not only the formant frequencies, but also their bandwidths, can cause errors in the corrections; see the documentation for more information.

It has been reported that wav files contained in folder names which consist of non-English characters may cause the formant estimator to fail. Equally, textgrid files from Praat encoded with 'UCS-2 Big Endian' cannot be read by Matlab and will cause it to crash. Such textgrid files need to be re-saved as ANSI or UTF-8, which can be done in e.g. Notepad (Open -> Save As, under encoding select ANSI) before they can be used with VoiceSauce.

Computer memory can be an issue. Very long files for which all parameters are to be estimated may cause VoiceSauce to hang up, or to give an Insufficient Memory message. Computing fewer parameters at once, or dividing the files into smaller files, should help. The April 2015 version addresses one cause of such problems - the resources needed by SHR and shrF0.


Download:

Distribution is currently in two forms: (1) m-code for systems with Matlab, and (2) compiled executables for systems without Matlab. Note that the compiled executables requires the installation of the Matlab Component Runtime (only needs to be installed once).

Currently compiled executables are only available for Windows systems. We welcome assistance from anyone who would like to provide a legal compiled executable for Macs.

Version changelog is available here. Please let us know about any problems.

The p-code file format was changed from Matlab 2015 onwards. For this reason, support for pre-Matlab 2015 versions have been deprecated. The p-code only affects the Straight F0 estimator.

Current active development version:

Note 1: Due to a licensing issue, Praat has been removed from the package. To install Praat, go to Settings, and under Praat, press 'Install'. Or to install manually, follow the instructions in /Praat/README.txt

Note 2: Snack is working again on OSX - thanks to Sam Gregory for providing a compatible binary version.

Matlab m-code
(v1.37 - Jun 2, 2020)

Compiled Matlab executables - Windows 7/10
(v1.37 - Jun 2, 2020)

VoiceSauce.zip(1.7MB)

Instructions:Unzip and runVoiceSauce.m from Matlab.

Matlab Component Runtime (32-bit)- MCR_R2015b_win32_installer.exe
Matlab Component Runtime (64-bit)- MCR_R2015b_win64_installer.exe
VoiceSauce_bin.zip(6.8MB)

Instructions:RunMCRInstaller (only needs to be done once). Unzip VoiceSauce_bin.zipand run VoiceSauce.exe.

Note:Running VoiceSauce.exefor the first time may take a few minutes to load.

Legacy version (v1.27):

Matlab m-code
(v1.27 - August 15, 2016)

Compiled Matlab executables - Windows XP/Vista/7
(v1.27 - August 15, 2016)

VoiceSauce.zip (9.9MB)

Instructions: Unzip and run VoiceSauce.m from Matlab.

Matlab Component Runtime - MCRInstaller.exe (179MB)
VoiceSauce_bin.zip (15.4MB)

Instructions: Run MCRInstaller.exe (only needs to be done once). Unzip VoiceSauce_bin.zip and run VoiceSauce.exe.

Note: Running VoiceSauce.exe for the first time may take a few minutes to load.


Documentation:

Documentaton is available here. Originally written by Chad Vicenik and later expanded by Spencer Lin, this manual is now maintained by Pat Keating, with expert input from Yen Shue. Requests for additions are always welcome. To cite this manual: Chad Vicenik, Spencer Lin, Patricia Keating, and Yen-Liang Shue (current year). Online documentation for VoiceSauce. Available at http://www.phonetics.ucla.edu/voicesauce/documentation/index.html.

Note about running VoiceSauce:

VoiceSauce's Matlab console provides various run-time messages about its dealings with individual input files. These are not necessarily error messages! Unless VoiceSauce actually crashes, or hangs up, while running, you should be able to find .mat output files in the folder you specified, and you should be able to produce a .txt output file from these. Most notably, 'Multicue failed: switching to exstraightsource' is not an error message and does not mean that VoiceSauce has crashed. See the documentation for more information about this message.

Companion software:

EggWorks: A free program by Henry Tehrani, created for the NSF Voice project to analyze EGG signals (closing quotients, peak increase in contact) in batch mode; also includes utilities for splitting .pmf files into separate .wav files, for inverting .wav files, and for converting .wav files from 32- to 16-bit.
EggWorks can be found here (download link is at the bottom of the page).

Acknowledgements:

This work was supported in part by grants from the NSF to UCLA.

How to cite:

The original reference for VoiceSauce is Yen Shue's dissertation: Y.-L. Shue (2010), The voice source in speech production: Data, analysis and models. UCLA dissertation.
VoiceSauce is described in this paper: Shue, Y.-L., P. Keating , C. Vicenik, K. Yu (2011) VoiceSauce: A program for voice analysis, Proceedings of the ICPhS XVII, 1846-1849.
DO NOT BE FOOLED BY the bogus citation that Google Scholar has somehow concocted (a supposed 2010 paper in the supposed journal 'Energy', with pages H1-A1!).

Questions, bug reports, and comments to yshue@ucla.edu.

New functionality was introduced in Praat 5.3.14 and is still under development. Future versions will allow the direct creation of Vocal Tract Tiers from LPC objects and LPC filtering of (source) sounds with sample-frequencies and durations that differ from those of the LPC analysis. Be aware that this code has not been tested extensively, so there will be bugs.

Links to Real Time MRI and articulatory synthesis

  • Seeing Speech, Collected Works on Real-Time Imaging of Speech Production by Erik Bresch, Department of Electrical Engineering, University of Southern California
    • Example 4 Nice video of articulator movements while speaking the Rainbow text, with colored region markings
  • VocalTractLabTowards high-quality articulatory speech synthesis. Contains very nice demonstration videos
    • Dona Nobis Pacem singing articulators in 2D (2007)
    • Salvete based on Canon in D by Pachelbel singing articulators in 3D (2010)
  • MULTIMODAL SPEECH SYNTHESIS, KTH Stockholm

Examples of manipulations using Vocal Tract Area functions in praat

In Praat it is possible to calculate a vocal tract area function that is equivalent to a certain (vowel) sound. The sound can then be resynthesized using the calculated vocal tract area function as a filter. The vocal tract area functions can be manipulated and modified before resynthesis. In the list below, you find some example vocal tract area functions of sustained /a/, /i/, and /y/, and the resynthesized sounds.

Female speaker

  • /a/ speech female voice (original)
  • /a/ vocal tract of female voice (acoustic, 42 segments)
  • /i/ speech female voice (original)
  • /i/ vocal tract of female voice (acoustic, 42 segments)
  • /y/ speech female voice (original)
  • /y/ vocal tract of female voice (acoustic, 44 segments)

Blend two vocal tracts: Paste the lips of an /y/ onto the vocal tract of an /i/. That is append the last four segments of /y/ to /i/ vocal tract function, adapt length etc.:

Male speaker

  • /a/ speech male voice (original)
  • /a/ vocal tract of male voice (acoustic, 42 segments)
  • /i/ speech male voice (original)
  • /i/ vocal tract of male voice (acoustic, 42 segments)
  • /y/ speech male voice (original)
  • /y/ vocal tract of male voice (acoustic, 44 segments)

Blend two vocal tracts: Paste the lips of an /y/ onto the vocal tract of an /i/. That is append the last two segments of /y/ to /i/ vocal tract function, adapt length etc.:

Attaching measured areas to a Vocal Tract Area functions in praat

Take measured areas from MRI slices of the lips, and attach them to an existing Vocal Tract Area function. Start with the recordings of /i/ and /y/ of the female speaker above. Areas for her lips were determined using an MRI image. Starting from the teeth (X=0) go outward. Only every third slice was used. Slice thickness was 1.4064 mm and the area value is positioned at slice midpoint. All values are recalculated to meters.
X/i/ (m2)/y/ (m2)/a/ (m2)
0.00070320.000240510.000178210.00062801
0.00492240.0003660.000128110.00043362
0.00914160.000358990.000086230.00037098
0.0133608-0.000013030.00039874
0.01758--0.00037381
Start with the original recorded vowels /i/ and /y/ from the female voice. Convert them to LPC -> VocalTract with order 30 and length 0.17 (/i/ VocalTract) and order 32 and length 0.1756 (/y/ VocalTract). Replace the last three sections in the original /i/ VocalTract with the values from the table for /i/ and /y/, For the /y/ table values, adapt the number of sections of the resulting VocalTract to 31 and length to 0.1756. The same is done for the last four sections of the original /y/ VocalTract. But now the number of sections for the /i/ table values is reduced to 31 and the length to 0.17. The two original and four new VocalTracts can then be resynthesized like was done above.

Starting with the original recorded /i/

  • /i/ VocalTract (order 30, length 0.17)
  • /i/ resynthesis (sound)
  • /i/ with lips of /i/ VocalTract (order 30, length 0.17)
  • /i/ with lips of /i/ resynthesis (sound)
  • /i/ with lips of /y/ VocalTract (order 31, length 0.1756)
  • /i/ with lips of /y/ resynthesis (sound)

Starting with the original recorded /y/

  • /y/ VocalTract (order 32, length 0.1756)
  • /y/ resynthesis (sound)
  • /y/ with lips of /i/ VocalTract (order 31, length 0.17)
  • /y/ with lips of /i/ resynthesis (sound)
  • /y/ with lips of /y/ VocalTract (order 32, length 0.1756)
  • /y/ with lips of /y/ resynthesis (sound)

From Vocal Tract area functions to speech

The following table explains how to get from a Vocal Tract to a synthetic sound. For synthesis, a 'Source' sound is needed that supplies the driver of the Vocal Tract filter. In normal speech, the source sound is produced by the glottal folds, or voice box. You can generate a source as specified below. Note that the sample frequency of the source sound has to be equal to the number of segments in the Vocal Tract in kHz. For instance, if you have 40 segments (tubes), you need a source sampled with 40kHz. Use the Praat Resample... function to perform the resampling. The length of the Vocal Tract Tier must be exactly the same as the length of the Source sound. Below, we take a duration of 3 seconds in the presented examples. The audio example is 5 seconds long.

Here is an example generated by determining the vocal tract area function at a point in a recorded /a/ and one at a corresponding point in a recorded /i/ from the same speaker. The voice source signal is entirely synthetic.

To test the synthesis, you can use the standard vocal tracts in Praat or create a Vocal Tract from recorded speech. The standard phone Vocal Tracts can be created in Praat from New->Articulatory synthesis->Create Vocal Tract from phone... . To create a Vocal Tract from recorded speech, simply read in the recording and convert it to LPC with the Formants & LPC ->LPC (autocorrelation)... options. Enter the number of segments you want in your Vocal Tract as the prediction order. Then use To VocalTract (slice)... to generate the Vocal Tract object. Save it with Save->Save as short text file... . Note that there is a rather convoluted relationship between the LPC prediction order, the sample frequency, the recorded sound and the quality of the resulting LPC model.

You can download Praat from www.praat.org

ActionPraatScript
Sythesize sound
Read VocalTract fileOpen->Read from file...Read from file... a.VocalTract
Convert to Vocal Tract TierTo VocalTractTier...To VocalTractTier... 0 3 0.5
Convert Tier to LPCTo LPC...To LPC... 0.005
Select both LPC and Source audio fileOption/Control select source audio Soundplus Sound Source
Filter Source with LPCFilter...Filter... no
Resample to 10kHzConvert->Resample...Resample... 10000 50
Generate Source sound
Create an empty PitchTier objectNew->Tiers->Create PitchTier...Create PitchTier... Source 0 3
Add a high starting point at 120HzModify->Add point...Add point... 0 120
Add a low end point at 100HzModify->Add point...Add point... duration 100
Convert it into a phonation soundSynthesize->To Sound (phonation)...To Sound (phonation)... 40000 1 0.05 0.7 0.03 3 4 no
Scale to a nice intensityModify->Scale intensity...Scale intensity... 70
Create Vocal Tract
Read audio fileOpen->Read from file...Read from file... a.wav
Convert to LPC with predition order 40 for 40 tube segmentsFormants & LPC -> LPC (autocorrelation)...To LPC (autocorrelation)... 40 0.025 0.005 50
Convert LPC to Vocal tract, use slice at 2 seconds and a total vocal tract length of 20 cmTo VocalTract (slice)...To VocalTract (slice)... 2 0.20

Example files /i/ /a/

  • VocalTractExample.praat: Synthesizer script
  • CreateVocalTracts.praat Script to create example VocalTracts from example audio
  • a.wav: Example sustained /a/ recording (natural speech)
  • a.VocalTract: Example /a/ VocalTract
  • i.wav: Example sustained /i/ recording (natural speech)
  • i.VocalTract: Example /i/ VocalTract
  • a_i_synthesis.wav: Example /a/-/i/ synthesis

Example files /i/ /y/ (LPC order 44)

  • i2_y_i2_synthesis.praat: Example /i/-/y/-/i/ praat script
  • i2.wav: Example sustained /i/ recording (natural speech)
  • i2.VocalTract: Example /i/ VocalTract (4.0 s)
  • y.wav: Example sustained /y/ recording (natural speech)
  • y.VocalTract: Example /y/ VocalTract (3.5 s)
  • i2_y_i2.VocalTractTier: Example /i/-/y/-/i/ Vocal tract tier
  • i2_y_i2_synthesis.wav: Example /i/-/y/-/i/ synthesis
These files all Copyright © 2012 NKI-AVL and R.J.J.H. van Son. Licensed under the GNU GPL v3 or later See below

Vocal Tract tube models

The Vocal Tract area functions model the human vocal tract as a set of connected tubes with variable width. Determining the tube, or segment, areas with LPC is not very reliable. Below are presented tube models as determined with LPC

Praat Voice Analysis Software

Praat Voice

Praat Voice Analyzer

(prediction order 40) and the 'theoretical' models as given by Praat New->Articulatory synthesis->Create Vocal Tract from phone....
Vocal Tract tube model of /a/ (example)Vocal Tract tube model of /i/ (example)
Standard Vocal Tract tube model of /a/Standard Vocal Tract tube model of /i/
Vocal Tract tube model of /y/ (example)Standard Vocal Tract tube model of /y/

Praat Voice Analysis Tutorial

VocalTract file format

The example VocalTract file below is created with Save as short text file... . There is a more descriptive (longer) format that is obtained with Save as text file... .

Praat Voice Report

File type = 'ooTextFile'The line by which Praat can recognize your file
Object class = 'VocalTract 2'The line that tells Praat about the contents
Empty line
0xmin: First segment (Glottis, meter)
0.2xmax: Last segment (Lips, meter)
40nx: Number of segments
0.005dx: Segment length (m)
0.0025x1: Position of first segment
1ymin: NA
1ymax: NA
1ny: NA
1dy: NA
1y1: NA
0.00010813061971705616Area in m2
0.00010390570341053334Area in m2
8.903563828398031e-05Area in m2
0.00010876151465927323Area in m2
....Many more values
0.008175693406171154Area in m2
0.0013459947563683344Area in m2
0.04293933951717365Area in m2
0.000489118171677886Area in m2

VocalTractExample.zip: all files

License