Kashu-do (歌手道): Fundamentals of Vocal Acoustics: What we need to know as singers

After writing this blogpost and viewing the videos, it became clear that the definition of the screen recordings is too low to see the exact frequency numbers.  The grid of the spectrum view has vertical lines markers separating every 500Hz.  This should give the viewer an idea of the frequency peaks.  I will edit later to add the information that cannot be seen immediately.

The amount of information available relative to vocal acoustics is exhaustive and exhausting.  To understand the really profound stuff one should have at least a strong physics background.  However some basic mathematics is enough to understand the stuff that makes a difference to how we singers think.

The first thing to understand about a tone is that it is much more than what it is named.   When we sing or play the tone A4 (so called because it is the 4th A from the bottom of the standard 88 key piano; 440 Hz (Hz=Hertz  or oscillations per second.  It is the unit of acoustic measurement), the pitch orchestras tune by, soprano middle range, tenor high range), we are singing that tone (called the Fundamental [F0] or the first harmonic [H1]) plus all its overtones also called natural harmonics (H2 for the second harmonic, H3 for the third, etc…).  The harmonics are multiples of the fundamental tone. If A4 (440 Hz) is the fundamental (first harmonic, H1) then H2 is 880 Hz, twice the frequency rate and H3 is 1320 Hz, 3 times the frequency rate of the fundamental (H1).

At the end of the video, I freeze the spectrum view and point the cursor to each harmonic:  The fundamental (F0 also called H1 for first harmonic) is about 1000 Hz, H2 is about 2000 Hz, H3 is 3000 Hz and H4 is 4000 Hz.  It is clear that the frozen display shows the harmonics decrease in intensity as they get higher in frequency.  This is standard for a tube like the Irish flute but not necessarily so for a piano.

The lower half of the acoustic piano has keys that activate a felt-covered hammer to strike three strings of equal length and thickness.  Even though highly calibrated, the strings are not always struck with equal intensity. Furthermore, the shape of the pianos sounding board (its resonator) favors certain frequencies to others.  It is a complex instrument in that way.  Different pianos playing the same note may produce slightly different acoustic patterns.  The rate of decay of the piano’s sound also contributes to how the acoustic pattern appears depending on when the display is frozen.  An early freezing of the display (i.e. right after the note is struck) gives us a pattern closer to the acoustic nature of the piano.  The higher notes of a piano however are very thin and are produced with two strings instead of three.  The pattern there resembles the Irish flute a bit. That is because the sound board (resonator) of a piano favors the lower notes.  Like a violin (most instruments in fact) the piano has a fixed resonator that acts on different notes with different intensity.

All pitched instruments (instruments that produce a regular oscillation of sound) regardless of make up, including pitched percussion, will produce the same harmonics.  What gives each instrument its unique color (timbre) can be observed by the relative strength of the harmonics.  Each instrument has a different variation of strength of the harmonics of a given note.  In fact, different notes on the same instrument may produce different relative strengths in the harmonics.  In a straight tube, like the Irish flute in the video above, the harmonics tend to decrease in strength the higher they are.  However depending on the shape of the instrument’s resonator certain acoustic regions will be stronger than others and so certain higher harmonics will be stronger, reflecting the specific nature of the instrument’s make up.  An instrument, based on its acoustic make-up is expected to produce very predictable resonance.

The human vocal tract is flexible and changeable and so it can change its acoustics from one moment to another. Those changes will have a direct impact on the note being produced at any given moment.  That is the virtue that makes the human voice very unique. It is also what makes it difficult to make consistent.

We can hear the difference between the three sopranos above singing the same note. (The three sopranos were recorded in different rooms. The piano was slightly lower for Soprano2 who has an exceptional ear).  The three spectrographs show us differences that are significant.
  Soprano1 is an amateur beginning to coordinate her voice. The level of breath support is lacking and the spectrograph shows almost equal strength in the harmonics.  This is not desired. It reveals a certain amount of inefficiency (to be discussed later).  Still a good beginning!
  Soprano 2 is more advanced and reveals an excellent selection of a relatively narrower range in the lower harmonics than soprano 1 and a narrow range in the upper harmonics as well.  Selecting areas of strong resonance and having others weaker signifies that this singer is already focusing the acoustic energy and not spreading it across the entire spectrum as does soprano 1.

  Soprano 3 is similar to Soprano 2 except her pattern shows a cluster effect in the upper harmonics suggesting a strong “ring” in the voice.  This is the effect of the Singer’s Formant, which will be discussed later.  It is important to note that all three singers are lyric coloraturas of similar range and vocal size.  The difference is not only acoustic. These singers are at different levels of experience and underlying muscular development, which impacts the strength of the source tone (the laryngeal vibration) .

On a given fundamental frequency (F0) the harmonics are predictable.  However there are two fundamental variables: 1) how the source tone  is produced (there are many components to the laryngeal tone) and 2) how the vocal tract is shaped.

For the time being let us assume the source tone is optimally produced, we are left with the possible variations in the vocal tract.  We have to consider what the components are that can vary and how they affect the sound.

The optimal volume of the vocal tract is dependent upon five variables: A) laryngeal depth B) opening of the jaw C) variations on tongue position D) the shape of the lips and E) the position of the soft palate (closing or opening of the velar port: nasal or not nasal).

There are many theories about the soft-palate and how high it needs to be.  It basically responds to a desire not to be nasal.  Its function is related to laryngeal functions including the depth of the larynx, which itself depends on phonation, as well as tongue migration.  The pieces are inter-connected.  A weakness in the source tone can affect the ability of the velum to close the passage to the nasal cavity. A nasal tone has proven to have an adverse effect on the resonance of the vocal tract, producing a weaker and less balanced tone (balance of low and high harmonics what is often called chiaroscuro or balance between bright and dark).

The concept of a “low larynx” is commonly accepted as beneficial to resonance.  A high larynx contributes to many vocal faults AND is caused by vocal faults.  The question is rather how to achieve a low larynx without losing other fundamental functions, such as a flexible tongue and raised palate.  All functions must be able to be achieved satisfyingly without disturbing other functions.

The optimum spatial nature of the vocal tract could appear to depend on taste.  Some teachers insist that the jaw has to be relatively closed and that releasing the jaw even barely will cause a loss of high harmonics.  This is obviously false.  Other teachers insist on the jaw being opened three fingers tall. Others insist that the jaw must be pushed back.  The jaw should open to what I call its natural maximum.


The singers in the photo were told to push the jaw back as in the pictures on the left.  This was beginning to cause both discomfort as is obvious by their looks . When advised to allow the jaw to release according to its natural contour, the result was the photos on the right.  The alignment of the lower jaw seems appropriate to each structure whereas one can see that the lower jaw is crooked toward the right in the left photos, when they attempted to push their jaw toward the back.   An inappropriate opening or forced closure of the jaw during singing does not make for a high quality tone.  There are those who have great source tones and can get away with inappropriate resonance adjustments.  These types of singers make a conversation about efficiency difficult to sustain. I posted these pictures because I encounter many singers with TMD (Temporo-Mandibular Joint Disorder) who acquired it after they began singing lessons.  Many teachers afraid of a protruding jaw suggest that the jaw should be pushed back.

A jaw released to its natural maximum (different for each physiognomy) regardless of vowel and through the articulation of most consonants, contributes to a resonance atmosphere of regularity and constancy. A fully open vocal tract creates the conditions for optimal resonance of lower harmonics, which leaves the tongue as the principal element to partition the vocal tract, creating conditions for a balance between lower and higher harmonics.  When  the jaw is released to its natural maximum and the larynx is released low, the tongue must migrate further to create the [i] to [E] spectrum of vowels.  In speech we do a combination of subtler tongue migration and closing of the jaw to achieve an [i], so singers assume this is natural.  Yet they usually open the jaw when they produce the same vowel on higher fundamental frequencies (pitches).  An [i] vowel is better balanced when the jaw is released and the larynx is low creating conditions for optimum resonance of the [i]’s very low first formant (F1–more on this later).  The coordination of released jaw, low larynx and high tongue position is not easy to achieve.  Those that seek immediate gratification and quick results usually go the easy route and close the jaw and/or allow the larynx to rise for [i], the [a] vowel that comes after such an [i] would usually be weak because it would require adjustments that do not occur very quickly.

The lips are refiners and rounding them should only be used for vowels that require rounding such as [o] and [u] and mixed vowels [ø] or [y] (for example).  There are some specific situations where a slight rounding makes for a better resonance adjustment but overuse of lip rounding often replace a low larynx to produce a warmer sound.  The rounding of the lips does not produce the same results as a larynx relaxed to its lower position.  Lip rounding has a way of dampening high harmonics rendering the tone warmer but at the expense of high resonance that is needed for the voice to be heard over loud accompaniments.  A low larynx enriches low partials given the voice warmth without eliminating high partials, as long as the tongue is able to migrate naturally and not muffle the resonance by pushing down on the epiglottis.

This brings us to the tongue.  It is the most agile, multi-faceted  and complex muscle we deal with as singers.  If it is not handled with specific expectations and intent it tends to do what it wants  to compensate for weaknesses elsewhere.  When the rest of the vocal tract is optimized (i.e. low larynx, closed velum released jaw and relaxed lips ready to be shaped “as needed” and not rounded when not needed) the tongue becomes the most important agent of resonance change.  The tongue repartitions the vocal tract to create the fundamental vowel spectrum from [i] through [e], [E], [ae] to [a].  The lips then round to continue from [a] through [O], [o], [U] to [u].  Combining lips and tongue create mixed vowels such as [y], [Y], [ø] [oe].  Through all these changes the jaw remains at its natural maximum, the larynx floats low and the velar port remains shut.  Here is a very clear, concise and thorough discussion of the tongue’s intrinsic and extrinsic muscles and how they interact

The vocal tract, like any space has resonant frequency bands called formants.  Depending on the shape of the vocal tract–what we recognize as vowels– these formant areas move around.  Looking at a spectrograph, vowel formants may be identified based upon where the strongest harmonics are.  For our purposes, the voice displays 5 formant areas.  The lowest two have the strongest impact on vowel recognition.  The upper three combine to produce strong higher harmonics that make the voice seem more present. Formants bandwidths vary with frequency.  The lowest vowel formant (the first formant of the vowel [i])around 250 Hz has a bandwidth of around 50 Hz whereas the highest formant value around 4000 Hz has a bandwidth of 200 Hz.

The exact formant frequencies for a given vowel are similar for all singers, however they do vary subtly between voice types and probably to a certain degree for each individual since we do not have the same size and shape of vocal tract.  A simple way to find formant frequencies is by producing a a gentle vocal fry (also called pulse tone).  A vocal fry requires little air pressure, a fact that reduces the strength of the harmonics so much that only the formants are seen:

In this video, I freeze the spectrum view (bottom of the screen) to allow the viewer to see the formant peaks for each of the cardinal vowels ([a,e,i,o,u]).  The peaks also give the exact frequency numbers. In the video that follows, I sing all 5 vowels on the pitch f3=267Hz (so named because it is the third f from the bottom of the standard keyboard.  You will see that the peaks in the spectrum are pretty close to what was experienced in the fry-tones for the respective vowels.

What we should take from this is the following:

The sung pitch (fundamental and its harmonics) are given.  The overtones cannot change. They are exact multiples of the fundamental.  However, the formants can change their location (frequency).  Assuming the source tone is of good quality, if the sound output is not good, the vocal tract must be adjusted. This is called vowel modification or formant tracking. That said, I must say that I observe greater fault in singers’ source tones in general, which then leads to over-manipulation of the movable parts of the vocal tract (i.e. jaw, tongue, lips, soft palate and laryngeal depth).

The strategy for the classical singer is to use the formants in ways that concentrate the acoustic energy of the vocal tract in specific areas to achieve a balance between low and high harmonics. As previously explained the lower two formants have an effect on vowel recognition, while the upper three can combine to make the voice more present to the human ear.  The human ear is most sensitive in the area between 2000 Hz (2 kHz) and 3000 Hz (3 kHz).  The upper three formants can concentrate acoustic energy in that area making the voice very present to the listener’s ear. 

The fourth formant (F4) (called Singer’s Formant, SF for short) is a special formant frequency that is believed to be the result of a large ratio between the size of the pharynx and the Ary-epiglottic fold (also called the collar of the larynx).  It is calculated that a ratio of at least 6:1 (can be 7:1 or 8:1 but not 5:1) creates the conditions for a special resonance at the opening of the Ary-epiglottic fold into the pharynx.  This resonance (formant) is not vowel dependent.  It only depends on the size relationship between the two structures.  However, faulty production of a vowel can reduce the size of the pharynx thereby eliminating the conditions for the SF. Although I believe that any singer can train to produce the SF (I did not have it for a long time and now I do), some singers’ throats are predisposed for the necessary conditions.  The efficiency of the source tone is often influenced by speaking habits, including social and linguistic influences.  The size of the pharynx can be widened with technical exercises.  It is conceivable that someone may have a pharynx that is genetically too small to produce the fourth formant, SF.  This fact would not eliminate a singer’s operatic viability. The third and fifth formants lie between 2 kHz and 3 kHz. They can give a singer’s voice the presence needed for operatic singing.  However, the presence of the crucial 4th formant can draw the energies of the third and fifth formant creating a cluster effect thereby concentrating the acoustic energy in that region in a way that makes an impressive impact with orchestral accompaniment.  Singer’s who have such an ability have an acoustic advantage and sound often more impressive than their colleagues on stage.  

Female singers sing approximately an octave above their male counterparts (that is alto to bass or soprano to tenor).  If a soprano sings G5, a fourth below her high C, the fundamental frequency is 800 Hz.  This mean the harmonics would be as follows: H2=1600 Hz, H3=2400 Hz, H4=3200 Hz etc… The SF for a soprano is thought to be between 2900 Hz and 3200Hz depending on the specific singer. Even if the bandwidth of the SF were around 200 Hz, its frequency would have to be at least 3000 Hz in order to catch the H4 (fourth harmonic).  Because the harmonics are so far apart, the singer’s formant does not always have an effect on the soprano or mezzo voice.  However, there is no reason other than pharyngeal size that would prevent a woman from having the SF in the middle range quite consistently. However, there are problems in modern training.  The discovery of the acoustic passagio (where the first formant loses dominance to the second) in the female lower voice has caused teachers to think of the middle voice as a separate register from a source tone perspective. Today’s female singers often do not develop the source tone enough in the middle range to achieve strong enough harmonics that would carry the influence of the SF resonance.

© 08/01/2015

2 thoughts on “Kashu-do (歌手道): Fundamentals of Vocal Acoustics: What we need to know as singers

Add yours

  1. Thank you!
    This is a wonderful article full of amazing information that I have been researching on the net.
    I also read and enjoy some of Titze's work, and study voice and Tai Chi!

    I am looking to understand what the female acoustic passagio truly is. I am talking about the high passagio.

    I never really understood and felt what meant that the passagio is the place where “the first formant loses dominance to the second”. How can we differentiate both formants in daily practice? How can we allow one formant to dominate the other in the passagio?
    I often have a hard time in this passagio, probably because I have not yet mastered the three dimensional vocal fold closure as well…

    Please, could you enlighten me?

    And thank you so much for this blog, which is pure gold :-)!


  2. You ask a very interesting set of questions @ Math Flair! The female voice has two major acoustic events (of course we could look at others beyond the traditional operatic range). In the lower passaggio (1st acoustic passaggio around Bb3-G4), the vowels reach first formant limits and the second formant takes over. Yet the first formant does not disappear. In a balanced tone F1 and F2 are never extremely distant from each other in strength. That balance is indeed partly dependent on the balance of the source tone. In my experience, we must develop the source tone appropriately before the acoustics make any sense. The second female acoustic event happens between C5 and G5 approximately where the first formant becomes viable again at lower harmonics–that is, as fundamental frequency rises, the first formant falls between harmonics a bit (except for the [i] and [u] vowels that have very low 1st formants anyway) and loses influence. Eventually, the first formant falls to the next lower harmonic and becomes influential again.

    We can use extreme vowel modification to influence formant frequencies, but this is not the way to use this information. In classical singing as with all other forms, text is important (si canta come si parla). Vowel modification occurs (almost passively) if laryngeal depth (depends upon a balance source tone relative to folds coordinating with breath) is maintained. From one half tone to the next there should be subtle vocalic modification such that no extreme adjustment becomes necessary.

    I hope this addresses some of your questions. It was a real pleasure to experience your beautiful voice in person.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: