Kashu-do (歌手道): STRUCTURE OF THE VOCAL FOLDS: A three-dimensional view

Fig. 1. This two dimensional view shows the “depth” of the vocal folds on the Y-axis (vertical) and the the “width” on the X-axis (horizontal)  we are not seeing the “length” of the folds, which would be parallel to the ground.  
I chose to begin with this view so that we are aware of “what we are after” as singers.  The curlycue blue arrow shows the airway (the path of the breath stream).
Our first issue is the interaction of the breath stream with the vocal folds.  The epithelium and the superficial lamina propria (Reinke’s Space) together are referred to as the “Fold Cover”, the other layers getting progressively harder (stiffer: less flexible) are called the “Body”.  We would like for the vibration of the vocal folds during singing to be isolated on the fold cover (yellow and blue).  Isolating the fold cover gives the voice a sensation of flexibility: the type of sensation we identify as heady, fluid and tension-free!  I call this the Flag and Flagpole Effect.  For a flag to flutter freely in the wind, it needs the structure of the firm flagpole to steady it.
Fig. 2A.  This animated gif simulates the Flag-flagpole Effect.  The body of the fold the muscular red portion and the medial yellow portion are still, while the outside layer, the cover, oscillates with the movement of the breath (not seen).
Fig. 2B. This second animation shows a flexible body that allows the entire mass of the folds to participate in the vibration.  If the entire mass of the folds is active in the vibration, the sensation is one of tension and inflexibility.  Greater breath pressure is needed to maintain the vibration.  It is far greater work and it tires the voice.  The question remains how do we get the body of the folds to be firm enough such that the breath stream activates the cover only?
Dr. Zhang, Zhaoyang at UCLA, in a 2008 article shows that a stiffer fold body will isolate the vibration of the vocal folds along the mucosal edge. These two images from his article illustrate the different modes of vibration:
Fig. 3A. This first picture represents a model of loose fold body and cover.  When there is not enough antagonism between the Thyro-arytenoid group and crico-thyroid, the body (represented by the leftmost blue structure) vibrate with the cover (rightmost blue structure. The red structures represent the same two structures at rest in order to show relative movement).  The vibration will tend to be more difficult in such a case. More sub-glottal pressure will be necessary to start and maintain the vibration.  (This animated gif and the following one represent the left fold).

Fig. 3B. This simulation, on the other hand, represents a stiff fold body (leftmost blue structure is still) rendered by contractions of both main muscle groups.  This antagonism makes the fold body less mobile and isolates the vibration of the folds along the mucosal edge (cover, represented by mobile rightmost blue structure).  The antagonism between Thyro-arytenoid and crico-thyroid also increase the contact area along the mucosal edge.

Fig. 4. This is a similar two-dimensional view of the folds but more anatomically complete.  On the right side of the picture, I draw a red line from the top of the epiglottis around the vestibular fold (false vocal fold) around the true fold and down to the trachea.  This layer of tissue is one fold that covers the entire structure.  That is why we refer to the vocal cords as folds.  That tissue when it comes to the true vocal folds form the outer layer (epithelium).  That layer covers the rest of the components of the entire vocal fold structure.

Fig 4B. Let us concentrate our attention on the two lateral muscles on each side, the Thyroarytenoid (also called External or Thyromucularis) and the Vocalis muscle (sometimes called Internal or Thyrovocalis)! When these two muscles contract, they contract in opposite directions.  When it contracts, the Vocalis thickens the vocal folds vertically (gives it more depth—See the first pictures to have a perspective—greater contact area: more on contact area later).  The Muscularis contract the folds in the opposite direction also helping create greater vertical mass. (The Muscularis also contracts, slightly inward, which appears to have a secondary closure function).  When the two muscles are active together in opposition to the Cricothyroid (see below), they create a dynamic that renders them stiffer and less flexible.  When the muscles are stiff (and by proximity, the medial layers as well), the outer layers (the cover) alone respond to the movement of the breath stream.  This allows the tone to feel more fluid, less resistant to the airstream. What we identify as the head voice sensation.  The action of the Muscularis is at least equivocal. Although its action shortens/thickens the vocal folds, the fact that the CT is pulling on the Thyroid cartilage on the same vector as the Muscularis makes the exact nature of its contraction difficult to gauge.  Even though the Muscularis is a thickening partner to Vocalis, most articles and books give the task of vocal fold thickening to the Vocalis.  More interesting is the slightly inward angle of the contraction which contributes as a secondary adductor.
Given an appropriate dynamic between the two intrinsic muscles of the the vocal folds, pitch is pretty much controlled by the contraction or relaxation of the Cricothyroid Muscles.  They are included in the above picture but not labeled.
Fig. 4C. This picture, very similar to the one above it, has pointers to the cricothyroid muscle.  Let us have an outside view:
Fig. 4D. Looking from the outside we can see that when the cricothyroid contracts (contracts toward its point of origin the cricoid cartilage) it would pull the thyroid cartilage downward.  Looking at the picture directly above (inside view), it is clear that the vocal folds are attached to the Thyroid cartilage on the inside.  When the thyroid cartilage is tilted downward, the vocal folds stretch and are set up for faster oscillation (vibration) and therefore higher frequency (pitch).  We will discuss the mechanics of vocal fold oscillation later and how “shallower” folds (less deep) make for faster oscillations and higher pitch.  For now let us consider the dynamic antagonism between the three muscle pairs (one set for each fold): Thyro-muscularis, Thyro-vocalis and Crico-thyroid.

 The Thyro-arytenoid (TA) pairs (vocalis and muscularis) must relax to allow the stretching/thinning of the vocal folds that makes for faster oscillations and higher frequencies.  However, the lengthening of the folds help maintain the stiffness of the fold body as long as the TA group continues to be active and does not give in completely to the contraction of the Crico-thyroid (CT).  Ideally, the thickness (depth) of the folds change with pitch, but the stiffness of the body is maintained as long as there is enough antagonism (opposing action) between the three muscle groups.
The question in the singers mind will always be:
The singer’s experience is basically sensory.  Indeed we can only activate these muscles by having experienced certain sensations associated with their function.  Two basic sensations are Stretch and Substance.  When I sing a relaxed high note softly (a sensation akin to falsetto or flute-voice), we have a feeling that we can keep going up without problem, as if we can continually stretch. In fact there is a sensation of lengthening.  A sensation of substance, meatiness, full-bodyness is experienced when I sing a relaxed low note.  In both situations, the experience is one-sided.  If I increase pressure, the stretch-dominant note tends to go toward falsetto (over-stretch that cannot endure the increased breath pressure, causing the back of the folds to open–more on fold closure later).  Increasing volume on the  substance dominant note is easier, because the vertical fold mass is enough to endure the increased pressure. However, it becomes progressively difficult to rise in frequency because the set-up is one-sided in favor of the Vocalis muscle that governs thickness.  
In order to effectuate a gradual crescendo without loss of balanced coordination, both sensations must be engaged before increasing the tone thus (I chose to begin on Db4, right on the muscular balancing point of the tenor voice):

The next two videos show a soprano dealing with balance on both sides of the issue:  the tendency to lose stretch on the way down and to lose substance on the way up.  (Of special note is that the soprano has an excellent F2 dominance in her middle range–This will come up when we discuss resonance).

The state of the tone before the crescendo (we will soon bring breath pressure and fold closure into consideration) is interesting.  It is falsetto or a soft head-tone?  By our definition above, head-voice, is essentially proper muscular coordination including, appropriate balance between substance and stretch.  If the TA group is appropriately balanced throughout the changes in CT contraction and relaxation (pitch), the vibration will be isolated on the cover and the tone will feel heady and released.  However, we take for granted that fold closure and breath pressure play appropriate roles.


What happens if the two folds close too hard against each other? The fold cover would be pressed against the body (muscular layer) and would not be free to oscillate.  In such a case the vibration would have to include the entire fold structure (including the body).  The amount of breath pressure would have to be very high to maintain the vibration of greater horizontal mass, including a relatively still body.  This is why pressed phonation does not work and why it is not a remedy for breathy phonation.  Breathy phonation often occurs when the vertical mass (induced by the contraction of the Vocalis) is inadequate or the muscles responsible for bringing the folds to midline are not working adequately.
This is were I theorize (only because this has not been observed with scientific protocol yet):  I believe that when the vertical phase is too shallow, it takes the shape of a higher frequency (pitch) then is desired.  Therefore, the tendency is to sing sharp (higher frequency).  To compensate, the folds press together to slow down the opening phase of the vibration.  In this way the intended frequency is achieved, however the tone is pressed and the vibration includes the body. A singer who does not like the tension that comes with this pressed mode of singing might reduce the breath pressure by allowing the arytenoidal juncture (the back of the folds) to open creating a gap that allow the air to pass through.  There are many who use this pressed mode of phonation with leakage through the arytenoidal gap, without knowing they are doing it.  It can be done subtly or not so subtly.  When it is a minor compensation it does not sound badly and it can be difficult to convince a successful singer to change.  I have observed this strategy in many high voices, particularly coloraturas and Rossini tenors.  This approach is also common among singers of early music.
While we are on the subject of fold depth and closure, it is worthwhile here to mention that a recent paper (Journal of Voice July 2014) by Harry Hollien (University of Florida): Vocal Fold Dynamics for Frequency Change, confirmed that fold mass is basically the same for a given fundamental frequency (pitch) regardless of who the singer is.  This means that fold thickness for a coloratura or a bass is basically the same when they are singing the same pitch.  What is different is the relative longitudinal tension (tautnessof the vocal folds on the given pitch.  A coloratura singing C4 (middle C or C1 in the European system) is in her lower range and thus has relatively relaxed folds whereas a bass singing the same note is in his upper range and has much more longitudinal tension:

What was not expected was the relatively high correlation between vocal fold thickness and absolute fundamental frequency of phonation…As can be seen, the thickness of the folds appears to be reasonably similar at each fundamental frequency no matter if the subject was male or female or had a high-pitched or low-pitched voice.  Thus, it appears that the per-unit mass of the folds relates to the frequency produced no matter how massive (or not) these structures are naturally. (Hollien p. 400)

Hollien later explains the correlation between thickness of the folds, variations in length and overall mass.  Although the bass folds must lengthen considerably to achieve C4, in the end, the vibrating mass is the same as with the coloratura who does not have to lengthen very much to sing the same pitch.  This very complex experiment at least tells us there is an optimal fold depth and length index for a given pitch produced by a given voice.  If that depth/length relationship (Stretch and Substance) is not achieved, there must be compensatory measures (usually pressing and raised breath pressure).

At this juncture, we can pedagogically conclude that during phonation of a given pitch, there must be a balance between fold thickness (depth) and lengthening that adheres to a gentle closure of the folds such that the fold cover is not trapped.   The next area of concern is therefore how the muscles that govern fold closure (Lateral Crico-Arytenoids and Inter-arytenoids) respond to increased breath pressure.

Fig. 5A. The right Posterior Cricoarytenoid (PCA) muscle is removed in this picture to feature a clear view of the Lateral Cricoarytenoid (LCA).

Fig. 5B.  The rendering above takes all obstructive tissue away so we can see how the LCA attach to the muscular process of the arytenoid.  The black dot represents a swivel point.  When the muscle contract in the direction of the Cricoid (unseen here–muscle contract in the direction of the point of origin.  They are names by point of origin and then point of insertion. The CA is so named because it originates at the Cricoid and inserts into the Arytenoid. Thus Crico-Arytenoid), the arytenoid swivels bringing the vocal processes (where the vocal folds insert) inward and closing the glottis.  It should be noticed that the swiveling of the arytenoids inward also creates a gap in the back.  The arytenoids also have the ability to rock inward where the gap is.  This is controlled by the Inter-Arytenoids (IA)
Fig. 6A. The picture above shows both sets of Interarytenoids (IA): transverse and oblique.  The transverse go across parallel between the arytenoids.  When they contract they bring the arytenoids closer together and close the gap.  The obliques do the same but draw the arytenoids in diagonally. Both actions are necessary to completely close the posterior gap.

Fig. 6B. This picture gives a clearer view of the arytenoids and shows more clearly the layers of muscles.

Are these muscles strong enough to maintain gentle closure even when breath pressure increases for volume.  In other words loudness has the potential of disrupting balance if one of the muscles can not maintain its proper function when pressure is applied.  There must be a means of strengthening these muscles in balance (we shall discuss the logic behind occlusives later).

Just to be thorough, I must mention the Posterior Crico-arytenoid (PCA).  It is responsible for abducting (draw apart) the folds.  Muscular activity has been observed in the PCA during phonation, which would be unexpected.  I have a couple of theories on that.  Since all muscles are paired, it is possible that when the adductors (Lateral CA) are dominant (as expected during phonation) that the abductors (Posterior CA) provide counterbalance. It is also possible by the vector of their contraction that PCA counters the vector of the Crico-thyroid, that stretch the folds for pitch.

Finally I must address the secondary adductive function of the Thyromuscularis (external TA).  I mentioned above that this muscle contracts slightly inward and since its vector is more or less the same as the CT, when the folds are elongated, they tend to come together a little more.  This secondary adduction must be taken into account.  Sometimes inefficiency occurs not because the IAs or the LCAs are functioning inadequately but rather because the folds are not lengthened enough for the desired pitch.  There can be many variations on how a given fundamental frequency is obtained. It is theoretically possible that the vibratory cycle occur without the top of the folds closing.  This mode of vibration  would be possible for folds that are too deep (TA-hyperfunction).  This is the second sound I demonstrated on the first clip.

I will stop here for now.  We will continue soon with breath, resonance, etc…

© 07/08/2015

7 thoughts on “Kashu-do (歌手道): STRUCTURE OF THE VOCAL FOLDS: A three-dimensional view

Add yours

  1. (My comment from yesterday didn't seem to go through, so I'll make a second try!)

    Thanks for a thorough treatment! I will need to go through this slowly, and also read the interesting article you pointed to (Just downloaded it). Luckily, I have a full day of travel tomorrow to go through the stuff!

    I am quite surprised over the quoted conclusion, that the vibrating mass should be the same for the same absolute pitch regardless of whether a bass or a soprano are singing the note. Can this really be true? I have always thought that the difference should be similar to the difference between a double bass and a violin playing the same pitch. That is, the vibrating mass of the bass string is higher, and to compensate for that, the tension is higher. (In general, the simplest “lumped” model of vibration is that the square of the frequency is proportional to the tension divided by the mass.)

    Well, I will read the article and will return with thoughts about this fascinating issue!


  2. This is an argument I made quite a while ago, and it always seemed logical to me. Although the mass would be close to the same, the longitudinal tension would be different. The folds are more relaxed for a soprano singing F4 than a bass singing the same pitch. The soprano is in her low range while the bass is at his top. How is the cello string different? Even if the overall mass is considerably larger than the violin, because the strings are stopped, isn't the vibrating portion more or less the same? The mechanism of fold vibration is slightly different in the way the air is propagated. Could this play a part?


  3. Isn't the difference therefore more a question of the percentage of the overall mass that is isolated to vibrate and what acoustic function is played by the non-vibrating portion of the instrument in question. such that a cello playing the same note has a greater overall mass and a smaller percentage of the string is actively vibrating?


  4. So, now I've gone through the article by Hollien. The article, although published in 2014, does not really report any new findings, but reviews and compiles results, where most of the primary measurements were done already in the beginning of the 1960s.

    It should be stressed that trained singers are explicitly excluded from Hollien's studies!

    The main question I have about Hollien’s article is: what quantities are really measured? A main conclusions (Table 4 and 5) is that what the author calls total vocal fold mass (TVFM) is more or less constant for each individual across all pitches! Moreover, the TVFM is strongly correlated to voice type; its mean values, from which is differs very little for different pitches, are 558, 342, 221, 137 (unit unclear) for low male, high male, low female, and high female, respectively.

    This sounded at first very strange to me, until I realized that TVFM likely don't corresponds to effective vibrating mass! What the authors of the original articles have done is to look at ordinary and X-ray pictures of the vocal folds and measured the horizontal length of the fold and the area of a vertical cross section of the folds, that is, roughly the red, yellow, and orange regions of your animated gifs above. From this area, which also includes the underlying muscle, a vocal fold depth is calculated (Hollien calls it “thickness”, but it really seems to be the depth). This depth seems to correlate strongly with absolute pitch across voice types; it decreases rapidly with increasing pitch. The TVFM is then defined as the length multiplied with the depth. (To me this seems like area rather than mass, but OK…)

    With this in mind, the conclusion that the TVFM is constant for each person but varies with voice type, simply tells me that the total volume of the vocal folds don’t change, which seems absolutely reasonable. The shape changes, but not the volume; it has simply nowhere to go! What happens is that the depth decreases and the length increases with increasing pitch, which also seems reasonable because of CT action.

    However, a large portion of TVFM is likely not active at all in the vibration, and I don’t think the results gives much indication about the effective vibrating mass at different pitches and for different voice types.


  5. I understan Martin. What is significant here however is that the “area” X x Y is relatively constant for all voice types on a given F0. It is understood that the Z axis which represents the actual A-P length is not considered in its entirety. What he calls mass is X times Y times a given Z value (I believe). It is understood that the different voice types have different total A-P lengths (Z). Since the vibration cycle for the voice is based primary on the value of X-axis it is significant that the value of X is the same irrespective of voice type. Whether or not someone produces an efficient tone that isolates the fold cover or not is a question of refinement. That the X value is the same means that we can expect certain norms relative to F0 production. That is significant. More focused studies need to be done on the finer variations of this “cross-sectional mass” to determine the parameters of efficiency among professionals and between pros and non-pros.

    Longitudinal tension is obviously going to have in effect on what part of the horizontal thickness will be active in vibration and we should not expect the actual vibratory mass to be the same among singers. Two tenors of different weight will have different thickness of fold cover. That is not going to have an effect on the speed of the vibration cycle. What does is whether the X value is constant for both and I think the findings of this study imply that the X values will be closely similar across voice types.


  6. OK, you talk about X, Y, and Z, but I have to say that I don’t understand your coordinate system 😦 And as far as I can understand, there is no coordinate system defined in the article either… Anyway, what Hollien calls total vocal fold mass (TVFM) is, in his terminology, L x T, length times “thickness”. (You may have misunderstood this since you talk about the length as the “Z value”). And what Hollien calls “thickness” is really what you call “depth”. (If I don’t misread him, that is. The article is not crystal clear everywhere.) So “mass” for him is length times depth .

    Moreover, the conclusion from the article is that it is the depth of the vocal fold that is relatively constant for each absolute pitch across voice types, and the depth shrinks very strongly with increasing pitch. This is certainly a very significant conclusion! In contrast, TVFM differs a lot between voice types, but it is more or less constant over the pitch range. This makes me suspect that TVFM is really a measure of the total amount of tissue, which of course does not change. The shape of the folds change under muscle action, but the total volume does not.

    I think what Hollien really is studying is the general, gross posture of the folds, where the most significant conclusion, from what I understand, is the lengthening of the folds with pitch and the strong thinning of depth with pitch. However, which parts and how much of the folds that are active in the vibration is not really studied, except perhaps the stroboscopic pictures of figure 8, which seems to indicate that it is mostly the top part of the folds that vibrate at that pitch. I suspect that this type of phonation hurts a little; it may have been me before starting studying with you 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: