Choir Note Distributions

in Data Things

A while ago, I wanted to empirically determine which voice group to choose, soprano or alto, for a new piece. I decided to download a bunch of choral midi files:

  • Bach's Hohe Messe, Johannespassion, Mattheuspassion, Weihnachtsoratorium (WO) and motet Jesu, Meine Freude (JMF) (I squished down soprano I+II into one part, which may affect the scores for JMF and HM)
  • Verdi's Requiem
  • Brahms' Requiem
  • Mozart's Requiem
  • Puccini's Gloria Mass

... and plot the average heights of each choir voice group. The results may shock you!

Check out this plot, it's the distribution of note heights for altos and sopranos, sorted by median height:

The main thing I noticed is there seems to be much less variance in the alto parts for all of these pieces. Especially the upper quartile of the data always ends at about the same note for the altos, but not the sopranos.

Composers seem to obey a sort of rule for the altos: "if it's higher than D#5, just don't", whereas for sopranos, it's free game even above a G#5 (which is the note at which MuseScore starts warning me it's a high note even for a soprano).

(I immediately recognized the high outlier in the alto part for Verdi: 'Movendi suuuunt' in the last movement of Verdi's Requiem. As for the other outlier, there is also a single F#5 at the end of the second 'Lass ihn Kreuzigen' in the St. Matthew passion, which is often transposed to an F#4)

A similar pattern can be seen for tenors versus basses, although there are no high outliers there:

If you are curious to see all the parts on one plot for all the Bach pieces, here it is (with soprano I/II separated!):

Again, the upper quartile of the alto/bass data (almost) always goes up to the exact same note, but not for sopranos and tenors, who have to deal with a different average range per piece. Conclusion: it's empirically more "annoying" to sing soprano/tenor :)

Of course, boxplots do not give the whole picture. High notes can be especially annoying when they are sudden jumps from a low note, pianissimo, or a whole measure long. These things are harder to automatically extract and visualize.

If you have any specific plots you want to see, let me know.