Reflexions, the University of Liège website that makes knowledge accessible


The layman, a competent judge of singing voice

12/3/15

How do listeners perceive whether a singer is in tune or not? It is challenging to define it objectively. In spite of the difficulties involved, Pauline Larrouy-Maestri, a researcher at the Max Planck Institute and a scientific collaborator with the department of psychology of the University of Liege, has succeeded in doing so. By quantifying objective criteria for judging singing accuracy with the help of computer programs, and by comparing the subjective judgements of music professionals and laymen, she succeeded in evaluating the perception of accuracy among the two groups. This research greatly alters the commonly-held idea that music professionals are better equipped to judge voice accuracy.

singing voiceWhether we are listening to the radio, watching one of the many reality TV shows aiming to discover the latest star or even listening to a relative singing their heart out under the shower, we are constantly evaluating the singing accuracy of our peers. Whether or not we do so unconsciously, objectively or based on informed knowledge, we all have our own criteria for deciding where singers appear on a scale ranging from those who do not have a note in their head to those who are professional singers. For the layman, making a crude distinction between the two extremes seems to be childsplay. The question becomes a lot more difficult, however, when the distinction becomes the subject of scientific research. What does “singing in tune” actually mean? And what does “knowing whether a song is in tune” mean? The answer can be multi-faceted. Different phenomena come into play. There are questions of a physical or acoustic nature, questions related to cultural heritage, very objective criteria and other criteria that depend on individual appreciation etc. It is therefore very difficult to know what judging singing actually refers to.

These are the complex questions that interest Pauline Larrouy-Maestri, a researcher at the Neuroscience Department of the Max Planck Institute in Germany and scientific collaborator at the Psychology Department of the University of Liege. She had an article(1) published in PLOS ONE devoted to comparing the judgement of layman listeners and music professionals. It is a study based on rigorous quantitative criteria, whose conclusions have raised more than a few eyebrows. Indeed, in many respects, it would appear that “non-musicians” have no reason whatever to be embarrassed about their faculty to determine what is accurate and what is not. The study has yielded surprising conclusions.

Identification of objective criteria

Between music studies at the Royal Conservatory of Mons and a Master’s in logopedics obtained at ULB, the way was almost paved for Pauline Larrouy-Maestri. Attracted by scientific research, she met Dominique Morsomme, head of the Voice Therapy Unit at ULg, and began a thesis on the perception of singing accuracy. “I wondered what led us to decide whether a song was accurate or not. There are many television programmes which expose us to this kind of voice evaluation but nobody really knows how to identify the factors involved in deciding if singing is accurate or not”.

For the young researcher, it was first necessary to concentrate on an objective and quantified means to measure singing accuracy. To do this, she first used optimised computer programmes for acoustics and music for the purposes of her research. The voice, like all other sounds is a measurable acoustic signal. In the case of untrained voices, it is just a simple question of extracting the fundamental frequency for each note sung and to measure the relationship between the different frequencies to verify whether the singing is accurate or not. “These tools enabled me to quantify three music criteria: melodic contour errors, when the frequency of the note varies in an unexpected direction, incorrect intervals, gaps that are too large or too small between two notes, and finally changes in tonality during the piece. These changes occur when an interval error has not been compensated for afterwards. The singer has not corrected it and continues in another tonality”.

The computer against the experts

Once the computer programme was developed, a database needed to be created. The speech-language pathologist took her study to the streets and, with the help of students, recorded 166 volunteers who agreed to sing “happy birthday”. This is a simple piece which everybody knows. This sample included men, women, young people, elderly people, shy people and others who love to sing…“The only condition, says the researcher, “was that they did not have a trained voice. I wanted the sample to be exclusively made up of people who had not formally learned to sing”.

These sequences were then quantified according to the three criteria developed. For each of the 166 performances, Pauline Larrouy-Maestri used an objective analysis of three possible types of errors and therefore three values. Following this, the researcher selected 18 experts who had studied classical music for several years or who had some expertise relating to the voice. They evaluated the 166 songs on a scale of 1 to 9 ranging from “not in tune at all” to “absolutely in tune”. “We therefore had three objective criteria for judging accuracy and 18 subjective evaluations from expert judges. We were able to compare objective and subjective data, apply statistical analyses to observe the relationship between the opinions of the different judges and the value for the three criteria measured”. The result showed a strong correlation between the objective criterion and the judgement of the different experts. For example, if the computer identified a large number of interval or tonality errors, the judges gave this same performance a low mark (for more details see below: “layman listeners take part in the study”). This results , published in 2013 in the Journal of Voice, were also used for the present study. If the researcher had just found a model that was capable of quantifying and explaining the perception of accuracy by the human ear, her experiment had only just begun.

A large number of studies show the benefits of learning music. Musicians have better attention spans, better concentration and sound perception, are better at learning languages and develop faculties which they can transfer to other activities. “I am not saying that these studies are unfounded”, says the logopedics expert. “But from their point of view, they draw a picture of a musician as a person with particular cognitive faculties. And yet, I have friends who are not musicians and sing better than me or who have a keener sense of hearing and who can recognise and reproduce sounds, or find a theme on a piano without ever having learned to play this instrument. All our lives we sing or listen to music, and assimilate the musical system in which we live. At school we make music together, we experiment with writing songs… So to some extent, we are all musicians. In the same way we are all inclined to evaluate or judge the performances of others”.

Layman listeners take part in the study

“We are all musicians”. This is the intuition that the speech-language expert followed in order to scientifically verify that non-musicians had an implicit musical training. To complete her experiment, she had the 166 versions of “happy birthday” evaluated by 18 new judges who were laymen this time. The judges were required to have had no musical training but needed to be able to hear correctly (normal audiometry and not low-scoring on the Montreal Battery of Evaluation of Amusia). They were also chosen in such a way as to be able to suggest similar profiles to those of the experts. The same proportion of men and women, young people and elderly people from the same socio-cultural background… The only major difference was the absence of formal music training. In contrast to the experts, the non-experts had to take the test two times at intervals of eight to fifteen days. This strategy ensured that they did not change their judgement criteria in the interim which would have suggested, for instance, that the experiment was itself a learning process that had had influence on the participants.  

Experts non experts
The main thrust of the research can be summarised in the figure below which lists the relationship between the opinions of the different judges and the results at the three objective criteria. The first column represents the data for the group of experts (mentioned in the 2013 article), the second and third columns represent the data gathered for the two non-expert groups. On the Y axis, the gradation ranging from 0 to 1 represents the correlation coefficient between the objective values and the opinions of the judges. The closer the coefficient is to 1, the more the objective criteria analysed by the author of the experiment correlate with the judgement of the judges. For example, the first box on the top left represents the opinions of the experts and the criterion “deviation of intervals”. Before being submitted to the panel, the 166 versions of “happy birthday” were objectively analysed according to this criterion. In other words, the approximations around the intervals between two notes were quantified. The correlation coefficient for this box is around 0.8. “This result signifies that the two operations are connected”, explains the researcher. “The more the computer records a high level of precision in the intervals, the higher the score given by the judges. This is reassuring because it was to be expected”.

The red line present on each of the boxes corresponds to the level of significance of the results. Everything below this line is considered as statistically significant. “The relationship between the judgment and the objective criteria do not occur by chance. If we reconducted the same experiment, we would have a 95% chance or greater of obtaining the same results, or, put another way, that the results observed have a less than 5% chance of being obtained by chance”. Another general point to be understood from this figure is the gradation of the X axis ranging from 1 to 18. “The black box above the number 18 represents the average opinion from among the 18 judges. To the left of the box is the average of 17 of the 18 judges taken randomly etc. On the far left, the opinion of only one judge is represented, also taken randomly. In all of these cases, we observed the relationship, the correlation between these groups of judges ranging from 1 to 18 and the objective analyses obtained by means of our computer programs. It is to be noted, for example, that in the first box, we only need 3 judges to obtain a strong correlation between measurement of objectivity and evaluation”.

One last small detail must be taken into account before fully understanding what this figure reveals, it concerns the different sized “moustaches” that appear on each side of the small black boxes. The smaller these moustaches are, the weaker the variability is between the opinions of the judges. If they are bigger, the judges have not answered in the same way. This is an important piece of data because the more the judges are in agreement with each other, the more this signifies that their opinion is linked to clear and objective evaluation criteria, or in any case conditioned by the same kinds of learning. As the boxes represent averages, it is logical that there would be no moustaches for the group of 18 judges given that there is only one answer. Conversely, these moustaches are bigger when they combine the separate opinions of the 18 judges on the far left of the X axis.

Very competent non-experts

A lot of information can be obtained from this figure. The correlation coefficients of the experts and non-experts are similar overall. This signifies that the non-experts are also “objective” when they judge singing accuracy. The results are also well below the red line which means that they are significant. “In any event, the researcher points out, “The moustaches for the non-experts are bigger than those for the experts. This means that the experts supply more conditioned answers than the non-experts which are more variable in their evaluations. We can see that a common sensitivity appears here. In fact, if the non-experts were not fully in agreement with each other, the moustaches would cover the entire box. After a certain number of judges, the non-experts show performances that are almost equivalent to those of the experts”.

A second observation concerns the two tests offered to the non-experts. The figure shows that they react in a similar way both times. “This is encouraging. When the same experiment is carried out twice in a relatively short space of time, we want to find out if there has been a change in strategy or an effect of learning. That is not the case in this instance”. This result bears testimony to the fact that the judgment of the non-experts is stable from one time to another and that their level of inherent knowledge is high enough to rule out a significant learning effect during the experiment. The figure shows more slightly smaller moustaches in the third column which signifies less variability between the evaluations of the different judges.

This figure illustrates the strong relationship between the accuracy criteria measured and the evaluation by groups of listeners but it does not make it possible to say what predict or explain the evaluations. In other words, what are the objective criteria for accuracy that our ears are sensitive to? “By means of the statistics, we can observe that both the experts and non-experts do not attach a lot of importance to melodic contour to judge the level of accuracy of a song”. Conversely, errors linked to the intervals were taken into account by the two panels. On the other hand, only the experts seemed to have taken account of changes in tonality during the same piece of singing. “This fact may appear to be very surprising. From another viewpoint, this notion appears later in the development, both in terms of perception and musical production. This does not however mean that the non-experts do not hear this type of error. It means that this is not a sufficiently important criterion for them and does not appear as a predictive variable in our statistical model”.

Statistics to transform the subjective into objective

The study shows that the faculty to distinguish accurate from inaccurate singing results, in part, from implicit musical learning and this prior learning applies to a majority of the population. The number of years of music practice results in competence but does not specifically make for better judges. More broadly, what interests Pauline Larrouy-Maestri, is to do justice laypeople and go beyond popular belief in order to understand where differences come from. “In many studies, the differences between the two groups are overestimated simply because one of the two groups does not understand what is expected of it. In the present case, I ask a simple question. Is a particular song in tune or not? If I had asked for a list of inaccuracies related to intervals or to judge whether the piece was sung in the correct tonality, I would have had much better results among the experts than among the non-experts. These results would probably have been linked to the fact that the non-experts had not understood the question. SingersThey would have demonstrated nothing in terms of their real perceptual abilities. The method developed here allows us to ask a simple question, accessible to all, and then, by means of the computer programs and statistical tools, to examine the objective reasons that can influence their answers. We then outline the relationship between clear criteria we have identified and the evaluation by the judges. We therefore seek to understand the reasons for their judgements. This type of method seems to me to be particularly interesting and promising for the study of the sensitivities or judgements which initially seem very subjective, whether in the area of music or in other fields of research”.


© Universit� de Li�ge - https://www.reflexions.uliege.be/cms/c_404627/en/the-layman-a-competent-judge-of-singing-voice?printView=true - April 19, 2024