Our academic writing and marking services can help you
The current technology has laid down a strong foundation and made remarkable strides in the area of audio Large Language Models research, yet they still struggle considerably in the domain of abstract musical reasoning tasks. In their study, The Muse Benchmark: Probing Music Perception in AI and Auditory Relational Reasoning in Audio LLMs (arXiv preprint arXiv:2510.19055), Carone, Roman, and Ripollés (n.d.) demonstrate that even the state-of-the-art models such as Qwen and Audio Flamingo 3 are incapable of executing the tasks of pitch-invariant melody recognition, harmonic function understanding, and rhythm synchronisation. Classic prompting techniques like Chain-of-Thought (CoT) and few-shot in-context learning often produce inconsistent or even counterproductive results, thus underlining a huge gap between machine and human performance. This points toward the need for a complete overhaul in model architecture and training approaches that will allow the understanding of deep auditory reasoning rather than relying on mere audio processing at the surface level.
It’s a fact that no current system can easily bring in abstract relational reasoning, along with the use of audio LLMs for the perception of intricate music. The study that we propose will not only evaluate new model structures but also provide novel training techniques aimed at narrowing the gap between humans and machines in the auditory perception area.
Carone, B. J., Roman, I. R., & Ripollés, P. (2025). The muse benchmark: Probing music perception and auditory relational reasoning in audio llms. arXiv preprint arXiv:2510.19055.
The current audio language models for music have very serious shortcomings in acquiring invariant representations, which are the basis for profound comprehension of music. Models such as Qwen and Audio Flamingo 3 are mentioned by Carone, Roman, and Ripollés (n.d.) in their paper The Muse Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMs (arXiv preprint arXiv:2510.19055) as having abilities very close to random guessing in tasks like chord sequence matching, key modulation detection, and rhythm comparison. The prompting techniques of Chain-of-Thought and few-shot learning do, however, cut down the gap to some extent, but not to the level of human performance, thus making the perceptual defects clear. These results indicate the necessity of further work on invariant representation learning and architecture changes that would support the AI to be musically reasoning.
A universal method to integrate invariant musical representation learning into audio LLMs is still not available. The present study will investigate techniques for encoding invariances in pitch, rhythm and harmony to enhance AI’s human-like musical reasoning.
Carone, B. J., Roman, I. R., & Ripollés, P. (2025). The muse benchmark: Probing music perception and auditory relational reasoning in audio llms. arXiv preprint arXiv:2510.19055.
The Music Emotion Recognition System has a very great future ahead, for it has already taken a number of unprecedented steps in the last few years. Still, limitations caused by the use of genre-specific datasets, differences in standards for annotation and inconsistency in the representation of emotions are among the major reasons MER is still not getting well in some areas. The study by Ching and Widmer (2025) in A Study on the Data Distribution Gap in Music Emotion Recognition (arXiv) reveals that the currently available MER models do not generalise well when tested against a dataset containing different musical styles. The problem is compounded by the use of separately processed valence-arousal labels and the dominance of certain genres (Pop, Jazz, Electronic) that cause distortions in emotional patterns, making it difficult for the models to generalise. Moreover, the voice quality, i.e., timbre, is one of the factors that affect the emotions of the listener, but this is not represented symbolically, and the use of outdated feature analysis tools is one of the limitations to reproducibility. The aforementioned limitations of the MER technology, therefore, indicate a need for a genre-inclusive, unified approach to music emotion recognition that is not constrained by genres.
Currently, there is no comprehensive method that deals with the distribution divergence between different MER data sets. Moreover, no method allows generalised prediction of emotion across types. Such a model design that is free of genre bias and inconsistency in annotation is a task for a PhD student.
Ching, J., & Widmer, G. (n.d.). A study on the data distribution gap in music emotion recognition. Institute of Computational Perception, Johannes Kepler University Linz; LIT AI Lab, Linz Institute of Technology. https://arxiv.org/abs/2510.04688
The research in the Music emotion annotation framework are very different from each other across datasets and is affected by the culture, style, and perception of the annotators. Ching and Widmer (2025) explain that using inconsistent valence–arousal scales and applying different normalisation methods independently lead to emotion distributions that are incorrectly seen as similar, although they are very different in content. The predominance of certain genres and the subjectivity in annotations add to the confusion regarding the emotions present in MER datasets and, consequently, the impossibility of achieving a universal or genre-invariant emotion recognition. Models that rely on audio exacerbate this problem due to the strong emotional impact of timbre, whereas symbolic data only conveys a limited range of expressive features. The discipline is short of such annotation systems that are standard and reliable and can work across different musical traditions.
PhD-Level Verification: A systematic appraisal of the impact of behavioural and environmental interventions, evaluation of technological tools and an integrated model for mental health protection in populations exposed to noise are required for a PhD project. This requires collaboration between different disciplines such as psychology, environmental health, neurophysiology, and acoustic technology.
Ching, J., & Widmer, G. (n.d.). A study on the data distribution gap in music emotion recognition. Institute of Computational Perception, Johannes Kepler University Linz; LIT AI Lab, Linz Institute of Technology. https://arxiv.org/abs/2510.04688
AI-based music generation evaluation methods have shown considerable progress within a short time, yet the evaluation methods still differ a lot, and their reliability is questionable. Kader and Karmaker (n.d.) comment that there is no accepted universal framework that can cover the most important musical qualities like structure, creativity, emotional expression, or coherence. The use of existing objective metrics is often the case of very poor correlation with human perception, and besides that, they are subjected to Western-centric biases that make it difficult to evaluate genres from non-Western and low-resource areas. Also, symbolic music evaluation still spins in its own little world while audio-based approaches are far more advanced; it adopts very basic measures that disregard the higher-level musical aspects. Filling these gaps requires a perception-aligned, culturally inclusive evaluation framework to be developed.
The future research plans the creation and validation of a wide-ranging evaluation framework, which will incorporate computation metrics, human listener experiments and cultural standards for symbolic and audio music generation. The main emphasis in the doctoral contribution is on the issues of normalisation, perceptual legitimacy and cultural diversity.
Kader, F. B., & Karmaker, S. (n.d.). A survey on evaluation metrics for music generation. https://arxiv.org/abs/2509.00051
Need assistance finalising your dissertation topic? Selecting a strong, researchable topic can be challenging — but you don’t have to do it alone.
Our research consultants can help refine your ideas, identify literature gaps, and guide you toward a topic that aligns with current academic trends and your programme requirements.
Contact us to begin one-on-one topic development and refinement with PhdAssistance.com Research Lab.
PhDAssistance. (n.d.). Music Dissertation Topics. Retrieved December 20th, from https://www.phdassistance.com/topic/music-dissertation-topics/
Jalolova, M., and Musawwir, M. “Music Dissertation Topics
for PhD Scholars.” PhDAssistance, https://www.phdassistance.com/topic/music-dissertation-topics/ Accessed 20th December 2025.
Jalolova, M., and Musawwir, M. “Music Dissertation Topics for PhD Scholars.” PhDAssistance, PhDAssistance, Web. 20th December 2025.
Jalolova, M., and Musawwir, M., n.d. Music Dissertation Topics for PhD scholars. [online] Available at: https://www.phdassistance.com/topic/music-dissertation-topics/ [Accessed 20th December 2025].
Jalolova M., Musawwir M. Music Dissertation Topics for PhD scholars [Internet]. PhDAssistance; [cited 2025 20th December]. Available from: https://www.phdassistance.com/topic/music-dissertation-topics/
Jalolova, M., and Musawwir, M. (n.d.). Music Dissertation Topics for PhD scholars. Retrieved 20th December 2025, from https://www.phdassistance.com/topic/music-dissertation-topics/
Jalolova, M., and Musawwir, M., Music Dissertation Topics n.d.) https://www.phdassistance.com/topic/music-dissertation-topics/ accessed 20th December 2025.
Free resources to assist you with your university studies!