Informations générales


Directives de soumission

Programme de la conférence

Programme en détail
Invités d'honneur

Session spéciale Métadonnées

Articles acceptés

Panneaux/articles courts acceptés


Tables rondes

Visite professionnelle

Programme social


Site de la conférence

Comité d'organisation

Liste de diffusion



Partenariats et subventions

Mairie de Paris

IU seal, red on white, small


ISMIR 2002
3rd International Conference on
Music Information Retrieval

IRCAM – Centre Pompidou
Paris, France
13-17 octobre 2002


Les contributions dans cette catégorie seront présentées par leurs auteurs dans la salle de conférence plénière, lors d’interventions de 30 minutes (avec projection informatique et diffusion sonore).

Similarité, reconnaissance

1.       Yongmoo E. Kim (MIT Media Lab) and Brian Whitman:
Singer Identification in Popular Music Recordings Using Voice Coding Features[Abstract1] 

2.       Daniel P.W. Ellis (Columbia University), Brian Whitman (MIT Media Lab), Adam Berenzweig (Columbia University) and Steve Lawrence (NEC Research Institute):
The Quest for Ground Truth in Musical Artist Similarity[Abstract2] 

3.       Jouni Paulus and Anssi Klapuri (Tampere University of Technology):
Measuring the Similarity of Rhythmic Patterns[Abstract3] 

4.       Jean-Julien Aucouturier and François Pachet (Sony Computer Science Lab., Paris):
Music Similarity Measures : What’s the use? [Abstract4] 

Création de résumé

5.       Keiji Hirata (NTT Communications Science Laboratories) and Shu Matsuda (Digital Art Creation):
Interactive Music Summarization based on GTTM[Abstract5] 

6.       Geoffroy Peeters, Amaury La Burthe and Xavier Rodet (IRCAM):
Toward Automatic Music Audio Summary Generation from Signal Analysis[Abstract6] 

7.       Matthew Cooper and Jonathan Foote (FX Palo Alto Laboratory):
Automatic Music Summarization via Similarity Analysis[Abstract7] 

Indexation, classification, analyse

8.       Roger B. Dannenberg and Ning Hu (Carnegie Mellon University):
Pattern Discovery Techniques for Music Audio[Abstract8] 

9.       Cheng Yang (Stanford University):
MACSIS: A Scalable Acoustic Index for Content-Based Music Retrieval[Abstract9] 

10.    Andreas Rauber (Vienna University of Technology), Elias Pampalk (Austrian Research Institute for Artificial Intelligence) and Dieter Merkl (Vienna University of Technology):
Using Psycho-Acoustic Models and Self-Organizing Maps to Create a Hierarchical Structuring of Music by Musical Styles[Abstract10] 

11.    Brian Whitman and Paris Smaragdis (MIT Media Lab):
Combining Musical and Cultural Features for Intelligent Style Detection [Abstract11] 


12.    Alexandra Uitdenbogerd and Ron van Schyndel (RMIT University):
A Review of Factors Affecting Music Recommender Success [Abstract12] 

13.    Steffen Pauws (Philips Research Eindhoven) and Berry Eggen (Philips Research Eindhoven and Technische Universiteit Eindhoven):
PATS: Realization and user evaluation of an automatic playlist generator [Abstract13] 

14.    Joe Futrelle and J. Stephen Downie (University of Illinois):
Interdisciplinary Communities and Research Issues in Music Information Retrieval [Abstract14] 

15.    Ann Blandford and Hanna Stelmaszewska (University College London):
Usability of Musical Digital Libraries: a Multimodal Analysis [Abstract15] 

16.    Ja-Young Kim and Nicholas J. Belkin (Rutgers University):
Categories of Music Description and Search Terms and Phrases Used by Non-Music Experts [Abstract16] 

Interrogation par exemple

17.    Jaap Haitsma and Ton Kalker (Philips Research Eindhoven):
A Highly Robust Audio Fingerprinting System [Abstract17] 

18.     Jeremy Pickens (University of Massachusetts Amherst), Juan Pablo Bello (University of London), Tim Crawford (King’s College), Matthew Dovey (Oxford University), Giuliano Monti (University of London) and Mark Sandler (University of London):
Polyphonic Score Retrieval Using Polyphonic Audio Queries: A Harmonic Modeling Approach [Abstract18] 

19.     Jungmin Song, So-Young Bae and Kyoungro Yoon (LG Electronics):
Mid-Level Music Melody Representation of Polyphonic Audio for Query-by-Humming System [Abstract19] 

20.     Shyamala Doraisamy and Stefan M. Rüger (Imperial College London):
A Comparative and Fault-tolerance Study of the Use of N-grams with Polyphonic Music [Abstract20] 

21.    Colin Meek and William Birmingham (University of Michigan):
Johnny Can’t Sing: A Comprehensive Error Model for Sung Music Queries [Abstract21] 

22.    L. P. Clarisse, J. P. Martens, M. Lesaffre, B. De Baets, H. De Meyer and M. Leman (Ghent University) :
An Auditory Model Based Transcriber of Singing Sequences [Abstract22] 

Preprocessing: codage, segmentation…

23.    Christopher Raphael (University of Massachusetts Amherst):
Automatic Transcription of Piano Music [Abstract23] 

24.     Jürgen Kilian (Darmstadt University of Technology) and Holger H. Hoos (University of British Columbia):
Voice Separation – A Local Optimization Approach [Abstract24] 

25.    Anna Pienimäki (University of Helsinki):
Indexing Music Databases Using Automatic Extraction of Frequent Phrases [Abstract25] 

26.     George Tzanetakis, Andrey Ermolinskiy and Perry Cook (Princeton University):
Pitch Histograms in Audio and Symbolic Music Information Retrieval [Abstract26] 

27.    Massimo Melucci and Nicola Orio (University of Padova):
A Comparison of Manual and Automatic Melody Segmentation [Abstract27] 

28.    Hui Jin and H. V. Jagadish (University of Michigan):
Indexing Hidden Markov Models for Music Retrieval [Abstract28] 


29.    Hugues Vinet (IRCAM), Perfecto Herrera (IA-UPF) and François Pachet (Sony CSL):
The CUIDADO Project [Abstract29] 

30.    Steffen Pauws (Philips Research Eindhoven):
CubyHum: a fully operational ‚“query by humming“ system [Abstract30] 

31.    Chaokun Wang, Jianzhong Li and Shengfei Shi (Harbin Institute of Technology):
A Kind of Content-Based Music Information Retrieval Method in a Peer-to-Peer Environment [Abstract31] 


 [Abstract21]We propose a model for errors in sung queries, a variant of the Hidden Markov Model (hmm). This is related to the problem of identifying the degree of similarity between a query and a potential target in a database of musical works, in the music retrieval framework. The model comprehensively expresses the types of error or variation between target and query: cumulative and non-cumulative local errors, transposition, tempo and tempo changes, insertions, deletions and modulation. Results of experiments demonstrating the robustness of such a model are presented.

In this paper, a new system for the automatic transcription of singing sequences into a sequence of pitch and duration pairs is presented. Although such a system may have a wider range of applications, it was mainly developed to become the acoustic module of a query-by-humming (QBH) system for retrieving pieces of music from a digitized musical library. The first part of the paper is devoted to the systematic evaluation of a variety of state-of-the art transcription systems. The main result of this evaluation is that there is clearly a need for more accurate systems. Especially the segmentation was experienced as being too error prone (≈ 20_ % segmentation errors). In the second part of the paper, a new auditory model based transcription system is proposed and evaluated. The results of that evaluation are very promising. Segmentation errors vary between 0 and 7 %, depending on the amount of lyrics that is used by the singer. Anyway, an error of less than 10 % is anticipated to be acceptable for QBH. The paper ends with the description of an experimental study that was issued to demonstrate that the accuracy of the newly proposed transcription system is not very sensitive to the choice of the free parameters, at least as long as they remain in the vicinity of the values one could forecast on the basis of their meaning.    

 [Abstract23]A hidden Markov model approach to piano music transcription is presented. The main difficulty in applying traditional HMM techniques is the large number of chord hypotheses that must be considered. We address this problem by using a trained likelihood model to generate reasonable hypotheses for each frame and construct the search graph out of these hypotheses. Results are presented using a recording of a movement from Mozart's Sonata 18, K. 570.

 [Abstract24]Voice separation, along with tempo-detection and quantization, is one of the basic problems of computer-based transcription of music. An adequate separation of notes into different voices is crucial for obtaining readable and usable scores from performances of polyphonic music recorded on keyboard (or other polyphonic) instruments; for improving quantisation results within a transcription system; and in the context of music retrieval systems that primarily support monophonic queries. In this paper we propose a new voice separation algorithm based on a stochastic local search method. Different from many previous approaches, our algorithm allows chors in the individual voices; its behaviour is controlled by a small number of intuitive and musically motivated parameters; and it is fast enough to allow interactive optimisation of the result by adjusting the parameters in real-time. We demonstrate that compared to existing approaches, our new algorithm generates better solutions for a number of typical voice separation problems. We also show how by changing its parameters it is possible to create score output for different needs (i.e. piano-style or orchestral scores).

 [Abstract25]The Music Information Retrieval methods can be classified into online and offline methods. The main drawback in most of the offline algorithms is the space the indexing structure requires. The amount of the data stored into the structure can however be reduced by storing only the suitable index terms or phrases instead of the whole contents of the database. Repetition is agreed to be one of the most important factors of musical meaningfulness. Therefore repetitive phrases are suitable for indexing purposes. The extraction of such phrases can be done by applying and existing text mining method to musical data. Because of the differences between text and musical data the application requires some technical modification of the method. This paper introduces a text mining-based music database indexing method that extracts maximal frequent phrases from musical data and sorts them by their length, frequency and personality. The implementation of the method found three different types of phrases from the test corpus consisting of Irish folk music tunes. The suitable two types of phrases out of three are easily recognized and separated from the set of all phrases to form an index data for the database.

 [Abstract26]In order to represent musical content, pitch and timing information is utilized in the majority of existing work in Symbolic Music Information Retrieval (MIR). Symbolic representations such as MIDI allow the easy calculation of such information and its manipulation. In contrast, most of the existing work in Audio MIR uses timbral and beat information, which can be calculated using automatic computer audition techniques. In this paper, Pitch Histograms are defined and proposed as a way to represent the pitch content of music signals both in symbolic and audio form. This representation is evaluated in the context of automatic musical genre classification. A multiple-pitch detection algorithm for polyphonic signals is used to calculate Pitch Histograms for audio signals. In order to evaluate the extent and significance of errors resulting from the automatic multiple-pitch detection, automatic musical genre classification results from symbolic and audio data are compared. The comparison indicates that Pitch Histograms provide valuable information for musical genre classification. The results obtained for both symbolic and audio cases indicate that although pitch errors degrade classification performance for the audio case, Pitch Histograms can be effectively used for classification in both cases.

 [Abstract27]The main contribution of this paper is an invistigation on the effects of exploiting melodic features for automatic melody segmentation aimed at content-based usicd retrieval. We argue that segmentation based on melodic features is more effective than random or n-grams-based segmentation, which ignore any context. We have carried out an experiment employing experienced subjects. The manual segmentation result has been processed to detect the most probably boundaries in the melodic surface, using a probabilistic decision function. The detected boundaries have then been compared with the boundaries detected by an automatic precedure implementing an algorithm for melody segmentation, as well as by a random segmenter and by a n-gram-based segmenter. Results showed that automatic segmentation based on melodic features is closer to manual segmentation that algorithms that do not use such information.

 [Abstract28]Hidden Markov Models (HMMs) have been suggested as an effective technique to represent music. Given a collection of musical pieces, each represented by its HMM, and a query , the retrieval task reduces to finding HMM most likely to have generated the query. The musical piece represented by this HMM is frequently the one rendered by the user, possibly imperfectly. This method might be inefficient if there is a very large music database, since each HMM to be tested requires the evaluation of a dynamic-programming algorithm. In this paper, we propose an indexing mechanism that can aggressively prune the set of condidiate HMMs to be evaluated in response to a query. Our experiments on a music database showed an anverage of a seven-fold spped up with no false dismissals.

 [Abstract29]The CUIDADO Project (Content-based Unified Interfaces and Descriptors for Audio/music Databases available Online) aims at developing a new chain of applications through the use of audio/music content descriptors, in the spirit of the MPEG-7 standard. The project includes the design of appropriate description structures, the development of extractors for deriving high-level information from audio signals, and the design and implementation of two applications: the Sound Palette and the Music Browser. These applications include new features which systematically exploit high-level descriptors and provide users with content-based access to large catalogues of audio/music material. The Sound Palette is focused on audio samples and targets professional users, whereas the Music Browser addresses a broader user target through the management of music titles. After a presentation of the project objectives and methodology, we describe the original features of the two applications made possible by the use of descriptors and the technical architecture framework on which they rely.

 [Abstract30]"Query by humming" is an interaction concept in which the identity of a song has to be revealed fast and orderly from a given sung input using a large database of known melodies. In short, it tries to detect the pitches in a sung melody and compares these pitches with symbolic representations of the known melodies. Melodies that are similar to the sung pitches are retreved. Approximate pattern matching in the melody comparison process compensates for the errors in the sung melody by using classical dynamic programming. A filtering method is use to save computation in the dynamic programming framework. This paper presents the algorithms for pitch detection, note onset detection, quantization, melody encoding and approximate pattern matching as they have been implemented in the Cubyllum software system. Since human reproduction of melodies is imperfect, findings from an experimental singing study were a crucial input to the development of the algorithms. Future research should pay special attention to the reliable detection of note onsets in any preferred singing style. in addition, research on index methods and fast bit-parallelism algorithms for approximate pattern matching needs to be further pursued to decrease computational requirements when dealing with large melody databases.

 [Abstract31]In this paper, we propose four peer-to-peer models for content-based music information retrieval (CBMIR) and carefully evaluate them on load, time, refreshment and robustness qualitatively and quantitatively. And we bring forward an algorithm to accelerate the retrieval speed of CBP2PMIR and a simple but effective method to filter the replica in the final results. And we present the architecture of content-based peer-to-peer music information retrieval system QUIND which can implement CBMIR. QUIND combines content-based music information retrieval technologies and peer-to-peer environment, and has good robustness and expansibility. Music stored and shared on each PC makes up of the whole available music resource. When user puts forward a music query, e.g. a song or a melody, QUIND can retrieve a lot of similar music quickly and accurately according to the content of query music. After user selects his favorite ones, he can download and enjoy them.