Programme
en détail
Invités d'honneur
Articles
acceptés
Panneaux/articles courts acceptés Accueil Partenariats
et subventions
|
ISMIR 2002
|
The
ISMIR 2002 Web pages will be regularly updated
|
[Abstract1]This paper describes the design policy and
specifications of the RWC Music Database, a music database that gives researchers
freedom of common use and research use. Various commonly available databases
have been built in other research fields and have made a significant
contribution to the research in those fields. The field of musical information
processing, however, has lacked a commonly available music database. We
therefore built the RWC Music Database containing four original databases:
Popular Music Database (100 pieces), Royalty-Free Music Database (15 pieces),
Classical Music Database (50 pieces), and Jazz Music Database (50 pieces).
These databases enable researchers to compare and evaluate various methods by
using them as a common benchmark. We also expect that they will accelerate the
progress of various forms of research that use statistical methods. In addition,
researchers can use the databases for research publication and presentation
without copyright restrictions. The music compact discs of these databases are
now available in Japan at a cost equal to only duplication, ship- ping, and
handling charges (virtually for free), and we plan to make them available
outside Japan in 2003. We hope that our database will encourage further
advances in musical information processing research.
[Abstract2]The success of information retrieval depends heavily
on the quality of data input into them. Musical scores, as a complex visual
format with small details, are particularly difficult to digitally capture and
deliver well. Virtually all capture decisions should be made with a clear idea
of the purpose of the resulting digital images. Master images must be flexible
enough to fulfill unanticipated future uses as well. In order to provide a
framework for decision-making in musical score capture projects, best practices
for detail and color capture are presented for creating an archival image
containing all relevant data from the print source, based on commonly defined
purposes of digital capture. Options and recommendations for file formats for
archival storage, web delivery and printing of musical materials are presented.
[Abstract3]Opuscope is an initiative targeted at sharing
musical corpora and their analyses between researchers. The Opuscope repository
will contain musical corpora of high quality which can be annotated with
hand-made or algorithmic musical analyses. So, analytical results obtained by
others can be used as a starting point for one's own investigations.
Experiments performed on Opuscope corpora can easily be compared to other
approaches, since an unequivocal mechanism for describing a certain corpus will
be provided.
[Abstract4]We study the use of
content-based techniques to form playlists from a given seed song. Our
techniques use as a basis our previously presented audio similarity measure.
This measure compares songs according to the novelty of their frequency
spectrum and has been shown to have good performance on a non-trivial database.
In this paper we investigate extensions to this basic technique. Specifically,
we study playlists formed by trajectories through the distance space and
playlists formed using automatic relevance feedback. We report results on a
database of over 8000 songs. We find that when information about the songs’
genre is added, improvements over the basic distance measure are obtained,
suggesting both approaches are suitable for incorporating user input or
labeling information if available.
[Abstract5]This paper analyzes a set of 161 music-related
information requests posted to the rec.music.country.old-time newsgroup. These
postings are categorized by the types of detail used to characterize the
poster's information need, the type of music information requested, the
intended use for the information, and additional social and contextual elements
present in the postings. The results of this analysis suggest that similar
studies of 'native' music information requests can be used to inform the design
of effective, usable music information retrieval interfaces.
[Abstract6]This article describes MIR research in Carnatic
music (from southern India), which is characterised by aural transmission and
improvisation. These features have profound implications for the relative
importance and accessibility of different forms of music information available
and for the indigenous attitude towards dissemination of Carnatic music
information. Following Smiraglia`s [2001] methodology, the author identifies
the crucial MIR problems in designing an information resource in this music for
Western users as (i) understanding the indigenous view of the music and (ii)
embedding this understanding in the organisation of the information resource.
The indigenous representation of raga is summarised and illustrated by sample
WAV files and their more detailed analysis, which are downloadable from the
author`s home Web page. The relationship of raga to compositions and
consequently the relationship of improvised performance to cultural and social
meanings is also explained. The author then details the issues arising in the
embedding of this representation in the organisation of an information
resource. Colleagues' views (e.g. on auditory quality and technical
feasibility) and participation (e.g. in tool sharing and experimental digital
audio editing) are sought.
[Abstract7]We claim that the core mechanism of a
sufficiently general MIR system should be expressed in symbolic terms. We
defend the idea that music database should be pre-analyzed before being scanned
for MIR queries. We suggest a new vision of automated pattern analysis that
generalizes the multiple viewpoint approach by adding a new paradigm based on
analogy and temporal approach of musical scores. Through a chronological
scanning of the score, analogies are inferred between local relationships --
namely, notes and intervals -- and global structures -- namely, patterns --
whose paradigms are stored inside an abstract pattern tree (APT). Basic
mechanisms for inference of new patterns are described and illustrated. The
same pattern-matching algorithm used for pattern discovery during pre-analysis
of musical works is reused during MIR applications. Such an elastic vision of
music enables a generalized understanding of its plastic expression. This
project, in an early stage, introduces a broader paradigm of automated music
analysis.
[Abstract8]We present a methods for characterizing both
the rhythm and tempo of music. We also present ways to quantitatively measure
the rhythmic similarity between two or more works of music. This allows
rhythmically similar works to be retrieved from a large collection. A related
application is to sequence music by rhythmic similarity, thus providing an
automatic "disc jockey" function for musical libraries. Besides
specific analysis and retrieval methods, we present small-scale experiments
that demonstrate ranking and retrieving musical audio by rhythmic similarity.
[Abstract9]A system is described which segments musical
signals according to the presence or absence of drum instruments. Two different
yet approximately equally accurate approaches were taken to solve the problem.
The first is based on periodicity detection in the amplitude envelopes of the
signal at subbands. The band-wise periodicity estimates are aggregated into a
summary autocorrelation function, the characteristics of which reveal the
drums. The other mechanism applies straightforward acoustic pattern recognition
approach with mel-frequency cepstrum coefficients as features and a Gaussian
mixture model classifier. The integrated system achieves 88% correct
segmentation over a database of 28 hours of music from different musical
genres. For the both methods, errors occur for borderline cases with soft
percussive-like drum accompaniment, or transient-like instrumentation without
drums.
[Abstract10]Optical Music Recognition is the
process of converting a graphical representation of music (such as sheet music)
into a symbolic format of use to music software. Music notation is rich in
structural information, and the relative positions of objects can often help to
identify them. When objects are unidentified or mis-identified, many current
systems "coerce" the set of objects into some semantic
representation, for example by modifying the detected durations. This could cause
correctly identified symbols to be modified. The knowledge that the current set
of identified symbols can not be semantically parsed could instead be used to
re-examine some of the symbols before deciding whether or not the
classification is correct. This paper describes work in progress involving the
use of feedback between the various phases of the optical music recognition
process to automatically correct mistakes, such as symbolic classification
errors or mis-detected staff systems.
[Abstract11]The M-MIMOR approach presented here
makes productive use of the multidimensionality of music retrieval. It
integrates heterogeneous poly-representation into a self adapting system. The
different perspectives of users can be expressed by relevance feedback and
serve as direction for a learning process which ultimately leads to an optimal
solution for a user within a certain context. The paper explores the diversity
within music retrieval stemming from an abundance of approaches for
representing musical objects and searching for similarity. As a result, the
system designer is usually confronted with a large number of arbitrary
decisions. These challenges are discussed within the M-MIMOR framework which
provides an appropriate solution. A fusion with linear combination guarantees
that every perspective is integrated. The weight and therefore the strength of
one perspective is reflected by the weight of the representation scheme or
matching algorithm in the fusion. These weights are adapted according to their
success in previous retrieval tasks.
[Abstract12]This paper compares the relative
ease of creating a useful quantization of time from linear and log2
representations. The quantization is created by mapping these timing representations
onto different size alphabets and studying the ability of a simple
string-matcher to differentiate between themes in a melodic corpus when
different representations are used. The results indicate that time is better
represented by a logarithmic scale than a linear one. We also compare the
merits of representing timing between events as Inter Onset Intervals (IOIs)
and that taking the ratio of adjacent IOI values, looking at the kind of
information preserved by each and the kinds of variation each minimizes.
[Abstract13]In this paper, we study
transposition-invariant content-based music retrieval (TI-CBMR) in polyphonic
music. The aim is to find transposition invariant occurrences of a given query
pattern called a template, in a database of polyphonic music called a dataset.
Between the musical events (represented by points) in the dataset that have
been found to match points in the template, there may be any finite number of
other intervening musical events. For this task, we introduce an algorithm,
called SIA(M)ESE, which is based on the SIA pattern induction algorithm. The
algorithm is first introduced in abstract mathematical form, then we show how
we have implemented it using sophisticated techniques and equipped it with appropriate
heuristics. The resulting efficient algorithm has a worst case running time of O(mn
log(mn)), where m and n are the size of the template
and the dataset, respectively. Moreover, the algorithm is generalizable to any
arbitrary, multidimensional translation invariant pattern matching problem,
where the events considered can be represented by points in a multidimensional
dataset.
[Abstract14]The audio processing and
post-processing of singing hold a fundamental role in the context of
query-by-humming applications. Through the analysis of a sung query, we should
perform some kind of meta-information extraction and this topic deserves the
interest of the present paper. Some considerations are presented aiming to give
a systematic view to a number of issues related to the transcription of singing
into music. A critical review of previous approaches and findings is followed
with novel experimental results. Starting from the similarities between speech
sounds and sung notes, the peculiar facets of singing voices are introduced and
analyzed in accordance with three different directions: extraction of a
microintonation contour (or pitch contour at frame level), note estimation and
study of singing accuracy. A segmentation algorithm has been developed
combining the Spectral Flatness Measure with pitch and envelope information. A
practical implementation for smoothing raw output from pitch tracking and a
rule-based schema for reducing the pitch contour to a sequence of note-duration
pairs are illustrated. Finally, we report an experiment on the deviations from
pure tone intonation in performances of untrained singers.
[Abstract15]The use of audio queries for
searching multimedia content has increased rapidly with the rise of music
information retrieval; there are now many Internet-accessible systems that take
audio queries as input. However, testing the robustness of such a system can be
a large part of the development process. A corpus of audio queries would aid
researchers in the development of both audio signal processing techniques and
audio query systems. Such a corpus would also be of use for making empirical
comparisons between different systems and methods. We propose the creation of a
set of audio queries taken from attendees of the ISMIR 2002 Conference that
would be made readily available to MIR researchers.
[Abstract16]One of the problems encountered in
music transcription is to produce an algorithm that detects whether a note
should be repeated, when a new onset is found during its duration, or not; with
other words whether two or more shorter notes should be produced instead of a
single longer note. The paper describes our approach to solving this problem,
implemented within our system for transcription of piano music. The approach is
based on a multilayer perceptron neural network, trained to recognize repeated
notes. We compare this method to a more naive method that tracks the amplitude
of the first partial of each note and also present performance statistics of
our system on transcriptions of several real piano recordings.
[Abstract17]Electronic music distribution, the
internet success of MP3 and the actual activities concerning the semantic web
of music require for convenient music information retrieval, resp.
question-answering systems. In this paper we will give an overview about the
concepts behind our "super-convenience" approach for MIR. By using
natural language as input for human-oriented queries to large-scale music
collections we were able to address the needs of non-musicians. The entire
system is applicable for future semantic web services, existing music web-sites
and future electronic devices such as cd-chargers for cars, or PDAs. It is a
full-fledged architecture combining state-of-the-art approaches from different
research disciplines. We customized in a cross-discipline approach techniques
from natural language understanding phonetic matching, automatic analysis of
audio for meta tag construction, content-based classification and music
ontologies as a backbone for the representation of musical knowledge. Beside
the basic framework we present a novel idea to incorporate the processing of
lyrics based on standard information retrieval methods, i.e the vector space
model. This work has been performed at the German Research Center for AI and
the authors spin-off company -- sonicson -- specialized in music web services.
[Abstract18]In this article, a heuristic version
of Multidimensional Scaling named FastMap, is used for audio retrieval and
browsing. FastMap, like MDS, maps objects into an Euclidean space, such that
similarities are preserved. In addition of being more efficient than MDS it
allows query-by-example type of query, which makes it suitable for a content-based
retrieval purposes.
[Abstract19]The increasing availability of
digital music has created a greater need for methods to organize large
collections of music. The eXtensible PlayList (XPL) representation allows users
to express playlists with varying degrees of specificity. XPL handles
references to exact files or URLs as well as rules for selecting content based
on metadata constraints. XPL also allows the transitions between tracks in a
playlist to be specified. This paper describes the features of XPL, a system
for rendering XPL specifications and use of an advanced XPL renderer in an
existing application.
[Abstract20]Current XML encoding systems for
music focus almost exclusively on western music from the 17th century onwards,
and on the western notation system. In order to ensure that music information
retrieval (MIR) systems have full theoretical generality, and wide practical
application, we have begun a project to explore the representation, in XML, of
a genre of traditional Korean music which has a distinctive notation system
(Chôngganbo). Our project takes seriously the specific notational expression of
musical intention and intends to ultimately contribute to the analysis of
theoretical issues in music representation, as well as to the improvement of
methods for representing Korean music specifically. The present paper is an
introduction to the music and its notation, and to our exploratory XML
representation system.
[Abstract21]Singing is the characteristic vocal
part in popular music, retrieval by singing with lyrics is a natural way for
popular music. Unlike some music retrieval systems, which used melody contour
to represent music and string matching to retrieval music, we match acoustic singing
input directly with vocal part of popular music, it seems very difficult to
exactly matching of them, while, they are represented by self-similarity
sequence to eliminate error propagation. Our approach deals with raw audio
music in WAV, independent Component Analysis (ICA) is employed to separate
singing from the accompaniment, we use AbstractCCs as the features to calculate
self-similarity sequence, the weights of recurrent neural network are used as
indices on music database, retrieval list is generated by correlation degree.
[Abstract22]A melody recognition system with a
voice-only user interface is presented in this paper. By integrating speech
recognition and music recognition technology we have built an end-to-end melody
recognition system that allows voice controlled melodic queries and melody
generation using a dial-in service with a mobile phone. In this paper we
present the system behind the service, report user evaluation results and
consider the strengths and weaknesses of such service.