Conference program
Detailed
program
Keynote speakers
Accepted
papers Hosted by Sponsors and partners |
ISMIR
2002
|
The
ISMIR 2002 Web pages will be regularly updated
|
[Abstract1]In most
popular music, the vocals sung of the lead singer are the focal point of the
song. The unique qualities of a singer’s voice make it relatively easy for us
to identify a song as belonging to that particular artist. With little
training, if one is familiar with a particular singer’s voice one can usually
recognize that voice in other pieces, even when hearing a song for the first
time. The research presented in this paper attempts to automatically establish
the identity of a singer using acoustic features extracted from songs in a
database of popular music. As a first step, an untrained algorithm for
automatically extracting vocal segments from within songs is presented. Once
these vocal segments are identified, they are presented to a singer
identification system that has been trained on data taken from other songs by
the same artists in the database.
[Abstract2]It would be
interesting and valuable to devise an automatic measure of the similarity between two musicians
based only on an analysis of their recordings. To develop such a measure,
however, presupposes some ‘ground truth’ training data describing the actual
similarity between certain pairs of artists that constitute the desired output
of the measure. Since artist similarity is wholly subjective, such data is not
easily obtained. In this paper, we describe several attempts to construct a
full matrix of similarity measures between a set of some 400 popular artists by
regularizing limited subjective judgment data. We also detail our attempts to
evaluate these measures by comparison with some direct subjective similarity
judgments collected via a web-based survey in April 2002. Overall, we find that
subjective artist similarities are not consistent between users, undermining
the concept of a single ‘ground truth’, but we offer our best
common-denominator measures anyway.
[Abstract3]A system is
described which measures the similarity of two arbitrary rhythmic patterns. The
patterns are represented as acoustic signals, and are not assumed to have been
performed with similar sound sets. Two novel methods are presented that
constitute the algorithmic core of the system. First, a probabilistic musical
meter estimation process is described, which segments a continuous musical
signal into patterns. As a side-product, the method outputs tatum, tactus
(beat), and measure lengths. A subsequent process performs the actual
similarity measurements. Acoustic features are extracted which model the
.uctuation of loudness and brightness withing the pattern, and dynamic time
warping is then applied to align the patterns to be compared. In simulations,
the system behaved consistently by assigning high similarity measures to
similar musical rhythms, even when performed using different sound sets.
[Abstract4]Electronic Music
Distribution (EMD) is
in demand of
robust, automatically extracted music descriptors. We introduce a timbral similarity measures
for comparing music
titles. This measure
is based on a Gaussian model of cepstrum coefficients. We describe
the timbre extractor and the
corresponding timbral similarity relation. We describe
experiments in assessing
the quality of
the similarity relation, and
show that the
measure is able
to yield interesting similarity
relations, in particular
when used in conjunction with other similarity
relations. We illustrate the use of the
descriptor in several
EMD applications developed
in the context of the Cuidado European project.
[Abstract5]This paper
presents a music summarization system called “Papipuun” that we are developing.
Papipuun performs quick listening in a manner similar to a stylus skipping on a
scratched record, but the skipping occurs correctly at punctuations of musical
phrases, not arbitrarily. First, we developed a method for representing
polyphony based on time-span reduction in the generative theory of tonal music
(GTTM) and the deductive object-oriented database (DOOD). The operation, least
upper bound, plays an important role in similarity checking of polyphonies
represented in our method. Next, in a preprocessing phase, a user analyzes a
set piece by the time-span reduction, using a dedicated tool, called TS-Editor.
For a real time phase, the user interacts with the main system, Summarizer, to
perform music summarization. Summarizer discovers a piece structure by
similarity checking. When the user identifies the fragments to be skipped,
Summarizer deletes them and concatenates the rest. Papipuun can produce the
music summarization of good quality, reflecting the atmosphere of an entire
piece through interaction with the user.
[Abstract6]This paper
deals with the automatic generation of music audio summaries from signal
analysis without the use of any other information. The strategy employed here
is to consider the audio signal as a succession of “states” (at various scales)
corresponding to the structure (at various scales) of a piece of music. This
is, of course, only applicable to certain kinds of musical genres bas ed on
repetition.From the audio signal, we first derive features representing the
time evolution of the energy content in various frequency bands. These features
constitute our observations from which we derive a representation of the music
in terms of “states”. Since human segmentation and grouping performs better
upon subsequent hearings, this “natural” approach is followed here. The first
pass of the proposed algorithm uses segmentation in order to create “templates”
as “potential” states. The second pass uses these templates in order to
structure the music using unsupervised learning methods (k-means and hidden
Markov model). The audio summary is finally constructed by choosing a
representative example of each state. Further refinements of the summary audio
signal construction, uses overlap-add, and a tempo detection/ beat alignment in
order to improve the audio
quality of the created summary.
[Abstract7]We present
methods for automatically producing summary excerpts or thumbnails of music. To
find the most representative excerpt, we maximize the average segment
similarity to the entire work. After window-based audio parameterization, a
quantitative similarity measure is calculated between every pair of windows,
and the results are embedded in a 2-D similarity matrix. Summing the similarity
matrix over the support of a segment results in a measure of how similar that segment is to the
whole. This can be maximized to find the segment that best represents the
entire work. We discuss variations on the method, and present experimental
results for orchestral music, popular songs, and jazz. These results
demonstrate that the method finds significantly representative excerpts,
usingvery few assumptions about the source audio.
[Abstract8]Human
listeners are able to recognize structure in music through the perception of
repetition and other relationships within a piece of music. This work aims to
automate the task of music analysis. Music is “explained” in terms of embedded
relationships, especially repetition of segments or phrases. The steps in this
process are the transcription of audio into a representation with a similarity
or distance metric, the search for similar segments, forming clusters of
similar segments, and explaining music in terms of these clusters. Several
transcription methods are considered: monophonic pitch estimation, chroma
(spectral) representation, and polyphonic transcription followed by harmonic analysis.
Also, several algorithms that search for similar segments are described. These
techniques can be used to perform an analysis of musical structure, as
illustrated byexamples.
[Abstract9]We present an efficient and scalable
system that indexes acoustic music data for content-based music retrieval. Both
the music database and input queries are given in raw audio formats without
metadata or other symbolic information; retrieval is targeted at music pieces
which are “similar” to the query sound clip. Our framework is designed as a
series of modular pipeline stages and phases. Each music file entering the
pipeline is transformed into spectogram vectors and then into characteristic
sequences, representing small segments of audio features than can
tolerate some noise and tempo variations. These sequences are placed in a
high-dimensional indexing structure. Retrieval results from the index are
ranked based on alignment of short matching segments. Each module of the
framework can be independently changed or replaced, and their effect are
studied by experiments.
With the advent of large musical
archives the need to provide an organization of these archives becomes eminent.
While artist-based organizations or title indexes may help in locating a
specific piece of music, a more intuitive, genre-based organization is required
to allow users to browse an archive and explore its contents. Yet, currently these organizations following
musical styles have to be designed manually. In this paper we propose an
approach to automatically create a hierarchical organization of music archives
following their perceived sound similarity. More specifically, characteristics
of frequency spectra are extracted and transformed according to psycho-acoustic
models. Subsequently, the Growing Hierarchical Self-Organizing Map, a popular
unsupervised neural network, is used to reate a hierarchical organization,
o®ering both an interface for interactive exploration as well as retrieval of
music according to sound similarity.
[Abstract11]In this paper we present a musical
style identification scheme based on simultaneous classification of auditory
and textual data. Style identification is a task which often involves cultural
aspects not present or easily extracted through auditory processing. The scheme
we propose complements any audio driven genre or style detection system with a
classifier based on web-extracted data we call "community metadata."
The addition of these cultural attributes in our feature space aids in proper
classification of acoustically dissimilar music within the same style, and
similar music belonging to different styles.
[Abstract12]Much research has been published on
musical taste, however, little has been studied by the builders of music recommenders.
Implicit and explicit collaborative filtering has been used for making
recommenders, in addition to the automatic classification of music into style
categories based on extracted audio features. This paper surveys research into
musical taste, reviews music recommender research, and outlines promising
directions. In particular, we learned that demographic and personality factors
have been shown to be factors influencing music preference. For mood, the main
factors are tempo, tonality, distinctiveness of rhythm and pitch height.
[Abstract13]A means to ease selecting preferred
music referred to as Personalized Automatic Track Selection (PATS) has been
developed. PATS generates playlists that suit a particular context-of-use, that
is, the real-world environment in which the music is heard. To create
playlists, it uses a dynamic clustering method in which songs are grouped based
on their attribute similarity. The similarity measure selectively weighs
attribute-values, as not all attribute-values for a context-of-use from
preference feedback of the user. In a controlled user experiment, the quality
of PATS-compiled and randomly assembled playlists for jazz music was assessed
in two contexts-of-use. The quality of the randomly assembled playlists was
used as base-line. The two contexts-of-use were "listening to soft
music" and "listening to lively music". Playlist quality was
measured by precision (songs that suit the context-of-use), coverage (songs
that suit the context-of-use but that were not already contained in previous
playlists) and a rating score. Results showed that PATS playlists
contained increasingly more preferred music (increasingly higher precision),
covered more preferred music in the collection (higher coverage), and
were rated higher than randomly assembled playlists.
[Abstract14]Music Information Retrieval (MIR) is
an interdisciplinary research area that has grown out of need to manage
burgeoning collections of music in digital form. Its diverse disciplinary
communities have yet to articulate a common research agenda or agree on methodological principles
and metrics of success. In order for MIR to succeed, researchers need to work
with real user communities and develop research resources such as reference
music collections, so that the wide variety of techniques being developed in
MIR can be meaningfully compared with one another. Out of these efforts, a
common MIR practice can emerge.
[Abstract15]There has been substantial research
on technical aspects of musical digital libraries, but comparatively little on
usability aspects. We have evaluated four web-accessible music libraries,
focusing particularly on features that are particular to music libraries, such
as music retrieval mechanisms. Although the original focus of the work was on
how modalities are combined within the interactions with such libraries, that
was not where the main difficulties were found. Libraries were generally well
designed for use of different modalities. The main challenges identified relate
to the details of melody matching and to simplifying the choices of file
format. These issues are discussed in detail.
[Abstract16]Previous research has demonstrated
that people listen to music for various reasons. The purpose of this study was
to investigate people’s perception of music, and thus their music information
needs. These ideas were examined by presenting 22 participants with 7 classical
musical pieces, asking one-half of them to write words descriptive of each
piece, and the other half words they would use if searching for each piece. All
the words used by all subjects in both tasks were classified into 7 categories.
The two most frequently appearing categories were emotions and occasions
or filmed events regardless of the task type. These subjects, none of whom
had formal training in music, almost never used words related to formal
features of music, rather almost always using words indicating other features,
most of which have not been considered in existing or proposed music IR
systems. These results suggest that music IR research should be extended to
consider needs other than finding known items, or items identified by formal
characteristics, and that understanding music information needs of users should
be prioritized to design more sophisticated music IR systems.
[Abstract17]Imagine the
following situation. You’re in your car, listening to the radio and suddenly
you hear a song that catches your attention. It’s the best new song you have
heard for a long time, but you missed the announcement and don’t recognize the
artist. Still, you would like to know more about this music. What should you
do? You could call the radio station, but that’s too cumbersome. Wouldn’t it be
nice if you could push a few buttons on your mobile phone and a few seconds
later the phone would respond with the name of the artist and the title of the
music you’re listening to? Perhaps even sending an email to your default email
address with some supplemental information. In this paper we present an audio
fingerprinting system, which makes the above scenario possible. By using the
fingerprint of an unknown audio clip as a query on a fingerprint database,
which contains the fingerprints of a large library of songs, the audio clip can
be identified. At the core of the presented system are a highly robust
fingerprint extraction method and a very efficient fingerprint search strategy,
which enables searching a large fingerprint database with only limited
computing resources.
This paper extends the familiar
"query by humming" music retrieval framework into the polyphonic
realm. As humming in multiple voices is quite difficult, the task is more
accurately described as "query by audio example", onto a collection
of scores. To our knowledge, we are the first to use polyphonic symbolic
collections. Furthermore, as our results will show, we will not only use an
audio query to retrieve a known-item symbolic piece, but we will use it to
retrieve an entire set of real-world composed variations on that piece, also in
the symbolic format. The harmonic modeling approach which forms the basis of
this work is a new and valuable technique which as both wide-applicability and
long-range future potential. [Abstract18]
[Abstract19]Recently a great attention is paid
to content-based multimedia retrieval that enables users to find and locate
audio-visual materials according to the intrinsic characteristics of the
target. Query by humming (QBH) is also an application that makes retrieval
according to major characteristics of music, that is, "melody". There
are couples of researches on QBH system, but their major concern is the system
that retrieves symbolic music data by humming query. But when the usability of
technology is taken into consideration, retrieval of music in the form of
polyphonic audio would be more useful and needed in the application such as
internet music search or music juke box, where the music data is stored not in
symbolic form but in raw audio signal because such music data is more natural
format for consumption. Our focus is on the realization of query-byhumming
technology to easy-to-use application, which entails full automation of all the
processes of the system, including melody information extraction from
polyphonic audio. Melody feature of music and humming is not represented by
distinct note information but the possibilities of note occurrence. Similarity is
then measured between the melody features of humming and music using DP
matching method. This paper presents developed algorithms for key steps of QBH
system including the melody feature extraction method from polyphonic audio and
humming, their representation for matching, and matching method between
represented melody information from polyphonic audio and humming.
[Abstract20]Many of the large digital music
collections available today are in polyphonic form. However, because of the
complexities of music information retrieval (Music IR), much of the research in
this area has focused on monophonic data. In this paper we investigate the
retrieval performance of monophonic queries made on a polyphonic music database
using the n-gram approach for full-music indexing. The pitch and rhythm
dimensions of music are used and the ‘musical words’ generated enable text
retrieval methods to be used with music retrieval. An experimental framework is
outlined for a comparative and fault-tolerance study of various n-gramming strategies
and encoding precision using six experimental databases. For monophonic queries
we focus in particular on query-by-humming (QBH) systems. Error models
addressed in several QBH studies are surveyed for the faulttolerance study. The
experiments show that different n-gramming strategies and encoding precision
differ widely in their effectiveness. We present the results of our comparative
and fault-tolerance study on a collection of 6365 polyphonic music pieces
encoded in the
[Abstract21]We propose a model for errors in
sung queries, a variant of the Hidden Markov Model (hmm). This is related to
the problem of identifying the degree of similarity between a query and
a potential target in a database of musical works, in the music
retrieval framework. The model comprehensively expresses the types of error or
variation between target and query: cumulative and non-cumulative local errors,
transposition, tempo and tempo changes, insertions, deletions and modulation.
Results of experiments demonstrating the robustness of such a model are
presented.
In this paper, a new system for the
automatic transcription of singing sequences into a sequence of pitch and
duration pairs is presented. Although such a system may have a wider range of
applications, it was mainly developed to become the acoustic module of a
query-by-humming (QBH) system for retrieving pieces of music from a digitized
musical library. The first part of the paper is devoted to the systematic
evaluation of a variety of state-of-the art transcription systems. The main
result of this evaluation is that there is clearly a need for more accurate
systems. Especially the segmentation was experienced as being too error prone
(≈ 20_ % segmentation errors). In the second part of the paper, a new auditory
model based transcription system is proposed and evaluated. The results of that
evaluation are very promising. Segmentation errors vary between 0 and 7 %,
depending on the amount of lyrics that is used by the singer. Anyway, an error
of less than 10 % is anticipated to be acceptable for QBH. The paper ends with
the description of an experimental study that was issued to demonstrate that
the accuracy of the newly proposed transcription system is not very sensitive to
the choice of the free parameters, at least as long as they remain in the
vicinity of the values one could forecast on the basis of their meaning.
[Abstract23]A hidden Markov model approach to
piano music transcription is presented. The main difficulty in applying
traditional HMM techniques is the large number of chord hypotheses that must be
considered. We address this problem by using a trained likelihood model to
generate reasonable hypotheses for each frame and construct the search graph
out of these hypotheses. Results are presented using a recording of a movement
from Mozart's Sonata 18, K. 570.
[Abstract24]Voice separation, along with
tempo-detection and quantization, is one of the basic problems of
computer-based transcription of music. An adequate separation of notes into
different voices is crucial for obtaining readable and usable scores from
performances of polyphonic music recorded on keyboard (or other polyphonic)
instruments; for improving quantisation results within a transcription system;
and in the context of music retrieval systems that primarily support monophonic
queries. In this paper we propose a new voice separation algorithm based on a
stochastic local search method. Different from many previous approaches, our
algorithm allows chors in the individual voices; its behaviour is controlled by
a small number of intuitive and musically motivated parameters; and it is fast
enough to allow interactive optimisation of the result by adjusting the
parameters in real-time. We demonstrate that compared to existing approaches,
our new algorithm generates better solutions for a number of typical voice
separation problems. We also show how by changing its parameters it is possible
to create score output for different needs (i.e. piano-style or orchestral
scores).
[Abstract25]The Music Information Retrieval
methods can be classified into online and offline methods. The main drawback in
most of the offline algorithms is the space the indexing structure requires.
The amount of the data stored into the structure can however be reduced by
storing only the suitable index terms or phrases instead of the whole contents
of the database. Repetition is agreed to be one of the most important factors
of musical meaningfulness. Therefore repetitive phrases are suitable for
indexing purposes. The extraction of such phrases can be done by applying and
existing text mining method to musical data. Because of the differences between
text and musical data the application requires some technical modification of
the method. This paper introduces a text mining-based music database indexing
method that extracts maximal frequent phrases from musical data and sorts them
by their length, frequency and personality. The implementation of the method
found three different types of phrases from the test corpus consisting of Irish
folk music tunes. The suitable two types of phrases out of three are easily
recognized and separated from the set of all phrases to form an index data for
the database.
[Abstract26]In order to represent musical
content, pitch and timing information is utilized in the majority of existing
work in Symbolic Music Information Retrieval (MIR). Symbolic representations
such as
[Abstract27]The main contribution of this paper
is an invistigation on the effects of exploiting melodic features for automatic
melody segmentation aimed at content-based usicd retrieval. We argue that
segmentation based on melodic features is more effective than random or
n-grams-based segmentation, which ignore any context. We have carried out an
experiment employing experienced subjects. The manual segmentation result has
been processed to detect the most probably boundaries in the melodic surface,
using a probabilistic decision function. The detected boundaries have then been
compared with the boundaries detected by an automatic precedure implementing an
algorithm for melody segmentation, as well as by a random segmenter and by a
n-gram-based segmenter. Results showed that automatic segmentation based on
melodic features is closer to manual segmentation that algorithms that do not
use such information.
[Abstract28]Hidden Markov Models (HMMs) have
been suggested as an effective technique to represent music. Given a collection
of musical pieces, each represented by its HMM, and a query , the retrieval
task reduces to finding HMM most likely to have generated the query. The
musical piece represented by this HMM is frequently the one rendered by the
user, possibly imperfectly. This method might be inefficient if there is a very
large music database, since each HMM to be tested requires the evaluation of a
dynamic-programming algorithm. In this paper, we propose an indexing mechanism
that can aggressively prune the set of condidiate HMMs to be evaluated in response
to a query. Our experiments on a music database showed an anverage of a
seven-fold spped up with no false dismissals.
[Abstract29]The CUIDADO Project (Content-based
Unified Interfaces and Descriptors for Audio/music Databases available Online)
aims at developing a new chain of applications through the use of audio/music
content descriptors, in the spirit of the MPEG-7 standard. The project includes
the design of appropriate description structures, the development of extractors
for deriving high-level information from audio signals, and the design and
implementation of two applications: the Sound Palette and the Music Browser.
These applications include new features which systematically exploit high-level
descriptors and provide users with content-based access to large catalogues of
audio/music material. The Sound Palette is focused on audio samples and targets
professional users, whereas the Music Browser addresses a broader user target
through the management of music titles. After a presentation of the project
objectives and methodology, we describe the original features of the two
applications made possible by the use of descriptors and the technical
architecture framework on which they rely.
[Abstract30]"Query by humming" is an
interaction concept in which the identity of a song has to be revealed fast and
orderly from a given sung input using a large database of known melodies. In
short, it tries to detect the pitches in a sung melody and compares these
pitches with symbolic representations of the known melodies. Melodies that are
similar to the sung pitches are retreved. Approximate pattern matching in the
melody comparison process compensates for the errors in the sung melody by
using classical dynamic programming. A filtering method is use to save
computation in the dynamic programming framework. This paper presents the
algorithms for pitch detection, note onset detection, quantization, melody
encoding and approximate pattern matching as they have been implemented in the
Cubyllum software system. Since human reproduction of melodies is imperfect,
findings from an experimental singing study were a crucial input to the
development of the algorithms. Future research should pay special attention to
the reliable detection of note onsets in any preferred singing style. in addition,
research on index methods and fast bit-parallelism algorithms for approximate
pattern matching needs to be further pursued to decrease computational
requirements when dealing with large melody databases.
[Abstract31]In this paper, we propose four
peer-to-peer models for content-based music information retrieval (CBMIR) and
carefully evaluate them on load, time, refreshment and robustness qualitatively
and quantitatively. And we bring forward an algorithm to accelerate the
retrieval speed of CBP2PMIR and a simple but effective method to filter the
replica in the final results. And we present the architecture of content-based
peer-to-peer music information retrieval system QUIND which can implement
CBMIR. QUIND combines content-based music information retrieval technologies
and peer-to-peer environment, and has good robustness and expansibility. Music
stored and shared on each PC makes up of the whole available music resource.
When user puts forward a music query, e.g. a song or a melody, QUIND can
retrieve a lot of similar music quickly and accurately according to the content
of query music. After user selects his favorite ones, he can download and enjoy
them.