Assessing Musical Similarity for Computational Musical Creativity

Computationally creative systems require semantic information when reflecting or self reasoning on their output. In this paper we outline the design of a computationally creative musical performance system aimed at producing virtuosic interpretations of musical pieces and provide an overview of its implementation. The case-based reasoning part of the system relies on a measure of musical similarity based on the FANTASTIC and SynPy toolkits that provide melodic and syncopated rhythmic features, respectively. We conducted a listening test based on pair-wise comparison to assess to what extent the machine-based similarity models match human perception. We found the machine-based models to differ significantly to human responses due to differences in participants’ responses. The best performing model relied on features from the FANTASTIC toolkit obtaining a rank match rate with human response of 63%, while features from the SynPy toolkit only obtained a ranking match rate of 46%. While more work is needed on a stronger model of similarity, we do not believe these results prevent FANTASTIC features being used as a measure of similarity within creative systems.


INTRODUCTION
Automation of musically creative tasks, as the field of Musical Metacreation (MuMe) [1] seeks to investigate, generally requires elements of semantic information related to the specific task being automated.Such information is rational, meaningful information related to both the task and its context.The work presented here is specific to the creative musical task of musical performance by computer systems and the creativity and creative behaviors that these systems may display.
From a Computational Creativity perspective, a system displaying creative behavior must be capable of reflection.This is the ability for an agent (or in the context of this paper, a computational system) to evaluate or reason about its creative output and in light of this evaluation adapt or alter its behavior.This capability to reflect is crucial to creative systems [2,3] and semantic information can be used to aid in the evaluation and reasoning that guides the system's reflection process.However, because it is more common for previous work developing musical performance systems, such as Computer Systems for Expressive Musical Performance (CSEMP) to not employ a full reflection loop or self-reasoning, we proposed in [4] to use the Creative Systems Framework (CSF) by Wiggins [5,6] as a design tool to frame and describe both new or even existing CSEMPs as creative systems (should their authors wish to turn them into creative systems).
We relied on the CSF to design a new computationally creative music performance system that uses case-based reasoning to produce virtuosic musical performances with a physical model of a bass guitar (selected due to author expertise and experience with the instrument, and to allow for nuanced control of the performance rendering) [4].Here, we refine the system implementation and focus on the measure of musical similarity used within our casebased reasoning system.We can see potential interest in this measure of musical similarity within semantic audio applications, particularly in the areas of online music education.For example, to train recommendation systems that can suggest new pieces of music for someone to learn (see, e.g., [7]) that are musically similar to what they already know but that might require more advanced playing ability, or vice versa, such as pieces that are musically dissimilar but require the same or similar playing ability as those that are currently playable by an individual.The musical similarity information could also be used within the vast on-line transcription resources that are available to both guitar and bass guitar players to aid navigation and again recommendation.

Case-Based Reasoning
All performing musicians use their own previous experiences, knowledge, and ability to develop a musical  performance.This is analogous to case-based reasoning (CBR), which is an approach that uses solutions to previous problems to solve new ones.These solutions (which initially need to be collected or created) are made available to the CBR system by being stored in a case database.
There are four steps in case-based reasoning-retrieval, reuse, revise, and retain [8].Starting with a new problem, first a solution to a similar problem is retrieved.This solution is then reused as a solution to the new problem.Because the two problems may not be an exact match, the solution needs to be checked, which is done in the revise step.If the solution is not found to solve the problem in a satisfactory way it is then modified so that it does.Once the solution to the new problem is finalized it is retained as a new case within the case database so that it may be used to solve future problems.The basic outline of a CBR system is shown in Fig. 1.
If the problems that the system solves have many different solutions, reflection can be added to the revise step of a CBR system.For this an additional method of evaluating the derived solution, beyond checking the problem was solved, is needed.This will include additional heuristic information related to the problem.There is also the additional requirement for mechanisms to be in place that can further modify or change the derived solution in ways that will improve its evaluation.

Case-Based Reasoning in Musical Performance Systems
As defined in [9], musical interpretation in music is "the act of performance with the implication that in this act the performer's judgment and personality necessarily have their share."When applying case-based reasoning (CBR) as an approach to generating musical performances, a performance for a (normally) new, previously unseen musical piece is produced by considering the previous performances of similar musical pieces.CBR has been used to great effect in CSEMPs, such as SaxEx [10,11] and DISTALL [12][13][14].
SaxEx produces expressive saxophone performances of jazz standards.Performances are produced using Spectral Modeling Synthesis to manipulate un-expressive saxophone recordings into expressive ones.The similarity measure used is based upon Narmour's implication realization model and Lerdahl and Jackendoff's generative theory of tonal music (GTTM) [10,11].DISTALL learns and applies expressive rules for piano performance using first or-der logic and linked clauses to represent its musical piece [12][13][14].Similarity is measured by the distances between the solutions to the maximal flow minimal weight problem [12][13][14].Tempo and dynamic curves are transfered from similar musical pieces to produce expressive piano performances.CBR has also been used in MuzaCazUza [15] to generate melodies by matching suitable melodic excerpts determined by a formula based upon Schöenberg's chart of regions.

VIRTUOSITY AND MUSICAL PERFORMANCE
The related work highlighted in Sec.2.2 has a focus on expression.Our interest moves beyond expressive musical performances to that of performances that display virtuosity.

Defining Virtuosity in Musical Performance
Virtuosity is an ill defined term that relates to a musical performance and has both positive and negative connotations.Virtuosity demands both the highest levels of musicianship and the highest levels of technical proficiency.Performances recognized as demonstrating virtuosity exceed the normal expectations and standards for the performance.For virtuosity to be considered, the performance requires a performer who is capable of reflection and self-reasoning, the conveying of meaningful expression or symbolic significance, and an appreciative audience [16].It is the audience members and listeners themselves who ultimately decide if a performance is virtuosic or not, and they make this judgment based upon their understanding of the domain the music and performance is in, the performer, as well as their own individual expertise, knowledge, and sensibilities.This judgment is summarized by Howard [16, p. 47] as "a judgment of merit over results achieved by the combination of exceptional musicianship and technical proficiency" and act like a seal of approval given by the critical community.Judgments can, and likely do, differ between receivers and anyone, including naive audience members, can potentially make a judgment on the performance.However experts, academics, and music critics are likely to possess a better understanding of each of the factors and thus, their judgments can be given more significance.

Musical Performance
We formalize the production of a musical performance as the result of a process in which a set of musical instrument techniques, {t 1 , t 2 , . ...}, are applied to a sequence of musical notes, n 1 , n 2 , . . ., by the player of the instrument.We represent the actions of the Player of the musical instrument as a function that applies a set of techniques to a sequence of notes.This is summarized by Eq. (1).
The selection of techniques and how they are applied to each note is to be left implicit within the Player function as it can be achieved through many different methods.For example using rules such as the KTH rules [17] or more A language, which contains an alphabet that is used to express concepts (c x ), and the rule sets: R, T and E.Where R ∈ L, T ∈ L, E ∈ L. L is required to be sufficiently expressive to allow for metalevel modification of R, T and E.

[[.]]
A function generator that maps a subset of L to a function that associates concepts in U with real numbers [0,1].., ., .A further function generator that maps three subsets of L to a function that generates a new sequence of concepts, from an existing one.
Please refer to Goddard et al. [4] for further explanation of the CSF symbols and the components they refer to.
sophisticated machine learning methods, e.g., [18], as well as using case-based reasoning as outlined by the work in Sec.2.2.

THE CREATIVE SYSTEMS FRAMEWORK
We take Computational Creativity to be: "The philosophy, science and engineering of computational systems which, by taking on particular responsibilities, exhibit behaviours that unbiased observers would deem to be creative" [19, p. 21].
Creative behavior can be differentiated into exploratory creativity and transformational creativity.The Creative Systems Framework (CSF) [5,6] is a formalized abstract representation of exploratory creative systems, based on the philosophy of Boden [20].Its purpose is to help to analyze, describe, and design creative systems.
The CSF is based upon the following septuple: whose symbols are as shown in Table 1.
Eq. ( 3) is a formalization of Boden's notions of conceptual space by Wiggins and Forth [21] that allows for a conceptual space for a given R to be created.This will be a conceptual space containing all artifacts that are defined by the ruleset R.
Concepts are deemed part of a conceptual space if the results of applying functions generated by the function generator [[.]] to R, when compared to U, is greater than a real valued comparator.In Eq. ( 3) this is a value of 0.5.
Exploratory creativity is achieved through the exploration of a conceptual space.Within the CSF, the ruleset T determines how the space is traversed.Traversal of the conceptual space is summarized by Eq. ( 4) where T is interpreted by the function ., ., .that acts upon a sequence of known concepts/artifacts, c in , to produce a sequence of new concepts, c out .R and E are included in the interpretation function to allow for reasoning over the type and value of artifacts that are being traversed by T. However, they are not a requirement of ., ., . .By removing R from the interpretation it is possible to generate artifacts not bound by the rules of R, and removing E allows for the generation of artifacts that is not guided by any evaluation.
The value of an artifact/concept is defined by ruleset E and determined from a set of functions generated by interpreting E with [[]].We take value to be "a relation between an artifact, its creator and its observers and the context in which creation and observation takes place" [22, p.2].For further explanation of the CSF and its components, please see [4].

THE SYSTEM
Previously in [4] we produced a formalized system design in terms of the CSF for a performance system that uses case-based reasoning to produce virtuosic interpretations of a piece of music.We also briefly discussed one potential implementation.We provide a summary of the formal system design here, before expanding upon a refined system implementation that we are currently building.

Interpreting a Musical Piece
Referring back to Eq. ( 1), where we formalized a performance to be the application of a set of performance techniques to a sequence of notes (the musical piece), and in line with the definition in [9], we consider the interpretation of a piece as being the result of a decision making process related to what instrument or musical performance techniques should be applied to each note in a musical piece.We consider the application of expressive, contextual, situational, and historic intentions and conventions to fall under the category performance techniques.
We call the process of assigning a performance technique to a musical note, adorning the note.Notes may be adorned with multiple different techniques, assuming they do not pose a contradiction in the way they should be performed.All adornments (performance techniques) have an explicit, singular fixed interpretation.We draw a distinction between the interpretation of a sequence of musical notes, and the interpretation of the adornments, with the latter not being considered within our system design.Instead the design decision was taken to have all adornments used within our system be capable of describing expressive performance intentions and have precise definitions of how any expressive intention is to be performed.This is so that our system design does not exclude any expressive or other such interpretations that are traditionally made through a realization of a performer's intention and personal interpretations of how adornments are performed.
This design choice allows for a decoupling between the interpretation of a musical piece and its technical execution.Having this distinction allows for any rendering or synthesis method to be used to produce a performance of the interpretation, assuming a suitable mapping or encoding is available from adornments to rendering/synthesis parameters.

Formal Description of the System
We produced a formal definition of our system using the CSF framework, outlined in Backus-Naur Form [4].The specification of the language L, the functions that are generated by [[.]] and ., ., ., as well as an outline of the syntax for R, T and E are all presented in [4].
The U in which our system operates is a universe of all musical pieces.We restricted the operation of our system to a subset of U that contains musical note sequences of length greater than zero and that are not adorned with contradictory playing techniques.These restrictions are formally outlined by R, and this subset of musical pieces forms the conceptual space C that the system operates in.T formalizes the functions needed for CBR and E the evaluation rules for determining virtuosity.

Implementation
Figs. 2(a) and 2(b) show a Code Block diagram and data and process flow diagrams of the current implementation of the system, as outlined in Sec.5.2.This implementation takes Guitar Pro Files, which are a specialized digital notation program for electric guitar and bass notation (see: http://www.guitar-pro.com)as input.The musical information contained in the file is converted into concept representation.The musical notes within the concept are then adorned to produce a virtuosic interpretation before being converted back into a readable Guitar Pro File.Guitar Pro can then be used to render a performance of the interpretation.PyGuitarPro (http://pyguitarpro.readthedocs.io) is being used to read, edit, and convert Guitar Pro files into the concept representation, which is defined by R.This concept can then be processed by the CBR module, which is the implementation of T, and ., ., . .First a concept that has the most similar melodic and rhythmic features is retrieved from the case database.The adornments from this concept are then compared and applied to a sequence of notes within the input concept , forming a new interpretation of the input piece.Once the adornments have been applied, they are checked to ensure that the concept is performable according to R. The ruleset E is to be implemented within the evaluation module and used to allow for the system to reflect and reason upon PAPERS ASSESSING MUSICAL SIMILARITY FOR COMPUTATIONAL MUSICAL CREATIVITY the interpretations being produced during the revise stage of the CBR.This is to use a, as yet to be determined, playing complexity rate calculation and a perceptual model of bass guitar performance to evaluate the interpretation.At this stage, the interpretation can be converted back into a Guitar Pro file ready for performance and its concept stored in the case database for future use.
We have chosen to break down musical pieces into single bar segments and apply the CBR process to each bar individually.Once all bars of a musical piece have been processed they are then re-combined to form a complete interpretation of the musical piece.A continuity check between bars will be performed to ensure that R is conformed to, and an evaluation of the whole piece carried out, in addition to the individual evaluation of each bar contained within the the piece.

MUSICAL SIMILARITY
Much, if not all, evaluation and reasoning performed within and in relation to our system is dependent on perceptual information.Fundamental to the production of an interpretation of a musical piece is the functionality to identify similar pieces of music.We have chosen to use computable melodic, and rhythmic features as a basis for assessing musical similarity between concepts .For this we have chosen to use melodic features from FANTASTIC [23] and syncopation features from the SynPy [24] toolkits.

FANTASTIC
FANTASTIC stands for Feature ANalysis Technology Assessing STatistics (In a Corpus) [23].It is a program written in R (see http://www.r-project.org) that analyzes symbolic representations of monophonic melodies by computing features that can be used to characterize a melody or melodic phrase with a set of numerical or categorical values.These values represent different aspects of musical structure, making use of concepts from descriptive statistics, music theory, and music cognition [23].FANTASTIC also has the option to compute corpus-level features with respect to a corpus of melodies, however we currently are not using this functionality of the toolkit.
FANTASTIC provides its own similarity function, which can compute the similarity between two or more melodies, based upon one or more computed features.When using only numeric features, the Euclidean distance, shown in Eq. ( 5), where d is the distance between two multi-dimensional points, p and q, is used to compute the similarity between melodies.Euclidean distance has been proposed as suitable similarity measure by Gärdenfors [25].
FANTASTIC's similarity function also applies a socalled z -standardization [23] to the feature values before the Euclidean distance is calculated.This is the subtraction of the feature mean (μ) and division by feature variance (σ).This standardization ensures an equal weighting is applied to all features when calculating the similarity.
To be able to use Euclidean distance as a measure of similarity, only the numeric features from FANTASTIC could be used.This excluded the use of the two contour and melodic mode categorical features.In testing with our dataset there were also issues computing features dependent on polynomial contour calculations so these features were also excluded from use.The remaining features relating to absolute pitch or notes, pitch intervals, note durations, global lengths, melodic step, and interpolation contours were all selected to be used.This gave a total of 26 features being utilized from the FANTASTIC toolkit.
FANTASTIC accepts Music-CSV (MCSV) files [26] as input.These MSCV files contain the symbolic representations of monophonic melodies.We are using the MEL-CONV software by Klaus Frieler to convert monophonic MIDI files to the MCSV format.Thus the presence of the MELCONV code block within Fig. 2.

SynPy
FANTASTIC does not handle rests within music; all note duration times are represented as inter-onset intervals (IOIs) and the features computed are only related to the IOI duration values.As we are primarily working with bass lines, the melodic content can be quite static and instead rhythmic variation are required to differentiate different pieces.We have selected to use features computed from SynPy, a python toolkit for syncopation modeling [24], as a way to include additional rhythmic features within our measure of similarity.
In SynPy seven different models for syncopation are implemented.Each model computes a numerical measure that relates to the syncopated-ness of the rhythm that is being analyzed.We combine the mean syncopation measures for each model with the melodic features of FANTASTIC and use both to compute the musical similarity between concepts in our case database.
SynPy can compute syncopation measures from MIDI or from its own rhythm (.RHY) file format.In testing, the .RHY files were more stable when being analyzed and thus we have implemented a concept to .RHY file converter.

MUSICAL SIMILARITY STUDY
Musical similarity is a very subjective judgment.To ensure that musically suitable concepts are being retrieved, we conducted a study to compare the values of musical similarity computed using FANTASTIC and SynPy features, with judgments of musical similarity by musicians and bass guitar players.

Study Design
Our study design is based on the method outlined by Allan et al. [27] where participants are presented with a triadic comparison of audio tracks and asked to specify which two audio tracks are most musically similar.Allan et al. [27] advocate presenting full permutations for every triadic comparison to account for presentation order bias.They propose using a Balanced Complete Block PAPERS Partitioning (BCBP) setup to allow for full permutations to be presented, while controlling the combinatorial explosion issues that occur when testing more than a couple of audio tracks.This approach involves partitioning every possible permutation of stimuli into smaller groups and presenting each of these groups to a different subject.
We chose to follow the Balanced Complete Block Partitioning (BCBP) setup for five tracks partitioned between six participants [27].This setup splits every permutation combination into six groups (blocks) of 10.While no permutation is repeated over all these permutation blocks there is repetition of triads both between and within participants.This allows for within subject and between subject response consistency to be assessed.
We randomly selected five bars of monophonic music from an initial bass guitar transcription dataset containing 30 bars of notated music within which span pop, rock, funk, and jazz genres.These were then labeled A, B, C, D, E and separate files made for each bar's notated transcription.The tempo of the selected bars were: 125, 104, 92, 121, and 95 beats per minute for tracks A, B, C, D and E respectively.To allow a fair comparison between the computed values of similarity that only account for musical content, all playing techniques, dynamics, and expressive notations were removed from the notated transcriptions files.The audio for each of the separate bars was then rendered from their notated transcriptions files using Guitar Pro 6.This resulted in five separate audio files with durations varying between two and three seconds, with a mean of 2.2 and variance of 0.2 seconds.No other segmentation or processing preceded the final formulation of the audio content.

Demographics
There were 12 participants (11 males, 1 female).Eleven identified themselves as being musicians; five participants played bass guitar and three were music teachers.Ages ranged from 22 to 54 years old, with an average age of 34.Musical instrument playing experience, with the exception of the non-musician ranged from 10 to 50 years with an average playing experience being 24 years.

Results
To allow for 12 participants, each of the 6 permutation blocks were presented twice to different participants.An analysis of the complete set of participants' responses was conducted followed by a partitioned analysis where participants were clustered based upon the similarity of their responses.

Complete Set of Participants' Analysis
The complete set of all participants' responses were matched to three computed measures of similarity using FANTASTIC's similarity function.The first measure used an aggregated similarity measure that utilized all the FAN-TASTIC and SynPy features outlined in Sec.6 (totaling 33 features); the second measure used only the FANTASTIC features (totaling 26 features); and the third used only the SynPy features (totaling 7 features) to compute similarity.
Table 2 shows the voted similarity of tracks, for each combination along with the computed measures of similarity using features from FANTASTIC + SynPy, FANTASTIC only, and SynPy only.The number of matches between the computed similarity and reported similarity by participants is shown in Table 3 as well as the rank matches and rank match rate.The highest ranking match was 63%, achieved when only using FANTASTIC features; the lowest was 46% when using SynPy features; both combined yielded a match of 60%.
A Kruskal-Wallis rank sum test was performed between all participants' answers for every presented permutation, and each of the three computed similarity values.We wished to see if the responses from the participants and the computer similarity values could have been derived from the same populations (null hypothesis).A p-value greater than 0.05 would indicate both sets come from the same population (null hypothesis can't be rejected), whereas a p-value smaller than 0.05 would indicate the similarity judgments come from significantly different populations.The results are shown in Table 4, where all p-values are smaller than 0.05 indicating a significant difference in similarity rating between participants and all computer similarity values.
The consistency of each participant's individual rating and the consistency between each pair of participants were calculated using Fleiss' κ (Kappa).Individual consistency was calculated by treating each rating instance from a participant as a separate rater for each repeated combination of tracks.The κ values are shown in Table 5.The κ values calculated between pairs of participants presented with the same block partition are shown in Table 6.Fleiss' κ (Kappa) was calculated for all responses to each of the 10 possible combinations of tracks.The results are shown in Table 7. κ values less than 0 indicate poor agreement, values between 0.21 to 0.40 indicate fair agreement, values between 0.61 to 0.80 indicate strong agreement.p values less than 0.05 indicate that the agreement between participants' responses are not due to random chance.

Partitioned Sets of Participant Analysis
Participants were clustered based upon a dissimilarity matrix formed from the Fleiss' κ between each pair of participants' responses, not just pairs that were presented with the same block.It should be noted the p-value for every κ between participant pairs, with the exception of when there was only one response that could be compared, was less than 0.05.This indicated that the agreement in responses between each pair of participants where two or more responses could be compared is unlikely to be due to chance.However, as some pairs of participants could only be compared based upon one response, we chose to cluster based upon k-medoids, due to the method being more robust to outliers.
The R function pamk, Partitioning Around Medoids (PAM) [28] with estimation of number of clusters, was used to partition participants.The optimal number of clusters was identified by the function to be four.There were participants who identified as bass players in clusters two (2), three (2), and four (1).The one participant who identified as a non-musician was placed in cluster one.One participant in each of clusters one, two, and three identified as a music teacher.The mean musical experience of participants in each cluster was: 26.67, 21.5, 21.33, and 15 years for clusters one, two, three, and four respectively.The number of matches and rank match rates between the three computed measures of similarity were calculated for each cluster.The results are shown in Table 8.

Discussion
The significant difference in the Kruskall-Wallis rank sum test results indicate that the computed values of similarity come from a different population to that of the participants' responses.Variations can be seen between participants' responses within each combination of tracks, whereas the computed values of similarity do not have any variation.This may explain why there were not high match rates between any of the computed measures of similarity and the full set of participant responses.
Given the subjective nature of musical similarity this variation in responses could be due to differences between participants' own subjective interpretations of musical similarity.The formation of four clusters from the partitioning process indicates that different participants may share similar interpretations of musical similarity.Clusters that had a higher level of musical experience (cluster one) and a higher number of bass players (cluster two) did see improvements in match rate, while the cluster with the least musically experienced participants (cluster four) saw a worsening in match rates compared to the full set of participants' responses.This would appear to indicate that the computed measures of similarity are a closer match to more expert listeners (those with more musical experience and specialization with the instrument).However, it should be noted that the cluster sizes are small, and some κ measures between participants could only be calculated based upon one response from each; thus, there is still a possibility of random chance effecting this partitioning.A comparison of our results to a larger study with more expert listeners would be interesting and could help confirm what is being indicated here.
From all our ranking match rate analysis it appears overall SynPy features are not as effective an indicator of musical similarity as FANTASTIC.However this difference in feature performance suggests that participants were relying on different features more heavily than was accounted for in the computed measures.This follows Allan et al.'s [27] work that states that when using aggregated measures of similarity, the contributing features require weighting adjustments to be made to better match people's own understanding of musical similarity.The relative weighting of features used within our computed similarity measures could be adjusted to better align with the clustered participants' responses using Eq. ( 6).This could be done by adjusting weight factors w for the features (D) so that the error ε is reduced and the similarity calculation matches the study responses.However, such optimizations would be best done with results from a larger study.
More work is needed on a stronger similarity model, but meanwhile we do not believe these results prevent using FANTASTIC features as a measure of similarity within a creative system.There may be an effect on how the output from a creative system is valued due to the significant disparity between observers and the system's "understanding" of measures of musical similarity.When considering our own system that is to produce virtuosic performances, this mismatch might produce serendipitous performances and also alter people's expectations of how the music can be performed.However, these measures of similarity are likely a poor metric to use within a recommender system without any weighting optimization.

SUMMARY
We have described a creative system that utilizes semantic audio information to produce virtuosic interpretations of musical pieces.The tools and framework used to design such a system have been summarized and a refined implementation outlined.A method for matching musically similar bars of music has been described and a study conducted to evaluate how perceptually valid the computed similarity values are.While the study was small in scope, it highlights the subjective nature of musical similarity and the careful considerations required when using computable features as an indication of measures of similarity within creative systems.

Fig. 1 .
Fig. 1.The basic outline of a general case-based reasoning system.

Fig. 2 .
Fig. 2. (a) Code Block Diagram; Modules are indicated with dashed lines.(b) Data and Process Flow Diagram PAPERS ASSESSING MUSICAL SIMILARITY FOR COMPUTATIONAL MUSICAL CREATIVITY PAPERS ASSESSING MUSICAL SIMILARITY FOR COMPUTATIONAL MUSICAL CREATIVITY

Table 1 .
Creative System Framework Symbols

Table 2 .
Full Results
A NaN value indicated complete consistency in an individual's response.A value of -8.33e-17 indicates that one response differed from the other two.A value of 0 indicates all responses differ.

Table 6 .
Fleiss's κ rating for pairs of participants presented with the same block partition.

Table 7 .
Fleiss' κ rating for combined participant responses for each possible track combination.

Table 8 .
Percentage match rates between computed musical similarity and clustered participant responses