Feature extraction and machine learning techniques for musical genre determination

Davis, Rosalind

Masters Thesis

Feature extraction and machine learning techniques for musical genre determination

Since 2015, the music industry has experienced a resurgence driven by online music sales and streaming, which has in turn been facilitated by very large archives of musical data. These large musical archives, however, remain challenging to search and index effectively, due to the scale of the data involved and the subjective, perceptual nature of how humans relate to music. Contemporary research in music information retrieval seeks to bridge this gap by using algorithmic analysis on features extracted from the underlying audio to automatically classify and identify perceptual features in music. This project applied three machine learning techniques (support vector classification, traditional neural networks, and convolutional neural networks) to two sets of audio features (Mel-frequency cepstral coefficients and the discrete wavelet transform) for the purposes of genre classification. Because convolutional neural networks have been used on images to great effect, the discrete wavelet transform data was used to map audio into the image domain, to leverage publicly available, pre-trained weight sets for four large, sophisticated image recognition networks. For all tasks, two subsets of a large, publicly available musical dataset were used, along with multiple training and optimization techniques. While all models were able to meet or exceed some pre-existing benchmarks for the genre classification task, support vector classification was found to yield better results, with a best overall test set accuracy of 61%, than either traditional neural networks (51.4%) or convolutional neural networks (40.5%) on an eight-genre multi-class classification task. The application of the pre-trained image recognition networks to audio wavelet data decreased training time, but was not found to yield accuracies comparable to the accuracies those networks achieved on image data. The small size of the dataset relative to datasets in other domains, the reuse of data augmentation techniques intended for use on images, and sub-optimal feature extraction techniques are suggested as factors in the inability of the machine-learning models evaluated in this project to achieve the quality of results observed in the image domain. Audio-native augmentation techniques and the use of ensemble models present worthwhile avenues for future investigation.

Date

2/8/2018

Resource Type

Masters Thesis

Creator

Davis, Rosalind

Advisor

Hang, Xiyi

Committee Member

Campus

Northridge

Department

Electrical and Computer Engineering

Publisher

California State University, Northridge

Degree Level

Masters

Degree Name

M.S.

Subjects

Date Copyright

2018

Date Accessioned

2018-02-08T18:14:14Z

Handle

http://hdl.handle.net/10211.3/199917

["Made available in DSpace on 2018-02-08T18:14:14Z (GMT). No. of bitstreams: 9 Davis-Rosalind-thesis-2018.pdf: 24268644 bytes, checksum: cf099541d1ed6023bfd3b518e37f8fc3 (MD5) move_images.py: 4035 bytes, checksum: da3dd7cc18dca83c686992e32dd3ef50 (MD5) generate_wavelets.py: 21017 bytes, checksum: 95aa2fee917898c989975b5316963a49 (MD5) NeuralNetworkModels.ipynb: 650285 bytes, checksum: d7448e8a52facf57c298fe2879ebada2 (MD5) utilities.py: 2297 bytes, checksum: 6e6af73c76b29d4d32afb4be4335f4d6 (MD5) SVCModels.ipynb: 23777 bytes, checksum: fd6b03d2766bedacff8b09f9fd7c6cf5 (MD5) requirements.txt: 3123 bytes, checksum: 3fc0702f0cede2cba5fad1078767a381 (MD5) custom_keras_utils.py: 26502 bytes, checksum: e3bdef38e2a45240b955c4f736049b2d (MD5) code_timing.py: 1880 bytes, checksum: cefc5c974bb55882b9a3b210a7e81f11 (MD5) Previous issue date: 2018-02-08", "Submitted by Graduate Studies (gradstudies@csun.edu) on 2018-02-08T18:14:14Z No. of bitstreams: 9 Davis-Rosalind-thesis-2018.pdf: 24268644 bytes, checksum: cf099541d1ed6023bfd3b518e37f8fc3 (MD5) move_images.py: 4035 bytes, checksum: da3dd7cc18dca83c686992e32dd3ef50 (MD5) generate_wavelets.py: 21017 bytes, checksum: 95aa2fee917898c989975b5316963a49 (MD5) NeuralNetworkModels.ipynb: 650285 bytes, checksum: d7448e8a52facf57c298fe2879ebada2 (MD5) utilities.py: 2297 bytes, checksum: 6e6af73c76b29d4d32afb4be4335f4d6 (MD5) SVCModels.ipynb: 23777 bytes, checksum: fd6b03d2766bedacff8b09f9fd7c6cf5 (MD5) requirements.txt: 3123 bytes, checksum: 3fc0702f0cede2cba5fad1078767a381 (MD5) custom_keras_utils.py: 26502 bytes, checksum: e3bdef38e2a45240b955c4f736049b2d (MD5) code_timing.py: 1880 bytes, checksum: cefc5c974bb55882b9a3b210a7e81f11 (MD5)"]

Language

English

Statement of Responsibility

by Rosalind M. Davis

Notes

California State University, Northridge. Department of Electrical and Computer Engineering.
Includes bibliographical references (pages 82-88)

Title	Date Uploaded	Visibility	Actions
Davis-Rosalind-thesis-2018.pdf	2020-10-18	Public	Download
move_images.py	2020-10-18	Public	Download
generate_wavelets.py	2020-10-18	Public	Download
NeuralNetworkModels.ipynb	2020-10-18	Public	Download
utilities.py	2020-10-18	Public	Download
SVCModels.ipynb	2020-10-18	Public	Download
requirements.txt	2020-10-18	Public	Download
custom_keras_utils.py	2020-10-18	Public	Download
code_timing.py	2020-10-18	Public	Download

Downloadable Content

Feature extraction and machine learning techniques for musical genre determination