{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "IPython.OutputArea.prototype._should_scroll = function(lines) {\n", " return false;\n", "}" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%javascript\n", "IPython.OutputArea.prototype._should_scroll = function(lines) {\n", " return false;\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature Extraction and Machine Learning Techniques for Musical Genre Determination
by [Rosy Davis](mailto:rosydavis@ieee.org), CSUN MSEE 2017\n", "\n", "## Introduction\n", "\n", "This notebook runs the neural network models for my masters project, \"Feature Extraction\n", "and Machine Learning Techniques for Musical Genre Determination,\" for which I will be\n", "receiving a Masters of Science in \n", "[Electrical Engineering](https://www.csun.edu/engineering-computer-science/electrical-computer-engineering/) \n", "from [California State University, Northridge](https://www.csun.edu/) in December 2017. \n", "My advisor at CSUN is \n", "[Dr. Xiyi Hang](http://www.csun.edu/faculty/profiles/xiyi.hang.14). The accompanying paper is not currently public, but when it is released, a link will be added to this page. Only a partial list of most-relevant references appears in this notebook; the full list appears in the accompanying paper.\n", "\n", "In this project, two approaches to musical genre classification were investigated: the use of support vector classification on Mel-frequency cepstral coefficient (MFCC) features (Experiment 1, this notebook), and the use of neural networks on image data generated via the discrete wavelet transform (DWT) (Experiments 2-5, the \"[NeuralNetworkModels.ipynb](NeuralNetworkModels.ipynb)\" notebook).\n", "\n", "### Contents\n", "\n", "* [Setup](#Setup)\n", "* [Experiment 1: MFCC Benchmarking](#Experiment-1:-MFCC-Benchmarking)\n", " * [Small Dataset](#Small-Dataset)\n", " * [Extended Dataset](#Extended-Dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Seed the random number generator\n", "seed = 42 # set to None to auto-generate\n", "\n", "# For nice tables:\n", "from IPython.display import display\n", "\n", "# This block is adapted from FMA: A Dataset For Music Analysis\n", "# Michaƫl Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2\n", "# from their provided usage code\n", "import os\n", "\n", "import numpy as np # For math and analysis\n", "import pandas as pd # For data structures\n", "import sklearn as skl # (scikit-learn) for various machine learning tasks\n", "import sklearn.utils, sklearn.preprocessing, sklearn.decomposition, sklearn.svm\n", "import fma.utils as fma_utils # Utilities provided for loading and manipulating the\n", " # Free Music Archive dataset.\n", "\n", "# My code for utilities and long-running timers:\n", "import code_timing as timer \n", "import utilities as ut " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Adapted from fma usage code:\n", "AUDIO_DIR = os.path.join(os.getcwd(), \"data/fma_small/\")\n", "\n", "# Load the metadata files\n", "tracks = fma_utils.load(AUDIO_DIR + 'tracks.csv')\n", "features = fma_utils.load(AUDIO_DIR + 'features.csv')\n", "\n", "# Make sure everything in features is in tracks and vice versa\n", "np.testing.assert_array_equal(features.index, tracks.index)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def fit_svm(kernel, idx_tuple):\n", " # Tuple is (size_name, kernel_name)\n", " (size_name, kernel_name) = idx_tuple\n", " print(\"{} SVM training SVM begun at {}.\".format(kernel_name, timer.datetimestamp()))\n", " clf = skl.svm.SVC(kernel=kernel)\n", " clf.fit(X_train_mfcc, y_train)\n", " print(\"\\tTraining {} SVM finished at {}. Calculating\\n\\t\\tscores...\".format(\n", " kernel_name,\n", " timer.datetimestamp()))\n", " mfcc_benchmarks.loc[idx_tuple][\"Training Accuracy\"] = clf.score(X_train_mfcc, y_train)\n", " mfcc_benchmarks.loc[idx_tuple][\"Validation Accuracy\"] = clf.score(X_val_mfcc, y_val)\n", " mfcc_benchmarks.loc[idx_tuple][\"Test Accuracy\"] = clf.score(X_test_mfcc, y_test)\n", " print('\\tTrain accuracy: {:.2%}; validation accuracy: {:.2%}; test accuracy: {:.2%}\\n'.format(\n", " mfcc_benchmarks.loc[idx_tuple][\"Training Accuracy\"], \n", " mfcc_benchmarks.loc[idx_tuple][\"Validation Accuracy\"], \n", " mfcc_benchmarks.loc[idx_tuple][\"Test Accuracy\"]))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "iterables = [[\"Small\",\"Extended\"], [\"Linear\",\"Polynomial\",\"RBF\",\"Sigmoid\"]]\n", "idx = pd.MultiIndex.from_product(iterables, names=[\"Dataset Size\", \"SVC Kernel\"])\n", "mfcc_benchmarks = pd.DataFrame(index = idx, \n", " columns = [\"Training Accuracy\",\n", " \"Validation Accuracy\",\n", " \"Test Accuracy\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Experiment 1: MFCC Benchmarking" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Small Dataset" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Code adapted from FMA (commenting mine):\n", "\n", "# Use the small dataset (which is balanced re: genre and <8GB):\n", "small = tracks['set', 'subset'] <= 'small'\n", "\n", "# Get the pre-split sets from the FMA dataset\n", "train = tracks['set', 'split'] == 'training'\n", "val = tracks['set', 'split'] == 'validation'\n", "test = tracks['set', 'split'] == 'test'\n", "\n", "# Pull the main genre information on the examples in each of the pre-split sets:\n", "y_train = tracks.loc[small & train, ('track', 'genre_top')]\n", "y_val = tracks.loc[small & val, ('track', 'genre_top')]\n", "y_test = tracks.loc[small & test, ('track', 'genre_top')]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data split into 6400 training examples, 800 validation examples, and 800 testing examples.\n", "Small dataset contains 8000 tracks:\n", "\tHip-Hop: 800 training, 100 validation, and 100 test.\n", "\tPop: 800 training, 100 validation, and 100 test.\n", "\tFolk: 800 training, 100 validation, and 100 test.\n", "\tRock: 800 training, 100 validation, and 100 test.\n", "\tExperimental: 800 training, 100 validation, and 100 test.\n", "\tInternational: 800 training, 100 validation, and 100 test.\n", "\tElectronic: 800 training, 100 validation, and 100 test.\n", "\tInstrumental: 800 training, 100 validation, and 100 test.\n", "\n" ] } ], "source": [ "# Check out the data:\n", "print(f\"Data split into {y_train.size} training examples, \"\n", " f\"{y_val.size} validation examples, and \"\n", " f\"{y_test.size} testing examples.\")\n", "\n", "unique_genres = tracks.loc[small & train, ('track', 'genre_top')].unique().categories\n", "print(\"Small dataset contains {} tracks:\".format((tracks.loc[small & (train | \n", " test | \n", " val)]).shape[0]))\n", "for item in unique_genres:\n", " num_train = y_train[y_train == item].shape[0]\n", " num_val = y_val[y_val == item].shape[0]\n", " num_test = y_test[y_test == item].shape[0]\n", " print(f\"\\t{item}: {num_train} training, {num_val} validation, and {num_test} test.\")\n", "print()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* MFCC data loaded: 6400 training examples, 800 validation examples, and 800 test\n", "examples.\n", "\n", "* Shuffle complete.\n", "* Preprocessing for 0 mean/unit variance complete.\n", "\n", "The MFCC training data contains 140 features per example and 8 classes.\n", "Linear SVM training SVM begun at Tuesday, 2017 November 28, 7:47 AM.\n", "\tTraining Linear SVM finished at Tuesday, 2017 November 28, 7:48 AM. Calculating\n", "\t\tscores...\n", "\tTrain accuracy: 60.53%; validation accuracy: 46.12%; test accuracy: 41.62%\n", "\n", "Polynomial SVM training SVM begun at Tuesday, 2017 November 28, 7:48 AM.\n", "\tTraining Polynomial SVM finished at Tuesday, 2017 November 28, 7:48 AM. Calculating\n", "\t\tscores...\n", "\tTrain accuracy: 70.16%; validation accuracy: 45.00%; test accuracy: 38.88%\n", "\n", "RBF SVM training SVM begun at Tuesday, 2017 November 28, 7:48 AM.\n", "\tTraining RBF SVM finished at Tuesday, 2017 November 28, 7:49 AM. Calculating\n", "\t\tscores...\n", "\tTrain accuracy: 75.81%; validation accuracy: 53.12%; test accuracy: 46.38%\n", "\n", "Sigmoid SVM training SVM begun at Tuesday, 2017 November 28, 7:49 AM.\n", "\tTraining Sigmoid SVM finished at Tuesday, 2017 November 28, 7:49 AM. Calculating\n", "\t\tscores...\n", "\tTrain accuracy: 40.91%; validation accuracy: 36.00%; test accuracy: 34.75%\n", "\n" ] } ], "source": [ "# Pull the Mel-frequency cepstral coefficients (pre-calculated) for each of the pre-split \n", "# sets--the MFCCs are used as the baseline:\n", "X_train_mfcc = features.loc[y_train.index, 'mfcc']\n", "X_val_mfcc = features.loc[y_val.index, 'mfcc']\n", "X_test_mfcc = features.loc[y_test.index, 'mfcc']\n", "print((\"* MFCC data loaded: {} training examples, {} validation examples, and \"\n", " \"{} test\\nexamples.\").format(X_train_mfcc.shape[0], \n", " X_val_mfcc.shape[0], \n", " X_test_mfcc.shape[0]))\n", "print()\n", "\n", "# Shuffle training examples:\n", "X_train_mfcc, y_train = skl.utils.shuffle(X_train_mfcc, y_train, random_state=seed)\n", "print(\"* Shuffle complete.\")\n", "\n", "# Standardize MFCC features by removing the mean and scaling to unit variance.\n", "scaler = skl.preprocessing.StandardScaler(copy=False)\n", "X_train_mfcc = scaler.fit_transform(X_train_mfcc)\n", "X_test_mfcc = scaler.transform(X_test_mfcc)\n", "X_val_mfcc = scaler.transform(X_val_mfcc)\n", "print(\"* Preprocessing for 0 mean/unit variance complete.\")\n", "\n", "print((\"\\nThe MFCC training data contains {} features per example and \"\n", " \"{} classes.\").format(X_train_mfcc.shape[1],\n", " unique_genres.shape[0]))\n", "\n", "# Now perform the benchmarking:\n", "fit_svm(\"linear\", (\"Small\",\"Linear\"))\n", "fit_svm(\"poly\", (\"Small\",\"Polynomial\"))\n", "fit_svm(\"rbf\", (\"Small\",\"RBF\"))\n", "fit_svm(\"sigmoid\", (\"Small\",\"Sigmoid\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extended Dataset" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Desired genres: ['Hip-Hop', 'Pop', 'Folk', 'Rock', 'Experimental', 'International', 'Electronic', 'Instrumental']\n", "Eight-genre dataset contains 46317 tracks:\n", "\tHip-Hop: 3552\n", "\tPop: 2332\n", "\tFolk: 2803\n", "\tRock: 14182\n", "\tExperimental: 10608\n", "\tInternational: 1389\n", "\tElectronic: 9372\n", "\tInstrumental: 2079\n", "\n", "Using the large dataset as the base for the extended dataset.\n", "Data split into 37316 training examples, 4350 validation examples, and 4651 testing examples.\n", "Extended dataset contains 46317 tracks:\n", "\tHip-Hop: 2910 training, 319 validation, and 323 test.\n", "\tPop: 1815 training, 313 validation, and 204 test.\n", "\tFolk: 2275 training, 229 validation, and 299 test.\n", "\tRock: 11394 training, 1324 validation, and 1464 test.\n", "\tExperimental: 8557 training, 966 validation, and 1085 test.\n", "\tInternational: 1124 training, 137 validation, and 128 test.\n", "\tElectronic: 7662 training, 871 validation, and 839 test.\n", "\tInstrumental: 1579 training, 191 validation, and 309 test.\n", "\n" ] } ], "source": [ "# Pull up the small dataset y values, just for its list of genres\n", "print(\"Desired genres: {}\".format(unique_genres.tolist()))\n", "\n", "# Now filter the tracks based on those genres:\n", "eight_genre = tracks[tracks[(\"track\",\"genre_top\")].isin(unique_genres)]\n", "print(f\"Eight-genre dataset contains {eight_genre.shape[0]} tracks:\")\n", "for item in unique_genres:\n", " num = eight_genre[eight_genre[(\"track\", \"genre_top\")] == item].shape[0]\n", " print(\"\\t{}: {}\".format(item, num))\n", "print()\n", "\n", "size_key = 'large'\n", "print (\"Using the {} dataset as the base for the extended dataset.\".format(size_key))\n", "\n", "size_selector = eight_genre['set', 'subset'] <= size_key\n", "\n", "# Get the pre-split sets from the FMA dataset\n", "train = eight_genre['set', 'split'] == 'training'\n", "val = eight_genre['set', 'split'] == 'validation'\n", "test = eight_genre['set', 'split'] == 'test'\n", "\n", "# Pull the main genre information on the examples in each of the pre-split sets:\n", "y_train = eight_genre.loc[size_selector & train, ('track', 'genre_top')]\n", "y_val = eight_genre.loc[size_selector & val, ('track', 'genre_top')]\n", "y_test = eight_genre.loc[size_selector & test, ('track', 'genre_top')]\n", "print(f\"Data split into {y_train.size} training examples, \"\n", " f\"{y_val.size} validation examples, and \"\n", " f\"{y_test.size} testing examples.\")\n", "\n", "print(f\"Extended dataset contains {eight_genre.shape[0]} tracks:\")\n", "for item in unique_genres:\n", " num_train = y_train[y_train == item].shape[0]\n", " num_val = y_val[y_val == item].shape[0]\n", " num_test = y_test[y_test == item].shape[0]\n", " print(f\"\\t{item}: {num_train} training, {num_val} validation, and {num_test} test.\")\n", "print()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* MFCC data loaded: 37316 training examples, 4350 validation examples, and 4651 test\n", "examples.\n", "\n", "* Shuffle complete.\n", "* Preprocessing for 0 mean/unit variance complete.\n", "\n", "The MFCC training data contains 140 features per example and 8 classes.\n" ] } ], "source": [ "# Pull the Mel-frequency cepstral coefficients (pre-calculated) for each of the pre-split \n", "# sets--the MFCCs are used as the baseline:\n", "X_train_mfcc = features.loc[y_train.index, 'mfcc']\n", "X_val_mfcc = features.loc[y_val.index, 'mfcc']\n", "X_test_mfcc = features.loc[y_test.index, 'mfcc']\n", "print((\"* MFCC data loaded: {} training examples, {} validation examples, and \"\n", " \"{} test\\nexamples.\").format(X_train_mfcc.shape[0], \n", " X_val_mfcc.shape[0], \n", " X_test_mfcc.shape[0]))\n", "print()\n", "\n", "# Shuffle training examples:\n", "X_train_mfcc, y_train = skl.utils.shuffle(X_train_mfcc, y_train, random_state=seed)\n", "print(\"* Shuffle complete.\")\n", "\n", "# Standardize MFCC features by removing the mean and scaling to unit variance.\n", "scaler = skl.preprocessing.StandardScaler(copy=False)\n", "X_train_mfcc = scaler.fit_transform(X_train_mfcc)\n", "X_test_mfcc = scaler.transform(X_test_mfcc)\n", "X_val_mfcc = scaler.transform(X_val_mfcc)\n", "print(f\"* Preprocessing for 0 mean/unit variance complete.\")\n", "\n", "print((\"\\nThe MFCC training data contains {} features per example and \"\n", " \"{} classes.\").format(X_train_mfcc.shape[1],\n", " unique_genres.shape[0]))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Linear SVM training SVM begun at Tuesday, 2017 November 28, 7:49 AM.\n", "\tTraining Linear SVM finished at Tuesday, 2017 November 28, 8:24 AM. Calculating\n", "\t\tscores...\n", "\tTrain accuracy: 61.27%; validation accuracy: 62.25%; test accuracy: 57.82%\n", "\n", "Polynomial SVM training SVM begun at Tuesday, 2017 November 28, 8:28 AM.\n", "\tTraining Polynomial SVM finished at Tuesday, 2017 November 28, 8:36 AM. Calculating\n", "\t\tscores...\n", "\tTrain accuracy: 72.37%; validation accuracy: 58.64%; test accuracy: 56.35%\n", "\n", "RBF SVM training SVM begun at Tuesday, 2017 November 28, 8:40 AM.\n", "\tTraining RBF SVM finished at Tuesday, 2017 November 28, 8:47 AM. Calculating\n", "\t\tscores...\n", "\tTrain accuracy: 74.54%; validation accuracy: 63.59%; test accuracy: 61.21%\n", "\n", "Sigmoid SVM training SVM begun at Tuesday, 2017 November 28, 8:52 AM.\n", "\tTraining Sigmoid SVM finished at Tuesday, 2017 November 28, 10:18 AM. Calculating\n", "\t\tscores...\n", "\tTrain accuracy: 45.32%; validation accuracy: 45.01%; test accuracy: 45.99%\n", "\n" ] } ], "source": [ "fit_svm(\"linear\", (\"Extended\",\"Linear\"))\n", "fit_svm(\"poly\", (\"Extended\",\"Polynomial\"))\n", "fit_svm(\"rbf\", (\"Extended\",\"RBF\"))\n", "fit_svm(\"sigmoid\", (\"Extended\",\"Sigmoid\"))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Training AccuracyValidation AccuracyTest Accuracy
Dataset SizeSVC Kernel
SmallLinear0.6053130.461250.41625
Polynomial0.7015620.450.38875
RBF0.7581250.531250.46375
Sigmoid0.4090620.360.3475
ExtendedLinear0.6127130.6225290.578155
Polynomial0.7237110.5864370.563535
RBF0.7454440.6358620.612126
Sigmoid0.453210.4501150.459901
\n", "
" ], "text/plain": [ " Training Accuracy Validation Accuracy Test Accuracy\n", "Dataset Size SVC Kernel \n", "Small Linear 0.605313 0.46125 0.41625\n", " Polynomial 0.701562 0.45 0.38875\n", " RBF 0.758125 0.53125 0.46375\n", " Sigmoid 0.409062 0.36 0.3475\n", "Extended Linear 0.612713 0.622529 0.578155\n", " Polynomial 0.723711 0.586437 0.563535\n", " RBF 0.745444 0.635862 0.612126\n", " Sigmoid 0.45321 0.450115 0.459901" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(mfcc_benchmarks)\n", "ut.save_obj(mfcc_benchmarks, \"mfcc_benchmarks\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }