Masters Thesis

Modeling in R and Weka for Course Enrollment Prediction

This thesis presents a tool developed for the comparison of R and Weka time series models for predicting undergraduate Computer Science course enrollments at CSUN. Current methodologies used at other universities along with related work on course enrollment prediction are examined to guide model selection as well as interpret the modeling test results. The models implemented in Weka are Gaussian Processes, Linear Regression, Multilayer Perceptron, and SMOreg. The models implemented using R's forecast package are ARIMA, ETS, and RWF. Predictions on holdout data are compared both in modified form, with numbers rounded up and negative values zeroed out, and unmodified form. The most accurate models when comparing three semesters of both modified and unmodified predictions against three semesters of holdout data were Gaussian Processes and SMOreg. All models were most accurate when predicting three semesters of holdout data using the maximum available enrollment data from Spring term of 2010 to Spring term of 2015 for training. Results at best predicted enrollment within 25 students for 93.5% of courses, and at worst for 77.4% of courses. Details on project maintenance as well as future enhancements are also included.

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.