Data Mining Dutch Football
Problem being addressed
Can a useful forecasting model for football games be built using a combination of machine learning methods and publicly available data?
The authors use publicly available data on the Dutch Football League to develop a predictive model that aims to forecast the outcome of a given match. Typical publicly available data includes travel distance between matches, results over a rolling period, results from the previous season and disruptions to the coaching and playing staff. A set of 45 data features is used by the authors, and these features are used to train and evaluate different machine learning methods including Random Forest and Naive Bayes.
Advantages of this solution
There researchers conclude that there is statistically no difference between using publicly available data (over a 13 year period) and looking at betting odds. This isn't really a spectacular predictive result, but it confirms that if you would like to genuinely do better than the odds, then you would need to look at more sophisticated machine learning modelling than using publicly available data.
Solution originally applied in these industries
Possible New Application of the Work
Secondary vehicle markets are interesting because they give an indication of the durability of a vehicle. Cars that don't break are often resold. This data is available on the web from car sales websites. An interesting project would be to predict the reliability of a brand (in a given country) based on sales in the secondary car market, and data points like vehicle mileage.
Real Estate Industry
There is a great deal of publicly available data in the real-estate industry. It would be interesting to build similar predictors for consumers that are looking to buy; it may help them decide where and when to enter the market. And, it could be a better guide than an agent.
Source DOI: #############