![]() # Classes 'tbl_df', 'tbl' and 'ame': 89 obs. # filter out final month of the series, which is incomplete There is a version for those that prefer to handle data in excel using R and one where I fetch the data by linking R to a PosgreSQL database. Here I’m simply loading up the compiled dataset but if you want to follow along I’ve also written a number of posts where I show how I’ve assembled the various data feeds and sorted out variable names, new features creation and some general housekeeping tasks. Due to its artificial nature, the series presents a few oddities and quirks, which I am going to point out throughout this project. The data covers 3 & 1⁄ 2 years worth of sales orders for the Sample Outdoors Company, a fictitious B2B outdoor equipment retailer enterprise and comes with details about the products they sell as well as their customers (which in their case are retailers). The dataset I’m using here accompanies a Redbooks publication and is available as a free download in the Additional Material section. I am fitting an assorted selection of machine learning models such as Generalised Linear Model, Gradient Boosting Machine and Random Forest and also using AutoML for automatic machine learning, one of the most exciting features of the H2O library. In particular, I use TSstudio to carry out a “traditional” time series exploratory analysis to describe the time series and its components and show how to use the insight I gather to create features for a machine learning pipeline to ultimately generate a weekly revenue forecast.įor modelling and forecasting I’ve chosen the high performance, open source machine learning library H2O. In this project I am going to explain in detail the various steps needed to model time series data with machine learning models.Ĭomparison of models performance and forecasting To top it off, their black box nature makes their output harder to interpret and has given birth to the ever growing field of Machine Learning interpretability (I am not going to touch on this as it’s outside the scope of the project) Project structure However, they tend to have a wider array of tuning parameters, are generally more complex than “classic” models, and can be expensive to fit, both in terms of computing power and time. Moreover, they can handle complex calculations over larger numbers of inputs much faster. ![]() ![]() The advantage of using machine learning models over more traditional methods is that they can have higher predictive power, especially when predictors have a clear causal link to the response. These techniques allow for historical information to be introduced as input to the model through a set of time delays. In more recent times, the popularisation and wider availability of open source frameworks like Keras, TensorFlow and scikit-learn helped machine learning approaches like Random Forest, Extreme Gradient Boosting, Time Delay Neural Network and Recurrent Neural Network to gain momentum in time series applications. forecast electricity consumption) and academia (e.g. predict stock prices and analyse trends in financial markets), the energy sector (e.g. Traditional approaches to time series analysis and forecasting, like Linear Regression, Holt-Winters Exponential Smoothing, ARMA/ARIMA/SARIMA and ARCH/GARCH, have been well-established for decades and find applications in fields as varied as business and finance (e.g.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |