Book Description
In this project, you will perform analysis, clustering, and prediction on household electric power consumption with python. The dataset used in this project contains 2075259 measurements gathered between December 2006 and November 2010 (47 months). Following are the attributes in the dataset: date: Date in format dd/mm/yyyy; time: time in format hh:mm:ss; globalactivepower: household global minute-averaged active power (in kilowatt); globalreactivepower: household global minute-averaged reactive power (in kilowatt); voltage: minute-averaged voltage (in volt); global_intensity: household global minute-averaged current intensity (in ampere); submetering1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered); submetering2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light; and submetering3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner. In this project, you will perform clustering using KMeans to get 5 clusters. The machine learning models used in this project to perform regression on total number of purchase and to predict clusters as target variable are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM, Gradient Boosting, XGB, and MLP. Finally, you will plot boundary decision, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy.