Munging my “Smart” Home Data

Context

For some of you who live in older homes, the feeling of too cold in the winter and too hot during the summer comes with the package. Last year, I thought of getting spray foam installed and figured now is the time to start analyzing the some of the data I’ve been collecting over the past three years. I told the installer that I had three years of data and would do a before and after test to see what the spray foam did in terms of performance. He stated he would purchase the data analysis. Needless to say one could just look at the heating bill to see if there is difference.  With variance in unit  costs of fuel, admin, etc., I did not want to bother normalizing that info. The geek in me wants to explore data mining and inference. So off I go to explore  Linear discriminant analysis and random forests.

For this exercise, I had three in-home temperature points, one outside, and several power related measurements. So far I had close to 3 million data points. I searched the web for an open source toolset that could help me with data analytics and decent plotting capabilities. Given I used the R programming language a few months back and liked the graphing capabilities in the ggplot2 package, it became my tool of choice. Note that Python is making in-roads in the data analysis space and for now, I want to remain focused on data analysis so R it is.