Image source1

Introduction

For several years now, I’ve heard acquaintances lament: We don’t have spring anymore. We skip spring and go straight to summer. I miss spring-like temperatures.

This project combines some of my favorite technologies - open data, cloud computing, and Jupyter notebooks. All to answer the question “Are milder spring temperatures being replaced by hotter ones?

Finding the data

With some research, I was able to find that weather observations are publicly available and free to use as part of NOAA’s Global Historical Climatology Network Daily (GHCN-D) dataset. The dataset contains observations from 106,200 weather stations with daily measurements of precipitation, temperature, and other conditions for 260 years. There are several options to access the data, including the NOAA website and Amazon Web Services (AWS) Open Data. The dataset is 256 GB in size.

Since this data set is so big and because of the success stories of using the data in the cloud (Calculating growing degree days and Visualize over 200 years of global climate data), I decided to use the NOAA data available on AWS.

Processing the data

The step-by-step instructions explain how to use AWS services to pinpoint the necessary data from those billions of observations, quickly and with minimal cost. The resulting dataset is 185 KB and has about 40,000 observations. Since the dataset is so small, it can be downloaded and used on a personal machine if needed.

Analyzing the data

The full notebook is availalble on GitHub. My conclusion is that the data does support the perception that spring feels hotter nowadays in Knoxville, TN.

Thought process

I had to translate the vague complaint “Spring feels hotter” into measurable quantities. To that effect, I had to make some assumptions and define some terms. As an example, I had to set temperature thresholds for an expected spring day and an abnormally hot spring day. I also had to define what time period people are complaining about. My acquaintances seem to expect hotter days near the end of spring, but the middle spring time frame is what they were complaining about. So my takeaway is that it takes some research and interviews to quantify terms for analytics projects.

Future work

This project sparked plenty more ideas. It would be interesting to expand to more locations, to see if the results are similar. I also am curious about how the data collection technology changed over time. Surely the 1910 methods are different from 2023 and I’m wondering how that impacts the data. And I’m also interested in knowing how shifting the definitions in the analysis would sway the results. For example, if the temperature thresholds are changed, are the conclusions the same?

  1. Image courtesy of Photo by Freddie Ramm: at Pexels