Global Temperatures Analysis
A time series analysis of global surface temperatures using linear regression to model trends and forecast 50-year future projections.
How fast is global temperature rising, and what does that tell us about the future?
Climate data exists going back centuries, but the noise of seasonal variation drowns out the signal. The question isn't whether temperatures fluctuate — they always do — but whether there's a consistent, measurable trend buried underneath.
Berkeley Earth Global Land Temperatures — monthly records from 1743 to 2013, covering 243 countries and regions. For this analysis, US data only: approximately 3,240 monthly observations filtered to reliable records from 1821 onward.
Processing: Forward-fill missing values, aggregate monthly data to yearly averages to smooth seasonal noise.
Linear regression on 190 years of yearly averages.
Fit an OLS line through yearly temperatures from 1821–2013. The slope tells us the warming rate per decade. Uncertainty bands (95% confidence interval) show where future temperatures could plausibly land given historical variation.
The model is intentionally simple: one variable (time), one target (temperature). No fancy nonlinear terms, no seasonal adjustments. The trend is strong enough that a straight line captures it well.
from sklearn.linear_model import LinearRegression import numpy as np # Fit line through yearly temps X = np.array(years).reshape(-1, 1) y = np.array(temps) model = LinearRegression().fit(X, y) # Forecast 50 years ahead slope = model.coef_[0] # warming per year forecast = model.predict([[2024], [2074]])
About 0.018°F of warming per year in US data.
That's roughly 1.8°F per century — small in any given year, but over two centuries it adds up to a visible shift in the baseline. The 50-year forecast extends that line to 2074, showing where we'd expect to be if current trends continue.
The uncertainty bands widen into the future (as they should), reflecting both the historical scatter around the trend and the limits of linear extrapolation. Real-world temperatures will bounce around this line; the line itself is our best guess at the underlying drift.
Three things stand out.
Warming is steady.
No sharp breaks or reversals. The trend is consistent enough that a single line fits well.
Year-to-year scatter is real.
Some years warm, some cool. The trend emerges only when you zoom out to decades.
Uncertainty compounds.
The farther out we forecast, the wider the confidence band. 2074 is 60 years away.
What this model doesn't capture.
Linear regression assumes the future will look like the past — that the warming rate stays constant. In reality, feedback loops, policy changes, and emissions scenarios could accelerate or dampen the trend. A more sophisticated model might use multiple variables (CO₂ levels, sunspot cycles, etc.) but requires more data and introduces more assumptions.
This model is useful for one thing: as a baseline, a "do-nothing" projection. It's not a prediction so much as a reference point for comparison.
I'd segment by region instead of treating all US data as one blob. Coastal vs. inland vs. elevation have very different trends. I'd also pull in CO₂ data as an exogenous variable — not to model causation, but to see how the relationship behaves.
The uncertainty bands would also benefit from a bootstrap approach instead of classical confidence intervals — that would capture the actual range of reasonable future scenarios better.