GraPO: Power Outage Prediction | GSU Research Project

Abstract

Predicting power outages in advance is critical for enhancing public safety, expediting emergency response, reducing economic losses, and minimizing damage to electrical infrastructure. During natural disasters and extreme weather events, power systems face elevated risks, making accurate and timely predictions essential. This paper introduces GraPO, a novel spatial-temporal graph neural network (GNN) approach designed to predict county-level power outage risk and severity during and after extreme weather events. We construct a spatial graph where each U.S. county is represented as a node, and edges encode the topology of the national electric transmission network based on publicly available datasets. GraPO integrates a rich set of node features derived from historical outage records (EAGLE-I), real-time and forecasted climate data (ERA5), severe weather alerts (NWS VTEC), and county-level socio-demographic attributes. By embedding spatial-temporal dynamics through weather event sequences and grid connectivity, GraPO models how outage risks propagate across the network. We evaluate the model using three years of historical data covering multiple storm, heatwave, and dark calm events. Results show that GraPO consistently outperforms traditional machine learning baselines, providing earlier and more accurate outage risk predictions. We also discuss practical implications, limitations related to data sparsity in rural regions, and future directions for improving predictive robustness and real-time deployment.

Introduction

Power outages significantly impact economic stability, critical infrastructure integrity, public safety, and the overall quality of life. Annually, these disruptions result in billions of dollars in economic losses, infrastructure damages, and increased risks to public welfare. Severe weather events such as hurricanes, floods, and thunderstorms are frequent causes of power outages, often overwhelming traditional forecasting methods that rely heavily on historical outage data and simplistic statistical relationships. The complexity and interconnectedness of modern power grids, coupled with the escalating frequency of extreme weather due to climate change, underscore the urgent need for advanced predictive capabilities to anticipate and mitigate power outages.

Traditional prediction methods, including statistical modeling and classical machine learning algorithms such as random forests and gradient boosting, have provided useful but limited insights. These methods typically neglect essential spatial dependencies and temporal dynamics intrinsic to power grids, leading to limited predictive accuracy and limited actionable insights.

Graph neural networks (GNNs) offer promising solutions by explicitly modeling relationships and interactions across networked data structures. GNNs excel in capturing intricate patterns across interconnected nodes, making them highly suitable for spatially and temporally structured data. Spatial-temporal graph neural networks (ST-GNNs), in particular, have demonstrated significant efficacy in various applications including traffic prediction, infrastructure monitoring, and energy management.

To address the critical gaps in existing methodologies, this study proposes GraPO, a sophisticated GNN-based model specifically designed to predict the locations and severities of power outages approximately 12 hours in advance. Our approach integrates multiple, heterogeneous datasets including comprehensive climate data (ERA5), detailed US power grid topology, historical power outage occurrences (EAGLE-I), and real-time weather alerts from the National Weather Service (NWS VTEC archives). The resulting predictive model offers a strategic advantage to utility companies, emergency response organizations, and policymakers by providing early and accurate predictions, allowing for proactive and effective management of resources and responses.

Data Sources

EAGLE-I Outage Data

We utilize county-level outage counts from the EAGLE-I™ (Environment for Analysis of Geo-Located Energy Information) dataset maintained by Oak Ridge National Laboratory. This dataset provides comprehensive power outage information across the United States from 2014 to 2023, containing information on the number of customers without power every 15 minutes at the county level. Each model's input features comprise: (1) the maximum number of county outages over the 12-hour period preceding each weather alert issuance; (2) the categorical weather alert type; and (3) the list of counties impacted by the alert.

ERA5 Climate Reanalysis Data

We use the ERA5 Single Level Reanalysis dataset, hosted by the Google Cloud Public Datasets program via the arco-era5 archive. This dataset provides hourly global weather reanalysis fields produced by ECMWF on a native N320 reduced Gaussian grid (~25 km resolution).

Our analysis focuses on a single calendar year (2022), which includes 8,760 hourly timesteps. We partition the data chronologically, using the first 70% (January 1 to September) for training, 10% for validation, and the remaining 20% for testing. This split preserves temporal coherence and reflects realistic forecasting conditions.

The dataset includes a wide range of atmospheric and surface variables:

Surface meteorology: 2-meter temperature, dewpoint, surface pressure, mean sea level pressure
Wind components: at 10m and 100m height
Cloud cover: low, medium, high, and total cloud cover
Water and energy fluxes: total column water vapor, cloud liquid water, convective available potential energy (CAPE)
Soil and snow fields: soil temperature and moisture (layers 1-4), snow temperature
Sea state: sea surface temperature, sea ice fraction

County-Level Spatial Data

To produce county-level weather data, we use the US Census Bureau boundary files for each US county. We compute the centroid of each county using an average of all sides of the polygon's coordinates weighted by side length. To convert the ERA5 weather data from a grid to county-level data, we use Delaunay triangulation, the dual of Voronoi tessellation, to get a set of triangular surfaces with ERA5 grid points making up the vertices. For each county centroid, we find which surface it falls upon then use the weather data corresponding to the vertices of the triangle to interpolate a value for that county.

Power Grid Topology

We leverage the US Power Transmission Line Dataset from the Department of Energy, which provides precise geographic start and end points for transmission lines across the United States. Each power line segment is geographically mapped to the counties it traverses, enabling us to construct a graph where edges represent physical electrical connections between counties.

Methodology

Our approach, GraPO, integrates spatial-temporal graph neural networks (GNNs) to predict power outage locations and severities approximately 12 hours in advance. The methodology consists of four key components: graph construction, node embedding initialization, model architecture design, and model training and evaluation.

Graph Construction

The foundational step in our methodology is the detailed construction of a spatially explicit graph representing all US counties, interconnected through transmission lines. To create this structure, we leverage the US Power Transmission Line Dataset, which provides precise geographic start and end points for transmission lines. Each power line segment is geographically mapped to the counties it traverses. Edges between county nodes in the resulting graph are established where transmission lines directly connect counties. Additionally, each edge is assigned a weight corresponding to the voltage level of its associated power line, effectively capturing varying strengths of connectivity within the power grid. This graph structure embodies the intricate interdependencies inherent in electrical infrastructure and serves as a spatial foundation for predictive modeling.

Node Embeddings Initialization

Each node in the graph corresponds to a US county and is initialized with rich, multi-source embeddings to effectively capture both static and dynamic county-level characteristics. We aggregate climate features from the ERA5 dataset, which includes critical weather parameters such as temperature, wind speed, and precipitation. This aggregation occurs by calculating each county's centroid and extracting climate information from the four nearest ERA5 data points, thus providing robust local climatic representations.

Furthermore, we enrich these embeddings by incorporating static demographic and geographic features from publicly available county datasets. Historical power outage information from the EAGLE-I dataset also forms a critical part of each node's embedding, providing the temporal context necessary for prediction tasks. Finally, we integrate event-specific weather advisories from the NWS Valid Time Extent Code (VTEC) archives, dynamically capturing real-time alert data relevant to outage occurrences. Collectively, these embeddings represent comprehensive county-specific features, essential for accurately modeling complex outage dynamics.

Model Architecture

To explore the predictive performance and generalization capabilities of various GNN configurations, we design and evaluate modeling approaches with varying architectures:

Encoder: We compare a Temporal Transformer encoder to a simple feedforward network used as a baseline.
Graph Neural Network (GNN) Layer: We evaluate both Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs). The GCN aggregates information uniformly from neighboring nodes, whereas the GAT introduces learned attention coefficients to dynamically weight node interactions.

In addition to the architectural components, we incorporate a learnable county-specific bias term to capture static characteristics and persistent outage tendencies. The final model output is computed as:

$$ \hat{y}_{i,t+h} = \phi\left( \text{GNN} \left( \left\{\text{TemporalEncoder}\left(X_{t-w:t}^{(i)}\right) + c^{(i)} \mid X_{t-w:t}^{(i)} \in \mathbf{X}, c^{(i)} \in C\right\}, \text{E} \right) \right) $$

where:

$\hat{y}_{i,t+h}$ is the predicted target value for county $i$ at forecast horizon $h$
$X_{t-w:t}^{(i)} \in \mathbf{X}$ denotes the input weather features for county $i$ over the historical window $[t-w, t]$
$\text{E}$ is the edge list
$\text{TemporalEncoder}$ is the temporal transformer layer
$\text{GNN}$ is a spatial layer (GCN or GAT)
$\phi$ is the feedforward decoder
$c_i$ is a learnable county embedding vector for county $i$

Training Objective and Evaluation

Our primary training objective is to minimize the root mean squared error (RMSE) between predicted and actual customer outage counts per county, forecasting 12 hours in advance. Specifically, the RMSE loss function, defined as:

$$ \text{RMSE} = \sqrt{\frac{1}{N}\sum_{i=1}^N (y_i - \hat{y_i})^2} $$

quantitatively measures model predictive accuracy, where $y_i$ and $\hat{y_i}$ denote true and predicted outage counts, respectively. To rigorously evaluate model performance, robustness, and generalization capacity, we apply a 5-fold cross-validation strategy.

Implementation Details

Models are implemented using the PyTorch and PyTorch Geometric frameworks, leveraging GPU acceleration via NVIDIA GTX 4090 for computationally intensive tasks. Hyperparameter optimization is conducted using Bayesian methods, systematically tuning parameters such as learning rate, number of hidden units, number of layers, and attention heads, selecting optimal configurations based on validation performance. Training durations range from 2 to 8 hours, dependent upon the complexity of each model architecture.

Results and Discussion

Baseline Performance

To benchmark GraPO's performance against established approaches, we adopt the same state-level machine learning methods used by prior work. Specifically, we implement three widely-used regression models: Random Forest (RF), k-Nearest Neighbors (kNN), and Extreme Gradient Boosting (XGBoost). Following the methodology from existing research, we sort events chronologically and allocate the first 80% for training and the remaining 20% for testing.

Below are the baseline results predicting the number of affected customers 12 hours in advance at the state level:

State	Data Size (# of events)	Random Forest R² (RMSE)	kNN R² (RMSE)	XGBoost R² (RMSE)
Texas	Large (15,272)	0.512 (6,692.8)	0.346 (7,750.9)	0.411 (7,356.0)
Michigan	Medium (2,162)	0.502 (9,257.6)	0.486 (9,396.8)	0.438 (9,831.8)
Hawaii	Small (658)	0.185 (1,040.9)	-0.000832 (1,154.1)	0.0485 (1,125.2)

GraPO Model Performance

Our proposed approach, GraPO, demonstrates promising potential for accurately predicting power outages by effectively modeling spatial and temporal dynamics using graph neural networks. Unlike traditional regression models that rely primarily on historical outage patterns and weather alerts, our method explicitly incorporates the physical topology of the electrical grid and detailed climate data, significantly enhancing predictive capabilities on a more granular task.

While prior work has primarily focused on coarser, state-level prediction, our model is trained on a more challenging task of forecasting outages at the county scale across the entire United States. Despite the increased granularity and complexity, GraPO achieves a strong RMSE of 9,413.13, demonstrating its ability to generalize across diverse areas of the US. These results highlight the promise of graph-based learning in critical infrastructure modeling, particularly that our derived features were informative enough to allow for high predictive power.

Key Findings

Spatial Dependencies Matter: Incorporating power grid topology through graph structure significantly improves prediction accuracy compared to models that treat counties independently.
Multi-Modal Data Integration: Combining ERA5 climate data, EAGLE-I outage history, and NWS weather alerts provides a comprehensive feature set that captures diverse aspects of outage risk.
12-Hour Lead Time: GraPO successfully predicts outages 12 hours in advance, providing sufficient time for utility companies and emergency responders to prepare.
County-Level Granularity: Moving from state-level to county-level predictions enables more targeted and effective resource allocation.

GraPO: Predicting Power Outages During Extreme Weather Events Using Graph-Based Methods

Spatial graph representation of US power grid with county-level nodes and transmission line connectivity