Grid Small Foundation Model (GridSFM)

Model summary

Field Value
Developer Microsoft Corporation

Authorized representative: Microsoft Ireland Operations Limited 70 Sir John Rogerson’s Quay, Dublin 2, D02 R296, Ireland
Description GridSFM is a graph neural-network-based model trained on structured representations of power grid systems. The model leverages simulated data to learn relationships between topology, load distribution, and system behavior, enabling generalization across diverse grid configurations. Training involves large-scale synthetic scenario generation combined with supervised learning on simulation outputs.
Model architecture Heterogeneous graph attention network with Hodge positional encoding, residual blocks, and dual heads for AC-OPF operating-point prediction and feasibility classification
Parameters 1-500M
Inputs Inputs are graph-structured numerical scenarios representing AC-OPF problem instances: heterogeneous node features (buses, generators, loads, voltage limits, dispatch capacities, cost coefficients, demand) and edge features (AC lines, transformers, thermal ratings, angle limits, impedance), together with the topology defining the graph.
Context length N/A
Outputs Graph-structured numerical predictions matching the input topology: per-bus voltage magnitude (V) and angle (θ), per-generator real and reactive power dispatch (Pg, Qg), per-edge active and reactive branch flows (Pij, Qij), and one per-scenario scalar feasibility logit.
Public data summary (or summaries) microsoft/GridSFM_US_power_grid · Datasets at Hugging Face
Training Dates May 1, 2026 to May 10, 2026
Release date

Release date in the EU (if different)
May 13, 2026
License MIT license
Model dependencies: N/A
List and link to any additional related assets http://github.com/microsoft/gridsfm
Acceptable use policy N/A

1. Model overview

GridSFM is a graph neural-network-based model trained on structured representations of power grid systems. The model leverages simulated data to learn relationships between topology, load distribution, and system behavior, enabling generalization across diverse grid configurations. Training involves large-scale synthetic scenario generation combined with supervised learning on simulation outputs.

Unlike traditional physics-based solvers, GridSFM focuses on data-driven approximations and learning-based representations. This enables faster inference and supports research into scalable, AI-driven transmission grid modeling. However, the model is not intended to replace numerical solvers in production settings and it is for research purposes only.

1.1 Alignment approach

GridSFM is not a generative language model and does not generate open-ended content. As such, no standard LLM alignment techniques were applied. Instead, risk mitigation focuses on constrained outputs, clear documentation of intended use, and restricting use to research contexts involving simulated data.

2. Usage

2.1 Primary use cases

GridSFM is intended for research and experimentation in AI-driven power system modeling. It can be used for approximating power flow, studying system behavior under different operating conditions, and evaluating learning-based optimization strategies.

The model is particularly useful in scenarios where large-scale simulation is required, enabling faster evaluation of grid configurations and supporting research into resilient and efficient grid operation.

2.2 Out-of-scope use cases

GridSFM is not designed for real-time operational decision-making in power grids. It should not be used for safety-critical applications, infrastructure control, or deployment in production environments without extensive validation.

The model is not intended for use outside the domain of power system modeling and may produce unreliable outputs when applied to other domains or data formats.

2.3 Distribution channels

Model checkpoint can be downloaded from Hugging Face and Foundry

2.4 Input formats

gridSFM expects structured input in JSON format representing a power grid instance. Each sample contains three top-level fields: grid, solution, and metadata.

The grid field defines the input state and includes:

  • nodes: entities such as buses, generators, loads, and shunts, represented as feature matrices where rows correspond to entities and columns to attributes

  • edges: connectivity information including transmission lines and transformers, with sender/receiver indices and associated physical features

  • context: global system information (e.g., baseMVA)

The input captures a full graph representation of a power system, where:

  • nodes encode local properties (e.g., voltage limits, generation bounds, demand)

  • edges encode physical connections and constraints (e.g., line impedance, thermal limits)

The expected format is consistent across all examples, with structured numerical tensors derived from the JSON representation. Inputs must follow this schema and match the feature ordering used during training.

2.5 Technical requirements and integration guidance

GridSFM is a lightweight model (~15M parameters) and can be run on CPU for inference and small-scale experimentation. For larger datasets or more efficient processing, GPU acceleration is recommended but not required.

The model can be integrated into standard machine learning pipelines using common frameworks (e.g., PyTorch). Inputs must be preprocessed into structured graph representations derived from JSON grid data, and outputs require post-processing to map back to physical quantities such as voltages and power flows.

GridSFM is best suited for offline research workflows, simulation pipelines, and decision-support applications. It is not designed for real-time control or fully autonomous deployment.

2.6 Responsible AI considerations

GridSFM is a research model trained on simulated power system data and is intended for offline analysis and experimentation. As a result, its outputs may not fully capture real-world grid complexity, rare failure modes, or operational constraints. Performance may degrade when applied to out-of-distribution grid configurations or real-world systems.

A key risk is overreliance on model outputs. The model provides approximations based on learned patterns rather than exact physical guarantees, and incorrect predictions could lead to misleading conclusions if used without validation. In addition, biases in the simulated data (e.g., overrepresentation of certain grid topologies or operating conditions) may limit generalizability across regions or system types.

To support responsible use, users should restrict usage to research and evaluation scenarios, validate outputs against trusted simulation tools or domain expertise, and clearly communicate the limitations and uncertainty of model predictions.

3. Quality and performance evaluation

GridSFM is evaluated using AC Optimal Power Flow (AC-OPF) ground-truth labels generated with IPOPT in PowerModels.jl. We report (i) dispatch cost accuracy against AC-OPF, (ii) feasibility classification accuracy (feasible vs infeasible), and (iii) warm-start effectiveness when handing off gridSFM outputs to an exact numerical solver (IPOPT), measured as solve-time speedup relative to cold-start AC-OPF.

Internal benchmarks (54-grid Open Benchmark). We evaluate on an internal 54-grid benchmark that combines 24 pglib-OPF transmission cases (500–4,661 buses) and 30 multi-state planning-region grids (“msr_”). Each grid contributes 10 test scenarios (540 scenarios total).
Standalone dispatch cost vs AC-OPF ground truth: per-scenario gap has median 2.4% and mean 3.9%, with <5% gap on 80% of scenarios. A small set of outliers extends to ~25% on a small number of grids; in these cases the AC-OPF reference itself required additional constraint relaxation to converge.
Feasibility classification (per-grid binary accuracy across the 54-grid mix): real-feasible scenarios correctly passed through (mean 92.9%); real-infeasible scenarios correctly flagged (mean 94.3%); synthetic stress modes (voltage-squeeze, thermal-bottleneck, angle-tighten, DC-thermal) achieve mean 92.1%.
Warm-start handoff to numerical solver (IPOPT solve_time, single-core CPU pinning): gridSFM-seeded warm-start is 1.45× faster than cold-start AC-OPF (geomean), and is faster on 52/54 grids. For comparison, a DC warm-start is 1.02× relative to cold-start (essentially tied; 31/54 wins). An oracle warm-start using ground-truth initializations provides a ceiling of 2.84×.

Case study (Texas2k summer-peak). On a focused analysis of a single grid, gridSFM achieves ROC AUC 0.985 and feasibility binary accuracy 94.4% at the natural operating threshold. Per-mode detection is 99–100% on perturbation modes that drive a constraint cleanly past its limit.

Public benchmark integration (OPFData). We additionally report results on the OPFData benchmark (Google DeepMind, arXiv:2406.07234) on the test split of three OPFData pglib cases, evaluated under both standard load perturbations (FullTop) and topological perturbations (N-1). Test cost MAPE and feasibility F1 are:

  • pglib_opf_case500_goc: FullTop MAPE 1.14%, N-1 MAPE 1.44%; Feas F1 1.00 (FullTop) / 0.86 (N-1)

  • pglib_opf_case2000_goc: FullTop MAPE 0.51%, N-1 MAPE 0.64%; Feas F1 1.00 (FullTop) / 0.94 (N-1)

  • pglib_opf_case4661_sdet: FullTop MAPE 0.93%, N-1 MAPE 0.99%; Feas F1 1.00 (FullTop) / 0.96 (N-1)
    Across these cases, N-1 cost accuracy remains within ~0.5–1.5 percentage points of the corresponding FullTop value, providing evidence of robustness to unseen line/generator outage topologies without per-topology retraining.

Learning-based third-party comparisons. Direct head-to-head comparisons against other learning-based AC-OPF surrogates are not applicable. All prior approaches (e.g., DeepOPF-style methods and follow-ups) train a separate model per grid topology. GridSFM is trained to work across all grid topologies.

Non-ML baselines (practical alternatives today). We compare against DC-OPF as a conventional approximation baseline. On the internal 54-grid mix, DC-OPF is in a similar accuracy class on dispatch cost gap (DC mean 2.8% vs gridSFM mean 3.9%), but DC-OPF does not produce a full AC operating point (voltages and reactive power). gridSFM produces a full AC operating point, which enables solver warm-start workflows: on the 54-grid mix, DC warm-start is essentially tied with cold-start AC-OPF (1.02×), while gridSFM warm-start improves solver time by 1.45× relative to cold-start.

Key differentiators. gridSFM is designed as (i) a single model spanning multiple grid topologies and sizes, (ii) trained robustness to N-1 topology perturbations, (iii) a joint per-scenario feasibility score in addition to dispatch prediction, and (iv) AC operating point prediction that supports warm-start handoff to exact solvers.

3.1 Safety evaluation and red-teaming

GridSFM is a numerical optimization surrogate (predicts voltages, generator dispatch, branch flows, and a feasibility score) and does not generate natural language, images, code, or any other user-facing content. The standard generative-AI safety categories (disallowed sexual/violent/hateful/self-harm content, copyright/IP infringement, jailbreaks) are not applicable because there is no text or media generation surface and no instruction-following layer to jailbreak. The relevant safety surface for a power-grid optimization surrogate is operational: silently incorrect predictions that would cause harm if blindly trusted in a real-world dispatch or planning workflow.

4. Data overview

4.1 Training, testing, and validation datasets

4.1.1 Size of dataset and characteristics

GridSFM-Open is trained on ~540,000 AC-OPF scenarios spanning 54 base grid topologies: 24 pglib-OPF transmission cases (500–4,661 buses) and 30 multi-state region grids("msr_") covering US planning regions. Each scenario carries an AC-OPF ground-truth solution (bus voltages, angles, generator dispatch, branch flows, and a feasibility label) generated by IPOPT in PowerModels.jl with a tolerance of 1e-8 and full nonlinear AC physics. Beyond per-load demand variations on the canonical operating point, training scenarios include: multi-element generator and line outages, line-rating derates, voltage-bound tightening, and shuffled generator-cost coefficients across cases — designed to force the model to generalize across topology, dispatch regime, and cost structure rather than memorize a single operating point per grid. A per-graph synthetic infeasibility wrapper generates additional adversarially-stressed scenarios in nine labeled modes (heterogeneous load, gen outage, line derate, combined, voltage squeeze, thermal bottleneck, angle tightening, DC-thermal congestion, capacity-aware load spike) so the joint feasibility head can be trained on the boundary between feasible and infeasible operating points.

Validation and testing. Each of the 54 base grids is partitioned into train / validation / test chunks at the scenario level (PyTorch Geometric chunked_processed_20 layout, ~80% / 10% / 10%) before training begins. The validation split is used for early-stopping (the released checkpoint is the epoch with lowest validation cost-MAPE) and for hyperparameter selection; the test split is held out from all training and tuning decisions and is the basis for the published per-grid and aggregate accuracy numbers , test graphs across 54 cases. All quantitative results in this release (median 2.4% cost gap, 92.9% / 94.3% / 92.1% feasibility accuracy across classes, 1.45× warm-start speedup, etc.) are computed on this held-out test set with the same reproducibility seed, single-core CPU pinning where wall-clock matters, and the full eval pipeline checked into the open repository.

4.1.2.A Text training data size: Not applicable

4.1.2.B Text training data content: Not applicable

4.1.2.C Image training data size: Not applicable

4.1.2.D Image training data content: Not applicable

4.1.3.A Audio training data size: Not applicable

4.1.3.B Audio training data content: Not applicable

4.1.4.A Video training data size: Not applicable

4.1.4.B Video training data content: Not applicable

4.1.5.A Other training data size: GridSFM's training data is a single modality: graph-structured numerical scenarios representing power-grid operating conditions. Each scenario carries node features (bus voltage limits and type, generator capacity and cost coefficients, load demand), edge features (line thermal ratings, angle limits, transformer parameters), and AC-OPF target labels (bus voltages and angles, generator dispatch, branch flows, feasibility flag) — all numerical, all on the same graph.

4.1.5.B Other training data content: Not applicable

4.1.6 Latest date of data (acquisition/collection for model training): 15-Jan 2026

4.1.7 Is data collection ongoing to update the model with new data collection after deployment? No

4.1.8 Date the training dataset was first used to train the model: Mar-2026

4.1.9 Rationale or purpose of data selection: The training data combines three publicly-available sources, each chosen to cover a distinct slice of the AC-OPF generalization surface: pglib-OPF transmission test cases (500–4,661 buses), the canonical open benchmark used throughout the power-systems research literature, selected for reproducibility and head-to-head comparability with published methods; OPFData (Google DeepMind, arxiv 2406.07234), selected as the most appropriate publicly-available benchmark for evaluating learning-based AC-OPF under topological perturbations (single-element N-1 outages), a regime not well represented elsewhere; and OpenStreetMap (OSM)-derived US transmission grids covering multi-state planning regions, selected to extend coverage into realistic state-area topologies relevant for capacity-planning and resilience workflows that the prior two sources do not span. Together these allow a single model to train and evaluate across the spectrum of grid sizes, structural classes, and stress conditions encountered in research, planning, and contingency-screening use cases.

4.2 List of data sources

4.2.1 Publicly available datasets

4.2.2 Have you used publicly available datasets to train the model? Yes

4.2.2 Private non-publicly available datasets obtained from third parties

4.2.2.1 Datasets commercially licensed by rightsholders or their representatives

4.2.2.1A Have you concluded transactional commercial licensing agreement(s) with rightsholder(s) or with their representatives? No

4.2.2.2.A Have you obtained private datasets from third parties that are not licensed as described in Section 2.2.1, such as data obtained from providers of private databases, or data intermediaries? No

4.3 Personal Data

4.3.1 Was personal data used to train the model? NO Microsoft follows applicable laws and best practices pertaining to personal data.

4.4 Synthetic data

4.4.1 Was any synthetic AI-generated data used to train the model? No

4.4 Data processing aspects

4.4.1 Respect of reservation of rights from text and data mining exception or limitation

4.4.1A Does this dataset include any data protected by copyright, trademark, or patent? Microsoft follows applicable laws and best practices for processing data protected by copyright, trademark, or patent.

4.4.2 Other information

4.4.2A Does the dataset include information about consumer groups without revealing individual consumer identities? No Microsoft follows applicable laws and best practices for protecting consumer identities.

4.4.2B Was the dataset cleaned or modified before model training? No

5. Contact

Requests for additional information can be directed to MSFTAIActRequest@microsoft.com.

Authorized representative: Microsoft Ireland Operations Limited 70 Sir John Rogerson’s Quay, Dublin 2, D02 R296, Ireland

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train microsoft/GridSFM_Open

Collection including microsoft/GridSFM_Open

Paper for microsoft/GridSFM_Open