β˜€οΈ Suncast β€” Hourly Solar PV Generation Forecasting Model (China Region)

A machine learning model that predicts hourly solar PV power generation (kWh) for any location across mainland China, given latitude, longitude, and a date range.


πŸ“Œ Model Overview

Item Detail
Task Tabular Regression (Solar Irradiance β†’ PV Power)
Algorithm Random Forest Regressor (via PyCaret AutoML)
Target Region Mainland China (UTC+8)
Temporal Resolution 1-hour intervals
Output Unit kWh (1 kW standard PV plant)
Training Period 2024 full year
Training Samples 4,861,296

πŸ“Š Performance

Metric Value
MAE 76.19 W/mΒ²
RMSE 126.96 W/mΒ²
RΒ² 0.748
MAPE 1.49%

Notable observations:

  • βœ… High accuracy during summer months (abundant solar irradiance)
  • ⚠️ Increased error in winter (low irradiance, high meteorological variability)
  • The seasonal structure of the model allows for long-term extensibility

πŸ—‚οΈ Data Sources

Input β€” GFS (Global Forecast System, NOAA)

  • Spatial resolution: 1Β° Γ— 1Β°
  • Temporal resolution: 1 hour
  • Coverage: Lat 19°–53Β° (2Β° step), Lon 74°–134Β° (2Β° step) β†’ 558 grid points
Variable Unit
Surface Pressure Pa
Surface Temperature K
Relative Humidity (2m) %
U-Component of Wind (10m) m/s
V-Component of Wind (10m) m/s
Sunshine Duration s
Low / Mid / High Cloud Cover %
Downward Short-Wave Radiation Flux W/mΒ²

GFS DSWRF is a model-simulated value computed via the RRTMG radiation transfer scheme β€” not a direct satellite measurement.

Target β€” NASA POWER / CERES SYN1deg

  • Source: CERES SYN1deg (Ed4.x), cross-calibrated with Terra/Aqua CERES, MODIS, and GEO satellites
  • Spatial resolution: 1Β° Γ— 1Β° (downsampled to 2Β° Γ— 2Β°)
  • Temporal resolution: 1 hour (linearly interpolated from 3-hour data)
  • Time zone: UTC+8 fixed (unified across all of China)

🧠 Model Training Details

Feature Engineering

  • Spatiotemporal alignment and standardization of GFS input variables
  • Added temporal features: hour_local, month_local, day_of_year, season

Candidate Models Compared

  • Extra Trees Regressor
  • Random Forest Regressor βœ… (selected)
  • LightGBM
  • Gradient Boosting Regressor

Random Forest was selected for its strong resistance to overfitting and balanced performance across all evaluation metrics.

Training Configuration

Setting Value
Train / Test Split 80% / 20%
Cross-Validation k-fold (k=10)
Hyperparameter Tuning Grid Search

⚑ PV Power Conversion

Predicted solar irradiance (W/mΒ²) is converted to power generation (kWh) using pvlib.

Parameter Value
Panel Tilt 25Β°
Panel Azimuth 180Β° (south-facing)
Temperature Coefficient βˆ’0.004 /Β°C
Capacity 1 kW (standard)

Power generation is set to 0 kWh before 06:00 and after 19:00 (local time).


πŸš€ How to Use

1. Install dependencies

pip install huggingface_hub pycaret[full]

2. Download and load the model from Hugging Face Hub

from huggingface_hub import hf_hub_download
from pycaret.regression import load_model, predict_model
import pandas as pd

# Download model from Hugging Face Hub
model_path = hf_hub_download(
    repo_id="ryukkt62/Suncast",
    filename="Suncast_v1.pkl"
)

# Load PyCaret pipeline (strip .pkl extension)
model = load_model(model_path.replace(".pkl", ""))

3. Prepare input features and predict

# Prepare input features
input_data = pd.DataFrame([{
    "sp": 101325,       # Surface Pressure [Pa]
    "t": 300.15,        # Surface Temperature [K]
    "r2": 60.0,         # Relative Humidity [%]
    "u10": 2.0,         # U-Wind [m/s]
    "v10": -1.5,        # V-Wind [m/s]
    "SUNSD": 3200,      # Sunshine Duration [s]
    "lcc": 10.0,        # Low Cloud Cover [%]
    "mcc": 5.0,         # Mid Cloud Cover [%]
    "hcc": 20.0,        # High Cloud Cover [%]
    "sdswrf": 650.0,    # DSWRF [W/mΒ²]
    "hour_local": 12,
    "month_local": 7,
    "day_of_year": 190
}])

# Predict irradiance β†’ PV power
prediction = predict_model(model, data=input_data)
print(prediction["prediction_label"])

Note: The model file is cached locally after the first download (~/.cache/huggingface/hub/), so subsequent calls will not re-download.


πŸ“ Repository Files

File Description
Suncast_v1.pkl Trained PyCaret Random Forest pipeline
config.json Model metadata

⚠️ Limitations

  • Training data is limited to 2024 only (originally planned for 2020–2024; reduced due to GFS server instability and storage constraints)
  • Grid resolution is 2Β° Γ— 2Β° β€” predictions use the nearest grid point to the input coordinates
  • Not applicable outside mainland China grid coverage

image

πŸ“„ License

This model is released under the Apache 2.0 License.

Downloads last month
43
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support