| Title: | Electricity Load Curves Forecasting at Individual Level |
|---|---|
| Description: | Implements forecasting methods for individual electricity load curves, including Kernel Wavelet Functional (KWF), clustered KWF, Generalized Additive Models (GAM), Multivariate Adaptive Regression Splines (MARS), and Long Short-Term Memory (LSTM) models. Provides normalized dataset adapters for iFlex, StoreNet, Low Carbon London, and REFIT; download and read support for IDEAL and GX; explicit Python backend selection for TensorFlow-based LSTM fits; helpers for daily segmentation and rolling-origin benchmarking; and compact shipped example panels and benchmark-result datasets. |
| Authors: | Frederic Bertrand [cre, aut]
|
| Maintainer: | Frederic Bertrand <[email protected]> |
| License: | GPL-3 |
| Version: | 0.4.1 |
| Built: | 2026-05-22 18:56:56 UTC |
| Source: | https://github.com/fbertran/elcf4r |
elcf4R provides methods and supporting workflows for day-ahead forecasting
of individual electricity load curves. The current package surface includes
Kernel Wavelet Functional models, clustered KWF, GAM, MARS and LSTM
estimators, an explicit helper to configure the Python backend used by the
LSTM path, dataset adapters for iFlex, StoreNet, Low Carbon London and
REFIT, scaffolded download/read support for IDEAL and GX, helpers to build
daily segments, and rolling-origin benchmarking utilities.
Maintainer: Frederic Bertrand [email protected] (ORCID)
Authors:
Fatima Fahs [email protected]
Myriam Maumy-Bertrand [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/fbertran/elcf4R/issues
Assign segments to a fitted KWF clustering model
elcf4r_assign_kwf_clusters(object, segments)elcf4r_assign_kwf_clusters(object, segments)
object |
An |
segments |
Matrix or data frame of daily segments. |
A character vector of cluster labels.
Evaluate the package forecasting methods on a normalized panel using a
deterministic rolling-origin design. The runner supports the current
temperature-aware gam, mars, kwf, kwf_clustered and lstm wrappers
and returns both aggregate scores and, optionally, saved point forecasts.
elcf4r_benchmark( panel, benchmark_index = NULL, methods = NULL, entity_ids = NULL, cohort_size = NULL, train_days = 28L, test_days = 5L, benchmark_name = NULL, dataset = NULL, use_temperature = TRUE, method_args = NULL, include_predictions = TRUE, thermosensitivity_panel = NULL, benchmark_index_carry_cols = NULL, seed = NULL, tz = "UTC" )elcf4r_benchmark( panel, benchmark_index = NULL, methods = NULL, entity_ids = NULL, cohort_size = NULL, train_days = 28L, test_days = 5L, benchmark_name = NULL, dataset = NULL, use_temperature = TRUE, method_args = NULL, include_predictions = TRUE, thermosensitivity_panel = NULL, benchmark_index_carry_cols = NULL, seed = NULL, tz = "UTC" )
panel |
Normalized panel data, typically returned by one of the
|
benchmark_index |
Optional day-level index. If |
methods |
Character vector of method names to evaluate. Supported values
are |
entity_ids |
Optional character vector of entity IDs to benchmark. |
cohort_size |
Optional maximum number of eligible entities to keep after
sorting by |
train_days |
Number of days in each training window. |
test_days |
Number of one-day rolling test origins per entity. |
benchmark_name |
Optional benchmark identifier. If |
dataset |
Optional dataset label overriding |
use_temperature |
Logical; if |
method_args |
Optional named list of per-method argument overrides. |
include_predictions |
Logical; if |
thermosensitivity_panel |
Optional normalized panel used for
thermosensitivity classification. Defaults to |
benchmark_index_carry_cols |
Optional |
seed |
Optional integer seed forwarded to methods that support
user-supplied seeding, such as LSTM, unless overridden in |
tz |
Time zone used to derive dates and within-day positions. |
An object of class elcf4r_benchmark with elements results,
predictions, cohort_index, spec and backend.
id1 <- subset( elcf4r_iflex_example, entity_id == unique(elcf4r_iflex_example$entity_id)[1] ) keep_dates <- sort(unique(id1$date))[1:6] panel_small <- subset(id1, date %in% keep_dates) bench <- elcf4r_benchmark( panel = panel_small, methods = "gam", cohort_size = 1, train_days = 4, test_days = 1, include_predictions = TRUE ) head(bench$results)id1 <- subset( elcf4r_iflex_example, entity_id == unique(elcf4r_iflex_example$entity_id)[1] ) keep_dates <- sort(unique(id1$date))[1:6] panel_small <- subset(id1, date %in% keep_dates) bench <- elcf4r_benchmark( panel = panel_small, methods = "gam", cohort_size = 1, train_days = 4, test_days = 1, include_predictions = TRUE ) head(bench$results)
Create a compact day-level index from a normalized panel. The returned object contains one row per complete entity-day and can be reused to define deterministic benchmark cohorts without shipping the full panel.
elcf4r_build_benchmark_index( data, carry_cols = NULL, id_col = "entity_id", timestamp_col = "timestamp", value_col = "y", temp_col = "temp", resolution_minutes = NULL, complete_days_only = TRUE, drop_na_value = TRUE, tz = "UTC" )elcf4r_build_benchmark_index( data, carry_cols = NULL, id_col = "entity_id", timestamp_col = "timestamp", value_col = "y", temp_col = "temp", resolution_minutes = NULL, complete_days_only = TRUE, drop_na_value = TRUE, tz = "UTC" )
data |
Normalized panel data, typically returned by one of the
|
carry_cols |
Optional character vector of additional day-level columns
to propagate into the benchmark index. If |
id_col |
Name of the entity identifier column. |
timestamp_col |
Name of the timestamp column. |
value_col |
Name of the load column. |
temp_col |
Name of the temperature column. |
resolution_minutes |
Sampling resolution in minutes. If |
complete_days_only |
Passed to |
drop_na_value |
Passed to |
tz |
Time zone used to derive dates and within-day positions. |
A day-level data frame suitable for elcf4r_benchmark().
idx <- elcf4r_build_benchmark_index( elcf4r_iflex_example, carry_cols = c("dataset", "participation_phase", "price_signal") ) head(idx)idx <- elcf4r_build_benchmark_index( elcf4r_iflex_example, carry_cols = c("dataset", "participation_phase", "price_signal") ) head(idx)
Convert a long-format load table into one row per entity-day and one column per within-day time index. This is the matrix representation required by functional load-curve models and rolling benchmark scripts.
elcf4r_build_daily_segments( data, id_col = "entity_id", timestamp_col = "timestamp", value_col = "y", temp_col = "temp", carry_cols = NULL, expected_points_per_day = NULL, resolution_minutes = NULL, complete_days_only = TRUE, drop_na_value = TRUE, tz = "UTC" )elcf4r_build_daily_segments( data, id_col = "entity_id", timestamp_col = "timestamp", value_col = "y", temp_col = "temp", carry_cols = NULL, expected_points_per_day = NULL, resolution_minutes = NULL, complete_days_only = TRUE, drop_na_value = TRUE, tz = "UTC" )
data |
Data frame containing at least entity id, timestamp and load. |
id_col |
Name of the entity identifier column. |
timestamp_col |
Name of the timestamp column. |
value_col |
Name of the load column. |
temp_col |
Optional name of a temperature column used to derive day summaries. |
carry_cols |
Optional day-level columns to propagate into the returned covariate table. Their first non-missing value within each day is kept. |
expected_points_per_day |
Expected number of samples per day. If |
resolution_minutes |
Sampling resolution in minutes. If |
complete_days_only |
If |
drop_na_value |
If |
tz |
Time zone used to derive dates and within-day positions. |
A list with components segments, covariates, resolution_minutes
and points_per_day.
id1 <- subset( elcf4r_iflex_example, entity_id == unique(elcf4r_iflex_example$entity_id)[1] ) daily <- elcf4r_build_daily_segments(id1, carry_cols = "participation_phase") dim(daily$segments) names(daily$covariates)id1 <- subset( elcf4r_iflex_example, entity_id == unique(elcf4r_iflex_example$entity_id)[1] ) daily <- elcf4r_build_daily_segments(id1, carry_cols = "participation_phase") dim(daily$segments) names(daily$covariates)
Build the deterministic day groups used by the residential KWF workflow:
weekdays, pre_holiday, and holiday.
elcf4r_calendar_groups(dates, holidays = NULL)elcf4r_calendar_groups(dates, holidays = NULL)
dates |
Vector coercible to |
holidays |
Optional vector of holiday dates. If supplied, holiday dates
are labelled |
An ordered factor with levels monday, tuesday, wednesday,
thursday, friday, saturday, sunday, pre_holiday, holiday.
elcf4r_calendar_groups( as.Date(c("2024-12-24", "2024-12-25", "2024-12-26")), holidays = as.Date("2024-12-25") )elcf4r_calendar_groups( as.Date(c("2024-12-24", "2024-12-25", "2024-12-26")), holidays = as.Date("2024-12-25") )
Estimate thermosensitivity using the residential rule based on the ratio between mean winter load and mean summer load.
elcf4r_classify_thermosensitivity( data, id_col = "entity_id", date_col = "date", value_col = "y", threshold = 1.5, winter_months = c(12L, 1L, 2L), summer_months = c(6L, 7L, 8L) )elcf4r_classify_thermosensitivity( data, id_col = "entity_id", date_col = "date", value_col = "y", threshold = 1.5, winter_months = c(12L, 1L, 2L), summer_months = c(6L, 7L, 8L) )
data |
Data frame containing at least an identifier, a date and a load column. Long-format panels are accepted and are aggregated to mean daily load before classification. |
id_col |
Name of the entity identifier column. |
date_col |
Name of the date column. |
value_col |
Name of the load column. |
threshold |
Ratio threshold above which the series is classified as
thermosensitive. Defaults to |
winter_months |
Integer vector of winter months. |
summer_months |
Integer vector of summer months. |
A data frame with one row per entity and columns winter_mean,
summer_mean, ratio, thermosensitive, and status.
example_ts <- data.frame( entity_id = rep("home_1", 4), date = as.Date(c("2024-01-10", "2024-01-11", "2024-07-10", "2024-07-11")), y = c(12, 11, 6, 5) ) elcf4r_classify_thermosensitivity(example_ts)example_ts <- data.frame( entity_id = rep("home_1", 4), date = as.Date(c("2024-01-10", "2024-01-11", "2024-07-10", "2024-07-11")), y = c(12, 11, 6, 5) ) elcf4r_classify_thermosensitivity(example_ts)
This function downloads the original ELMAS archive from its public figshare URL and unpacks it to a local directory.
elcf4r_download_elmas(dest_dir)elcf4r_download_elmas(dest_dir)
dest_dir |
Directory where the files should be unpacked. |
A character vector with the paths of the extracted files.
Download selected assets from the official GX figshare dataset record. The helper only uses the dataset record itself and does not rely on the authors' code repository.
elcf4r_download_gx(dest_dir, components = "shapefile", overwrite = FALSE)elcf4r_download_gx(dest_dir, components = "shapefile", overwrite = FALSE)
dest_dir |
Directory where the downloaded files should be stored. |
components |
Character vector of GX components to fetch. Supported
values are |
overwrite |
Logical; if |
A character vector with the downloaded local file paths. Zip assets
are extracted into dest_dir and the extracted paths are returned.
Download selected assets from the IDEAL Household Energy Dataset record on
Edinburgh DataShare. The helper is docs-first: it always retrieves the
licence/readme files and documentation.zip, while heavy raw-data archives
must be requested explicitly through components.
elcf4r_download_ideal( dest_dir, components = "documentation", overwrite = FALSE )elcf4r_download_ideal( dest_dir, components = "documentation", overwrite = FALSE )
dest_dir |
Directory where the downloaded files should be stored. |
components |
Character vector of IDEAL components to fetch. Supported
values are |
overwrite |
Logical; if |
A character vector with the downloaded local file paths.
Download one or more StoreNet household files such as H6_W.csv into a
local directory. The helper uses the figshare article API to resolve the
actual file download URL when household-level article IDs are available.
Otherwise it falls back to the public StoreNet archive and extracts the
requested household files into dest_dir.
elcf4r_download_storenet( dest_dir, ids = "H6_W", article_ids = NULL, overwrite = FALSE, archive_url = "https://figshare.com/ndownloader/files/45123456" )elcf4r_download_storenet( dest_dir, ids = "H6_W", article_ids = NULL, overwrite = FALSE, archive_url = "https://figshare.com/ndownloader/files/45123456" )
dest_dir |
Directory where the downloaded files should be stored. |
ids |
Character vector of StoreNet household identifiers, for example
|
article_ids |
Optional named integer vector that maps each requested
household identifier to a figshare article ID. When |
overwrite |
Logical; if |
archive_url |
Optional figshare archive download URL used when a requested identifier is not present in the article-ID mapping. |
The default mapping currently covers the H6_W household file used by the
package examples. Additional households can be downloaded either by
providing a named article_ids vector or by relying on the public archive
fallback.
A character vector with the downloaded local file paths.
A compact subset of the public ELMAS dataset containing hourly load profiles for 3 commercial or industrial load clusters over 70 days. The object is intended for lightweight examples and tests that demonstrate time-series or segment-based workflows without shipping the full source archive.
A tibble with 5,040 rows and 3 variables:
Hourly timestamp.
Cluster identifier, one of 3 retained ELMAS clusters.
Cluster load in MWh.
Public ELMAS dataset, reduced with package data-raw scripts for
examples and tests.
Fit a GAM model for load curves
elcf4r_fit_gam(data, use_temperature = FALSE)elcf4r_fit_gam(data, use_temperature = FALSE)
data |
Data frame with columns |
use_temperature |
Logical. If |
An object of class elcf4r_model with method = "gam".
id1 <- subset( elcf4r_iflex_example, entity_id == unique(elcf4r_iflex_example$entity_id)[1] ) train_data <- subset(id1, date < sort(unique(id1$date))[11]) test_data <- subset(id1, date == sort(unique(id1$date))[11]) fit <- elcf4r_fit_gam(train_data[, c("y", "time_index", "dow", "month", "temp")], TRUE) pred <- predict(fit, newdata = test_data[, c("y", "time_index", "dow", "month", "temp")]) length(pred)id1 <- subset( elcf4r_iflex_example, entity_id == unique(elcf4r_iflex_example$entity_id)[1] ) train_data <- subset(id1, date < sort(unique(id1$date))[11]) test_data <- subset(id1, date == sort(unique(id1$date))[11]) fit <- elcf4r_fit_gam(train_data[, c("y", "time_index", "dow", "month", "temp")], TRUE) pred <- predict(fit, newdata = test_data[, c("y", "time_index", "dow", "month", "temp")]) length(pred)
Fit a day-ahead Kernel Wavelet Functional (KWF) model on ordered daily load curves. The implementation computes wavelet-detail distances on the historical context days, applies Gaussian kernel weights, restricts those weights to matching calendar groups when available, and can apply the approximation/detail correction used for mean-level non-stationarity.
elcf4r_fit_kwf( segments, covariates = NULL, target_covariates = NULL, use_temperature = FALSE, wavelet = "la12", bandwidth = NULL, use_mean_correction = TRUE, group_col = NULL, holidays = NULL, weights = NULL, recency_decay = NULL, temperature_bandwidth = NULL )elcf4r_fit_kwf( segments, covariates = NULL, target_covariates = NULL, use_temperature = FALSE, wavelet = "la12", bandwidth = NULL, use_mean_correction = TRUE, group_col = NULL, holidays = NULL, weights = NULL, recency_decay = NULL, temperature_bandwidth = NULL )
segments |
Matrix or data frame of past daily load curves (rows are days, columns are within-day time points) in chronological order. |
covariates |
Optional data frame with one row per training segment.
When present, the function looks for deterministic grouping information in
|
target_covariates |
Optional one-row data frame describing the day to
forecast. When it contains |
use_temperature |
Deprecated and ignored. Kept for backward compatibility with earlier package examples. |
wavelet |
Wavelet filter name passed to |
bandwidth |
Optional positive bandwidth for the Gaussian kernel on
wavelet distances. If |
use_mean_correction |
Logical; if |
group_col |
Optional column name containing precomputed KWF groups in
|
holidays |
Optional vector of holiday dates used by
|
weights |
Optional numeric prior weights of length |
recency_decay |
Optional non-negative recency coefficient applied as an exponential prior on the historical context days. |
temperature_bandwidth |
Deprecated and ignored. Kept only for backward compatibility with older examples. |
An object of class elcf4r_model with method = "kwf".
id1 <- subset( elcf4r_iflex_example, entity_id == unique(elcf4r_iflex_example$entity_id)[1] ) daily <- elcf4r_build_daily_segments(id1, carry_cols = "participation_phase") fit <- elcf4r_fit_kwf( segments = daily$segments[1:10, ], covariates = daily$covariates[1:10, ], target_covariates = daily$covariates[11, , drop = FALSE] ) length(predict(fit))id1 <- subset( elcf4r_iflex_example, entity_id == unique(elcf4r_iflex_example$entity_id)[1] ) daily <- elcf4r_build_daily_segments(id1, carry_cols = "participation_phase") fit <- elcf4r_fit_kwf( segments = daily$segments[1:10, ], covariates = daily$covariates[1:10, ], target_covariates = daily$covariates[11, , drop = FALSE] ) length(predict(fit))
Cluster dyadically resampled daily curves in a wavelet-energy feature space and use the resulting cluster labels as the grouping structure inside the KWF forecast.
elcf4r_fit_kwf_clustered( segments, covariates = NULL, target_covariates = NULL, wavelet = "la12", bandwidth = NULL, use_mean_correction = TRUE, max_clusters = 10L, nstart = 30L, cluster_seed = NULL, weights = NULL, recency_decay = NULL, clustering = NULL )elcf4r_fit_kwf_clustered( segments, covariates = NULL, target_covariates = NULL, wavelet = "la12", bandwidth = NULL, use_mean_correction = TRUE, max_clusters = 10L, nstart = 30L, cluster_seed = NULL, weights = NULL, recency_decay = NULL, clustering = NULL )
segments |
Matrix or data frame of past daily load curves in chronological order. |
covariates |
Optional data frame with one row per segment. |
target_covariates |
Optional one-row data frame for the target day. |
wavelet |
Wavelet filter name passed to |
bandwidth |
Optional positive bandwidth for the Gaussian kernel in the underlying KWF fit. |
use_mean_correction |
Logical; if |
max_clusters |
Maximum number of candidate clusters considered by the Sugar jump heuristic. |
nstart |
Number of random starts for |
cluster_seed |
Deprecated and ignored. Clustered KWF now uses deterministic non-random starts. |
weights |
Optional prior weights passed through to the base KWF fit. |
recency_decay |
Optional recency prior passed through to the base KWF fit. |
clustering |
Optional |
An object of class elcf4r_model with method = "kwf_clustered".
The LSTM implementation uses one or more previous daily curves to predict the
next daily curve. When use_temperature = TRUE and temp_mean is available
in covariates, the daily mean temperature is added as a second input
feature repeated across the within-day time steps.
elcf4r_fit_lstm( segments, covariates = NULL, use_temperature = FALSE, lookback_days = 1L, units = 16L, epochs = 10L, batch_size = 8L, validation_split = 0, seed = NULL, verbose = 0L )elcf4r_fit_lstm( segments, covariates = NULL, use_temperature = FALSE, lookback_days = 1L, units = 16L, epochs = 10L, batch_size = 8L, validation_split = 0, seed = NULL, verbose = 0L )
segments |
Matrix or data frame of past daily load curves (rows are days, columns are time points). |
covariates |
Optional data frame with one row per training day. |
use_temperature |
Logical. If |
lookback_days |
Number of past daily curves used as one training input. |
units |
Number of hidden units in the LSTM layer. |
epochs |
Number of training epochs. |
batch_size |
Batch size used in |
validation_split |
Validation split passed to |
seed |
Optional integer seed passed to TensorFlow. When |
verbose |
Verbosity level passed to |
An object of class elcf4r_model with method = "lstm".
if (interactive() && requireNamespace("reticulate", quietly = TRUE) && reticulate::virtualenv_exists("r-tensorflow")) { elcf4r_use_tensorflow_env(virtualenv = "r-tensorflow") if (isTRUE(getFromNamespace(".elcf4r_lstm_backend_available", "elcf4R")())) { id1 <- subset( elcf4r_iflex_example, entity_id == unique(elcf4r_iflex_example$entity_id)[1] ) daily <- elcf4r_build_daily_segments(id1) fit <- elcf4r_fit_lstm( segments = daily$segments[1:10, ], covariates = daily$covariates[1:10, ], use_temperature = TRUE, epochs = 1, units = 4, batch_size = 2, verbose = 0 ) length(predict(fit)) } }if (interactive() && requireNamespace("reticulate", quietly = TRUE) && reticulate::virtualenv_exists("r-tensorflow")) { elcf4r_use_tensorflow_env(virtualenv = "r-tensorflow") if (isTRUE(getFromNamespace(".elcf4r_lstm_backend_available", "elcf4R")())) { id1 <- subset( elcf4r_iflex_example, entity_id == unique(elcf4r_iflex_example$entity_id)[1] ) daily <- elcf4r_build_daily_segments(id1) fit <- elcf4r_fit_lstm( segments = daily$segments[1:10, ], covariates = daily$covariates[1:10, ], use_temperature = TRUE, epochs = 1, units = 4, batch_size = 2, verbose = 0 ) length(predict(fit)) } }
Fit a MARS model for load curves
elcf4r_fit_mars(data, use_temperature = FALSE)elcf4r_fit_mars(data, use_temperature = FALSE)
data |
Data frame with columns |
use_temperature |
Logical. If |
An elcf4r_model object with method = "mars".
id1 <- subset( elcf4r_iflex_example, entity_id == unique(elcf4r_iflex_example$entity_id)[1] ) train_data <- subset(id1, date < sort(unique(id1$date))[11]) test_data <- subset(id1, date == sort(unique(id1$date))[11]) fit <- elcf4r_fit_mars(train_data[, c("y", "time_index", "dow", "month", "temp")], TRUE) pred <- predict(fit, newdata = test_data[, c("y", "time_index", "dow", "month", "temp")]) length(pred)id1 <- subset( elcf4r_iflex_example, entity_id == unique(elcf4r_iflex_example$entity_id)[1] ) train_data <- subset(id1, date < sort(unique(id1$date))[11]) test_data <- subset(id1, date == sort(unique(id1$date))[11]) fit <- elcf4r_fit_mars(train_data[, c("y", "time_index", "dow", "month", "temp")], TRUE) pred <- predict(fit, newdata = test_data[, c("y", "time_index", "dow", "month", "temp")]) length(pred)
A compact index of complete days derived from the public iFlex hourly panel. Each row represents one participant-day with enough metadata to define deterministic benchmark cohorts without shipping the full raw panel.
A data frame with 563,150 rows and 11 variables:
Unique key built as entity_id__date.
Participant identifier.
Calendar date.
Day of week.
Month as a two-digit factor.
Mean daily outdoor temperature in degrees Celsius.
Minimum daily outdoor temperature in degrees Celsius.
Maximum daily outdoor temperature in degrees Celsius.
Experiment phase from the source dataset.
Experimental price-signal label, when available.
Number of hourly samples retained for the day.
Public iFlex raw file data_hourly.csv, reduced with
data-raw/elcf4r_iflex_subsets.R.
Saved benchmark results for a deterministic rolling-origin evaluation on a
subset of the iFlex data. The shipped results use a fixed participant cohort,
a 28-day training window and multiple one-day rolling test forecasts per
participant. The current shipped benchmark includes the operational gam,
mars, kwf, kwf_clustered and lstm wrappers.
A data frame with 20 variables:
Identifier of the benchmark design.
Dataset label, always "iflex".
Participant identifier.
Forecasting method: gam, mars, kwf,
kwf_clustered or lstm.
Date of the forecast target day.
First day in the training window.
Last day in the training window.
Number of training days.
Number of hourly points in the target day.
Logical flag for temperature-aware fitting.
Thermosensitivity flag when seasonal coverage is
sufficient, otherwise NA.
Status of the winter/summer ratio classification step.
Estimated winter/summer mean-load ratio when available.
Elapsed fit-and-predict time in seconds.
Benchmark execution status.
Error message when a fit failed.
Normalized mean absolute error.
Normalized root mean squared error.
Symmetric mean absolute percentage error.
Mean absolute scaled error.
Derived from elcf4r_iflex_benchmark_index and the public iFlex raw
file with data-raw/elcf4r_iflex_benchmark_results.R.
A compact hourly electricity-demand panel extracted from the public iFlex dataset. The object contains 14 complete days for each of 3 participants and is intended for examples, tests and lightweight vignettes.
A data frame with 1,008 rows and 16 variables:
Dataset label, always "iflex".
Participant identifier.
Hourly UTC timestamp.
Calendar date of the observation.
Within-day hourly index from 1 to 24.
Hourly electricity demand in kWh.
Outdoor temperature in degrees Celsius.
Day of week.
Month as a two-digit factor.
Sampling resolution in minutes.
Experiment phase from the source dataset.
Experimental price-signal label, when available.
Experimental electricity price in NOK per kWh.
Lagged 24-hour temperature feature from the source file.
Lagged 48-hour temperature feature from the source file.
Lagged 72-hour temperature feature from the source file.
Public iFlex raw file data_hourly.csv, reduced with
data-raw/elcf4r_iflex_subsets.R.
Build a reusable clustering model for daily load-curve segments in the wavelet-energy feature space used by the clustered KWF workflow.
elcf4r_kwf_cluster_days( segments, wavelet = "la12", max_clusters = 10L, nstart = 30L, cluster_seed = NULL )elcf4r_kwf_cluster_days( segments, wavelet = "la12", max_clusters = 10L, nstart = 30L, cluster_seed = NULL )
segments |
Matrix or data frame of daily load curves in chronological order. |
wavelet |
Wavelet filter name passed to |
max_clusters |
Maximum number of candidate clusters considered by the Sugar jump heuristic. |
nstart |
Number of random starts for |
cluster_seed |
Deprecated and ignored. Clustering now uses deterministic non-random starts. |
An object of class elcf4r_kwf_clusters.
Saved rolling-origin benchmark results for the shipped methods on a fixed Low Carbon London cohort of households. The benchmark is based on 30-minute load curves and reports NMAE, NRMSE, sMAPE and MASE.
A data frame with the same benchmark-result schema as
elcf4r_iflex_benchmark_results.
Derived from the local LCL raw file with
data-raw/elcf4r_lcl_artifacts.R.
A compact normalized panel extracted from a small group of households in the Low Carbon London dataset. The object contains complete 30-minute days and is intended for examples and lightweight benchmarking workflows.
A data frame with normalized panel fields:
Dataset label, always "lcl".
Low Carbon London household identifier.
Common normalized panel fields.
Public LCL raw file LCL_2013.csv, reduced with
data-raw/elcf4r_lcl_artifacts.R.
Compute NMAE, NRMSE, sMAPE and MASE between observed and predicted load curves, as in the posters.
elcf4r_metrics(truth, pred, seasonal_period = NULL, naive_pred = NULL)elcf4r_metrics(truth, pred, seasonal_period = NULL, naive_pred = NULL)
truth |
Numeric vector or matrix of observed values. |
pred |
Numeric vector or matrix of predicted values, same shape. |
seasonal_period |
Seasonal period for the naive benchmark in the MASE denominator (for daily curves with half hourly sampling, a value of 48 is appropriate). |
naive_pred |
Optional numeric vector or matrix of naive benchmark
predictions with the same shape as |
A named list with elements nmae, nrmse, smape, mase.
Convert a raw long-format load table into a normalized panel that uses the column names expected by the package examples and model wrappers.
elcf4r_normalize_panel( data, id_col, timestamp_col, load_col, temp_col = NULL, dataset = NA_character_, resolution_minutes = NULL, tz = "UTC", keep_cols = NULL )elcf4r_normalize_panel( data, id_col, timestamp_col, load_col, temp_col = NULL, dataset = NA_character_, resolution_minutes = NULL, tz = "UTC", keep_cols = NULL )
data |
Data frame containing at least an entity identifier, a time stamp and a load column. |
id_col |
Name of the entity identifier column. |
timestamp_col |
Name of the timestamp column. |
load_col |
Name of the load column. |
temp_col |
Optional name of the temperature column. |
dataset |
Short dataset label stored in the normalized output. |
resolution_minutes |
Sampling resolution in minutes. If |
tz |
Time zone used to parse timestamps. |
keep_cols |
Optional character vector of extra source columns to keep. |
A data frame with normalized columns dataset, entity_id,
timestamp, date, time_index, y, temp, dow, month and
resolution_minutes, plus any requested keep_cols.
Read the GX dataset from either the official SQLite database or a flat export and return a normalized long-format panel. GX is treated as a transformer/community-level dataset rather than an individual-household dataset.
elcf4r_read_gx( path = "data-raw", ids = NULL, start = NULL, end = NULL, tz = "Asia/Shanghai", n_max = NULL, drop_na_load = TRUE )elcf4r_read_gx( path = "data-raw", ids = NULL, start = NULL, end = NULL, tz = "Asia/Shanghai", n_max = NULL, drop_na_load = TRUE )
path |
Path to a GX SQLite database, a flat export file, or a directory containing one of them. |
ids |
Optional vector of GX community/profile identifiers to keep. |
start |
Optional inclusive lower time bound. |
end |
Optional inclusive upper time bound. |
tz |
Time zone used to parse timestamps. Defaults to |
n_max |
Optional maximum number of rows to read. |
drop_na_load |
Logical; if |
A normalized data frame with GX transformer-level data.
Read a direct IDEAL hourly aggregate-electricity file or search an extracted
auxiliarydata.zip directory for a matching hourly summary file, then return
a normalized long-format panel.
elcf4r_read_ideal( path = "data-raw", ids = NULL, start = NULL, end = NULL, tz = "Europe/London", n_max = NULL, source = "auxiliary_hourly", drop_na_load = TRUE )elcf4r_read_ideal( path = "data-raw", ids = NULL, start = NULL, end = NULL, tz = "Europe/London", n_max = NULL, source = "auxiliary_hourly", drop_na_load = TRUE )
path |
Path to an IDEAL hourly summary file or to an extracted IDEAL auxiliary-data directory. |
ids |
Optional vector of IDEAL household identifiers to keep. |
start |
Optional inclusive lower time bound. |
end |
Optional inclusive upper time bound. |
tz |
Time zone used to parse timestamps. Defaults to
|
n_max |
Optional maximum number of rows to read. |
source |
IDEAL source flavor. Currently only |
drop_na_load |
Logical; if |
A normalized data frame with IDEAL household data.
Read the iFlex hourly consumption table and return a normalized long-format panel ready for feature engineering, segmentation and benchmarking.
elcf4r_read_iflex( path = file.path("data-raw", "iFlex"), ids = NULL, start = NULL, end = NULL, tz = "UTC", n_max = NULL )elcf4r_read_iflex( path = file.path("data-raw", "iFlex"), ids = NULL, start = NULL, end = NULL, tz = "UTC", n_max = NULL )
path |
Path to |
ids |
Optional vector of participant identifiers to keep. |
start |
Optional inclusive lower time bound. |
end |
Optional inclusive upper time bound. |
tz |
Time zone used to parse timestamps. Defaults to |
n_max |
Optional maximum number of rows to read. Intended for quick prototyping on a small subset of the raw file. |
A normalized data frame with load, temperature and calendar fields.
The output also keeps participation_phase, price_signal,
price_nok_kwh, temp24, temp48 and temp72.
Read a wide Low Carbon London (LCL) smart-meter file and reshape it into a normalized long-format panel with one row per household timestamp.
elcf4r_read_lcl( path = file.path("data-raw", "LCL_2013.csv"), ids = NULL, start = NULL, end = NULL, tz = "UTC", n_max = NULL, drop_na_load = TRUE )elcf4r_read_lcl( path = file.path("data-raw", "LCL_2013.csv"), ids = NULL, start = NULL, end = NULL, tz = "UTC", n_max = NULL, drop_na_load = TRUE )
path |
Path to an LCL CSV file or to a directory containing one. |
ids |
Optional vector of LCL household identifiers to keep, for example
|
start |
Optional inclusive lower time bound. |
end |
Optional inclusive upper time bound. |
tz |
Time zone used to parse timestamps. |
n_max |
Optional maximum number of timestamp rows to read. |
drop_na_load |
Logical; if |
A normalized data frame with LCL household data.
Read one or more CLEAN_House*.csv files from the REFIT dataset, optionally
select appliance channels, resample them to a regular time grid, and return a
normalized long-format panel.
elcf4r_read_refit( path = "data-raw", house_ids = NULL, channels = "Aggregate", start = NULL, end = NULL, tz = "UTC", resolution_minutes = 1L, agg_fun = c("mean", "sum", "last"), n_max = NULL, drop_na_load = TRUE )elcf4r_read_refit( path = "data-raw", house_ids = NULL, channels = "Aggregate", start = NULL, end = NULL, tz = "UTC", resolution_minutes = 1L, agg_fun = c("mean", "sum", "last"), n_max = NULL, drop_na_load = TRUE )
path |
Path to a REFIT file or to a directory containing
|
house_ids |
Optional vector of house identifiers to keep. These are
matched against file stems such as |
channels |
Character vector of load channels to extract. Defaults to
|
start |
Optional inclusive lower time bound. |
end |
Optional inclusive upper time bound. |
tz |
Time zone used to parse timestamps. |
resolution_minutes |
Target regular resolution in minutes for the
normalized output. Defaults to |
agg_fun |
Aggregation used when resampling to the target grid. One of
|
n_max |
Optional maximum number of raw rows to read per file. |
drop_na_load |
Logical; if |
A normalized data frame with REFIT household data.
Read one or more StoreNet-style household CSV files such as H6_W.csv,
derive the household identifier from the file name, and return a normalized
long-format panel.
elcf4r_read_storenet( path = file.path("data-raw", "H6_W.csv"), ids = NULL, start = NULL, end = NULL, tz = "UTC", n_max = NULL, load_col = "Consumption(W)", keep_cols = c("Discharge(W)", "Charge(W)", "Production(W)", "State of Charge(%)") )elcf4r_read_storenet( path = file.path("data-raw", "H6_W.csv"), ids = NULL, start = NULL, end = NULL, tz = "UTC", n_max = NULL, load_col = "Consumption(W)", keep_cols = c("Discharge(W)", "Charge(W)", "Production(W)", "State of Charge(%)") )
path |
Path to a StoreNet CSV file or to a directory containing files
named like |
ids |
Optional vector of household identifiers to keep. Identifiers are
matched against the file stem, for example |
start |
Optional inclusive lower time bound. |
end |
Optional inclusive upper time bound. |
tz |
Time zone used to parse timestamps. |
n_max |
Optional maximum number of rows to read per file. |
load_col |
Name of the load column to normalize. Defaults to
|
keep_cols |
Optional extra source columns to keep. Defaults to the main battery and production fields when present. |
A normalized data frame with StoreNet household data.
Saved rolling-origin benchmark results for the shipped methods on the REFIT example cohort after resampling to 15-minute resolution. The benchmark reports NMAE, NRMSE, sMAPE and MASE.
A data frame with the same benchmark-result schema as
elcf4r_iflex_benchmark_results.
Derived from the local REFIT raw files with
data-raw/elcf4r_refit_artifacts.R.
A compact normalized panel extracted from the REFIT cleaned dataset after resampling to 15-minute resolution. The object contains complete days for one house and is intended for examples and lightweight benchmarking workflows.
A data frame with normalized panel columns plus REFIT-specific fields:
Dataset label, always "refit".
Entity identifier, here the aggregate household channel.
Common normalized panel fields.
REFIT house identifier derived from the file name.
Load channel name, for example "Aggregate".
Minimum Unix timestamp within the resampling bucket.
Maximum issues flag within the resampling bucket.
Public REFIT cleaned raw files, reduced with
data-raw/elcf4r_refit_artifacts.R.
Saved rolling-origin benchmark results for the shipped methods on the local StoreNet household example. The benchmark is derived from complete 1-minute household days and reports NMAE, NRMSE, sMAPE and MASE for every shipped row. The clustered KWF variant is only included when the shipped StoreNet cohort is classified as thermosensitive.
A data frame with the same benchmark-result schema as
elcf4r_iflex_benchmark_results.
Derived from the local StoreNet raw file with
data-raw/elcf4r_storenet_artifacts.R.
A compact normalized panel extracted from the local StoreNet household file
H6_W.csv. The object contains a small set of complete 1-minute household
days and is intended for examples and lightweight benchmarking workflows.
A data frame with normalized panel columns plus StoreNet-specific fields:
Dataset label, always "storenet".
Household identifier derived from the file name.
Common normalized panel fields.
Battery and production fields from the source file in watts.
Battery state of charge in percent.
Source CSV file name.
Public StoreNet raw file H6_W.csv, reduced with
data-raw/elcf4r_storenet_artifacts.R.
This helper provides an explicit, user-invoked way to bind the Python
environment used by reticulate before calling elcf4r_fit_lstm().
elcf4r_use_tensorflow_env(python = NULL, virtualenv = NULL, required = TRUE)elcf4r_use_tensorflow_env(python = NULL, virtualenv = NULL, required = TRUE)
python |
Optional path to a Python interpreter passed to
|
virtualenv |
Optional virtualenv name or path passed to
|
required |
Logical passed to the corresponding |
Invisibly returns the selected Python interpreter path when it can be determined.
if (interactive() && requireNamespace("reticulate", quietly = TRUE) && reticulate::virtualenv_exists("r-tensorflow")) { elcf4r_use_tensorflow_env(virtualenv = "r-tensorflow") }if (interactive() && requireNamespace("reticulate", quietly = TRUE) && reticulate::virtualenv_exists("r-tensorflow")) { elcf4r_use_tensorflow_env(virtualenv = "r-tensorflow") }
Assign new segments to a fitted KWF clustering model
## S3 method for class 'elcf4r_kwf_clusters' predict(object, segments, ...)## S3 method for class 'elcf4r_kwf_clusters' predict(object, segments, ...)
object |
An |
segments |
Matrix or data frame of new daily segments. |
... |
Unused, present for method compatibility. |
A character vector of cluster labels.
elcf4r_model
Predict from an elcf4r_model
## S3 method for class 'elcf4r_model' predict(object, newdata = NULL, ...)## S3 method for class 'elcf4r_model' predict(object, newdata = NULL, ...)
object |
An |
newdata |
Optional new data for methods that need it. For |
... |
Unused, present for method compatibility. |
Numeric predictions. For KWF and LSTM this is a forecast daily curve.