Home Back

Deep learning-driven hybrid model for short-term load forecasting and smart grid information management

nature.com 2 days ago

Time series analysis and forecasting

Application of deep learning in power load forecasting

Leveraging data from smart grids and Advanced Metering Infrastructure (AMI)

Overview of our network

This paper proposes a hybrid algorithm that combines GRU, TCN and attention mechanism to address the challenges in short-term power load forecasting and energy information management. The algorithm leverages the advantages of GRU in capturing long-term dependencies in time series data, TCN’s ability to extract temporal features, and the attention mechanism's capability to dynamically adjust the importance of different features, thereby improving the accuracy and reliability of predictions. Figure 1 represents the overall framework diagram of the proposed model.

The overall framework diagram of the proposed model.

figure 1
Figure 1

The overall process of the method implementation is as follows:

  1. 1.

    Data Preparation: Collect historical load data and other relevant features, including weather data and holiday information. Preprocess the data by handling missing values and performing normalization.

  2. 2.

    Feature Engineering: In this stage, consider the specific characteristics of the load forecasting task. Extract and select features from the data based on domain knowledge and experience to capture the relevant aspects of load forecasting. Various methods such as statistical features, frequency domain features, or time domain features can be employed. This targeted feature engineering enhances the model's predictive performance.

  3. 3.

    Model Construction and fusion: By combining GRU, TCN and attention mechanisms, the model can capture features and patterns in load data from different perspectives to reduce the impact of uncertainty on prediction results.

Firstly, build a hybrid structure combining GRU and Temporal TCN, leveraging the strengths of both to improve the model’s ability to capture load data. GRU is utilized to capture long-term dependencies in time series data, while TCN focuses on learning local patterns and features in load data.

Secondly, introduce an attention mechanism to prioritize the input components most relevant to the load forecasting task. By learning weight allocations, the attention mechanism enhances the model's focus and predictive accuracy, enabling it to emphasize the features and time periods with the greatest impact on load variations.

Model training

Split the prepared dataset into training, validation, and testing sets.

Train the hybrid model using the training set by minimizing the disparity between the predicted outputs and the actual load demand. Optimization algorithms such as gradient descent can be utilized to update the model parameters.

- Monitor performance metrics on the validation set during the training process to determine the optimal model parameter settings.

5. Uncertainty modeling: During model training and prediction processes, it is important to consider modeling of uncertainty factors. Uncertainty in load data can be modeled by introducing stochasticity or Monte Carlo methods. This obtains a range of possibilities to assess the reliability of the load forecast results and provide a probability distribution or confidence interval.

6. Forecast error analysis and Model Evaluation: By comparing the error between predicted and actual values, the reliability and accuracy of the model can be assessed. At the same time, analysis of errors can identify potential sources of uncertainty and improve model performance.

Evaluate the trained model using the testing set by computing error metrics such as root mean square error (RMSE) and mean absolute error (MAE) between the predicted results and the actual load demand. For load prediction, input the latest data into the trained model to obtain the prediction results.

7. Load forecasting and energy management: Utilize the trained model to predict future load, including short-term and medium-term forecasts. Adjust and optimize energy supply and demand based on the prediction results, such as adjusting power generation plans and managing energy storage.

8. Experimental validation and application: Validate the method using real-world data, assessing its accuracy, reliability, and efficiency. Apply the method to actual energy management systems, enabling practical load forecasting and energy information management.

Through this process, the hybrid algorithm effectively captures the spatial and temporal characteristics of power load data, leading to high-precision short-term load forecasting.

Gated Recurrent Unit (GRU)

GRU is a type of RNN that is widely used in sequence modeling tasks, including time series analysis and natural language processing. It addresses the vanishing gradient problem commonly encountered in traditional RNNs by incorporating gating mechanisms. Figure 2 represents the diagram of the proposed model.

The schematic diagram of the principle of GRU.

Figure 2

The basic principle of the GRU model involves the use of gating units to control the flow of information within the recurrent units. It consists of two main gates: the update gate \(\left( z \right)\) and the reset gate \(\left( r \right)\). These gates determine how much information from the previous time step should be passed on to the current time step. The update gate \(\left( z \right)\) determines the extent to which the previous hidden state \(\left( {h_{t - 1} } \right)\) should be updated for the current time step. It is computed as a sigmoid function of the concatenation of the current input \(\left( {x_{t} } \right)\) and the previous hidden state \(\left( {h_{t - 1} } \right)\):

$$z_{t} = \sigma \left( {W_{z} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{z} } \right)$$

(1)

where \(W_{z}\) and \(b_{z}\) are the weight matrix and bias vector for the update gate, and \(\sigma\) is the sigmoid activation function.

The reset gate \(\left( r \right)\) controls how much of the previous hidden state \(\left( {h_{t - 1} } \right)\) should be considered in combination with the current input \(\left( {x_{t} } \right)\) to compute the current hidden state candidate \(\left( {\tilde{h}_{t} } \right)\). It is also computed as a sigmoid function using the concatenation of \(h_{t - 1}\) and \(x_{t}\):

$$r_{t} = \sigma \left( {W_{r} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{r} } \right)$$

(2)

where \(W_{r}\) and \(b_{r}\) are the weight matrix and bias vector for the reset gate.

The current hidden state candidate \(\left( {\tilde{h}_{t} } \right)\) is computed by applying the activation function to the concatenation of the reset gate \(\left( r \right)\) and the current input \(\left( {x_{t} } \right)\):

$$\tilde{h}_{t} = {\text{tanh}}\left( {W_{h} \cdot \left[ {r_{t} \cdot h_{t - 1} ,x_{t} } \right] + b_{h} } \right)$$

(3)

where \(W_{h}\) and \(b_{h}\) are the weight matrix and bias vector for the hidden state candidate.

Finally, the current hidden state \(\left( {h_{t} } \right)\) is computed as a linear interpolation between the previous hidden state \(\left( {h_{t - 1} } \right)\) and the hidden state candidate \(\left( {\tilde{h}_{t} } \right)\),based on the update gate \(\left( z \right)\):

$$h_{t} = \left( {1 - z_{t} } \right) \cdot h_{t - 1} + z_{t} \cdot \tilde{h}_{t}$$

(4)

In the proposed method, the GRU model plays a crucial role in capturing the long-term dependencies in the time series data. By processing the sequential input data, the GRU units learn to retain important information from previous time steps and propagate it to the current time step. This ability to capture temporal dependencies is essential for accurate short-term load forecasting. The GRU model is integrated into the hybrid algorithm alongside the TCN and Attention mechanism. The outputs of the GRU units, along with the outputs of the TCN layers, are fed into the Attention mechanism, which adaptively combines the importance of different features. This combination of GRU, TCN, and Attention mechanism enhances the prediction and management performance, improving the accuracy and reliability of load forecasting in the proposed method.

Temporal Convolutional Network (TCN)

TCN is a type of deep learning model that is specifically designed for sequence modeling tasks, such as time series analysis and natural language processing. TCN leverages dilated causal convolutions to capture both local and global dependencies in the input sequence. The basic principle of the TCN model involves the use of dilated causal convolutions, which have receptive fields that grow exponentially with depth. This allows the TCN to capture long-term dependencies in the input sequence, making it suitable for tasks that require modeling temporal relationships. Figure 3 represents the diagram of the proposed model.

The schematic diagram of the principle of TCN.

figure 3
Figure 3

In TCN, the input sequence is processed by a series of convolutional layers. Each convolutional layer applies a set of filters to the input sequence, extracting local patterns and features. The key difference in TCN is the use of dilated convolutions, which introduce gaps (dilation) between the filter elements. By increasing the dilation rate with each layer, TCN can capture dependencies over a larger range of time steps, effectively modeling both short-term and long-term dependencies. To ensure causality in the convolutional layers, padding is applied to the input sequence. This ensures that the filters only have access to past and current time steps, preventing information leakage from future time steps. The dilated convolutions allow TCN to have a large receptive field without increasing the number of parameters, making it computationally efficient.

In the proposed method, TCN plays a significant role in capturing the local and global dependencies in the input time series data. By applying dilated causal convolutions, TCN can capture temporal patterns and relationships at different time scales. This capability is crucial for accurate prediction and analysis of time series data, such as forecasting future load demand. TCN is integrated into the hybrid algorithm alongside other models, such as the GRU and Attention mechanism. The outputs of the TCN layers, along with the outputs of the GRU units, are fed into the Attention mechanism, which dynamically weighs the importance of different features. This integration allows the model to leverage the strengths of both TCN and GRU, capturing both short-term and long-term dependencies in the time series data and enhancing the overall prediction performance. Here's the equation representing the TCN model:

$${\text{y}} = {\text{X}} \cdot {\text{W}} + {\text{b}}$$

(5)

where,\({\text{y}}\) is the output vector of the TCN model. \({\text{X}}\) is the feature representation matrix of the input sequence.\({\text{W}}\) is the weight matrix.\({\text{b}}\) is the bias vector. This formula represents the linear transformation part of the TCN model. The feature representation \({\text{X}}\) of the input sequence is multiplied by the weight matrix \({\text{W}}\) and then added to the bias vector \({\text{b}}\) to obtain the output vector \({\text{y}}\) of the model. In TCN, the specific model architecture and parameter settings will affect the exact form of the formula, such as the convolutional layers, dilation rates, padding methods, etc. The above formula represents the linear transformation part of the TCN model, and the specific non-linear activation functions and connections between layers can be adjusted based on the specific implementation.

TCN provides an effective and efficient solution for sequence modeling tasks by leveraging dilated causal convolutions to capture temporal dependencies and patterns. In the proposed method, TCN complements the GRU model and contributes to improved accuracy and reliability in load forecasting.

Attention mechanism

The Attention mechanism is a method used to enhance a model’s focus on different positions within an input sequence. It plays a crucial role in sequence modeling tasks such as machine translation, speech recognition, text generation and more. Figure 4 represents the diagram of the proposed model.

The schematic diagram of the principle of Attention Mechanism.

Figure 4

The basic principle of the Attention Mechanism is to dynamically allocate weights to different positions in the sequence based on their importance. These weights are then used to compute a weighted sum of the hidden states in the sequence, resulting in a context vector that is further processed by the model. There are three key components: query, key, and value. The query is a vector used to guide the attention process, while the key and value represent the hidden state vectors at different positions in the sequence. By calculating the similarity between the query and the keys, attention weights are obtained for each position. These weights are then used to weight the corresponding values and compute the context vector. The steps involved in the Attention mechanism are as follows:

  • Computing attention weights: By calculating the similarity between the query and the keys, attention weights are obtained for each position. Common methods for computing similarity include dot product attention, additive attention, and more. The attention weights are typically normalized using the softmax function.

  • Weighted sum: The attention weights are used to compute a weighted sum of the values, resulting in a context vector. The context vector represents a weighted average of the hidden states at different positions, with a focus on important positions.

  • Application of the context vector: The generated context vector can be used for further processing in the model, such as being used as input or as part of a prediction.

Here’s the equation representing the Attention mechanism model

$${\text{c}} = \mathop \sum \limits_{i = 1}^{n} \alpha_{i} \cdot {\text{h}}_{i}$$

(6)

where,\({\text{c}}\) is the weighted context vector of the Attention mechanism.\(n\) is the length of the sequence.\(\alpha_{i}\) is the attention weight at position \(i.{\text{h}}_{i}\) is the hidden state vector at position \(i\). This formula represents the weighted sum process in the Attention mechanism. It calculates the product of the attention weight \(\alpha_{i}\) at each position and the corresponding hidden state vector \({\text{h}}_{i}\), and then sums them up to obtain the final weighted context vector \({\text{c}}\). The calculation of the attention weight \(\alpha_{i}\) is typically performed using a softmax function to ensure weight normalization. Different mechanisms, such as dot product attention, additive attention, etc., can be used to compute the attention weights based on specific tasks and model designs. The weighted context vector \({\text{c}}\) from the Attention mechanism dynamically selects and weights.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

People are also reading