Wind Power Generation


Utilizing renewable energy sources has become a global trend, as countries around the world have rapidly increased their share of renewable energy supplies such as solar, wind, tides and waves, geothermal heat, etc. While renewable sources are clean and environmental-friendly compared to traditional sources, sun and wind energy depend on the weather and can be unstable energy supplies. Therefore, it is important to predict such energy output in a timely and accurate way to ensure optimal supply and demand planning in an electric grid. The goal of this study is to investigate output from wind turbines. Wind energy is generated by the mechanical power of wind on turbines that generate electricity. This makes wind energy very sensitive to weather factors such as wind speed. Machine learning (ML) techniques play a vital role in addressing this problem, where weather records are available. Here we apply SpeedWise® Machine Learning (a commercial AutoML solution) to generate a prediction of wind energy production.

Problem Description, Solution and Machine Learning Results

Problem Description

This dataset has been gathered from a wind turbine over a one year period, at 10-minute intervals. The dataset includes power generation data and weather records. Our goal here is to accurately model the power generation of the turbine based off the measured parameters observed from the weather records. For that we used SpeedWise Machine Learning (SML), which is a powerful AutoML solution that can automatically deal with the idiosyncrasy of real data (the one considered here is a large, incomplete and noisy dataset) while being able to automatically extract patterns, trends and correlations from this data.

What was I looking to accomplish with machine learning?

The idea is to build a model that can automatically identify the relationship between wind power generation and a set of records data (time, wind speed, wind direction, etc.). Therefore, our target in this case will be the wind power generation. The power generation represents continuous values, which can be described by solving a regression problem. Once an optimum machine learning model is identified, we will review and evaluate the metrices related to our regression problem to understand the ability of our model to predict power generation. Our assumption is that the meteorological inputs are given and can be used to generate the prediction.

Unwrapping the Data

The dataset for this study includes a total of 58,415 entries and 13 features or attributes associated with each entry. This means the dataset is a table with more than 880,156 cells. Figure 1 shows the different measurement points from our wind turbine. The meteorological measurements consist of date and time record for each observation. The system records include temperature measurements of different parts, nacelle position, rotor data etc. This information can be used to identify faulty or suboptimally performing equipment. The generation data consists of a record the amount of power generated by the turbine (units – kW). Observations are recorded at 15-minute intervals.

Figure 1: Wind energy diagram.

Data preprocessing is an essential step in building a robust and meaningful machine learning model. SML allows users to cleanse their data efficiently, systematically and quickly. For this dataset, our first step is to convert the time and date parameters into a more useful format. Next, we take care of the missing data using SML’s automatic imputation step. Alternatively, users can remove data by column or by row, which are also operations available in SML. In this problem the system records should be remove since it won’t be available when new prediction is generated. In fact, the system records can be used as a target in a different machine learning problem to predict the failure of a particular part. To get a better understanding of the data, several variables were visualized using the Visualization function in SML.

Figure 2: Visualization of the data in the procesing step

The plots in Figure 2 show that wind speed has a strong correlation with energy generation. It’s also observed that there is no improvement in the power generation when wind speed > 7.5 mi/hr. No clear trend is observed between energy generation and wind direction. The data also shows that the nacelle position follows the wind direction suggesting good performances of the wind tracker. To further help visualize any relationships between the variables, a Pearson correlation heatmap was plotted. The Pearson correlation plot shows a positive correlation between power generation wind speed, direction and time of the day.

Figure 3: Pearson correlation of the data


SpeedWise Machine Learning was able to generate good machine learning models for the problem at hand in a matter of minutes. In this case, the technology found a model (XGBoost) that explains 97% of the variation in wind power generation. Some key metrics from the actual model performance are shown in Figure 4. The Ground Truth vs. Prediction plot is shown to evaluate the accuracy of the model. The gray points are the training model results and the green points are the validation and testing. It is observed that the points are along the unit line implying that the model was able to accurately predict the wind power. Finally, a feature contribution plot was generated to identify the heavy hitters for this model. The variable most relevant to the prediction was the wind speed.

Figure 4: Results of the machine learning model.

To further explore the relationship between the model features and the energy generation, SML generates partial dependency plots. A partial dependence plot can show whether the relationship between the target and a feature is linear, monotonic or more complex. The model shows a positive correlation between DC power and temperature. As we have seen in the feature engineering analysis, DC power plateaus between 10:00 to 17:00. This analysis provides insight of the effect that individual features have on the predicted outcome of a machine learning model.

Figure 5: Partial dependency plots of the model’s features.

When generating a forecast, uncertainty quantification is useful for understanding the risk associated with the predictions. Since our data is subject to bias, which carried into our trained model, our goal is to provide a distribution describing the sensitivity of that prediction to sampling bias in the training data. In Figure 6, the predictions are sorted from low to high (blue line). A grey band is used to describe uncertainty and has an 80% chance of containing the true value (green stars). Steady supply of energy is expected from the grid, energy producers can be penalized with substantial fines by governments in the case of power outages. This model shows that wind energy is highly dependent on environmental factors such as wind speed. Therefore, it is critical for energy suppliers to successfully predict wind energy production in order to maximize profits.

Figure 6: Uncertainty quantification plot

Key Insights

• We built an ML model to predict the wind power generation that explains 97% of the variation in energy generation using a dataset that includes observations such as: wind speed, wind direction temperature, time.

• These results show that accurate predictive models can be very easily obtained, even for relatively large datasets, by applying smart built-in automated capabilities of SML.

• The variable most relevant to the prediction of wind power generation was found to be wind speed.

• An uncertainty quantification analysis was executed to understand the risk associated with the predictions.

How AutoML Workflow Solved this Problem

For this study we went through five basic steps in SML, which are very standardized in this technology:

1. This is achieved with a simple browser-based “drag and drop” exercise of the original dataset file (a .csv file in our case). A necessary step from the user was to define the output variable that we wanted to predict, and to indicate whether this was a classification or regression problem (in this case it was a regression with two classes).

2. Clean and Visualize Data: The original data required attention before the data could be effectively used for building a machine learning model. For instance, we had to change the date format so it can be used in a regression model. We solved these problems using the Datetime Parsing feature that extracts information from a date time string. Data visualization helped evaluate the nature of the data being used in our problem, and some specialized actions could certainly be taken based on that visualization exercise. Nevertheless, SML offers an autopilot option to automatically deal with most of these data processing issues and to generate a well-conditioned dataset, which is very handy for those people that lack a data science background. This process also includes appropriately splitting the data into training, validation and test sets, which SML facilitates in a smart way.

3. Machine Learning Model Building/Optimization: SML allows the user to choose from a variety of machine learning models, or they can try all of them if desired. For each model, a hyperparameter optimization process is also necessary to identify the best possible machine learning configuration. This technology leverages cloud computing to carry out this model building and optimization process in a very efficient manner.

4. Machine Learning Model Evaluation and Uncertainty Quantification: Once the best possible model is identified, a series of quantitative metrics and plots are used to properly evaluate the model (we showed some of those above).

5. Machine Learning Model Deployment: While deployment was not the main objective of this study, it is also possible within SML to generate an API (in Python, MATLAB and/or JavaScript). This would facilitate deployment of the model and automation of predictions on new data as it becomes available.