How to Build a Profitable Dataset

The foregoing highlights the properties of publicly available irradiance datasets that can be used to build a dataset for analyzing the performance of solar power facilities. In the next two subsections, we will describe how this information can be combined to create a dataset that enables financial analysis of solar projects. Ideally, the greatest benefits can be obtained from datasets derived from high-quality, site-specific solar monitoring stations that require well-maintained stations that have been monitoring for 30 years or more. However, the number of datasets in such time periods is very small, and the observation points with large-scale data and the location of large-scale solar installations are also different. Fortunately, multiple solar datasets and methods are still available to characterize solar resources and to describe hourly, monthly, and yearly incident energy in detail.

While there are solar radiation datasets all over the world, this article uses the US solar radiation dataset as an example. Suppose someone wants to build a large-scale solar project in the deserts of California. He must consider many factors, such as how the electricity will be sent out from the project site, whether there is enough land for building the power plant and expanding the equipment, whether the project site has sufficient of solar energy. The solar map on the official NREL website can identify areas with sufficient solar energy resources. Once a project location has been selected and the relevant land tenure rights obtained, the solar resource must be fully characterized in order to predict production capacity, optimally design the solar power plant, and obtain the funding required for the project. In the United States, publicly available solar resource data is archived in the NCDC’s NSRDB database. The National Renewable Energy Laboratory has also established an online NSRDB data ( The database contains data from 239 observation points in the United States from 1961 to 1990, and data from 1,454 observation points in the United States from 1991 to 2010. The 1961-1990 data of 239 observation points can be represented by the relevant observation points in the database from 1991-2010.

  1. Objectives of monetizable datasets

The goal of creating a profitable dataset is to combine the disparate data to establish a long-term reliable record of solar irradiance for the project site. A reliable solar irradiance record is one where the uncertainties and biases in the data set are known and characterized in detail. In general, the solar energy resources of the project location can be known based on the modeling satellite data. Solar irradiance data from the SUNY-Albany satellite in NREL’s NSRDB database (1998-2005) are available for most US observations located on a 0.1° grid.

In the area between 25° and 50° latitude, the 0.1° grid is roughly a grid of 10,000 meters. Similar satellite datasets exist or are available for most observation points within ±66° latitude. Satellite data obtained between 1998-2010 or even 1998-2012 are not sufficient to create a profitable dataset because they do not cover all scenarios (i.e., all the situation), especially the year in which the eruption occurred.

Therefore, it is imperative to use data sets obtained from nearby observation sites that have recorded solar irradiance data for the eruption-affected years or have meteorological data that can be used to model solar resources during the eruption-affected period data. Fortunately in the U.S., the NSRDB database covers data from periods affected by eruptions, namely El Chichon in Mexico from 1982-1984 and Pinatubo in the Philippines from 1991-1994. Among the 1,454 observation points in the latest version of the NSRDB database, there are usually one or more observation points located in the vicinity of potential solar power plants, which can provide the necessary long-term data to assess changes in power plant capacity.

  1. Steps to create a monetizable dataset

Since the solar resource at the location of the solar power plant and the solar resource at the observation point of the NSRDB database may be different, the data in the database must be adjusted so that the data can more closely approximate the solar irradiance of the target observation point. The steps to adjust the calculated data are as follows:

(1) Download satellite data from the NREL official website or other data sources, and download data from nearby NSRDB database observation points.

(2) Mapping of daily GHI and DNI for selected project sites and nearby NSRDB database observatories. Select the NSRDB database observatory that best closely mimics the solar resource at the observation site (the one under evaluation) (Perez et al., 2008).

(3) The average difference of solar irradiance between the observation point in the NSRDB database provided by the satellite data and the project site can be modified or adjusted to the data in the database to simulate the solar irradiance of the project site. Since there may be artificial clouds in the months observed between the NSRDB database observation point and the target site, it should be possible to make comparisons and inferences on a monthly (or shorter time interval) basis.

If the difference between the NSRDB observation point and the target site is only a few percentage points, a simple ratio may suffice for adjustment. However, if the difference is significant, then the statistical characteristics of solar radiation must be considered. As later discovered by Liu, Jordan, and many researchers (Vignola and McDaniel, August 1991; Liu and Jordan 1960; Vijayakumar et al. 2005), the distribution of the daily average skylight index is related to the monthly average skylight index. If, in a given month, the monthly clear sky index of the solar power plant is on average 10% higher than the monthly clear sky index of the selected NSRDB observation site, the data obtained from the NSRDB observation site must be revised to adjust for the excess 10% clear sky index . By comparing the differences in the data distribution of recorded overlapping sites, a pattern can be drawn to guide data modification at non-overlapping sites. The point is to ensure that the adjusted data for the month matches the expected distribution of Clear Sky Index data.

  1. NASA/SSE data and ground-based measurements

Due to the different areas covered by NASAVSSE (on the 1° grid), the data will vary widely, but unless there is a problem with the data, the downloaded NASA/SSE satellite data should match the long-term changes in the NSRDB data. As shown in Figure 5.8. When the gaps in the meteorological data used to create the NSRDB data are filled, there are often large differences between the NSRDB and NASAVSSE databases. Such NSRDB data are also flagged as having greater uncertainty.

In order to increase the confidence of the dataset and reduce the uncertainty, ground-based measured data is of great benefit to the establishment of profitable datasets. Although, the MBE for satellite data is typically small, approximating a few percent of the GHI (Zelenka et al. 1999; Perez et al. July 1987; Renne et al. 1999; Hoff and Perez 2012; Myers et al. 1989; Nottrott and Kleissl 2010). However, ground-based measurements can be used to validate satellite-modeled data to help identify any systemic issues in snow-covered areas or areas with large variations in surface albedo.

Ground-based measurements obtained from observation sites with only total pyranometers (which are not maintained by the system) must have considerable uncertainty. The reason is that there are limited ways to validate the data, and sometimes dirt, moisture, or other factors reduce the accuracy of the data. Therefore, the accuracy and validity of ground-based data should be guaranteed for at least one year in order to validate satellite data for observation sites and provide them with tighter uncertainty limits. This data is also valuable in the design and future operation of solar power plants. However, as explained in the previous section on NSRDB and satellite data, the same conjecture problem occurs with one-year ground-based measured data. Therefore, parallel satellite data for the location must be purchased. In addition, the current satellite data model must be compared with historical satellite data values ​​in the NSRDB. Some overlapping data between the two records is also of great significance, and such overlapping data can help adjust the data set to meet the requirements of a standard. Typically, this step is called Measure-Cormelate-Predict MCP.

The NASA/SSE dataset is the only freely available, up-to-date dataset on the Internet. This dataset can cover all parts of the world. Whether it’s an observation point in a certain region of the world, or a recent period not included in the NSRDB or similar datasets, the historical data is mostly from NASA datasets. In this environment, incorporating ground-based data into resource analysis is a top priority, which increases the confidence associated with satellite results.

The NASA/SSE dataset provides long-term worst-case data scenarios for P95 and P99 calculations. In the absence of such data scenarios, an analysis-first, deterministic-second approach is used.

Read more: Sources of uncertainty in solar irradiance

Related Posts