This is a story of Big Data program management and preparation. We could all learn a little...
Mo Data stashed this in Big Data Philosophy
- Significant potential value can come from implementing a big data platform across use case areas.
Two years ago, Duke Energy set out to better understand the value of integrating siloed datasets and streaming data from field devices used in managing its smart grid system. Duke acknowledged that it "does not know what it does not know" and that there is much to learn about data ingestion, storage techniques, modeling, visualization and the value of integrated data. So Duke Energy developed a Sandbox, a data model and dataset that combined data elements from various systems, to analyze and identify new value opportunities for managing the smart grid system.
To accelerate this process, Duke Energy sponsored the Data Modeling and Analytics Initiative (DMAI), an innovative forum by which big data experts were given a slice of the dataset to analyze for opportunities and insights that it could incorporate into its big data analytics strategy and activities.
Seventeen vendors submitted final reports that discussed data issues, models and tools used to analyze the data, as well as use cases that could be developed for new value opportunities. Responses varied considerably based on the skills and expertise of each vendor. Vendors provided more than 150 unique use cases for consideration (see Figure 1). For each category, it shows the number of use cases (with duplicates omitted) along with the number of vendors that provided use cases in this category. This figure illustrates how popular the category might be with the vendors. For example, most vendors provided use cases in the meter analytics and customer analysis categories. Conversely, there were many use cases in the distribution grid analysis category, but only half the vendors provided examples. Surprisingly, only two vendors provided use cases that address security. Selection of vendors for future discussions might depend on what use case categories are given the highest priority.
Use Case ValueAs part of the DMAI, Duke Energy asked the vendors to provide business case value estimates for the use cases they provided. Value could come in the form of reduced operational expenses, increased revenues from existing or new product or service ideas, or protection of existing revenues from increased reliability, customer satisfaction, theft reduction, etc.
Understanding that providing a financial metric would be difficult, vendors were instead asked to use one of the following codes:
To discourage guessing, Duke stressed that "N/A" should be specified when the vendor did not have specific experiences and insights in providing analytics within the category.
The value estimates were based on a wide range of assumptions. Some vendors developed "value potential" estimates. These represented the lifetime value of the use case. Lifetimes ranged from five to 20 years. Others developed a value per year estimate. Figure 2 summarizes use case counts and use case category value.
The following insights came from review of the vendor use cases and value estimate responses:
- The average score for each category ranged from 2.6 to 3.6. Although this was expected, what is more interesting is that for most categories, value scores varied widely. All categories had the highest score (5-$$$$$) by at least one vendor.
- In every category, at least one financial metric or rule of thumb was given for the basis for value estimation. These will be reviewed and standardized before a consolidated business case is developed.
- Many examples of value came from applications that crossed more than one use case category. Some examples of cross-value include:
- Program development. Incorporates customer analytics, energy efficiency and demand response information and potential benefits; and
- Load forecasting. Considered a building block for other categories, such as demand response, energy efficiency or distribution grid analysis.
- Significant potential value might come from the incorporation of social media, socioeconomic and interval data to segment new customers and identify new revenue opportunities.
- Duke expects operational efficiency percentage gains through improved grid control based on one minute of data sampled from the supply and demand sides of the electric grid.
- Weather data is one of the largest independent variables that affect load forecasts. In the future, streaming microcell weather data collection will improve local forecasting to improve efficiency.
- Analysis of failure signatures, steady-state performance and cold-load pickup for both in-home and more expensive capital-intensive grid devices will show the benefits of condition-based monitoring and lead to new value streams.
Data Analytics Tools Used by VendorsThere was significant breadth and diversity of software tools used by the vendors in DMAI. Tools used are organized into the process categories of ingestion, storage, visualization, analytics and business intelligence (BI) (see Figure 3).
Each vendor has a different history that showed its approach to analytics and revealed its strengths.
Each vendor in the DMAI also has strong interest in competing in the energy sector analytics space. Vendor expertise can be analyzed through the following characterizations:
- Energy sector expertise. These vendors have worked in the energy sector many years. They have hardware products, software systems and services that address the needs of this sector. They have moved into analytics through organic development or by purchasing or partnering with analytics and information technology (IT) companies.
- Analytics expertise. These vendors have deep analytics skills and approach the market by providing solutions across business sectors.
- IT expertise. These vendors have focused on the IT solutions that support the big data analytics ecosystems. Their products can be considered the base infrastructure for analytics and the enablers for handling big data. Many are working to grow their businesses into vertical offerings by business sector. Some have their beginnings in data warehouse products; others began with visualization tools.
The IT expertise space often includes open-source software (OSS) solutions. Standards do not exist in the visualization, analytics and business intelligence (BI) categories. Therefore, it is not typical to find OSS in these categories. Also, no one vendor can provide the best-in-class solution for all three skill categories.
Duke Energy took a cursory look at these tools, and by doing so, the following insights arose:
Ingestion. The data ingestion process involves processing data from sources and cleaning the data as it comes in. This process is called ETL (extract, transform and load). It involves understanding how to communicate with the data sources, rules to transform the data into formats that are recognizable by downstream systems and data-loading into new data stores. This process is costly and time-consuming and warrants questions on the initial data quality and formatting because significant cleaning and transforming is needed. Future systems, based on a distributed architecture enabling the autonomous grid, will address this process by standardizing translation and contextualization processes at the edge.
Storage. During the past 25 years, the primary data store has been the relational database management systems (RDBMS) with its standardized structured query language (SQL). The relational database space has been ruled by enterprise giants Oracle Corp., IBM and Microsoft Corp. The SQL base is a strong standard, and the space has become more commoditized, allowing OSS providers to enter. To answer known questions quickly across large data in time-series, online analytical processing (OLAP) was introduced and cubes were preformed to support fast visualization.
During the past 10 years, the relational database model has been placed under strain with the growth of the Internet, unstructured data sources and the Internet of Things devices' coming online. The inability of RDBMS to handle vast quantities of unstructured data and the advent of high-speed commodity hardware opened the world of parallel processing to all, though at one time it was accessible only to those who could afford supercomputers. In 2003, Nutch (later becoming Hadoop) was created, and in 2004, Google created MapReduce. Parallel processing moved to the forefront to solve the problem of processing billions of Web pages.
In-memory databases and complex event processing (CEP) also has become popular and necessary where high bandwidth and millions of transactions are present. For example, in the banking industry, fraud detection in hundreds of milliseconds is a requirement.
Since then, several types of NoSQL database structures have been created to better manage unstructured data:
- Key-value store (example: Redis);
- Tabular (example: HBase); and
- Document-oriented (example: MongoDB).
NoSQL does not support joins, has no complex transactions and offers no constraints. NoSQL enables the ability to store and retrieve large quantities of data. It also supports dynamic growth of data and easily takes in new types of data added to the system. The data is not highly structured, and the structure that is present is allowed to change. Relationship understanding and constraint management is moved to the programs and scripts that process the data. Joins can be preformed in the data before storage.
It will be important for Duke Energy architects to consider the types of data that support the primary use cases of the utility of the future so the determination of the right-fit data structures and stores can be made.
Visualization, Analytics and BI: The DMAI was not focused on deep analysis of vendor visualization, analytics and BI tools. These tools are an important aspect of a data analytics strategy.
Each vendor's final report included graphs, charts, geo-views and dashboards to show the results of their analyses and, in some cases, to differentiate their offerings. Given the data and use cases discovered in the initiative, a comprehensive dashboard would include transformer data, geo-location, socioeconomic information, outage data, meter readings data, time-series events and user interaction through queries, filters, tables and graphs.
The following are key observations and insights that came out of the DMAI:
- Significant potential value can come from implementing a big data platform across use case areas.
- Various problems were encountered extracting data from Duke Energy's systems. Consolidation and integration of data elements are required to perform the analytics necessary to identify and realize the value identified. Issues with data include missing data, no common information model, problems linking data sets from different systems and challenges from extracting data.
- To understand big data implications for other areas of Duke Energy, more data is needed than what was included in the initial data set. The vendors provided numerous examples of use cases that could be implemented with the inclusion of additional data. Social media, asset attributes (e.g., age, type) and event alerts were the three most common data elements identified by vendors.
- The initiative provided significant insights into the tools and systems used to manage big data. Development of technical and functional specifications, along with development of overall solution architecture, should be Duke Energy's first priorities upon finalizing its strategy.
- Many new models and analytics were introduced to Duke Energy by the vendors in their final reports. They demonstrated the use of these models using the Duke Energy dataset.
- There are resource and skill gaps between what is available at Duke Energy vs. potential capabilities identified from the vendor final reports. The most important issues to address are: availability of data, a comprehensive analytics strategy and overcoming the silo-based structure of the data and systems.
The DMAI, along with the participation and support of Duke Energy's industry partners, have accelerated big data and use case development activities throughout Duke Energy.
A copy of the final report is available at www.duke-energy.com/pdfs/dma initiative.pdf.
David Lawrence is a technology development manager with Duke Energy working in the emerging technology office, where he provides leadership on technologies for the smart grid, including development of subject matter expertise, leading business case development and providing guidance to external organizations.
David Mulder is senior consultant with Leidos Engineering. His responsibilities include technology evaluation and prioritization, business case analysis, data mining and analytics and market potential analysis associated with new utility technologies.
Zac Canders is a program manager at Leidos Engineering. His experience includes leading the design, development and implementation of smart grid solutions at the nation's largest investor-owned utilities and municipalities, where he managed AMI, MDM, enterprise resource planning and grid optimization projects.