The big-data analysis process reduces to three elements: Collection, Synthesis, and Insight
The big-data analysis process reduces to three elements: Collection, Synthesis, and Insight. We gather relevant data, harmonize and link it, and use analysis findings situationally. In the online/social/sensor era, “relevant” may reflect enormous data volume. “Harmonize” responds to variety, and situational applications must often accommodate high-velocity data. Context and latency considerations complicate matters. Latency refers to acceptable data-collection, analysis, and reporting lag. Low latency is crucial in online, mobile, and enterprise interactions. And context means metadata, good-old data about data, which can boost analysis accuracy (and also aide in proper data governance). This article is about the roles of metadata and connection in the big-data story.
Human Data: Fact, Feelings, and Intent
My particular interest is “human data,” communicated in intentionally expressive sources such as text, video, and social likes and shares, and in implicit expressions of sentiment. Implicit: We infer sentiment signals from behavior tracks (transaction records, click-/tap-streams, and geolocation) and social-network links and interactions. Human data, from devices, online and social platforms, and enterprise transactional and operational systems, captures what Fernando Lucini characterizes as “the electronic essence of people.”
Elliot Turner is particularly interested in intent mining, applied, for example, in efforts to predict an individual’s purchasing behavior. Turner says, “success will combine elements like a person’s interests, relationships, geography – and ultimately his identity, purchase history and privacy preferences – so that applications can plot where a person is in his ‘buyer’s journey’ and provide the best offers at the best times.”
Natural Language Processing
Natural language processing (NLP) (and parsing and interpretation for formal languages) is a route to mining the information content of text and speech, complemented by techniques that extract interesting information from sound, images, and video. (Of course, network, geospatial, and temporal data come into play: Matter for another article.) Recognizing that NLP includes both language understanding and language generation, two parts of a conversation — think about, but also beyond, “question answering” systems such as Apple Siri – I asked my interviewees, How well are we doing with NLP?, and also about our ability to mine affective states, that is, mood, emotion, attitudes, and intent.
One particular technique, unsupervised learning, which AlchemyAPI CEO Turner describes “enabl[ing] machines to discover new words without human-curated training sets,” is often seen as materially advancing language-understanding capabilities, but according to Autonomy CTO Lucini, the real aim is a business one, “making sure that any piece of information fulfills its maximum potential… Businesses need to have a clear view how [information availability] translates to value.”
I see mobile computing as opening up a world of opportunity, exploitable in conjunction with advances on a variety of technical and business fronts. Which? I asked my interviewees. The responses bring us back to this article’s starting point, metadata, context, and connection.
Marie Wallace says “Mobile is the mother load of contextual metadata that will allow us to provide the type of situational insights the contextual enterprise requires.” Add longer-established sources to the picture, and “there is a significant opportunity to be realized in providing integration and analysis (at scale) of social and business data… Once we combine interactional information with the business action, we can derive insights that will truly transform the social business.”
This combination, which I referred to as “synthesis,” is at the core of advanced big-data analytics, the key to solutions from providers that include, in addition to IBM and HP Autonomy, companies such as Digital Reasoning and Palantir.
IBMer Wallace adds, “privacy, ethics, and governance frameworks are going to be increasingly important.”
According to Fernando Lucini, mobile is great for HP Autonomy because it means “more use of information — in our case, human information.” He sees opportunity in three areas: 1) supporting “better and more real-time decisions [that] connect consumer and product,” 2) information governance, because “securing or protecting information, as well as evaluating the risk in information and then being able to act suitably and in accordance with regulation and law, is a considerable integration and synthesis challenge,” and 3) provision of self-service, cloud tools.