Elly Yates-Roberts |
In the last issue of The Record, the first of a series of articles on data and the digital feedback loop introduced the concept of digital transformation through data and some of the challenges in getting the complete and coherent data sets needed for maximal impact and optimisation. In this article, we’ll continue the story by looking at the next steps in the data journey, transforming data into information, knowledge, and insight.
The commonly used ‘General Definition of Information’ or GDI can be paraphrased to say that information consists of one or more pieces of well-described, meaningful data that means something to the user. The reader might note that the first part of this journey was covered in the last article – adding metadata and transforming data to satisfy the ‘well-described, meaningful’ part. However, to become information, data often needs to be combined to provide the meaning required by the end user.
Another consideration is when this takes place, and at what level of completeness. In the field of data analysis, practitioners often talk of ‘hot’, ‘warm’, and ‘cold’ path analytics. The degree of ‘heat’ here indicates the lag time, or ‘latency’ between the event being monitored and information extracted, with ‘hot’ being the nearest to real-time, and cold being typically based on an aggregation of broader historic data. A solution that supports such paths is said to have a Lambda architecture – Microsoft solutions for connected vehicles provide such an approach. The ‘hottest’ analysis might well take place in the vehicle, and ‘edge’ technologies such as Microsoft’s Automotive Intelligent Edge are an effective way of rapidly developing and deploying such solutions with a dynamic continuous integration/continuous delivery pipeline while still meeting the requirements of in-vehicle environments.
‘Colder’ analysis usually takes place with multiple sources of data collected over time, and an effective way of achieving this with the greatest flexibility is a Data Lake. Unlike earlier constructs such as data warehouses or marts, a data lake allows for data to be captured in multiple forms and then made sense of using compute cluster infrastructure. The pivot to cloud infrastructure has made large-scale data lakes practical, driven by blending open-source software and innovations from public cloud providers such as Microsoft, with Azure Data Lake. Azure Data Lake is built on Microsoft’s Cosmos DB service, which allows the use of multiple database paradigms with associated APIs within a single environment.
However, just pumping data arbitrarily into a data lake is a risky strategy, with some commentators warning of ‘big data graveyards’. The great thing about data lakes, though, is the temporal and infrastructural flexibility about when and how this work is performed. ‘Delta Lake’ – an open source storage layer based on Apache Spark, and available on the Azure Databricks service, enables a pattern for successive refinement and optimisation of the data set from raw to coherent, modelled, and useful datasets orientated to specific scenarios. It’s what we’d recommend for a range of scenarios from connected vehicle services to manufacturing and supply chain optimisation and customer care.
Now we have the information, we can progress to knowledge – a higher level of comprehension based on harvesting information on a specific topic. The key words here are comprehension and specific. There are multiple techniques for deriving that comprehension for a specific purpose or use case, well beyond the scope of this article. However, a few key classes include algorithmic, visualisation, application and cognitive.
There are several technologies within the Microsoft family that support these paradigms. Algorithmic and application both essentially require development tools, and Microsoft tooling and solutions supports a broad set of these including Python, R, NET and its many language bindings, Java, script languages and many more. Support environments include Azure DevOps, Cloud Shell, Machine Learning Workspace and Notebooks.
Power BI is a widely used visualisation platform and has extensive facilities for drill down and data inspection. It’s designed to empower users ranging from neophytes to power users, and support for free, pro, and web deployment makes it easy to develop and deploy dashboards and analytical solutions to an enterprise.
Power BI combined with other elements such as Power Apps, Power Automate, Power Virtual Agents, and the Microsoft Common Data Model constitutes the Microsoft Power Platform – an environment designed to accelerate and simplify the development of efficient functional applications. A good example where such an approach makes sense is the dealership, where a high degree of combinatorial diversity is driven by systems and requirements specified by the original equipment manufacturer or brand, market, distributor or importer, dealer group, and individual dealer. This means that almost every environment is unique, which has historically impeded delivery of highly optimised and integrated apps due to complexity and cost.
For Cognitive approaches to deriving knowledge and insight, Azure hosts a wide range of capabilities ranging from Cognitive Services designed to democratise and accelerate access to AI, to fully fledged environments enabling the development and deployment of advanced and highly tailored solutions. We will explore these – along with how we put derived insights to work – more in the next issue of The Record.
This is the second in a series of articles by John Stenlake, automotive lead for EMEA at Microsoft
This article was originally published in the Summer 2020 issue of The Record. To get future issues delivered directly to your inbox, sign up for a free subscription.