Operationalising big data analytics and machine learning

Data and Analytics Lead at BAE Systems Applied Intelligence Read time: 3 mins
Harriet Barr examines the challenges faced in moving from research to operations and explains why machine learning is actually the easy bit
Operationalising big data analytics and machine learning
As with many challenges in IT, technology represents only part of the picture when it comes to operationalising new capability. Let’s take machine learning systems as a case in point.
 
You might think that code is the be all and end all but actually there are a range of data processing and technology challenges required to deploy the capability at enterprise scale, and that’s not even considering the people and process issues that crop up too.
 
For example, experimentation often occurs in isolation. What this means, though, is that there is often little consistency in the way analytics and models are written, resulting in different data scientists needing the same data and struggling to get it in a usable format.
 
Operationalising analytics also requires engineering skills. Data scientists are very skilled people, but they are not software engineers and their focus is on validating the analytic idea rather than answering the questions of scalability, reliability and efficiency.
 
And remember, analytics and models are never static and experimentation never stops. I often find it useful to think of them as living things which are hugely susceptible to changes in their environment, always having room for improvement and will one day be obsolete. The challenge is that this is often not accounted for in resourcing work and Data Scientists often want or need to move onto their next innovation.
 
In large organisations we often find a community structure where analytic capabilities are created by data scientists for use more widely across the enterprise. Here the challenge is how you make new capability discoverable and trusted by end users, particularly if it automates something they have previously done manually.
 
So there are challenges, but the good news is there are ways to avoid or resolve them.
 

To the tool shed

High quality data tooling throughout the data science lifecycle addresses the challenges of experimentation in isolation, as well as operationalising analytics requiring engineering skills.
 
Firstly, this requires separating data preparation and feature generation from the development of analytics and models. Too often the lines between these tasks are blurred and data formatting code can end up as part of an algorithm or model.
 
Secondly, there needs to be a common toolkit to work with, a common set of applications, frameworks and templates for creating analytics or models. The benefit here is consistency, reducing the variability between capabilities, so that when it comes to operationalising later on, the engineering teams can be better prepared and the process is more business as usual.
 
Lastly, there is the idea of modules or components. Analytics and models often have to do exactly the same operations as one another, regardless of their purpose. Providing a facility to store and manage common modules will reduce duplication in code, reduce errors, and ultimately speed up model generation.
 
Providing high quality tooling to data science teams may incur high initial set up costs, but the benefit will be felt when these capabilities are operationalised quicker through reduced technical debt and smoother transition to engineering teams.
 
When it comes to operationalising analytics, it is important that Data Scientists are still involved but teams should be made up of many roles with a focus on solution delivery. Data engineers, data scientists, developers and architects must work together as part of a team whose collective goal is the delivery output, rather than the individual role function.
 

Getting the governance right

Governance may not sound very sexy, but it’s crucial because this addresses the challenge of analytics and models never being static. Data governance gets a lot of airtime, and there are standards and established methods for ensuring quality, consistency, validity and accuracy of data. But at the enterprise scale, I would argue that governance of analytics is equally important because of the need to ensure analytics and models are trusted and that decisions made on the back of analytics results come with some assurance.
 
For analytic governance to work it needs to be implemented in partnership with an effective cataloguing solution for analytics, in a similar way to the need for data cataloguing in data governance. The benefit is to create a transparent analytic estate and avoid duplication by enabling people to make use of the analytic capability already available rather than building their own.
 
Metrics are also key in order to inform decision points, and more broadly for monitoring operational analytics and models. They should be used to monitor operational analytics end to end to inform management of technical resources and to input into governance decisions. Ultimately, metrics underpin all our strategies for overcoming the challenges outlined and they are something that’s much easier to build in from the start.
 
And when it comes to introducing new capability, it is vital to embed it wherever possible into existing tools, rather than creating new applications or processes. This requires flexible deployment options for models and analytics, recognising that within an organisation different teams may want to apply the same model or analytic in different places in the data pipeline.
 
As machine learning takes ever deeper root, these challenges are not going to go away any time soon. Tackling them won’t be straightforward, but doing something new never is. With practice, process and the right people involved, organisations can succeed in putting operational analytic capability in the hands of the wider organisation.
 

About the author
Harriet Barr is a Data and Analytics Lead at BAE Systems Applied Intelligence: Harriet.barr@baesystems.com

Explore more content from our Global Executive Client Forum

Explore more content from our Global Executive Client Forum

At our recent Global Executive Client Forum, we brought together leaders from around the world to discuss the trends, threats and technologies which are reshaping our world around us. Check out our range of new content sourced from the event.
Find out more

Recommended reading

  • Artificial intelligence: it’s not just about ‘machine learning’. Artificial intelligence may offer exciting ways to strengthen public services but it’s not without its challenges. Richard Thorburn sheds some light on the importance of accuracy and transparency when deploying this new technology
  • Bringing data to the party. Caroline Bellamy is on a mission to transform how the UK Ministry of Defence uses data. She tells Mivy James about her 30-year career in industry and why data holds the key to smarter and faster decision-making across Defence
  • Seven ethical considerations for artificial intelligence. Artificial intelligence is here to stay but fears about this new technology remain stubbornly high. Mivy James sets out seven ways to inject greater calm and reassurance into its deployment
  • How a new model of collaboration can detect data risk. Data sharing has many advocates but it is fraught with ethical risk. Richard Thorburn says technology can help fuel greater collaboration that safeguards data by detecting risk earlier and faster
  • Tuning up data trust. How can governments generate greater trust when it comes to data? Nicola Eschenburg says it can be done, and the sooner the better
  • We need to talk about data engineering. Organisations across the public and private sectors are increasingly prioritising the role of data engineers – and rightly so, says Alex Richards  
top
Harriet Barr Data and Analytics Lead at BAE Systems Applied Intelligence 15 November 2021