The hype about machine learning (ML) is warranted. Machine learning is not just making things easier for the companies that are taking advantage of it. It’s also changing the way they do business. For example, machine learning is:
- Being used by financial institutions to quickly detect fraudulent activity
- Enabling healthcare practitioners to diagnose diseases and prescribe treatments more effectively
- Helping manufacturers monitor equipment so issues can be dealt with before they disrupt operations
- Allowing streaming services to identify customers at risk of taking their business elsewhere and helping determine what steps can be taken to retain them
With increasing data volumes, low-cost data storage, and less expensive, more powerful data processing, the potential applications of machine learning will grow exponentially.
So why are so many companies hesitant to jump on the machine learning bandwagon – and why is the success rate so low for those that do embark on these projects? Afterall, organizations such as Gartner note that up to 85% of machine learning projects ultimately fail to deliver on their intended promises to business.
More importantly, what can companies do to ensure a higher success rate so they can leverage the promise of machine learning?
Machine Learning is Different
To increase the chances of machine learning project success, the first step is to understand that these projects are not the same as typical application and software development projects. There are different processes, terminology, workflows, and tools involved.
There are also different staffing requirements. Among the most important are data scientists, who are especially critical when it comes to defining the success criteria, final deployment, and continuous monitoring of the machine learning model.
Data engineers, business intelligence specialists, DevOps, and application developers also play key roles. Few organizations have the internal resources to fill all these positions. Their options: hire them, which isn’t always easy given that machine learning is still a relatively new field with few experienced professionals, or outsource.
Even if an organization does have the staffing covered, it can be difficult to facilitate collaboration and communication between different teams. Traditional software and application development usually differs greatly from data science projects. Whereas software development tends to be more predictable and measurable, data science can entail multiple iterations and experimentation. Expectations are different. Deliverables are different.
The Issue of Data Quantity and Quality
According to a number of research initiatives (e.g., Hidden Debt in Machine Learning Systems) technical debt resides in areas common to many machine learning projects: Data Quality, Model Quality, Feature Versioning, Model Monitoring, Data Labeling, Model Explainability, and Fairness, Process Automation, Human Intervention (Review) in-place capabilities.
There’s also the matter of data quantity and quality. Machine learning projects use large datasets, since larger datasets facilitate better predictions. But as the size of the data increases, so do the challenges.
Data is usually merged from multiple sources. Often that data is not in sync, which can create confusion. In addition, data can get merged that wasn’t meant to be merged, resulting in data points with the same name but different meanings. Bad data can generate results that aren’t actionable or insightful, or that are misleading.
The lack of labeled data can also be an issue. Some teams may try to take on the laborious task of labeling and annotating training data themselves. Some may even try to create their own labeling and annotation automation technology. The problem is that a great deal of time and expertise is committed to the labeling process rather than machine learning model training.
Outsourcing can save both time and money but doesn’t work well if the labeling task requires specific domain knowledge. In those cases, organizations also must invest in formal and standardized training of annotators to ensure quality and consistency across datasets. The other option is to develop their own data labeling tool if the data to be labeled is extremely complex. However, this can require more engineering overhead than the machine learning task itself.
Yet another data-related issue is that the data required in a machine learning project often resides in different places with different security constraints and in different formats — structured, unstructured, video files, audio files, text, and images. Data preparation is required, a process that includes searching, cleaning, transforming, organizing, and collecting data. It’s a tedious activity that can require teams to spend up to 80% of their time converting raw data into high-quality, analysis-ready output.
For both the data labeling and data preparation, automation can help remedy the situation – but again, requires expertise that internal teams often lack.
Great Expectations
Machine learning projects aren’t cheap, so it’s not uncommon for organizations to have overly ambitious goals for them. There are often expectations that a project will completely transform the company or a product and generate an enormous return on investment. That creates a lot of pressure that can, in turn, lead to second-guessing on strategies and tactics.
Not surprisingly, these kinds of projects tend to drag out. As a result, both the project teams and management lose confidence and interest in the project, and budgets max out. Even the most expertly run projects are doomed to fail if the goals are unrealistic.
In other cases, machine learning projects kick-off without alignment on expectations, goals, and success criteria between the business and project teams. Without clearly defined success indicators, it’s difficult to determine whether a project is successful, what changes need to be made, if the model is effectively solving the intended business needs, or if other options should be considered.
Machine Learning Success Factors
While there are no specific guidelines for ensuring a successful machine learning project, there are ways to overcome many of the issues that can lead to project failure. Among them:
- An understanding of how machine learning works, how it differs from other project types, and what’s required to execute a project
- A properly scoped project with realistic goals, budget, and leadership support
- The resources to run a machine learning project, including experienced team members — whether in-house or outsourced — and a commitment to collaboration and open communication
- Large amounts of data, preferably labeled
- Capabilities for collecting, storing, labeling, cleaning, quickly accessing, and processing large volumes of data
- Advanced tools for machine learning models and data monitoring
- Capabilities for a human to review the machine learning system and inference at any place and point in time
- Software tools for executing machine learning algorithms
- A development platform, such as AWS, Baidu, Google, IBM, or Microsoft
Follow these tips and your machine learning project won’t derail before your organization can enjoy the many benefits that this modern technology provides.
About the author:
Artem Koval is Big Data and Machine Learning Practice Lead at ClearScale.