The Pillars to a Successful Machine Learning Initiative

Across both the business and academic landscapes there are transformations reshaping how organizations stay relevant in the ever evolving digital world. As digital transformations are enabling market disruptive innovation, industry leaders understand the need to properly prioritize initiatives to maximize impact while also minimizing cost. This is essential to maintaining an edge over the next decade. Any organization that does not have an aggressive, yet well grounded, strategic plan to address the change can easily squander capital on unfruitful projects in the near term and will lose relevance in the future.

Machine Learning is one of the major technology advancements that has, and continues to, completely change the technology landscape. With Gartner predicting $3.9 trillion in derived business value from AI by 2022, it is incumbent upon every executive, manager and individual to identify how to bring that value to their businesses, projects, or intiavies.

Figure 01. Forecast of Global AI-Derived Business Value (Billions of U.S. Dollars) – Source Gartner 2018

Because of the growth potential, Machine Learning is currently one of the most over-promising technologies we are faced with. It is common to fall into the ideal that Machine Learning and AI will solve all our problems. This, according to Gartner, is dangerous.  Especially since along with the prediction of growth for AI related business — Gartner warns that we are at the peak of the hype curve for AI and ML related technologies.

When pursuing a strategy to integrate AI and Machine Learning into your business, it is all too common to take traditional approaches  by throwing intelligent people at the problem, creating goals, establishing teams and often hiring consultants. With Machine Learning at the height of the hype curve, a lot of resources can be wasted that produce little or even negative value unless the proper foundational understanding is developed around what is needed and what is possible. An improperly developed Machine Learning based solution could produce incorrect data, leading to incorrect business decisions and, worse, a loss in customer engagement or revenue.

Three Pillars to Creating a Successful AI or ML Initiative: Data Density, Problem Selection, People

There are three foundational pillars that help guide the establishment of a Machine Learning initiative or practice. Without them organizations are at great risk of wasting effort which, at best, will result in the organization having to regroup around the foundational pillars or, at worst, abandoning the entire initiative and deming it as not viable for their business.

The three foundational pillars are Data Density, Problem Selection, and People.

Figure 02. Three Foundational Pillars: Data Density, Problem Selection, and People

Data Density

As mentioned, Machine Learning is currently in its heightened hyped stage, which means it’s incredibly common for organizations to want to capitalize on it. Many attempt to dive right in, but quickly face the initial problem of data density.  A large number of established businesses still struggle to embrace the new age of massive data collection that has become standard. The importance of this commodity is detailed in a  2017 article by the Economist, which dubs Data as the new oil. Success in the Machine Learning space starts with establishing a solid strategy for data gathering. This is an essential element to acquiring the data density needed to make relevant and, most importantly, correct predictions. 

To visualize the data density problem, we will use a very simple example of plotting a regression line using a robust data set vs a regression line resulting from a subset of the same data. While this is a very basic example with a small amount of data that would require no more than a simple spreadsheet, Machine Learning uses the same basic prediction algorithms as normal mathematical modeling, just at a larger scale.

Figure 03. .More Dense Data Set With a Better Regression Line. 

Figure 04. Less Dense Data Set With a Regression Line That Does Not Properly Predict Output.

Let us take an imaginary construction company that has a national presence,a fleet of vehicles and equipment in the 10,000s. They keep 10 years of basic maintenance records on that equipment that includes, location, mileage, date of service, and reason for the service. An initial pilot Machine Learning initiative uses the company’s F-250 pickup truck maintenance records to build a predictive model for when a truck might fail. The data consists of 5,000 trucks nationally, general operating conditions (by location), and the four years that the current model year truck has been in service. With this data set, they are able to build a decently reliable model with an 85% accuracy predicting when a maintenance issue will occur. This predictive maintenance results in savings from not having their equipment fail during jobs. Company executives are ecstatic with the results and want to expand the program to cover all of their equipment. They also want to predict individual part failure so they can proactively replace items expected to fail while the truck is already in the shop. 

They now encounter two problems related to data density. 

First, not all their equipment has the same density of data that their widely deployed F-250s have. For instance, they only have one paver (asphalt laying truck) in 10 different major metropolitan areas — and they are not all of the same model. This results in insufficient historical data to predict failures and not enough breadth of equipment to distinguish if the failures are outliers or actual failures. A solution to this might be to either defer to the manufacture to provide a predictive model, or to buy external data maintenance records from others using the same equipment. Since the company has no plans to expand their collection of pavers, they won’t have the capability to get the data density to properly maintain an accurate prediction.

Second, the jump from general maintenance prediction to specific part failures lacks the historical record that is required. Since the data collection was equipment and general failure, there is a gap around what was actually replaced or fixed. If that data had been collected, it could be fed into a model that could predict what other parts should be replaced the next time the equipment is in the shop, to proactively prevent a part failure in the field. The company recognizes that taking this predictive maintenance approach would result in overall cost savings by allowing the fleet to remain in use longer (cost to replace vs time to failure vs time till next in the shop vs cost for field failure). However, without that historical record, there is nothing for the model to build upon. Going forward, like the previous example, a solution to this might be either defer to the manufacturer to provide a predictive model or to buy external data maintenance records from others using the same equipment. Additionally, the company could implement a process change to maintenance records. Since the mechanism to gather this data is present for many pieces of equipment, a process change when doing maintenance can be included which requires the technician to log which parts were replaced and for what reason. While this may take a year or two to build the data density required for a prediction, it can save millions of dollars down the line, with a small up front investment.

When discussing data gathering, it is also important to talk about the new shift in data privacy laws that have and are happening. IIn the post Cambridge Analytica days and with the implementation of the General Data Protection Regulation in the EU it is critical to be cognizant of the new policies. It is a requirement that any company dealing with customer data fully understand all the current laws and have a process in place to address new laws to ensure compliance.

Problem Selection

Choosing the right problem to solve for an initial pilot can mean continued funding or disillusionment. This can fall into a couple of categories, primary data availability and problem difficulty

As we discussed above, choosing a problem that has the correct data density is fundamental. Getting funding, building a team and then attempting to build operational predictions that don’t always hold true costs money and can set back future projects. With the example above, our imaginary construction company made a good choice with their initial pilot. They chose a problem with the correct data density that could be properly modeled (their fleet of F250s) which resulted in good predictions and a positive reception. Had they chosen to go after the predictive parts replacement first, the initial investment (time and cost) may not have yielded expected results, resulting in a negative stigma and reduced likelihood of continued work.

There is also a sliding scale of readily solvable problems. Problems like recommendations or image recognition are well defined spaces with off-the-shelf options that can easily be built upon. Increasing in complexity, there are well researched areas, such as predictive part failure or anomaly detection, that require a depth of mathematical knowledge. Other spaces, like self driving vehicles or stock market predictions, are still being developed and are for organizations or academia with established data science and machine learning practices. 

Figure 05. Scale of Solved to Hard Problems in Machine Learning

Again, using our imaginary example from above, the construction company chose a well researched problem area as their pilot project. Had they instead tried to automate the pavers to lay down asphalt to replace drivers and operators, they would have been tackling a problem that is still immature (with self driving vehicles and robotics) and would require a massive effort and take years to accomplish. That problem is better left to larger, more established institutions until more mature.

To successfully incorporate Machine Learning into an organization, it is imperative to not only identify the right problem to tackle, but also to start with the right people who understand the space, what is possible, and how to quickly assess data density.


Great people make great organizations. Much of the digital revolution over the past few decades has come from engaging motivated, smart individuals in enabling and creative environments, and letting them bring about innovation and change. We have a plethora of startups and great dynamic digital change from people with no formal education in a technology field, but rather those who taught themselves. For organizations that have been structured around that model, it is tempting to apply the same approach to Machine Learning and Data Science. This is where caution should be applied; for certain roles and problems it can lead to suboptimal solutions or negative results.

There are three major role types required for a Machine Learning effort. A  subject matter expert, a developer and a data scientist. You will sometimes find an overlap between two of the roles, but it is extremely rare to have someone who can fulfill all three.

Figure 06. Major Role Types for Machine Learning: Subject Matter Expert, Developer, and Data Scientist

Any software development focused company will be well staffed with Subject Matter Expert and Developer roles. There is tons of information on the Software Development Lifecycle and how those roles are staffed and interact.

The new role for organizations who are just breaking into the Machine Learning space is that of the Data Scientist. This person is responsible for narrowing down the solvable problem sets, defining what the available data can solve, data cleansing, and applying or building models to solve the problem. This role should have a background in statistical analysis and data analytics. For many problem spaces this typically means someone with a Masters or PhD in statistics or another qualitative analysis subject. 

The depth of knowledge required for the data science role is directly related to how ambitious you want to be on the scale of solving hard problems. At a minimum, the individuals or team should be able to identify where a problem will generally lay on the spectrum of the problem space difficulty. 

If the problem space is a well solved one, and it is provided as a framework or a service, the barrier to implement is relatively low. For instance, Recommendations fall in that part of the problem spectrum, where some examples of services are AWS Personalization, Azure Personalizer, Google Recommendations, Rich Relevance, Adobe Sensei. 

For areas where an off-the-shelf solution is not available, the requirements of the Data Scientist ramp up very quickly to a Masters or PhD level. In order to develop a proper strategy to solve a problem, it is necessary for the individual to be well versed in a very broad array of statistical and mathematical models. Bringing this back to our fictional construction company, the failure analysis would have required a Data Scientist to first understand the need for a Weibull or Cox model, which correlates failure to cost analysis, and the ability to train based on the statistical models. Although the area is well researched, someone without the mathematical and analytical education would have a large knowledge gap to fill.

Moving to the far right part of the problem difficulty spectrum enters the area reserved for academia and large research organizations, who have teams working to solve a single, complex problem, such as self driving cars. Operating in this space are leaders in the industry and researchers who are pushing the bounds and measuring success incrementally through years of progress.

In summary…

Whether you are just starting or are re-evaluating a Machine Learning initiative, make sure to have a well defined strategy in place to address the three fundamental pillars: Data Density, Problem Selection and People. Without them, you will get caught in the hype curve and will struggle to produce usable and viable products or solutions. With them, you will have a foundation to begin bringing near term value through small intiavies and set yourself on the path for longer term leadership in the market.