The remarkable advances in Data, Analytics and Artificial Intelligence (AI) have fostered the widespread adoption of Machine Learning (ML) within many products and services companies use and consume on a daily basis. However, implementing ML solutions is still a complex endeavor, and many organizations are struggling to fully exploit the strategic advantages of AI.
One of the main causes of such complexity is the computationally & data intensive profile that characterizes most AI solutions. This implies that, to be useful to users, AI-powered products and services require the best of both worlds: 1) High-Performance Computing (HPC) to support the ever-growing demand for online calculations; 2) modern Data and Analytics solutions that may enable knowledge extraction from massive data sources.
Over the last few years, a simple yet effective iterative 3-step approach has gained a lot of traction mainly aimed on easing the design and development of business-oriented data products:
The proposed first step is starting from a tailored Proof of Concept (PoC).
At a second stage, the solution should evolve into a functional Prototype.
The final step is to build the first production ready Minimum Viable Product (MVP).
Designing the PoC as the first product iteration enables teams to embed the core principles of the target data product in a way that facilitates the evaluation of components whose performance and/or quality is critical to ensure that the whole project makes sense, despite not being fully connected to the whole application or ecosystem.
For instance, in a predictive algorithm a PoC could be an initial batch of predictions with a hand-crafted model that enables us to assess whether the process is actually forecastable and what would be a baseline level of accuracy. Similarly, in an ingestion pipeline, it would be possible to build a PoC to test the bandwidth of existing connectors and APIs to mass data ingestion or real time performance of packets, etc.
Once the core value has been validated, it is possible to proceed to the second step to evolve the PoC into a prototype, a functioning piece of software that not only can be integrated with the larger ecosystem, but also to be deployed on test environments and operating on the end-to-end product landscape. This allows the team to focus on what are the weak links, bottlenecks, corner cases and reliability issues.
It is worth noting that while a prototype is a working piece of software, it still does not cover all the cases nor is resilient enough to be fully deployed in production. However, it is crucial to start gathering feedback from test users and sample use cases.
The MVP constitutes the third and final stage of this agile data product development approach. An MVP comprises the minimal set of working features from a product definition standpoint to accomplish the required task in its bare bones, but that can be deployed in production and start gathering feedback from real world use cases and users.
The MVP constitutes the first release of the tangible product. It is meant to perform in the real world and thus it must provide all the guarantees (and good practices) of a properly functioning piece of software. The MVP should enable the iterative development and improvement of subsequent features and quality additions as part of the natural data product evolution.
Through these years this approach has been successfully applied (and refined) to different business verticals.
However, it became clear that, despite the simplicity and elasticity of the proposed iterative approach, the development of effective data products is not always that straightforward in real-world scenarios. This complexity is caused by the fact that data productions commonly:
Rely on heterogeneous computing with a strong trend towards hardware specifications (GPU, TPU, etc).
Make an non-uniform use of resources in the different operation states, such as modeling, training, inference, and serving.
Present deployment and versioning dependencies, with ever-arising AI methods, strategies, sources, etc.
This complexity has shaped most modern data platforms, that have evolved and now provide some unique features to tackle challenges such as:
Data versioning & lineage: to handle source data, processing code, trained models, deployment code, and usage logs.
Storage: to support different read/write patterns for data used in training vs. data used in production.
Compute: provisioning permissions, costs, configurations, and environments.
Process decoupling: different streams ingesting data and processing it into a reusable form, training models, evaluating them, and tracking performance in production and configurations across the board.
Undoubtedly, the natural evolution of modern data platforms has set the ground to address the complexity of real-world data products. However, there are still some challenges that need to be addressed since the development of AI has had a great deal of experimental or "lab" culture, greatly inherited from its academic genesis. Furthermore, some organizations were not ready for large-scale integrations, leading to many PoCs that failed to evolve to usable products and failed scale-up attempts, leading to a great loss of confidence in Data and AI solutions.
That is what gave rise to the ideas behind MLOps as an evolution of the concept of DevOps, enhanced with data science and processes around the Data & AI lifecycle. In this context, there are some critical elements that can be considered as the foundations of MLOps:
Governance: it is about who has access to what resources, such as data, assets, models, and deployment.
Repeatability: a critical concept in understanding the quality of any AI model. Enables auditing and improvement of the models, tracking data sets, experimenting with configurations, analysis, codes used, etc.
Continuous process: one of the biggest learnings in the software world. Concepts such as quality assurances, short iterations, versioning, and capability for distributed teams are all invaluable for something as complex as AI models.
In line with the constant evolution of AI, the ideas and concepts behind MLOps also evolve. Nowadays, the trends are moving towards MLOps Platforms. A good MLOps platform enables companies to have specialized talents in different platform dimensions. It untangles the unicorn that may be needed to otherwise effectively handle core concerns of data engineering, cloud engineering, data science, data architecture, DevOps, and many other disciplines.
Gonzalo Zarza
Global Head of Data & AI at Globant
BLOG
J On The Beach is an international rendezvous for developers and
DevOps around big data technologies.
It is a pure technical conference with workshops, a hackathon and technical talks where top speakers will share the latest trends in technologies related to Big Data. From data collection and stream processing to architectures, microservices, container systems, etc.