Machine learning solutions are used to solve a wide variety of problems, but in nearly all cases the core components are the same. Whether you simply want to understand the skeleton of machine learning solutions better or are embarking on building your own, understanding these components - and how they interact - can help.
Here’s a visual and written explanation of what these are and what they do.
The components of a machine learning solution
- Data Generation: Every machine learning application lives off data. That data has to come from somewhere. Usually it’s generated by one of your core business functions.
- Data Collection: Data is only useful if it’s accessible, so it needs to be stored – ideally in a consistent structure and conveniently in one place.
- Feature Engineering Pipeline: Algorithms can't make sense of raw data. We have to select, transform, combine, and otherwise prepare our data so the algorithm can find useful patterns.
- Training: This is where the magic happens. We apply algorithms, and they learn patterns from the data. Then they use these patterns to perform particular tasks.
- Evaluation: We need to carefully test how well our algorithm performs on data it hasn’t seen before (during training). This ensures we don’t use prediction models that work well on “seen” data, but not in real-world settings.
- Task Orchestration: Feature engineering, training, and prediction all need to be scheduled on our compute infrastructure (such as AWS or Azure) – usually with non-trivial interdependence. So we need to reliably orchestrate our tasks.
- Prediction: This is the moneymaker. We use the model we’ve trained to perform new tasks and solve new problems – which usually means making a prediction.
- Infrastructure: Even in the age of the cloud, the solution has to live and be served somewhere. This will require setup and maintenance.
- Authentication: This keeps our models secure and makes sure only those who have permission can use them.
- Interaction: We need some way to interact with our model and give it problems to solve. Usually this takes the form of an API, a user interface, or a command-line interface.
- Monitoring: We need to regularly check our model’s performance. This usually involves periodically generating a report or showing performance history in a dashboard.
Building your own ML Architecture
Data generation and collection, training, and evaluation are must-haves, but you may need domain-specific components too.
A common mistake we see is people focussing too much on the prediction component and not enough on the feature engineering pipeline (or trying to skip this component completely).
Do you need a second opinion on how to set up the architecture for your ML applications? Schedule a call with us.