When you build machine learning models, it’s common to run dozens or hundreds of experiments to find the correct input data, parameters, and algorithm. The more experiments you run, the harder it gets to remember what works and what doesn’t.
Sometimes you see a great result but because it took hours or days to produce, you’ve already changed the code. And now you can’t remember what parameter you used to get that result!
Of course, this is an eternal problem for all scientific research. It’s one that’s traditionally solved by the Lab Notebook: a journal where scientists carefully record every experiment they run.
Many data scientists follow a similar approach, keeping notes of their experiments digitally or by hand. But this doesn’t always work for machine learning. You need to keep track of three things: the data, the code, and the model. And even if you’re super precise in your note-taking, it’s not feasible to record everything in full.
Luckily, you can automate experiment tracking with MLFlow. Not only is it simple to set up, and it adapts easily to your existing workflow.
What is MLFlow?
MLFlow is an extensive machine learning platform with several components. We’ll just focus on its experiment tracking features for now. We’ve talked before about model registries and why they’re valuable. There’s definitely some overlap between experiment tracking and model registries, but they aren’t the same.
Experiment tracking vs model registries
Model registries are for system managers and focus on models that make it to production. On the other hand, experiment tracking tools are for scientists to track all experiments, including unsuccessful ones.
Registries track and visualise these models’ performance and associated metadata. But for every model in production, there are usually hundreds of failed attempts. Experiment tracking is designed for scientists and researchers to help make sense of unsuccessful and not-yet-successful experiments. It makes the research process more efficient by ensuring that work isn’t lost or duplicated.
MLFlow’s Python library and dashboard
The two components of MLFlow you’ll use for experiment tracking are its Web UI and Python library. You can get them both by running a single install command:
pip install mlflow
You can immediately view the dashboard where all your experiments will live by running mlflow ui and visiting http://localhost:5000 in your browser.
This dashboard already has a bunch of similar features to a Lab Notebook: you can take notes, search through previous data, and set up new experiments.
Things get more interesting when you use this in conjunction with the same mlflow Python library you’ve just installed. By adding a few lines to your existing machine learning code files, you can easily track every single run. This automatically saves the parameters you used, the results, and even the full model binary.
The screenshot below shows how you can start to automatically track your experiments in around ten lines of code.
The highlighted lines are MLFlow-specific, while the rest is a standard scikit-learn example to predict wine quality.
Each time you run an experiment with this code, it logs an entry you can view in the dashboard. Note the following:
- In the Python code, we referred to an experiment_id`. You can see this in the dashboard after you create a new experiment in the UI. One experiment can have many runs, so you won’t need a new experiment ID each time you try new hyperparameters or other variations.
- Your existing training and evaluation code needs to be within the mlflow.start_run block.
- You can log parameters, metrics, and the entire model with the methods log_param, log_metric and sklearn.log_model.
From here, you can change parameters as you like, without worrying about forgetting which ones you’ve used. For example, in the screenshot you can see that MLFlow tracked the result of two experiments where we changed the alpha and l1_ratio parameters, making it easy to compare how these metrics affected the r2 and rmse performance.
Keeping free-form notes
The reason many scientists keep their own notebook rather than using a structured platform is they feel that pre-created templates aren’t flexible enough.
Something we like about MLFlow is that you can keep notes at different levels: notes relating to an entire experiment, or to a specific run.
How MLFlow works internally
MLFlow can be set up in different ways. You can do it locally, as we did in the example above, or on a remote server, which lets you share a single dashboard among a whole team of scientists.
If you install it locally, you’ll notice it creates a directory called `mlruns` in the same directory where you executed your Python code. Inside this directory you can find the full model binaries (as .pkl files) for every experiment run.
If you want to track lots of models and you need more space, you can configure MLFlow to store these files on S3 or another cloud storage provider instead.
Scaling up your machine learning infrastructure
We use MLFlow as part of our Open MLOps architecture. You can use MLFlow on its own but it also works well in conjunction with other MLOps components, such as Prefect for scheduling and managing tasks, and a cloud-based Jupyter Notebook environment for easy collaboration.
We love discussing the challenges of building and keeping track of machine learning models. Contact us if you’d like to discuss your team’s approach to machine learning.