All machine learning teams face the same tasks: Manage a preprocessing pipeline; train and test models; and deploy models as APIs.
And nearly every team builds their own hodgepodge collection of internal tools and scripts. The alternatives are to either buy into an expensive and limiting proprietary platform or spend months learning and configuring open source products. The former “just work” but are limiting; the latter are tough to set up, but are flexible.
We wanted “flexible and easy” so we built Open MLOps
Open MLOps is a set of terraform scripts and user guides for you to set up a complete MLOps platform in a Kubernetes cluster. And all of it is open-source.
Who is Open MLOps for?
Open MLOps is for any engineering team. We built it so it’s easy to set up and can accommodate pretty much any machine learning use case. You’ll find Open MLOps useful if you’re,
- A machine learning research team: You know you should be using best practices from MLOps. But you’re not sure how to start and you lack the engineering capacity to build your own framework.
- A machine learning startup: You need to release your PoC quickly. But it also has to be scalable. You know Kubernetes could solve your problems but you don’t have much expertise.
- An individual machine learning practitioner: Even on your own, you battle to track experiments you’ve run and you waste time and effort on repetitive work. You need an easier, faster way to wrap your models in APIs, track experiments, and manage data pipelines.
- A machine learning team at a larger enterprise: Pre-built solutions aren’t flexible enough for you. You know you should build your own but you don’t want to start from scratch.
- A machine learning student: If you want to make the jump from academia to industry, you can test the waters by setting up Open MLOps and following our guides.
Open MLOps lets you deploy a configurable set of tools into a Kubernetes cluster. If you need to hook up your own authentication or configure the services in a specific way, you can simply fork the repository and build off the existing Terraform scripts.
What is Open MLOps?
Open MLOps is an open source repository of Terraform scripts and walk-through tutorials to set up a variety of machine learning tools in a Kubernetes cluster.
By default this sets up:
- Prefect: For scheduling tasks and building DAGs of dependent tasks. You can use it for cleaning data, training and deploying models, or other tasks that you need to automate. Tasks are defined using Python and you can easily build them into pipelines using intuitive decorators.
- JupyterHub: A shared Jupyter notebook environment for your team to run experiments, share code, and perform data wrangling.
- MLFlow: To track experiments and models, keeping a close record of results and runs, so you know what’s been tried and what’s working.
- Dask: To easily parallelize your heavy computation tasks across multiple cores or machines.
- Seldon: To convert your models into production REST APIs and serve them at scale.
What do I need to get started with MLOps?
The easiest way to get started with MLOps is with AWS. All you need is access to an AWS account and permission to create new services.
You also need some experience using command-line tools. Specifically, you’ll use Kubectl and Terraform during the set-up process, but our guides walk you through everything so you needn’t be an expert.
You can get everything running and configured in a few hours using our set up guide. Within another hour, you’ll already be training your first model.
Why should I use Open MLOps?
MLOps is an emerging but fast-growing field to help machine learning teams iterate quickly, deliver results reliably, and collaborate efficiently. Your team will benefit in several ways from the established open-source tools that make up Open MLOps:
- Hire developers more easily. Developers don’t want roles that rely on proprietary software. They know these could be dead-end jobs where they can’t learn transferable skills.
- Don’t reinvent the wheel. It’s often tempting to develop an internal platform that does everything you need. But most teams find it’s more complicated than they expected. You can read our post on why machine learning engineering is more complicated than software engineering. Even if you have intricate, custom requirements, you can start with Open MLOps as a base layer and build your custom solution on top.
- Build reproducible models. The biggest complaint machine learning teams have is the difficulty of reproducing their work, across different systems or over time. By using widely adopted frameworks and tools with MLOps practices you can ensure you’re never left with that large model binary that no one can build again.
- Collaborate efficiently. If everyone on your team uses the same system, it’s easier to work together, share, or hand over work to others.
Feedback, issues, or feature requests?
If you’ve given Open MLOps a try and need support or want to share your experience, please reach out. Alternatively, feel free to open an issue or a pull request directly on the Open MLOps repository.