Data Science stack⚓︎
This Docker Compose stack is a collection of services that can be used to run a Data Science environment.
JupyterLab | MLFlow Server |
---|---|
Prerequisites⚓︎
On desktop systems like Docker Desktop for Mac and Windows, Docker Compose is included as part of those desktop installs.
Services⚓︎
MLFlow⚓︎
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. MLflow currently offers four components:
JupyterLab⚓︎
JupyterLab is a web-based interactive development environment for Jupyter notebooks, code, and data. JupyterLab is flexible: configure and arrange the user interface to support a wide range of workflows in data science, scientific computing, and machine learning. JupyterLab is extensible and modular: write plugins that add new components and integrate with existing ones.
Dataiku Data Science Studio (DSS)⚓︎
Dataiku enables you to create, share, and reuse applications that leverage data and machine learning to extend and automate decision making.
Project setup⚓︎
Environment variables⚓︎
You need to set the following environment variables to use this Docker Compose.
The best way to do this is to add them to a .env
file in the same directory as docker-compose.yml
.
.env
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Docker Compose up⚓︎
To run all the required services, execute the following command:
1 |
|
Dataiku Data Science Studio (DSS) is an optional service. To run the stack with it, execute the following command:
1 |
|
You can now access the following endpoints:
Endpoint | Description |
---|---|
http://localhost:8888/ | JupyterLab |
http://localhost:5000/ | MLflow UI |
http://localhost:11000/ | Dataiku (optional) |
Next features⚓︎
Add optional profiles with more tools such as:
- Airflow - Develop, schedule, and monitor workflows
- Redash - Visualization tool
- Postgres - Relational Database
- Feast - Feature Store
So you can up your stack with optional tools using the same docker-compose.yml file.
1 |
|
The result of this command will launch JupyterLab, MLFlow Server (with its postgres and minio), Kubeflow and Redash.
Change Log⚓︎
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
1.1.0 (2021-08-15)⚓︎
- Added Dataiku service as a profile
--profile dataiku
.
1.0.0 (2021-08-07)⚓︎
Initial release with the following features:
- MLflow Tracking Server.
- JupyterLab.
- GitHub Page with the documentation.
- GitHub Actions (CI) to verify the docker-compose.yml file and deploy the documentation.