How to dockerize Data Science
This article is part of a multi-part series “How to dockerize [x]”.
The previous article is “How to dockerize Python environments”.

JupyterLab
For the first miles of many Data Science projects JupyterLab is the way to go. You may also use cloud hosted services like Google Colab, Amazon SageMaker Studio Lab or Azure Machine Learning Studio.
The fastest and easiest way to run JupyterLab in a Docker container is by running:
docker run --rm -p 8888:8888 franzdiebold/datascience-ultimate:latest
This will start a Docker container running JupyterLab, which you may access in your browser on http://127.0.0.1:8888/lab?token=<your-token>.

By the way, this is using a custom Docker image that I open sourced as well! Do not forget t️️o ⭐ it on GitHub:
Cool! 😎
But if you run the command from above, you will face two problems:
- No local file access
- Ephemeral dependencies
We may solve both problems using Docker bind mounts and Docker volumes.
For accessing local files, we can mount the current directory $PWD
into the container by using the -v
flag and also setting the working directory with -w
:
docker run --rm -p 8888:8888 -v "${PWD}":/usr/src/my-current-project -w /usr/src/my-current-project franzdiebold/datascience-ultimate:latest
If we want to install dependencies (using pip
) which will not disappear when restarting the container, we need to use a (persistent) Docker volume (my_volume
) that is mapped to the directory where third party packages are installed. This can also be done using the -v
flag:
docker run --rm -p 8888:8888 -v my_volume:/opt/conda/lib/python3.9/site-packages -v "${PWD}":/usr/src/my-current-project -w /usr/src/my-current-project franzdiebold/datascience-ultimate:latest
In this case, for every environment you want install dependencies in, you will need a different Docker volume.
But how could you automize this to have automagic 🪄 Jupyter environments?
Jupyter environments
You probably want an isolated Jupyter environment for each Data Science project you are working on. And each project probably has its own folder. So why not use the project path as the Docker volume name?
For better readability, the path should be slugified. So for a project path /Users/JohnDoe/Documents/dev/my-cool-data-science-project
the Docker volume name would be users-johndoe-documents-dev-my-cool-data-science-project
.
This can be automized using shell functions:
slugify() {
echo "$1" | iconv -t ascii//TRANSLIT | sed -E 's/[^a-zA-Z0-9-]+/-/g' | sed -E 's/^-+|-+$//g' | tr A-Z a-z
}env_name() {
echo $(slugify ${${1:-$PWD}: -200})
}
The shell function slugify
slugifies a given input. The env_name
function returns the slugified version of the current directory ($PWD
) with a maximum lenght of 200. The output of env_name
can now be the name of our Docker volume for the Jupyter environment! 👍
In your bashrc file (i.e. .zshrc
or .bashrc
) you just add (together with slugify
and env_name
from ahead):
jupyter-env() {
local working_directory="/usr/src/$(basename ${PWD})"
local ssh_directory="${HOME}/.ssh"
docker run --rm -p 8888:8888 -p 4040:4040 -v "$(env_name $1)_jupyter":/opt/conda/lib/python3.9/site-packages -v "${PWD}":"$working_directory" -v $ssh_directory:/home/jovyan/.ssh -w $working_directory franzdiebold/datascience-ultimate:latest
}
We can also define another shell alias to make it even easier:
alias je=jupyter-env
So next time you work on your Data Science project you just run je
in your project directory and let the magic happen!
Delete Jupyter environment 🧹
Using the following shell function you can delete a Jupyter environment:
jupyter-env-del() {
docker volume rm "$(env_name $1)_jupyter"
}
This will delete the corresponding Docker volume.
Complete Code
The complete code is part of the accompanying git repository:
Check it out and don’t forget to ⭐️!
This article is part of a multi-part series “How to dockerize [x]”.
The previous article is “How to dockerize Python environments”.
If you appreciate this post, here are a few things you can do to support my work:

- Give this story a clap. 👏
- Subscribe to my upcoming stories.
- Follow me on GitHub: https://github.com/franzdiebold