Towards Dev

A publication for sharing projects, ideas, codes, and new theories.

Follow publication

How to dockerize Data Science

--

This article is part of a multi-part series “How to dockerize [x]”.
The previous article is
How to dockerize Python environments.

Dockerizing Data Science meme
Source: imgflip Meme Generator

JupyterLab

For the first miles of many Data Science projects JupyterLab is the way to go. You may also use cloud hosted services like Google Colab, Amazon SageMaker Studio Lab or Azure Machine Learning Studio.

The fastest and easiest way to run JupyterLab in a Docker container is by running:

docker run --rm -p 8888:8888 franzdiebold/datascience-ultimate:latest

This will start a Docker container running JupyterLab, which you may access in your browser on http://127.0.0.1:8888/lab?token=<your-token>.

JupyterLab screenshot
JupyterLab running in Data Science Ultimate Docker image — Source: Image by the author.

Cool! 😎

But if you run the command from above, you will face two problems:

  1. No local file access
  2. Ephemeral dependencies

We may solve both problems using Docker bind mounts and Docker volumes.

For accessing local files, we can mount the current directory $PWD into the container by using the -v flag and also setting the working directory with -w:

docker run --rm -p 8888:8888 -v "${PWD}":/usr/src/my-current-project -w /usr/src/my-current-project franzdiebold/datascience-ultimate:latest

If we want to install dependencies (using pip) which will not disappear when restarting the container, we need to use a (persistent) Docker volume (my_volume) that is mapped to the directory where third party packages are installed. This can also be done using the -v flag:

docker run --rm -p 8888:8888 -v my_volume:/opt/conda/lib/python3.9/site-packages -v "${PWD}":/usr/src/my-current-project -w /usr/src/my-current-project franzdiebold/datascience-ultimate:latest

In this case, for every environment you want install dependencies in, you will need a different Docker volume.

But how could you automize this to have automagic 🪄 Jupyter environments?

Jupyter environments

You probably want an isolated Jupyter environment for each Data Science project you are working on. And each project probably has its own folder. So why not use the project path as the Docker volume name?
For better readability, the path should be slugified. So for a project path /Users/JohnDoe/Documents/dev/my-cool-data-science-project the Docker volume name would be users-johndoe-documents-dev-my-cool-data-science-project.
This can be automized using shell functions:

slugify() {
echo "$1" | iconv -t ascii//TRANSLIT | sed -E 's/[^a-zA-Z0-9-]+/-/g' | sed -E 's/^-+|-+$//g' | tr A-Z a-z
}env_name() {
echo $(slugify ${${1:-$PWD}: -200})
}

The shell function slugify slugifies a given input. The env_name function returns the slugified version of the current directory ($PWD) with a maximum lenght of 200. The output of env_name can now be the name of our Docker volume for the Jupyter environment! 👍

In your bashrc file (i.e. .zshrc or .bashrc) you just add (together with slugify and env_name from ahead):

jupyter-env() {
local working_directory="/usr/src/$(basename ${PWD})"
local ssh_directory="${HOME}/.ssh"
docker run --rm -p 8888:8888 -p 4040:4040 -v "$(env_name $1)_jupyter":/opt/conda/lib/python3.9/site-packages -v "${PWD}":"$working_directory" -v $ssh_directory:/home/jovyan/.ssh -w $working_directory franzdiebold/datascience-ultimate:latest
}

We can also define another shell alias to make it even easier:

alias je=jupyter-env

So next time you work on your Data Science project you just run je in your project directory and let the magic happen!

Delete Jupyter environment 🧹

Using the following shell function you can delete a Jupyter environment:

jupyter-env-del() {
docker volume rm "$(env_name $1)_jupyter"
}

This will delete the corresponding Docker volume.

This article is part of a multi-part series “How to dockerize [x]”.
The previous article is How to dockerize Python environments.

If you appreciate this post, here are a few things you can do to support my work:

  1. Give this story a clap. 👏
  2. Subscribe to my upcoming stories.
  3. Follow me on GitHub: https://github.com/franzdiebold

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in Towards Dev

A publication for sharing projects, ideas, codes, and new theories.

Responses (1)

Write a response