The Data Science Docker image you were waiting for!

Franz Diebold
2 min readApr 5, 2022

--

Source: Tom Fisk from Pexels

When looking for an appropriate Docker image for my Data Science projects, I could not find any that matched my requirements — although they are not so unusual.

What’s important in Data Science (for me)

  1. Languages
    - Python
    - Scala
    - R
  2. Technology
    - Spark
    - JupyterLab
  3. Data Libraries
    - Pandas
    - Numpy
    - Polars
  4. (ML) Algorithms
    - scikit-learn
    - SciPy
    - XGBoost
  5. Visualization
    - Plotly
    - Seaborn
    - Graphviz
  6. “Helpers”
    - Git support
    - Code formatting
    - Nice theming (for JupyterLab)

Many points in the list are already covered by the jupyter/all-spark-notebook Docker image.
To also include the missing points, I created and published my own Docker image franzdiebold/datascience-ultimate:

To give you a sneak peek, this is how JupyterLab looks like when using the image:

JupyterLab screenshot from franzdiebold/datascience-ultimate image
JupyterLab in franzdiebold/datascience-ultimate — Source: Image by the author.

How can I use it?

The fastest way is to just run the following command in your shell:

docker run -p 8888:8888 -p 4040:4040 franzdiebold/datascience-ultimate

This will start JupyterLab in a Docker container and the following web apps will be available:

There’s an even better way

You probably want your local files and folders to be accessable from within the container. Therefore, we need to mount the current directory ($PWD) to the working directory (/home/jovyan) in the container:

docker run --rm -p 8888:8888 -p 4040:4040 -v "${PWD}":/home/jovyan franzdiebold/datascience-ultimate:latest

If you like the Docker image:

Thanks! 🙏

If you appreciate this post, here are a few things you can do to support my work:

  1. Give this story a clap. 👏
  2. Subscribe to my upcoming stories.
  3. Follow me on GitHub: https://github.com/franzdiebold

--

--