Using repo2docker
¶
The core functionality of repo2docker is to fetch a repo (e.g., from GitHub or other locations) and build a container image based on the specifications found in the repo. Optionally, it can launch a local Jupyter Notebook which you can use to explore it.
This section describes the general ways in which you can use
repo2docker
, including:
Note
See the Frequently Asked Questions for more info.
Preparing your repository¶
repo2docker
looks for configuration files in the repository being built
to determine how to build it. It is philosophically similar to
Heroku Build Packs.
repo2docker
will look for files in two places:
- A folder called
binder
in the root of the repository. - The root of the repository. (if a folder called
binder
exists in the root of the repository, configuration files outside of that folder will be ignored)
Note
In general, repo2docker
uses configuration files that are already part of
various data science workflows (e.g., requirements.txt
), rather than
creating new custom configuration files.
repodocker
configuration files are all composable - you can use any number
of them in the same repository. There are a few notable rules:
Dockerfile
: if a Dockerfile is present in a repository, it will take precedence over all other configuration files (which will be ignored).environment.yml
withrequirements.txt
: If both of these files are present, thenenvironment.yml
will be used to build the image, notrequirements.txt
. If you wish topip install
packages using anenvironment.yml
file, you should do so with the *pip:* key.Note
For a list of repositories demonstrating various configurations, see Sample build repositories.
Supported configuration files¶
Below is a list of supported configuration files.
requirements.txt
¶
This specifies a list of python packages that would be installed in a virtualenv (or conda environment).
Example Contents¶
numpy==1.7
matplotlib==2.1
environment.yml
¶
This is a conda environment specification, that lets you install packages with conda.
Example Contents¶
channels:
- conda-forge
- defaults
dependencies:
- matplotlib
- pip:
- sphinx-gallery
Important
You must leave the environment.yml
’s name field empty for this
to work out of the box.
apt.txt
¶
A list of debian packages that should be installed. The base image used is usually the latest released version of Ubuntu (currently Zesty.)
Example Contents¶
cowsay
fortune
postBuild
¶
A script that can contain arbitrary commands to be run after the whole repository has been built. If you want this to be a shell script, make sure the first line is #!/bin/bash.
Example Contents¶
wget <url-to-dataset>
python myfile.py
Note
This file must be executable to be used with repo2docker
. To do this,
run the following:
chmod +x postBuild
REQUIRE
¶
This specifies a list of Julia packages!
Note
Using a REQUIRE
file also requires that the repository contain an
environment.yml
file.
Example Contents¶
PyPlot
Stats
runtime.txt
¶
This allows you to control the runtime of Python. To use Python 2,
put the line python-2.7
in the file. A Python 2 kernel will be installed
alongside Python 3.
Example Contents¶
python-2.7
Dockerfile
¶
This will be treated as a regular Dockerfile and a regular Docker build will be performed. The presence of a Dockerfile prevents all other build behavior. See the Binder Documentation for best-practices with Dockerfiles.
Using repo2docker
with a JupyterHub¶
It is possible to use repo2docker
in order to build JupyterHub-ready
Docker images. In order for this to work properly, the version of the ``jupyterhub``
package in your git repository must match the version in your JupyterHub
deployment. For example, if your JupyterHub deployment runs jupyterhub==0.8
,
you should put the following in requirements.txt
or environment.yml
:
jupyterhub==0.8.*
Running repo2docker
locally¶
For information on installing repo2docker
, see Installing repo2docker.
Note
Docker must be running on your machine in order to build images
with repo2docker
.
Building an image¶
The simplest invocation of repo2docker
builds a Docker image
from a git repo, then runs a Jupyter server within the image
so you can explore the repository’s contents.
You can do this with the following command:
jupyter-repo2docker https://github.com/jakevdp/PythonDataScienceHandbook
After building (it might take a while!), it should output a message in your terminal:
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://0.0.0.0:36511/?token=f94f8fabb92e22f5bfab116c382b4707fc2cade56ad1ace0
If you copy/paste that URL into your browser you will see a Jupyter Notebook with the contents of the repository you have just built!
Debugging the build process¶
If you want to debug and understand the details of the docker image being built,
you can pass the debug
parameter to the commandline. This will print the
generated Dockerfile
before building and running it.
jupyter-repo2docker –debug https://github.com/jakevdp/PythonDataScienceHandbook
If you only want to see the Dockerfile
output but not actually build it,
you can also pass --no-build
to the commandline. This Dockerfile
output
is for debugging purposes only - it can not be used by docker directly.
jupyter-repo2docker –no-build –debug https://github.com/jakevdp/PythonDataScienceHandbook
Accessing help from the command line¶
For a list of all the build configurations at your disposal, see the CLI help:
jupyter-repo2docker -h