Version control and visual diffs for Jupyter notebooks with Jovian.ml
Jovian.ml is a sharing and collaboration platform for data science projects. This post is a follow-up the introductory post on sharing & embedding Jupyter notebooks online with Jovian.
Jupyter notebooks are great for interactive programming and visualization of outputs. For this very reason, however, it is sometimes quite difficult to version control Jupyter notebooks. Git repositories don’t work well for Jupyter notebooks, for a number of reasons:
- Notebooks are often quite large in size (~10 to 100MB), so they can slow down your repository, since Git is designed to work well only with small code files.
- Notebooks are not plain code files: they use a custom JSON file format with html and images embedded as strings inside. So, commit logs and
git diff
outputs don’t make sense for notebooks. - Data science is more experimental in nature compared to software development, which means there can be many failed experiments, that you many not necessarily want to track in a version control system. Unfortunately, Git does not offer a simple way of removing intermediate versions.
- Not every working on data science projects or using Jupyter notebooks is familiar or comfortable with software development tools. Git has a fairly steep learning curve and can seem quite intimidating to beginners.
Most people end up creating copies of Jupyter notebooks and use long descriptive filenames like cifar10_preprocessed_resnet18_adam_lr1e-5_10epochs_accuracy_94_v3_final.ipynb
, which is far from ideal.
That’s why we created a simple versioning system for Jupyter notebooks on Jovian. Just run jovian.commit()
inside your notebook every time you wish to record a snapshot. On the first commit, the notebook is uploaded to your Jovian account as a new project, and subsequent commits automatically record new versions.
On the notebook page, you can switch between different versions using the versions dropdown.
You can also click the Compare button to view the list of versions in a table. You can edit the version title, add notes and even archive (hide) or delete failed experiments. Jovian offers a flexible version control system that fits right into the workflow of a data science or machine learning project.
You can also attach helper scripts, model checkpoints, datasets, hyperparameters, metrics, outputs etc. with each version of your notebook. Check out the API docs for jovian.commit
for more information.
You can download a specific version of the notebook using the ‘Download Zip’ button, or use the jovian clone
command.
jovian clone aakashns/keras-mnist-jovian -v 10
This behavior is intentionally different from git clone
(which downloads the entire history of the project), because notebooks can be fairly large in size, so downloading a single version is much faster.
Jovian also lets you view a visual side-by-side diff between different versions of a notebook. You can see fine-grained changes between code cells, markdown, output values, graphs etc. It’s especially useful while comparing the results of two experiments.
Behind the scenes, the diff is powered by nbdime, but you don’t need to install or set up anything on your computer. jovian.commit()
is all you need!
There’s a lot more you can do with Jovian. Over the next few weeks, we’ll publish several blog posts & videos describing other features. Visit www.jovian.ml to sign up to receive updates.