In this chapter, we will cover the following topics:

In this chapter, we will see several advanced features and usage examples of the Jupyter Notebook. As we have only seen basic features in the previous chapters, we will dive deeper into the architecture of the Notebook here.

The Notebook ecosystem

Jupyter notebooks are represented as JavaScript Object Notation (JSON) documents. JSON is a language-independent, text-based file format for representing structured documents. As such, notebooks can be processed by any programming language, and they can be converted to other formats such as Markdown, HTML, LaTeX/PDF, and others.

There is an ecosystem of tools around the Notebook. Notebooks are being used to create slides, teaching materials, blog posts, research papers, and even books. In fact, this very book is entirely written in the Notebook using the Markdown format and a custom-made Python tool.

JupyterLab is the next generation of the Jupyter Notebook. It is still in an early stage of development at the time of this writing. We cover it in the last recipe of this chapter.

Architecture of the Jupyter Notebook

Jupyter implements a two-process model, with a kernel and a client. The client is the interface offering the user the ability to send code to the kernel. The kernel executes the code and returns the result to the client for display. In the Read-Evaluate-Print Loop (REPL) terminology, the kernel implements the Evaluate, whereas the client implements the Read and the Print of the process.

The client can be a Qt widget if we run the Qt console, or a browser if we run the Jupyter Notebook. In the Jupyter Notebook, the kernel receives entire cells at once, so it has no notion of a notebook. There is a strong decoupling between the linear document containing the notebook, and the underlying kernel.

All communication procedures between the different processes are implemented on top of the ZeroMQ (or ZMQ) messaging protocol (http://zeromq.org). The Notebook communicates with the underlying kernel using WebSocket, a TCP-based protocol implemented in modern web browsers.

Connecting multiple clients to one kernel

In a notebook, typing %connect_info in a cell gives the information we need to connect a new client (such as a Qt console) to the underlying kernel:

%connect_info
{
  "shell_port": 58645,
  "iopub_port": 47422,
  "stdin_port": 60550,
  "control_port": 39092,
  "hb_port": 49409,
  "ip": "127.0.0.1",
  "key": "2298f955-7020b0ce534e7a8d81053d43",
  "transport": "tcp",
  "signature_scheme": "hmac-sha256",
  "kernel_name": ""
}

Paste the above JSON into a file, and connect with:
    $> jupyter <app> --existing <file>
or, if you are local, you can connect with just:
    $> jupyter <app> --existing kernel-4342f625-a8...
or even just:
    $> jupyter <app> --existing
if this is the most recent Jupyter kernel you
    have started.

Here, <app> is console, qtconsole, or notebook

JupyterHub

JupyterHub, available at https://jupyterhub.readthedocs.io/en/latest/, is a Python library that can be used to serve notebooks to a set of end-users, for example students of a particular class, or lab members in a research group. It handles user authentication and other low-level details.

Security in notebooks

It is possible for an attacker to put malicious code in a Jupyter notebook. Since notebooks may contain hidden JavaScript code in a cell output, it is theoretically possible for malicious code to execute surreptitiously when the user opens a notebook.

For this reason, Jupyter has a security model where HTML and JavaScript code in a notebook can be either trusted or untrusted. Outputs generated by the user are always trusted. However, outputs that were already there when the user first opened an existing notebook are untrusted.

The security model is based on a cryptographic signature present in every notebook. This signature is generated using a secret key owned by every user.

References

The following are some references about the Notebook architecture:

Here are a few kernels in non-Python languages for the Notebook: