Jupyter Notebook Tutorial

A Jupyter notebook is designed to let you create an interactive notebook containing a mixture of explanatory text, equations, images, plots and live code.

This brief tutorial will let you lift up the hood, kick the tires and see how this all works.

1. Notebooks are made up of cells

Click around this document and notice that your cursor selects certain portions of the browser page. These discrete pieces that you are selecting are called cells. All Jupyter notebooks are made up of cells.

You can put a lot of things in these cells, including:

  1. explanatory text
  2. equations
  3. images
  4. plots
  5. live code (usually Python, R or some other language)

Selection mode

When you click a cell once, the cell is outlined in a colored box, showing that it is in selection mode. Once selected, a cell can be copied and pasted, deleted or moved up or down on the page, using the editing icons in the toolbar at the top of the Jupyter window:

Jupyter toolbar

If you move your cursor over any of icons in the toolbar and wait for a second or two, some helper text will appear that tells you the function a a given icon.

2. All cells can be edited

To edit a cell, double-click on it. The box outlining the cell will now turn green, showing that it is in edit mode. Now make your changes. Once you're happy with your edits, you can render the cell by holding down the shift key and then hitting the return key.

Go ahead and give this a try.

3. Evaluating / rendering cells

All cells can be rendered by selecting the cell, holding down the shift key and then hitting the return key.

4. Text cells are formatted in Markdown

Text cells are formatted in a simple plain-text markup system called Markdown, which is quite easy to learn. Markdowntutorial.com has a great interactive tutorial.

The next few cells contain a buffet table of different Markdown styles you can use in your text cells.


Headings are denoted by six different levels of hashmarks:

# The largest heading
## The second largest heading
### Heading 3
#### Heading 4
##### Heading 5
###### The smallest heading

Text styling

Here is a mixture of bold text and italicized text. You can also use both bold and italics at the same time.

Several of the words in this sentence have been struck out.


You can create several types of lists as follows:

a numbered list:

  1. taco
  2. enchilada
  3. chalupa

a bulleted list:

  • lettuce
  • tomato
  • cabbage

a more complex list with indented elements:

  1. Parts of the arm
    1. humerus
    2. forearm
      • radius
      • ulna
  2. Parts of the wrist
    1. scaphoid
    2. triquetrum
    3. other bones
  3. Parts of the hand
    • metacarpals
    • fingers
      • phalanges


You can create tables fairly easily with Markdown. Here is a simple table:

First Header Second Header
Content Cell Content Cell
Content Cell Content Cell

The elements of a table don't have to be obsessively lined up, as in the example above. The Markdown rendering engine of Jupyter will line everything up for you automatically. You can also add styled text within the cells of a table:

Salter Number Description
I physis only
II physis + corner of the metaphysis
III physis + part of the epiphysis
IV fracture plane involves epiphysis and metaphysis
V crush injury to physis

Cells can also be centered, right-justified or left-justified by adding appropriate colons to the second line of the table:

Part quantity price
screw 5 1.50
nut 6 22.50
bracket 10 123.45

URLs such as:


are automatically converted to live links to that site. One can also link words to URLs with a simple syntax:


Entering equations

Equations are entered using LaTeX notation, and bounded by two dollar signs at the beginning and end of each equation. For example, the decay of a single radioisotope can be expressed by the following equation:

$N(t) = N_{0} e^{-\lambda t}$

$t_{\frac{1}{2}} = \frac{ln(2)}{\lambda}$

variable meaning
N number of atoms present at time t
$N_{0}$ number of atoms present initially
t time
$t_{\frac{1}{2}}$ half-life = $\frac{ln(2)}{\lambda}$

Mathematical expressions such as $A = \frac{4}{3} \pi r^{3}$ can also be used inline, i.e., within a sentence. In this instance, one encloses the mathematical expression with only a single dollar sign at each end.

5. Displaying an image file

Text is great, but images rule (dude, you're a radiologist).

It is fairly easy to display images in a Jupyter notebook.

Double-click this cell to see the Markdown notation that was used to display a .PNG file of a knee MR that is located elsewhere on the internet:

Knee MR image

If this image file were located in the same folder as this notebook file, one could use the following Markdown notation to display it:

![Knee MR image](knee_MR.png)

6. Cells can contain Python code

By default, any new cells you create will expect raw Python code. If you enter valid Python commands in that cell and then hit shift-enter, Jupyter will run that code, and show the results in a result cell below that.

For example, the cell below tells the Python interpreter to add 2 and 3. Rendering that cell results in the output of 5.

In [2]:
2 + 3

If we import Python's math module, we can do more complex calculations:

In [3]:
import math
In [4]:
a = 5
b = 12
c = math.sqrt(a*a + b*b)
print("The hypoteneuse is equal to",c)
The hypoteneuse is equal to 13.0

The following Python code does some basic text manipulation:

In [5]:
a = "bananas"
b = "kumquats"
print ("I want to eat some",a,"and some", b)
I want to eat some bananas and some kumquats

A brief caveat for Python cells

Many computer programs expect things to happen in a particular sequence. If one runs pieces of the program out of this sequence, the computer will get confused and throw up an error message.

For example, if we ask Python to evaluate "x + y" before we tell it the values of x and y, it will get confused.

The next two cells of Python code are out of order. If you evaluate them in the order they are presented below, Jupyter will show an error message.

In [36]:
x + y
In [37]:
x = 3
y = 5

The easiest way to fix this is by reversing their order in our notebook file. We can do this by cutting and pasting, using the appropriate icons in the toolbar. Another way to reverse them is by selecting one of the statements and then clicking on the up or down arrow in the toolbar to move that statement to a new location.


Some of the exercises in this tutorial require loading a specific Python library. Therefore, when going through the examples in this notebook, be sure to start at the first step of each example, and to evaluate each of the cells in order.

7. Calculating a correlation coefficient

A common task in data analysis is estimating the Pearson product-moment correlation between two variables. Correlation coefficients are easy to calculate using numerical Python (numpy).

In [1]:
import numpy as np # load numpy

# now enter some x and y data
x = [10.0,8.0,13.0,9.0,11.0,14.0,6.0,4.0,12.0,7.0,5.0]
y =[8.04,6.95,7.58,8.81,8.33,9.96,7.24,4.26,10.84,4.82,5.68]

np.corrcoef(x,y) # calculate the correlation matrix
array([[ 1.        ,  0.81642052],
       [ 0.81642052,  1.        ]])

Ta dah!! The correlation coefficient for these two variables = 0.816 and change.

8. Creating a data plot

Now, let's plot the data we entered for x and y in the last example.

In [2]:
%matplotlib inline

import numpy as np # load numpy
import matplotlib.pyplot as plt

<matplotlib.collections.PathCollection at 0x10cd12e10>

Next, let's do a linear regression of this data, and add a regression line to this scatterplot.

In [8]:
# linear regression of x and y where
# m is slope and b is y-intercept
m, b = np.polyfit(x, y,1)

# print out the regression coefficients
print ("slope =", m)
print ("y-intercept =", b)

# plot the data points
#plt.plot(x, y, '.')


# plot the regression line in red

plt.plot(x, m * np.array(x) + b,'r-')
slope = 0.500090909091
y-intercept = 3.00009090909
[<matplotlib.lines.Line2D at 0x10ccf7b00>]

9. Un-paired t-test of two populations

Another common task in data analysis is the t-test, in which we look for evidence that two sets of data have a statistically significantly different means.

The SciPy library has a number of useful open-source routines for mathematics, science, and engineering. This includes a variety of built-in statistical tests.

Let's start by importing the stats module:

In [10]:
from scipy import stats

Next, let's define some bogus data.

In [11]:
data1 = [1.83, 1.93, 1.88, 1.85, 1.85, 1.91, 1.91, 1.85, 1.78, 1.91, 1.93, 1.80, 1.80, 1.85, 1.93, 1.85, 1.83, 1.85, 1.91, 1.85, 1.91, 1.85, 1.80, 1.80, 1.85]
data2 = [1.96, 2.06, 2.03, 2.11, 1.88, 1.88, 2.08, 1.93, 2.03, 2.03, 2.03, 2.08, 2.03, 2.11, 1.93]

A good first step would be to visually inspect the data, perhaps with a histogram plot.

In [12]:
bins = np.linspace(1.7, 2.2, 30)

plt.hist(data1, bins, alpha=0.5, label='data1')
plt.hist(data2, bins, alpha=0.5, label='data2')
plt.legend(loc='upper right')

Another common method used to visually inspect data for trends is the boxplot. Let's see what that looks like for these two datasets:

In [13]:
data_to_boxplot = [data1, data2]

bp = plt.boxplot(data_to_boxplot)

So far, both of these two exploratory plots suggest that while there is some overlap between these two datasets, they look somewhat different.

It's time to quantify what our eyeballs have been suggesting to us. We can do this by running the SciPy independent t-test method:

In [14]:
stats.ttest_ind(data1, data2)
Ttest_indResult(statistic=-7.7687469251313148, pvalue=2.2989223768606846e-09)

This gives us a t-statistic of 7.77, with a p-value of $2.3 \times 10^{-9}$, showing that our two datasets have means that are statistically significantly different.

For various reasons (e.g. if our data don't seem to be normally distributed), we might wish to test this difference with a different method; perhaps with a non-parametric test like the Kolmogorov-Smirnov 2-sample test. SciPy also has a method for that test:

In [15]:
stats.ks_2samp(data1, data2)
Ks_2sampResult(statistic=0.7466666666666667, pvalue=1.9366397496230471e-05)

This gives us a t-statistic of 0.75, with a p-value of $1.94 \times 10^{-5}$, showing that it is highly likely that these two datasets come from two different distributions.

10. Interactive graphics

Jupyter has a cool module called ipywidgets that can add interactivity to ones graphics. Radioactive decay presents a nice demonstration of this interactivity.

In [1]:
%matplotlib inline

from ipywidgets import interactive, interact
from IPython.display import display

Now, let's use the equations for radioactive decay featured earlier in this notebook to create a custom decay function.

In [6]:
def decay_plot(halflife):
    N0 = 1000
    maxtime = 200
    x = np.arange(0,maxtime, 1)
    y = N0 * np.exp(-x*np.log(2)/halflife)
    plt.ylabel("Number of atoms")
    plt.xlabel("time (min)")
    plt.title("Half-life = %s min" %halflife)

Next, let's display an interactive plot. An interactive control bar for halflife will appear just above the plot. When this halflife control is adjusted, the decay plot will update in realtime.

In [7]:
w = interactive(decay_plot, halflife=(1,200,1))


11. Mapping demonstration

Where did our radiology residents go to medical school?

It is occasionally useful to plot demographic or other geospatial data onto a map. Python and Jupyter can make a wide assortment of such maps.

This map demonstrates where our radiology residents went to medical school. This demonstration will download a short datafile exported from Excel in .csv (comma-separated variables) format. This file contains the name of the medical school attended by each of the radiology residents and the latitude and longitude for each of their medical schools. It then plots these sites on a map of the U.S. This map is then saved as a high quality (300 dpi) .TIF file.

In [4]:
# display inline on this page
%matplotlib inline

import csv
import urllib.request

# download an Excel file of data
url = 'http://uwmsk.org/jupyter/Resident_MS_data.csv'
filename, headers = urllib.request.urlretrieve(url)

# Create empty lists for the data we are interested in.
lats, lons = [], []
med_schools = []

# Read through the entire file, skip the first line,
#  and pull out just the name of the medical schools and their 
#  latitudes and longitudes.

with open(filename) as f:
    # Create a csv reader object.
    reader = csv.reader(f)
    # Ignore the header row.
    # Store the latitudes and longitudes in the appropriate lists.
    for row in reader:

# Now it's time to make a map, using the Basemap library

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np

# make this plot nice and big

# Tell Basemap where to center the map and the lat/long of its margins.
# In this example we will use the Mercator projection centered over 
#    the geographical center of the contiguous U.S.

res_map = Basemap(projection='merc', lat_0=39.8333333, lon_0=-98.585522,
    resolution = 'l', area_thresh = 1000.10,
    llcrnrlon=-125.5, llcrnrlat=23.5,
    urcrnrlon=-65.0, urcrnrlat=50.0)

# Draw the coastlines, national and state borders.
# Define the colors of the land and water on the map.


# Plot a point for each resident 
# and add a label nearby (25 km north and 25 km west of med school location)

marker_size = 10.0
for lon, lat, med_school in zip(lons, lats, med_schools):
    x,y = res_map(lon, lat)
    res_map.plot(x, y, 'go', markersize=marker_size)
    plt.text(x + 25000, y + 25000,  med_school)
# add a title to our map
title_string = "Medical schools attended by our radiology residents\n"

plt.title(title_string, fontsize=20)

# save a 300 dpi .TIF version of our map

plt.savefig('Resident_map.tif', dpi=300)


12. Creating and playing audio files

Radiologists use high frequency audio signals (ultrasound) to produce diagnostic images. In order to create and interpret sonographic images optimally, it is helpful to understand some of the underlying physics.

The sounds used in medical imaging have frequencies up in the range of 1 to 10 MHz. However, many ultrasonic phenomena can be modeled with sounds audible to humans. A Jupyter notebook makes it easy to produce custom sounds. This makes it easy to create and share interactive modules that the basic principles of diagnostic ultrasound.

Audio simulation 1: a simple sine wave tone generator

The intensity of a single pure audio tone is given by the following formula:

$$I = \sin{2 \pi f t}$$

where t is time, and f is the frequency of the tone in Hz.

To create our tone, we will use the numpy (Numerical Python) library.

We will create a 1.5 second tone that is composed of a certain number of samples per second. For this example, we will use a sampling frequency of 44,100 Hz, which is the same sampling frequency used for compact discs. For this example, we will produce a 440 Hz tone, which is the A above middle C on a piano). One could use this tone to tune one's guitar or other instrument.

In [13]:
import numpy as np # load the Numerical Python library

fc = 440 # tone frequency --- A440
fs = 44100 # sampling frequency, Hz

T = 1.5 # length of tone in seconds
twopi = 2*np.pi

t = np.linspace(0, T, int(T*fs), endpoint=False) # time variable

output = np.sin(twopi*fc*t)

To play our tone, we will use the Python Audio library.

In [14]:
from IPython.display import Audio

Audio(output, rate=fs)

Audio simulation 2: Mixing two frequencies

Let's start with two pure audio sine waves. For this example, we have chosen two sounds with frequencies of 15 and 17 Hz. The following bit of Python code will plot these two waveforms for us:

In [15]:
# display inline on this page
%matplotlib inline

import matplotlib.pyplot as plt # load the Python plotting library

freq_1 = 15 #  frequency 1
freq_2 = 17 # frequency 2
fs = 3500 # sampling frequency, Hz

T = 3.0 # length of tone in seconds
twopi = 2*np.pi

t = np.linspace(0, T, int(T*fs), endpoint=False) # time variable

freq_1_output = np.sin(twopi*freq_1*t) 
freq_2_output = np.sin(twopi*freq_2*t)

# now plot these two audio tones
f, axarr = plt.subplots(2, sharex=True)
axarr[0].plot(t, freq_1_output)
axarr[0].set_title('frequency = 15 Hz')
axarr[1].plot(t, freq_2_output) 
axarr[1].set_title('frequency = 17 Hz')
<matplotlib.text.Text at 0x110827518>

Next, we will then mix two tones together. As these two tones go in and out of phase with each other, an interference pattern will be produced.

The next bit of Python code plots this interference pattern for us:

In [16]:
f1 = 15 #  frequency 1
f2 = 17 # frequency 2
fs = 44100 # sampling frequency, Hz

T = 3.0 # length of tone in seconds
twopi = 2*np.pi

t = np.linspace(0, T, int(T*fs), endpoint=False) # time variable

output = np.sin(twopi*f1*t) + np.sin(twopi*f2*t)

import matplotlib.pyplot as plt # load the Python plotting library

plt.plot(t, output)
plt.xlabel("time (sec)")
plt.title("Interference pattern of two tones")

# Audio(output, rate=fs)

At the beginning of the plot, the two sounds are in phase with each other, so they add together, producing a signal that is twice as big as eithe of the original signals. When the two sounds are out of phase with each other, they cancel each other out. The frequency of this interference pattern is equal to the difference in the two frequencies. Since we have chosen two frequencies that differ by ony 2 Hz, the interference pattern will occur twice per second.

The frequencies of these two sine waves were chosen to make it easy to see the details of the interference pattern. However, both of these frequencies are below the range of human hearing. Therefore, let's raise each frequency up into the audio range, but still keep the difference in the two frequencies equal to 2 Hz. For this example, we will use tones of 370 and 372 Hz.

Let's plot these and see how the mixture looks...

In [17]:
f1 = 370 #  frequency 1
f2 = 372 # frequency 2
fs = 44100 # sampling frequency, Hz

T = 3.0 # length of tone in seconds
twopi = 2*np.pi

t = np.linspace(0, T, int(T*fs), endpoint=False) # time variable

output = np.sin(twopi*f1*t) + np.sin(twopi*f2*t)

import matplotlib.pyplot as plt # load the Python plotting library

plt.plot(t, output)
plt.xlabel("time (sec)")
plt.title("Interference pattern of two tones")

The following code will mix these two tones together and let us hear listen to the mixture.

In [18]:
f1 = 370 #  frequency 1
f2 = 372 # frequency 2
fs = 44100 # sampling frequency, Hz

T = 3.0 # length of tone in seconds
twopi = 2*np.pi

t = np.linspace(0, T, int(T*fs), endpoint=False) # time variable

output = np.sin(twopi*f1*t) + np.sin(twopi*f2*t)

Audio(output, rate=fs)