IPython Cookbook

IPython Interactive Computing and Visualization Cookbook

Packt Publishing (520 pages, 30$, 09/2014)

This advanced-level book covers a wide range of methods for data science with Python:

  • Interactive computing in the IPython notebook
  • High-performance computing with Python
  • Statistics, machine learning, data mining
  • Signal processing and mathematical modeling

Highlights

  • 500+ pages
  • 100+ recipes
  • 15 chapters
  • Each recipe illustrates one method on one real-world example
  • Code for Python 3 (but works fine on Python 2.7)
  • All of the code freely available on GitHub
  • Contribute with issues and pull requests on GitHub

This is an advanced-level book: basic knowledge of IPython, NumPy, pandas, matplotlib is required. Beginners may be interested in the minibook.

A selection of free recipes from the book:

  1. Getting the best performance out of NumPy
  2. Simulating a physical system by minimizing an energy
  3. Creating a route planner for road network
  4. Introduction to machine learning in Python with scikit-learn
  5. Simulating a partial differential equation: reaction-diffusion systems and Turing patterns
  6. Getting started with Vispy
  7. Introduction to statistical data analysis in Python – frequentist and Bayesian methods
  8. more coming soon...

Part I: Advanced High-Performance Interactive Computing

Part I (chapters 1-6) covers advanced methods in interactive numerical computing, high-performance computing, and data visualization.

Chapter 1: A Tour of Interactive Computing with IPython

This chapter contains a brief but intense introduction to data analysis and numerical computing with IPython. It not only covers common packages such as NumPy, pandas, and matplotlib, but also advanced IPython topics such as interactive widgets in the notebook, custom magic commands, configurable IPython extensions, and new language kernels.

Chapter 2: Best practices in Interactive Computing

This chapter details best practices for writing reproducible, high-quality code: task automation, version control systems with Git, workflows with IPython, unit testing with nose, continuous integration, debugging, and other related topics. The importance of these subjects in computational research and data analysis cannot be overstated.

  • 2.1. Choosing between Python 2 and Python 3 (or not)
  • 2.2. Efficient interactive computing workflows with IPython
  • 2.3. Learning the basics of the distributed version control system Git
  • 2.4. A typical workflow with Git branching
  • 2.5. Ten tips for conducting reproducible interactive computing experiments
  • 2.6. Writing high-quality Python code
  • 2.7. Writing unit tests with nose (Python 2 or Python 3)
  • 2.8. Debugging your code with IPython
  • Full list of references

Chapter 3: Mastering the Notebook

This chapter covers advanced topics related to the IPython notebook, notably the notebook format, notebook conversions with nbconvert, and CSS/Javascript customization. The new interactive widgets available in IPython 2.0+ are also extensively covered. These techniques make data analysis in the notebook more interactive than ever.

Chapter 4: Profiling and Optimization

This chapter covers methods for making your code faster and more efficient: CPU and memory profiling in Python, advanced NumPy optimization techniques (including large array manipulations), and memory mapping of huge arrays with the HDF5 file format and the PyTables library. These techniques are essential for big data analysis.

Chapter 5: High-Performance Computing

This chapter covers advanced techniques for making your code much faster: code acceleration with Numba and Cython, wrapping of C libraries in Python with ctypes, parallel computing with IPython, OpenMP and MPI, and General-Purpose Computing on Graphics Processing Units (GPGPU) with CUDA and OpenCL. The chapter ends with an introduction to the recent Julia language, designed for high-performance numerical computing, and which can be easily used in the IPython notebook.

Chapter 6: Advanced Visualization

This chapter introduces a few data visualization libraries that go beyond matplotlib in terms of styling or programming interfaces (prettyplotlib and seaborn). It also covers interactive visualization in the notebook with Bokeh, mpld3, and D3.js. The chapter ends with an introduction to Vispy, a library that leverages the power of Graphics Programming Units (GPUs) for high-performance interactive visualization of big data.

Part II: Standard Methods in Data Science and Applied Mathematics

Part II (chapters 7-15) introduces standard methods in data science and mathematical modeling. All of these methods are applied to real-world data.

Chapter 7: Statistical Data Analysis

This chapter covers methods for getting insight into data. It introduces classic frequentist and Bayesian methods for hypothesis testing, parametric and nonparametric estimation, and model inference. The chapter leverages Python libraries such as pandas, SciPy, statsmodels, and PyMC. The last recipe introduces the statistical language R, which can be easily used in the notebook.

Chapter 8: Machine Learning

This chapter covers methods for learning and making predictions from data. Using the scikit-learn Python package, this chapter illustrates fundamental data mining and machine learning concepts such as supervised and unsupervised learning, classification, regression, feature selection, feature extraction, overfitting, regularization, cross-validation, and grid search. Algorithms addressed in this chapter include logistic regression, Naive Bayes, K-nearest neighbors, Support Vector Machines, random forests, and others. These methods are applied to various types of datasets: numerical data, images, and text.

Chapter 9: Numerical Optimization

This chapter is about minimizing or maximizing mathematical functions. This topic is pervasive in data science, notably in statistics, machine learning, and signal processing. This chapter illustrates a few root-finding, minimization, and curve fitting routines with SciPy.

Chapter 10: Signal Processing

This chapter is about extracting relevant information from complex and noisy data. These steps are sometimes required prior to running statistical and data mining algorithms. This chapter introduces standard signal processing methods like Fourier transforms and digital filters.

Chapter 11: Image and Audio Processing

This chapter covers signal processing methods for images and sounds. It introduces image filtering, segmentation, computer vision, and face detection with scikit-image and OpenCV. It also presents methods for audio processing and synthesis.

Chapter 12: Deterministic Dynamical Systems

This chapter describes dynamical processes underlying particular types of data. It illustrates simulation techniques for discrete-time dynamical systems, as well as for both Ordinary Differential Equations (ODEs) and Partial Differential Equations (PDEs).

Chapter 13: Stochastic Dynamical Systems

This chapter describes dynamical random processes underlying particular types of data. It illustrates simulation techniques for discrete-time Markov chains, point processes, and stochastic differential equations.

Chapter 14: Graphs, Geometry, and Geographic Information Systems

This chapter covers analysis and visualization methods for graphs, social networks, road networks, maps, and geographic data.

Chapter 15: Symbolic and Numerical Mathematics

This chapter introduces SymPy, a Computer Algebra System in pure Python. SymPy can help you conduct detailed analyses of mathematical models. The chapter ends with a short introduction to Sage, another Python-based system for computational mathematics.