12. Libraries

The built-in Python modules do a lot that you need, but there are many external Application Programming Interfaces (APIs), libraries or modules. We will look at some of these external libraries.

12.1. pip

One popular way to install external libraries is through pip. pip is a commandline tool that allows you to install Python libraries from the Python Package Index PyPi. In order to install a Python library from PyPi, all you need to know is the package name, e.g. pandas, and then you can issue the installation as follows.

pip install <package_name>

You can also install multiple packages in one line.

pip install <package_name_1> <package_name_2>


pip will work its hardest to resolve transitive dependencies and bring those in. Transitive dependencies are those that a package you are trying to install depends on to work.

12.2. Pandas

pip install pandas

Pandas is a library for interacting with data. Writing CSV files is easy using Pandas.

1import pandas as pd
2import random
4data = [[random.randint(0, 101) for _ in range(10)] for _ in range(10)]
6df = pd.DataFrame(data, columns=[f'x{i}' for i in range(10)])
9df.to_csv('test.csv', header=True, index=False)

Reading data from a CSV using Pandas is just as easy.

1import pandas as pd
3df = pd.read_csv('test.csv')

12.3. Numpy

pip install numpy scipy

Numpy is a numerical library. SciPy builds on numpy and is a general purpose scientific computing library. If we wanted to draw samples from a normal distribution centered on 0 with a scale of 1, \(\mathcal{N}(0, 1)\), we can use the normal() function.

from numpy.random import normal

values = normal(0, 1, 100)

12.4. Scikit-Learn

pip install scikit-learn

Scikit-Learn is a data science library. We can use this library to learn predictive models, generate data, transform data and so on.

from sklearn.datasets import make_regression

X, y = make_regression(**{
   'n_samples': 1000,
   'n_features': 50,
   'n_informative': 10,
   'n_targets': 1,
   'bias': 5.3,
   'random_state': 37

print(f'X shape = {X.shape}, y shape {y.shape}')

12.5. joblib

pip install joblib

Joblib is an library to make multi-core processing easier in Python.

from math import sqrt
from joblib import Parallel, delayed

results = Parallel(n_jobs=2)(delayed(sqrt) (i ** 2) for i in range(10))