12. Libraries

The built-in Python modules do a lot that you need, but there are many external Application Programming Interfaces (APIs), libraries or modules. We will look at some of these external libraries.

12.1. pip

One popular way to install external libraries is through pip. pip is a commandline tool that allows you to install Python libraries from the Python Package Index PyPi. In order to install a Python library from PyPi, all you need to know is the package name, e.g. pandas, and then you can issue the installation as follows.

pip install <package_name>

You can also install multiple packages in one line.

pip install <package_name_1> <package_name_2>

Note

pip will work its hardest to resolve transitive dependencies and bring those in. Transitive dependencies are those that a package you are trying to install depends on to work.

12.2. Pandas

pip install pandas

Pandas is a library for interacting with data. Writing CSV files is easy using Pandas.

import pandas as pd
import random

data = [[random.randint(0, 101) for _ in range(10)] for _ in range(10)]

df = pd.DataFrame(data, columns=[f'x{i}' for i in range(10)])
print(df.shape)

df.to_csv('test.csv', header=True, index=False)

Reading data from a CSV using Pandas is just as easy.

import pandas as pd

df = pd.read_csv('test.csv')

print(df.shape)

12.3. Numpy

pip install numpy scipy

Numpy is a numerical library. SciPy builds on numpy and is a general purpose scientific computing library. If we wanted to draw samples from a normal distribution centered on 0 with a scale of 1, \(\mathcal{N}(0, 1)\), we can use the normal() function.

from numpy.random import normal

values = normal(0, 1, 100)
print(values)

12.4. Scikit-Learn

pip install scikit-learn

Scikit-Learn is a data science library. We can use this library to learn predictive models, generate data, transform data and so on.

from sklearn.datasets import make_regression

X, y = make_regression(**{
   'n_samples': 1000,
   'n_features': 50,
   'n_informative': 10,
   'n_targets': 1,
   'bias': 5.3,
   'random_state': 37
})

print(f'X shape = {X.shape}, y shape {y.shape}')

12.5. joblib

pip install joblib

Joblib is an library to make multi-core processing easier in Python.

from math import sqrt
from joblib import Parallel, delayed

results = Parallel(n_jobs=2)(delayed(sqrt) (i ** 2) for i in range(10))
print(results)