12. Libraries
The built-in Python modules do a lot that you need, but there are many external Application Programming Interfaces (APIs), libraries or modules. We will look at some of these external libraries.
12.1. pip
One popular way to install external libraries is through pip
. pip
is a commandline tool that allows you to install Python libraries from the Python Package Index PyPi
. In order to install a Python library from PyPi, all you need to know is the package name, e.g. pandas
, and then you can issue the installation as follows.
pip install <package_name>
You can also install multiple packages in one line.
pip install <package_name_1> <package_name_2>
Note
pip
will work its hardest to resolve transitive
dependencies and bring those in. Transitive dependencies are those that a package you are trying to install depends on to work.
12.2. Pandas
pip install pandas
Pandas
is a library for interacting with data. Writing CSV files is easy using Pandas.
1import pandas as pd
2import random
3
4data = [[random.randint(0, 101) for _ in range(10)] for _ in range(10)]
5
6df = pd.DataFrame(data, columns=[f'x{i}' for i in range(10)])
7print(df.shape)
8
9df.to_csv('test.csv', header=True, index=False)
Reading data from a CSV using Pandas is just as easy.
1import pandas as pd
2
3df = pd.read_csv('test.csv')
4
5print(df.shape)
12.3. Numpy
pip install numpy scipy
Numpy
is a numerical library. SciPy
builds on numpy and is a general purpose scientific computing library. If we wanted to draw samples from a normal distribution centered on 0 with a scale of 1, \(\mathcal{N}(0, 1)\), we can use the normal()
function.
from numpy.random import normal
values = normal(0, 1, 100)
print(values)
12.4. Scikit-Learn
pip install scikit-learn
Scikit-Learn
is a data science library. We can use this library to learn predictive models, generate data, transform data and so on.
from sklearn.datasets import make_regression
X, y = make_regression(**{
'n_samples': 1000,
'n_features': 50,
'n_informative': 10,
'n_targets': 1,
'bias': 5.3,
'random_state': 37
})
print(f'X shape = {X.shape}, y shape {y.shape}')
12.5. joblib
pip install joblib
Joblib
is an library to make multi-core processing easier in Python.
from math import sqrt
from joblib import Parallel, delayed
results = Parallel(n_jobs=2)(delayed(sqrt) (i ** 2) for i in range(10))
print(results)