Week 20.5

In this page, we will do a quick introduction to fundamental data visualization libraries, namely matplotlib and seaborn. We split the plot types into three main categories regarding the relationship they display. These categories are evolution, distribution, and correlation. Plots in the evolution category simply indicates how variables change over time or how its relationship with another feature evolves. As implied by its name, the plot types in the distribution category displays the certain features' distribution. Finally the plots explained in correlation display the relationship between two features in the dataset. There are much more plot types that one can use the show those relationship among the features, so to learn more about different plot types, please refer to Python graph gallery ( https://www.python-graph-gallery.com ), which was prepared by Yan Holtz.

Evolution

Line plots:

The main use case for line plots is time series analysis. It shows how the value of a variable changes over time. A line chart displays the evolution of one or several numeric variables.

Here, you can see the how the Bitcoin price evolve over the years.
Similar to the Bitcoin price displayed above, you can see how the number of babies named with certain names change over time.

Area chart:

Area chart is used to show the evolution of numeric variable. Area chart can also be used to show the evolution of several variables.

Distribution

Histogram

A histogram is a graph showing frequency distributions. It displays the number of observations within each given interval.

Boxplot

Violin plot

Violin plots allow to visualize the distribution of a numeric variable for one or several groups. It is really close from a boxplot, but allows a deeper understanding of the distribution.

Correlation

Scatter Plot

A scatterplot displays the relationship between 2 numeric variables.

Heatmap

A heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors. It is a bit like looking a data table from above. It is really useful to display a general view of numerical data, not to extract specific data point.

It is widely used to plot the correlation matrix of a data frame.

Plot types based on data type

It is sometimes quite confusing to choose the right plot for your data. Here, you can find a nice guide. You can look into possible plot types you can use depending on the data type or the number of variables you want to visualize.

Last updated

Was this helpful?

#336:

Change request updated