Week 20.5
Last updated
Last updated
In this page, we will do a quick introduction to fundamental data visualization libraries, namely matplotlib and seaborn. We split the plot types into three main categories regarding the relationship they display. These categories are evolution, distribution, and correlation. Plots in the evolution category simply indicates how variables change over time or how its relationship with another feature evolves. As implied by its name, the plot types in the distribution category displays the certain features' distribution. Finally the plots explained in correlation display the relationship between two features in the dataset. There are much more plot types that one can use the show those relationship among the features, so to learn more about different plot types, please refer to Python graph gallery ( https://www.python-graph-gallery.com ), which was prepared by Yan Holtz.
The main use case for line plots is time series analysis. It shows how the value of a variable changes over time. A line chart displays the evolution of one or several numeric variables.
Area chart is used to show the evolution of numeric variable. Area chart can also be used to show the evolution of several variables.
A histogram is a graph showing frequency distributions. It displays the number of observations within each given interval.
Violin plots allow to visualize the distribution of a numeric variable for one or several groups. It is really close from a boxplot, but allows a deeper understanding of the distribution.
A scatterplot displays the relationship between 2 numeric variables.
A heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors. It is a bit like looking a data table from above. It is really useful to display a general view of numerical data, not to extract specific data point.
It is widely used to plot the correlation matrix of a data frame.
It is sometimes quite confusing to choose the right plot for your data. Here, you can find a nice guide. You can look into possible plot types you can use depending on the data type or the number of variables you want to visualize.