The Simple Yet Practical Data Visualization Codes
To solve the common scenarios of plotting and EDA
In the previous article I shared about my little toolbox for data cleaning after realizing that some codes are applicable for most common scenarios of messy data.
In other words, there is a pattern (or an approach) that is commonly used in data science for data cleaning and I compiled them into functions for reusability purposes in future.
Interestingly, in my opinion, this kind of “pattern” is also noticed in Exploratory Data Analysis (EDA), particularly in the case of data visualization. And I think there is a need for this article to be here to share my codes and explanation for the benefit of others.
Remember few months ago I posted an article — Exploratory Data Analysis on E-Commerce Data? In that article, I talked about why EDA is important in data science and how data can be explored and visualized in a simpler way to give meaningful insights to you, or potentially your stakeholders.
To understand your data and communicate results with stakeholders, data visualization is of utmost importance to give data a story to tell — storytelling.
Since the common scenarios here span across different types of datasets, this article focuses more on showing and explaining what the codes are used for as well as the plots so that you can plug and play easily in your projects.
At the end of this article, I hope you’ll find the codes useful and that would make your data visualization process much more fun, faster and effective!
Let’s get started!
Background of Dataset
In short, the data consists of transactional data with customers in different countries who make purchases from an online retail company based in the United Kingdom (UK) that sells unique all-occasion gifts.
The following codes can in fact be generalized to other dataset based on your needs with some minor adjustments.
The goal here is to show you how I usually perform data visualization given some generic dataset. Also, the codes are by no means an exhaustive compilation to cover all kind of plots but they should be fundamentally sufficient to get you started.
The data shown here has also gone through some data cleaning so that we can use it directly and focus on data visualization. In case you want to know how the data cleaning was done, you can always refer to this article written previously.
The Jupyter notebook and clean data for this data visualization has been uploaded to my GitHub.
Each column is pretty self-explanatory given that we’re dealing with typical e-commerce data. Let’s see what we can do to visualize this data!
My Little Toolbox for Data Visualization
1. Boxplot — Unit Price
Unit price here means price for each item. In the e-commerce world, we are curious about the spread of the unit price to understand its distribution of price.
We used Seaborn to do the boxplot (one of my favourite tools!) with just only one line of code and the rest is solely for labelling purpose. From the plot we see that the majority of unit price is less than $800 and the highest unit price can reach more than $8000. Good. Let’s go for the next step.
2. Distribution Plot — Quantity Sold
Again, we used Seaborn to do the distribution plot. In this case, we only take quantity sold (less than 100) into account as this is where the majority of the data lies within.
We see that most items are sold within the quantity of 30. Cool. What about the number of orders sold to each country?
3. Horizontal Bar chart
Since the online retail company is based on the UK, it is no surprise that United Kingdom has the highest number of orders made. Therefore, we intentionally neglected this country for more meaningful comparison among other countries.
You may have noticed by now, dataframe.groupby is extremely useful when it comes to plotting continuous variables grouped by some categorical variables.
You can even directly plot from the dataframe without having to use matplotlib. Whether to use vertical or horizontal bar chart depends on your needs. We chose horizontal bar chart in this case to show the name of each country in a more clearer fashion.
We’ll see how vertical bar chart can be used in the next section.
4. Vertical Bar Chart (With Annotation)
Here comes the vertical bar chart with annotation. Sometimes we may want to show a vertical bar chart with percentage annotation to show the portion occupied by some variables.
In our context, we want to know the number of orders for different days and look at their respective percentage for more insights.
A code sample is attached above to show you how to annotate percentage in the same plot without affecting the visual.
5. Bar Chart & Line Plot (Combined)
Finally, we want to know the total amount spent by customers (or total sales made) for each month.
At some point in time we may also want to know the percentage change between the current and prior element. In this case, we can make a line plot to know the percentage change from the previous month to the current month — all in one plot.
Use this combined plot wisely and sparingly as this may cause confusion to people with information over-packed in one plot. Again, the usability of the combined plot depends on situation and needs.
Thank you for reading.
Data visualization is nothing but a storytelling. Who is your audience? What are the takeaways that you want your audience to get from the visualization? What are the actionable insights to be executed?
I hope this little toolbox of data visualization would help you in data visualization in some ways.
If you’re interested in learning how to visualize data and perform storytelling to capture audience’s attention and convey your ideas effectively, I strongly encourage you to check out this book — Storytelling with Data: A Data Visualization Guide for Business Professionals.
As always, if you have any questions, feel free to leave your comments below. Till then, see you in the next post!