Python COVID-19 数据可视化案例

DataSci YX ⋅ 于 2020-04-13 15:34:18 ⋅ 616 阅读

原文:Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!)
本文根据原文改写,重点关注代码解析。感谢原文作者的工作与分享!

Section 1 - Download Data

在notebook中输入如下命令,即可下载数据,并在当前工作目录下保存为countries-aggregated.csv

!curl -o countries-aggregated.csv https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv

Section 2 - Loading and Selecting Data

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import matplotlib.ticker as ticker
%matplotlib inline 
# parse_dates :将csv中的时间字符串转换成日期格式
df = pd.read_csv('countries-aggregated.csv', parse_dates=['Date'])
# 查看国家
df['Country'].unique()

这份数据包含很多国家

array(['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola',
'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia',
'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh',
'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bhutan',
'Bolivia', 'Bosnia and Herzegovina', 'Botswana', 'Brazil',
'Brunei', 'Bulgaria', 'Burkina Faso', 'Burma', 'Burundi',
'Cabo Verde', 'Cambodia', 'Cameroon', 'Canada', ...

我们选取6个国家的数据做可视化

countries = ['Canada', 'Germany', 'United Kingdom', 'US', 'France', 'China']
# 掩码取值,选取countries内包含的6个国家,也可以添加,此时只要调整countries这个list即可
df = df[df['Country'].isin(countries)]

Section 3 - Creating a Summary Column

# Cases = Confirmed + Recovered + Deaths (axis=1)
df['Cases'] = df[['Confirmed', 'Recovered', 'Deaths']].sum(axis=1)

Section 4 - Restructuring our Data

# 制作透视表
df = df.pivot(index='Date', columns='Country', values='Cases')
covid = df

Section 5 - Calculating Rates per 100,000

populations = {'Canada':37664517, 'Germany': 83721496 , 'United Kingdom': 67802690 , 'US': 330548815, 'France': 65239883, 'China':1438027228}
# 根据covid进行浅拷贝,创建新的对象,但是这个新对象对旧对象中的子对象并没有重新创建
# 具体参考附录文献[1]
percapita = covid.copy()
for country in list(percapita.columns):
    percapita[country] = percapita[country]/populations[country]*100000

Section 6 - Generating Colours and Style

colors = {'Canada':'#045275', 'China':'#089099', 'France':'#7CCBA2', 'Germany':'#FCDE9C', 'US':'#DC3977', 'United Kingdom':'#7C1D6F'}
# 配置plot的风格,具体见[2]
plt.style.use('fivethirtyeight')

Section 7 - Creating the Visualization

# 核心绘图语句,设置图片大小、线条颜色、线宽、是否需要图例。
# figsize可自行配置
plot = covid.plot(figsize=(14,11), color=list(colors.values()), linewidth=5, legend=False)
# ticker.StrMethodFormatter 的配置见下文详细解释[3]
# set_major_formatterd 的使用介绍见[4]
# 此处配置y轴的刻度格式
plot.yaxis.set_major_formatter(ticker.StrMethodFormatter('{x:,.0f}'))
# 此处配置网格
plot.grid(color='#d4d4d4')
plot.set_xlabel('Date')
plot.set_ylabel('# of Cases')

Section 8 - Assigning Colour

# x, y为坐标,s为text内容,输出为图片右侧的国家名称
for country in list(colors.keys()):
    plot.text(x = covid.index[-1], y = covid[country].max(), color = colors[country], s = country, weight = 'bold')

Section 9 - Adding Labels

# x,y为文字的坐标,s为文字内容
# covid.max() 取的是每个国家的最大值,是一列数据,covid.max().max()取的是所有国家中的最大值,
# 是一个数据,这个数据表示线条的最高点,加上一个偏移量比如50000,就可用来设置文字内容y轴高度
# 这个偏移量与图片大小有关,比如这里设置成了50000,而不是原文中的数字
plot.text(x = covid.index[1], y = int(covid.max().max())+50000, s = "COVID-19 Cases by Country", fontsize = 23, weight = 'bold', alpha = .75)
plot.text(x = covid.index[1], y = int(covid.max().max())+15000, s = "For the USA, China, Germany, France, United Kingdom, and Canada\nIncludes Current Cases, Recoveries, and Deaths", fontsize = 16, alpha = .75)
plot.text(x = percapita.index[1], y = -100000,s = 'datagy.io                      Source: https://github.com/datasets/covid-19/blob/master/data/countries-aggregated.csv', fontsize = 10)

file

Cases per 100,000 People

percapitaplot = percapita.plot(figsize=(12,8), color=list(colors.values()), linewidth=5, legend=False)
percapitaplot.grid(color='#d4d4d4')
percapitaplot.set_xlabel('Date')
percapitaplot.set_ylabel('# of Cases per 100,000 People')
for country in list(colors.keys()):
    percapitaplot.text(x = percapita.index[-1], y = percapita[country].max(), color = colors[country], s = country, weight = 'bold')
percapitaplot.text(x = percapita.index[1], y = percapita.max().max()+25, s = "Per Capita COVID-19 Cases by Country", fontsize = 23, weight = 'bold', alpha = .75)
percapitaplot.text(x = percapita.index[1], y = percapita.max().max(), s = "For the USA, China, Germany, France, United Kingdom, and Canada\nIncludes Current Cases, Recoveries, and Deaths", fontsize = 16, alpha = .75)
percapitaplot.text(x = percapita.index[1], y = -70,s = 'datagy.io                      Source: https://github.com/datasets/covid-19/blob/master/data/countries-aggregated.csv', fontsize = 10)

file

附录:
[1] pandas 中的等号'='和copy()
[2] Python plot() 绘图的风格选择, 字体选择等
[3] ticker.StrMethodFormatter

The field used for the value must be labeled x and the field used for the position must be labeled pos.

[4] Matplotlib绘图双纵坐标轴设置及控制设置时间格式

回复数量: 0
    暂无评论~~
    • 请注意单词拼写,以及中英文排版,参考此页
    • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`, 更多语法请见这里 Markdown 语法
    • 支持表情,使用方法请见 Emoji 自动补全来咯,可用的 Emoji 请见 :metal: :point_right: Emoji 列表 :star: :sparkles:
    • 上传图片, 支持拖拽和剪切板黏贴上传, 格式限制 - jpg, png, gif
    • 发布框支持本地存储功能,会在内容变更时保存,「提交」按钮点击时清空
    Ctrl+Enter