Covid-19 Vaccinations Status Dashboard

Show on gallery

How to use

Most of the information are in the tool tip of map chart, the control bars are explained below

  • Select Metric: Chose one from the 4 metics ( People fully vaccinated / People vaccinated / Total Vaccinated / Last N days Avg.) and result will be present on left 2 charts.

  • Unit: By raw number / By per hundred (only applies to first 3 metrics)

  • Vaccination Threshold: When observing metrics with By per hundred, the rank chart may include some region that with small population, can be avoid by increading vaccinations threshold.

  • N days: How days are used to calculate average of daily vaccinations (only applies to Last N days Avg.)

Data Clean

I did some data clean for the purpose of plotting cumulative chart and calculating last n days average, both of them need data that has value at current data, however some countries are not updating daily. So I just assume that the date they are not updated is 0 or remain the same.

For example on 2021-02-21 one of the countries data looks like this, there has no data of 2/20 and 2/21

Country Date Daily Vaccinations People Fully Vaccinated
United States 2021-02-17 100 NaN
United States 2021-02-18 200 NaN
United States 2021-02-19 300 200

Then after imputation it should look like this, 0 was imputed to daily associated columns and latest data was cloned to status features like People Fully Vaccinated.

Country Date Daily Vaccinations People Fully Vaccinated
United States 2021-02-17 100 NaN
United States 2021-02-18 200 NaN
United States 2021-02-19 300 200
United States 2021-02-20 0 200
United States 2021-02-21 0 200

Code

df = pd.read_csv("/kaggle/input/covid-world-vaccination-progress/country_vaccinations.csv")

# latest report date
max_date = df['date'].max()

# create country/region list
country_lst = df.country.tolist()
country_lst = set(country_lst)

# create dict to access index of each column 
col_inx = {col_name:i for i,col_name in enumerate(df.columns.values)}

# Started to impute country by country
temp_df_lst = []
for country in country_lst:
    
    temp_df = pd.DataFrame(df.groupby(['country']).get_group(country))
    # lastest date of this country
    last_update = temp_df['date'].max()
    
    if last_update < max_date:
        # create date list 
        time_delta = pd.to_datetime(max_date) - pd.to_datetime(last_update)
        # length of lack data
        append_len = time_delta.days
        # create date string list
        time_lst = [pd.to_datetime(temp_df['date'].max())+pd.Timedelta(x, unit="day") for x in range(1,time_delta.days+1)]
        time_lst = [x.strftime("%Y-%m-%d") for x in time_lst]
        
        # create append data for other features, basically remain the same like the last day's information
        last_data = temp_df[temp_df['date'] == last_update].values
        # cloning to append length
        append_data = np.tile(last_data,(append_len,1))
        # specify 0 to "daily" related columns since there is no updates at these dates, and other features remain the same
        append_data[:,col_inx['date']] = time_lst
        append_data[:,col_inx['daily_vaccinations_raw']] = 0
        append_data[:,col_inx['daily_vaccinations']]=0
        append_data[:,col_inx['daily_vaccinations_per_million']]=0
        # create df 
        append_df = pd.DataFrame(append_data,columns=df.columns)
        # append to origin df and add to list 
        temp_df_lst.append(temp_df.append(append_df,ignore_index=True))
    else:
        temp_df_lst.append(temp_df)

# concate into new df

new_df = pd.concat(temp_df_lst,axis=0,ignore_index=True)


# sorting 
new_df = new_df.sort_values(by=['country','date'])

new_df.reset_index(inplace=True,drop=True)