31 Mar 2020
4 MINS READ
We are living in a middle of a pandemic which is likely to lead to a humanitarian crisis, which we still do not know how bad is going to be. While measured optimism is not that bad, it is important to read the signals and understand the numbers to gauge the scale and magnitude of the crisis.
We have seen in the past when wrong metrics were quoted with incorrect numbers, incorrect inferences were made from the metric. So the idea behind writing this post is to provide an overview of metrics referred in the COVID analytics works shared by data professionals. Of course these metrics are typically common for the study of any epidemic.
Coming to the data, just like many of you, I too have been following the coronavirus data on a daily basis. On some days, on an hourly basis, too. We all have noticed since last month or so, a good number of data dashboards, analysis and Kaggle kernels were shared by data professionals who diligently analysed the data and shared their work and findings.
So what kind of data is being used by the data practitioners and analysts for their analysis? While majority rely on the public datasets published on sites like Kaggle, most of the origins can be traced back to the following three key sources.
Key datasets for Covid-19 cases:
One of the websites I have been referring to regularly is this, which shows a high level summary of cases. But different news sites globally are quoting vastly different numbers for the fatality rate, which was the trigger for this blog.
Firstly, I wanted to understand the key metrics used by Healthcare bodies and Govt organizations for COVID-19 and how they compare them with previous epidemics like SARS or MERS to understand the relative scale and magnitude of the current one.
CFR (Case Fatality Rate): The common metric quoted is the fatality rate which is simply obtained by dividing number of deaths by total number of cases at a point in time. WHO first quoted this number at around 2.1% in Feb and at 3.4% in the first week of March.
Mortality Rate: Some websites and newspaper sites quote fatality rate as the mortality rate. However looking up on WHOs’ definition, it is the proportion of the number of deaths caused by the disease to the given population.
Morbidity Rate: Another important metric to help understand the spread of COVID-19 in a location is obtained by dividing the number of people infected with the disease by the population. A variant of this is reported in the CEDC’s dashboard as cases per 100K of population.
Both Mortality rate and Morbidity rate are not meaningful when there is an ongoing outbreak and hence WHO usually release these numbers post the epidemic. However the CFR can still be used to understand the severity of cases over different stages of epidemic as well as to compare it with other epidemics at similar stage.
But what are the challenges in using these metric during an epidemic?
Firstly, the numbers keep changing based on the stage of the epidemic in the respective country.
Secondly, the unreported cases are not taken into consideration in these metrics. Ignoring the unreported cases inflates the CFR, while unreported deaths could balance it out, the chances for the latter is minimal for obvious reasons.
Mean Incubation Period: Why is there a 14 day self-quarantine advised for people with travel history? It is because current estimated range for the incubation period is from 2 to 14 days (with outliers, it is reported at 0 – 27 days). The mean Incubation period is average duration from the time of infection to the appearance of first symptoms of the disease. Just like in any data, there were noted outliers here as well where few people didn’t report any symptom for up to 25+ days.
Median Hospital Stay: Median duration of hospital stay is the period for which the patients are admitted in hospitals for treatment till the time of outcome. This number is crucial and important for governments to plan healthcare facilities and supplies required. Scale of COVID-19 shows that even the developed countries are struggling to accommodate/treat patients and the facilities are being stretched beyond their capacities.
Ratio of Community to Imported Spread: Especially for COVID-19, the governments keenly follow this ratio to understand if the virus is only among travellers coming into the country already infected or if there is a community spread. India is on the brink of a community spread at time of writing as the ratio is almost >= 1.
The Unreported Problem: The problem of unreported cases is twofold. On the operational front, identification of unreported cases provide containment challenges as the unreported cases with mild/no symptoms could actively be transmitting the virus. On the modelling front, estimating the volume of unreported cases is vital to evaluate the size and severity of the epidemic so that the response can be optimized. There are advanced modelling techniques available to estimate the unreported cases. But those techniques would require a separate blog post on their own!
There are a whole lot of advanced analytics that can be done around this, like forecasting case volumes, predicting average length of stay (which can help hospitals & governments prepare), predicting the COVID-19 risk score of the individual based on symptoms and related variables (Apollo Hospitals just released one such AI model to predict the COVID Risk score), modelling the spread (R0), modelling survival and progression rates etc.
Lack of case level data: Though most of the datasets mentioned above are time-series data of daily cases and fatalities by geography, they are not at the grain level of individual cases. Of course, it will be difficult to collect unless the governments, hospitals, agencies and all others involved are willing to share this data. I am sure a whole lot of analytics can be done and interesting and useful insights can be obtained to understand other aspects of the epidemic in terms of the response, efficacy of treatment methods, medications tried, side effects etc.
About the Author
Natarajan Ganapathi has more than 18 years of experience spread across Consulting, Solutions and Delivery across industry verticals, predominantly in Financial Services, Manufacturing & Consumer verticals. Has primary expertise around data across the spectrum of Data Engineering, Analytics, Visualization & Machine learning. Has led transformation engagements in data and analytics, advising and consulting customers in formulating strategy, defining architectures & roadmap and planning execution and delivery.
BI & Analytics
05 Mar 2021
13 Nov 2020
07 Sep 2020
11 Jun 2020
28 May 2020
08 May 2020
24 Apr 2020
13 Apr 2020
06 Apr 2020
26 Mar 2020
23 Jun 2017
06 Aug 2015
13 Jul 2015
28 Oct 2014
17 Apr 2014
24 Mar 2014
22 Jan 2014
20 Dec 2013
01 Nov 2013
26 Sep 2013
03 Sep 2013
26 Aug 2013
29 Apr 2013
04 Mar 2013
21 Feb 2013
04 Feb 2013
03 Jan 2013
26 Nov 2010
19 Mar 2009
Digital Assurance
02 Jan 2012
17 Feb 2012
Infrastructure Mgmt. Services
02 Mar 2012
06 Feb 2013
Digital Assurance, Enterprise Solutions
14 Feb 2013
18 Feb 2013
27 Feb 2013
Others
01 Mar 2013
Enterprise Solutions
05 Mar 2013
18 Mar 2013
Digital Assurance, Enterprise Solutions, Others
22 Mar 2013
12 Apr 2013
26 Apr 2013
13 May 2013
11 Jun 2013
17 Jun 2013
25 Jun 2013
19 Aug 2013
27 Aug 2013
10 Sep 2013
19 Sep 2013
24 Sep 2013
30 Sep 2013
01 Oct 2013
03 Oct 2013
19 Nov 2013
Enterprise Solutions, Manufacturing and Consumer
28 Nov 2013
03 Dec 2013
03 Jan 2014
27 Jan 2014
31 Jan 2014
12 Feb 2014
13 Feb 2014
20 Mar 2014
11 Jun 2014
Manufacturing and Consumer
26 Jun 2014
30 Jun 2014
10 Jul 2014
15 Jul 2014
16 Jul 2014
18 Jul 2014
26 Aug 2015
28 Sep 2015
07 Oct 2015
26 Oct 2015
07 Mar 2016
22 Mar 2016
13 May 2016
23 May 2016
Application Transformation Mgmt.
11 Jul 2016
25 Aug 2016
03 Sep 2016
14 Sep 2016
15 Nov 2016
22 Nov 2016
25 Nov 2016
Business Process Services
25 Apr 2017
Banking and Financial Services
18 May 2017
30 May 2017
27 Jun 2017
18 Jul 2017
26 Oct 2017
Healthcare, Insurance
28 Nov 2017
11 Dec 2017
25 Jan 2018
21 Feb 2018
14 Mar 2018
( Mandatory field * )
The information you provide will be used in accordance with our terms ofPrivacy Policy
Please Check on "I Agree" to register for the blog.