Palo Alto, California / June 8, 2020 – Based on data from about 700,000 contributors, of which 13,311 volunteered their PCR or serological test results for SARS-CoV-2, we were able to compare the performance of four symptom constellations. For example, the CDC criteria for "flu" include fever and body aches, whereas the diagnostic criteria for COVID include fever and shortness of breath. How well do the various symptom constellations do in the real world, in terms of their ability to discriminate among a cold, a flu, and COVID?
As in our preliminary work in May, we found that people with positive and negative COVID tests reported symptoms all across the four diagnostic symptom vectors. What this means is that there are few – if any – truly definite symptoms of COVID, that allow it to be reliably differentiated for the common cold and the flu. Interestingly, we did note a different magnitude of reported symptoms – so although people with negative and positive COVID tests both exhibited fever and cough, the patient population with a positive COVID test tended to report more symptoms. The figure shows the total scores (sum of the four sub-scores) of people with a positive COVID test (orange) and with a negative COVID test (blue). As can be seen, the total scores for the positive COVID population had a longer tail, typically due to the COVID+ population reporting more symptoms.

We then used standard machine learning and statistical approaches to parametrize a classifier for COVID, using the four sub-scores we securely compute for each person based on their self-reported symptoms. Since we are using privacy-preserving computation, we cannot construct new classifiers based on individual symptoms, since those are hidden from us. However, we can train classifiers on the Hamming distances of each individual relative to the four preconfigured symptom constellations. The figure shows the intersection of the ROC functions positive predictive value (PPV) and true positive rate (TPR). In this example, a cutoff value of ~0.32 providing balanced positive and negative label success. The cutoffs can be chosen based on how the classifiers will later be used for early detection or diagnosis.

With this unique real world training set with symptoms from almost 1 million people, and test results for more that 10,000 people across many geographies, we are able to provide machine learning tools that help businesses and schools make the best decisions and help their employees and students to get the best care and testing when it is warranted.