Chronology Current Month Current Thread Current Date [Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

# Re: [Phys-L] Confessions of a Repentant Modeler in a Time of Plague.

Regarding:

It has been a wild ride.
I began with exponential fits - which worked quite well
for Oklahoma until they didn't. Then I swung over to straight-
line fits until, as it seemed to me, the new day hits were all
on the low side. So what else but a logistic fit.
Mea Culpa!
I used a logistic form that concealed even the time value of
the peak rate instead of the infinitely more sensible standard
form used by bc when he tried this fit for Andorra.(note 1)

When I shared these plots with others, I noted a certain rolling
of the eyes, concerning their simplicity - but I persisted,
until I finally realized my exponential fits were getting daily
hits on the upside: not from a data uptrend, but from the
logistic curve downtrend in face of a continued straight line
fit.
So I retreated from the simple three factor fit, to the simpler
two factor straight line fit, whence I should not have too
eagerly jumped.
My data source was Note 2.
Here are some comparisons of the [accumulated] US CASES time
series, as a straight line (which fits well) or a logistic,
which is trying its best.
https://imgur.com/pcj4yR0 US CASES LINEAR https://imgur.com
YKCgfYP US CASES LOGISTIC

For comparison, here is a rate projection by a commercial
source:
https://imgur.com/pekQmWH MORGAN-STANLEY FORECAST

Onto plots of US deaths:
You can easily see the linear plot for Deaths is concave up in
data.

https://imgur.com/e7uhJVf US DEATH LIN

Here, an exponential or even logistic curve fits the facts
better.
https://imgur.com/EfXprh0 US DEATH LOG

Though of less general interest, I offer the Oklahoma plots
below, where The straight line plots fit the facts for recent
Cases and Deaths, and the logistic is betrayed by the upside\
departures of the data.
https://imgur.com/bMVsB0v OK DEATH LOGISTIC https://imgur.com/
sE0zP3G OK DEATH LINEAR

https://imgur.com/Kzsr1rW OK CASES LOGISTIC https://imgur.com/
9UAKU0b OK CASES LINEAR

Finally, an academic forecast for Oklahoma from IMHE.

Search Results

IMHE is the Institute for Health Metrics & Evaluation at U
Washington-Medicine - a source favored by President Trump.

https://imgur.com/GLdSIf3 IHME FORECAST (OKLAHOMA)

Compare the timid US resources forecast from that source for
ventilators
- which correlates with US Deaths - as most CV-19 patients on
that device do not emerge alive.
https://covid19.healthdata.org/united-states-of-america

The purple and gray shading represents their uncertainty!

Brian W

Note 1) I initially used the logistic form
(plateau level) /(1 + exp(a - b*t)

Instead of the better form used by bc
(plateau level) / (1 + exp( a*b - b*t)

Actually both forms here are fully equivalent. They both have the same form with the same 3 independent degrees of variation. The only difference is that the value of a*b in the second form is simply the value of parameter a in the first form.

...which provides a direct value for the peak rate at t = a
without the need for differentiating the function.

Note 2) My data source was Worldometers.info - not a name that
initially inspires confidence - which seems to provide a quite
reliable data source for CV-19 data for this and other countries.
/end

For those (bc & BW?) who want to attempt to fit past data to a previously defined form for a multi-parameter family of functions for the purpose of predicting future outcomes, *without* attempting to model any of the complicated underlying dynamics of a real communicable disease epidemic, (which would typically involve many coupled independent degrees of freedom, each obeying their own mutually coupled ODEs, with frequent time dependent updating of parameters of said degrees of freedom, and which would have no closed form solution, if modeled well) and who *only* want to fit some prior data, without understanding how & why the data are what they are, so as to just make some simple model predictions from a reasonable data fit, I hereby offer the simplest 1-degree of freedom model form which I could think of which has 4
parameters independently separately controlling 4 different constraining aspects for any semi-realistic model, *and* which also has a closed form solution of a single ODE describing its temporal variation.

I consider a simple model for a cumulative sigmoidal growth process with single dependent variable y for a temporal independent variable, t, being the real line (or uniformly sampled real line) having the following 4 aspects which need to be determined by the fitting procedure. These are:

1.) An initial exponential growth rate in the t--> - ∞ limit. I'll call this rate a (for alpha) in that initial limit the growth is assumed be exponential in character behaving like
(Constant)*exp(a*t). For uniformly sampled data with sample time interval dt we can treat the initial growth behavior for the n-th sample as having the equivalent form

(Constant)*(1 + g)^n

where ln(1 + g) = a*dt & g = exp(a*dt) - 1. Here g is the initial compounded growth 'interest' rate for unit sampling time interval dt. We discretely model the continuous time t = dt*n for the nth data point, where we take the starting time t = 0 to coincide with starting sample n = 0 for convenience.

2.) Asymptotic long time (t --> + ∞, or n --> + ∞) maximum limiting cumulative value y_∞. IOW, lim t --> + ∞ y(t) --> y_∞ and/or lim n --> + ∞ y_n --> y_∞.

3.) The ratio of the final exponential decay rate to the maximum asymptote at infinite time to the initial exponential growth rate @ t --> - ∞. We call this rate ratio p. Thus the final asymptote is approached as t --> + ∞ like y(t) = y_∞ - (Constant)*exp(-a*p*t) , or for discrete samples y_n = y_∞ - (Constant)/(1 + g)^(p*n) .

4.) The location of the growth process's inflection point along the independent parameter, t* for continuous time or n* for uniformly sampled data. Here t* signals when the process stops accelerating and begins to decelerate toward the final bound y_∞.

The ODE describing the continuous process does not refer explicitly to this 4th parameter t* (or n*) as its value is determined by an integration constant which is found by the conditions for the particular fit calibrating the horizontal offset needed in the independent parameter for the fit.

The simplest (for me) ODE having the above properties which is still explicitly solvable (for me) is the equation

dy/dt = a*y*(1 - (y/y_∞)^p) .

Note the ODE depends on a, y_∞, & p, but not on t*. Note also the special case of p = 1 is the logistic ODE for a logistic process. The general solution to this ODE for a growth process is

y(t) = y_∞/(1 + p*exp(a*p*(t* - t)))^(1/p)

This functional form has initial exponential growth rate a (at t --> - ∞), asymptotic bound y_∞, final decay rate a*p (at t --> + ∞) and inflection point at t = t* where the value of the process at the inflection point is y(t*) = y_∞/(1 + p)^(1/p).

The discrete version of the process for uniformly sampled values is

y_n = y_∞/(1 + p*(1 + g)^(p*(n* - n))^(1/p) .

Again the inflection point (where the growth rate is maximum) has a value of y_n* = y_∞/(1 + p)^(1/p). Also again the p = 1 special case is the logistic process function.

Using uniformly sampled data and plotting the relative growth per sample d_n == y_n/y_(n-1) - 1 versus the cumulative value y_n allows one to determine g, y_∞ & p.

If the plot is a straight line the process is indeed logistic with p = 1, and the vertical intercept is g & horizontal intercept is y_∞. If the plot is smooth and is fittable with a curved graph using a value for p ≠ 1 then the simple growth/decay asymmetric process with p ≠ 1 indeed adequately describes the process. If the fit is not good because the underlying dynamics is not captured by the model, or if the statistical noise is too great then the model doesn't work so well. BTW in the p ≠ 1 situation the fit's vertical intercept is *still* g and its horizontal intercept is *still* y_∞. The parameter p just introduces more curvature to the graph between these intercepts as p deviates from 1 in value. The value for n* (or equivalently t*) can still be found later after parameters g, y_∞ & p are found from the d_n vs y_n plot by a method similar to what I already described previously for the logistic model. The p-dependent functional form for the curved d_n vs y_n plot has is the messy formula

d_n = (1 + g)*(1 - (1 - 1/(1 + g)^p)*(y_n/y_∞)^p)^(1/p) - 1 .

I tried out this 4-parameter model on the US CoVID-19 death data from Worldometers.info and did all the fitting by hand with some eyeballing of how the curvature went. The resulting fit I got seemed to agree with the data to within the statistical sampling noise with nearly imperceptible apparent systematic deviation. Since I did the fitting by hand I did not properly deal with the error bars/bands in the data. I figured someone else could do that if they were so inclined. I only wanted a proof-of-concept fit to see if this phenomenological model may have any bearing on anything. It ends up it performs way better than the simple logistic model does.

Here are my parameters I got from my fit using data from March 19 - April 17. After doing the fit I added in April 18's data and that data point had no influence on the parameters as it fit just as well as all the other previous data. Here are the parameters I got for that fit.
g = 0.850453366
y_∞ = 69994.5
p = 0.1555328
n* = 24.822212 (note n = 0 corresponds to March 19, 2020)

This means the process's inflection point value is

y_n* = 27631.5 ( = 69994.5/2.53314124 = y_∞/(1 + p)^(1/p) ) .

Caveat note, *please* do not be misled by all the sig figs listed here. In reality these parameters have *much* lower precision than quoted above. But since I didn't do any extensive error analysis (since I actually did the fit by hand) I don't know the real uncertainty ranges on them, nor their mutual covariances, as their variations interact with each other. So I just kept an over-supply of lots of guard digits in all calculations & report those raw numbers.

Because the low (0.1555328) p-value this causes the d_n vs. y_n curve to have most of its strong curvature near the low n-number region where the data number statistics are very poor. It also means that for none of the data used is the actual effective daily growth value anywhere near as large as 0.8504. This g value is only the *projected* initial daily growth rate in the *limit* of the *infinite* past n > - ∞. The actual growth factors (where the data are) are typically around (d_n =) 0.37 (with up over 50% noise) for early days near March 19 where the number statistics are extremely poor anyway. The p ≠ 1 model's projected value for y_∞ near 70000 is apparently much more believable than the anomalously low value projected by a naive logistic model projecting from early April data last week to the vicinity of y_∞ around 32000. The model's asymptotic final decay-to-asymptotic bound daily compounded decay rate is 1/(1 + g)^p - 1 = -0.09128 in the tail after nearly all the deaths have happened.

BTW, the Worldometers.info US death data has a huge anomalous spike in the daily count for April 14 (I think because NY reported all its prior COVID19 deaths happening outside of hospitals all at once on that day). This means a naive fit to the raw Worldometers death data will necessarily give bad results from this reporting spike. So I took the liberty of massaging the data as follows. Out of the 6185 daily deaths reported for April 14 I guesstimated that probably no more than about 2800 of them were really from that day and the remainder must have happened at some earlier time. Since I didn't know when those extra non-hospital deaths actually happened I just assumed they mostly likely would have spread out over all the previous days in a relatively uniform way. So I increased the death counts of all days before April 14 uniformly by a constant *relative* 14% above their raw daily values, and this then accounted for all the excess deaths reported on April 14 above my guesstimate of 2800.

David Bowman