Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] Finally ... normalized (by population) data, and Andorra



Regarding:

On 4/12/2020 3:47 PM, bernard cleyet wrote:
/snip/
http://cleyet.org/covid-19/Europe/Andorra/

Finally! I’ve plotted cases wherein the better fit is
logistic —considerably better than a simple exponential.
(Thanks to Brian W.) Deaths next.

It may be I am a prisoner of my own devise; but comparing bc's
Andorra scatter plot of cases per day using the raw rate data
suffers from the inevitable noise.
Can YOU define the peak case rate day from this plot?

How much better to derive cases per day using the logistic
parameters with which he modeled the cumulative cases, like this:

https://imgur.com/BR1US5H

Being old school I prefer to fit my data to a line whenever possible. That way I can see for myself in a more natural way how well the data fit the model, and which regions of the data do better, and which ones do worse in fitting said model. The main caveat is that any initial error bars/bands on the original raw data need to be carried through the nonlinear transformation to the resulting linear model so the transformed data points can be properly weighted in the fitting procedure for the line. After the fit/regression to the line is made the function is inverse-transformed back to the nonlinear original model that was to be fitted.

It is possible to massage data that is to be fit to a simple logistic model in a way so it is to be fit to a line, *even* though the logistic model has 3 parameters (asymptote height, initial exponential growth rate & inflection point location, i.e. location of the half-max-asymptote), and a line has just 2 parameters. The trick is to not plot a time series (which contains the temporal data needed for the location of the inflection point) and rather plot just the data values, and their relative changes, without indicating *when* those data values were taken in any absolute sense. After the fit is made to the line the location of the inflection point in time can be found later by other means. Below is the scheme for fitting discrete logistic data to a line as a means for finding the data's fitted logistic parameters.

Let y_n be the n-th data value that is to be fitted to the logistic model. The adjacent n-values are assumed to be equally separated in time. This means one is attempting to fit the data to a model of the form:

y_n = y_∞/(1 + (1 + g)^(n* - n))

where y_∞ is the final asymptotic maximum y-value (i.e. total number of deaths after the plague is over), and 1 + g is the initial adjacent value increase factor between y_n and y_(n-1), i.e. y_n/y_(n-1) --> 1 + g initially when the process is starting out (near n --> - ∞ ). So g is the initial adjacent datum relative increase. (It's sort of a daily compounded interest rate that is supposed to be gamma, but it's written as g to avoid font problems with Greek letters.) Note the initial exponential amplification rate for unit sample time is ln(1 + g). Here n* is the effective sample number at the inflection point where y_n* = y_∞/2 and the rate of increase peaks. To fit the data to a line and extract the values of y_∞ and g one simply plots y_n/y_(n-1) - 1 vertically versus y_n horizontally for all y_n values available. Do not *plot* n horizontally. Because the ordinates are adjacent relative differences (i.e approximations to a logarithmic derivative) they are quite noisy, especially so initially where the number statistics are low and the propagated error bars are huge. But for the higher y_n values the statistics become more manageable, the error bars narrow, and the plot settles down to a reasonable line to which the model parameters can be fit. Once one has the fitted line drawn its horizontal axis intercept is the model's long time asymptotic maximum value y_∞. The vertical axis intercept is the initial relative growth parameter g (initial daily compounded interest rate) at very low n-values.

After the parameters y_∞ & g are found one can turn one's attention to the value of n* (the location of the inflection point). One way to do it is to interpolate the n-value where the fitted model curve hits y_∞/2 in the data. A probably better way is to take a weighted average over all the data points of the values of

ln(y_∞/y_n - 1)/ln(1 + g) + n

If the fit was perfect each of these averaged values would be n*. So the average is an estimate for n*. The weightings used in the averaging could be taken from the propagated error bars on all the data. But I would prefer to overweight the data values with the steepest slopes near the inflection point because any error in the value of n* will affect the fit the most near those values. This is because horizontally displacing a steep line does more damage than horizontally displacing a shallow line by the same amount.

Notwithstanding JD's comments about combining data from differing regions with different local dynamics, using the Worldometers death data for the whole US from about March 19 to April 12 gives a fairly decent extractable line, at least for the later data values with the best statistics. It appears that a simple logistic model gives an asymptotic y_∞ value of around 32000 - 33000 US deaths (with much larger error bands) and the date of the inflection point was sometime around April 8 - 9.

Nevertheless I should mention that I doubt that the simple logistic model is appropriate, not only because of JD's warnings about regional variations, but also because the dynamical process modeled is far more complex that a simple logistic model can accommodate. If, in the end, the model works out well it would do so only by accident. One problem with a logistic function, being the simplest function that has exponential growth at the left side end and exponential decay to an upper bound at the right side end, is that both the left side exponential growth and right side exponential decay necessarily have the *same* exponential rate. Also, the slope of the function (i.e. density) is the square of a hyperbolic secant function centered on the inflection point. Such a function has an even symmetry in time, a symmetry probably not in a realistic model . And the original cumulative logistic function is actually a vertically shifted (by y_∞/2) hyperbolic tangent function with odd symmetry about the inflection point.

2*y_n/y_∞ - 1 = tanh(½*(n-n*)*ln(1 + g))

which means that

y_n + y_(n* - n) = y_∞

There is absolutely no reason to expect any real contagion to possess such symmetries--even when confined to local regional values.

David Bowman