Tests with synthetic and sample (real) data, July 2018

Synthetic data, minimize cost fn
Synthetic data, Reed's method
Real data, minimize cost fn
Real data, Reed's method



Tests with synthetic data

A. Minimising a cost function (solution option A in the notes)

This works well (the results I sent you a few months ago had a bug, which is now fixed). The method works by minimising the cost function (4). (First link above)
There are 8 channels and 80 observations of each channel.
The synthetic data are chosen in a way that is consistent with the imposed error covariances.
The first set of plots in the above link are the estimated components of c and m as a function of iteration. The true values of the components are also shown. The vectors c and m correspond to "a" and the diagonal elements of the diagonal matrix "B" respectively in (1).
The values converge very quickly. Even though the values don't converge to their true values, I don't think that this is too important as there are numerous solutions consistent with the truth. I think that an error analysis (if and when we do that) will show that the converged values are OK.
The next plot is the value of J vs iteration. This too shows convergence. The value is reasonable too. Statistically we expect something of order Jmin ~ Nobs/2 = 8*80/2 = 320.
The final set of plots are scatterplots of the data with the true, first guess, and best fit lines added. The first guess is y=x, so m=1, c=0 for all channels. The true and best fit lines are indistinguishable. Note that x corresponds to the reference data (modified according to (2)), and y corresponds to the target data.

B. Using the Reed method (solution option B in the notes)

This uses a formula to find the best fit parameters instead of minimising a cost function. There are still 8 channels and 80 observations of each channel, but the observations themselves are different to the above, as the computer chooses different random numbers (perhaps I ought to do other runs with the same data, but at this stage I just want to know if the approach works). There is no need for a first guess, and there are no iterations. (Second link above)
Remember that I first project the data to modes that are uncorrelated (according to the given covariances), solve the problem for each component in that space separately, and then project back to the original variables. There are many ways of defining uncorrelated modes, which makes this potentially a more tricky problem.
The first set of plots show the synthetic data in the uncorrelated space with the true and best fit lines. The standard deviations of the data in this space are always unit valued. The method seems to work reasonably well, although I'll need to use the same data as the cost function method to compare them properly. The best fit and true data are shown in a table.
The second set of plots show the data in the space where the data are correlated (the channels). It does not make sense to add lines of best fit because there is no best fit in this space -- a y-value in one channel is a function of the corresponding x-values in *all* other channels. The best fit and true c and m are shown in the last table (recall that the m values form a matrix in this space). c and m are the matrices a and B in (14).

Tests with real data

A. Minimising a cost function (solution option A in the notes)

The plots are here for applying the cost function method with the real data. (Third link above)
The first set of plots show c and m with iteration. There is clearly a converegence, although we don't have the true values for comparison.
The next fig is J vs iteration. The value of Jmin is far too small. This is expected to be Jmin ~ Nobs/2 = 816*19/2 = 7752 (there are 816 observations of each of 19 channels), but the actual value is orders of magnitude smaller. This indicates that the error covariances of the data are far too large.
The next set of figs are the scatter plots with the best fit lines for each channel. It it obvious from this plot that the error bars are too large. The data also seem to fall into two clusters in some of the channels.

B. Using the Reed method (solution option B in the notes)

The results are here (Fourth link above)
The first set of plots show the data projected onto the uncorrelated components. The relationships don't even look linear. If the covariances are incorrect then this projection will not be correct. I suspect that the incorrect error covariance matrices makes this procedure nonsense.

Contacts and links

Contacts

Useful links

Page navigation

See also