[Scilab-users] "Smoothing" very localised discontinuities in (scilab: to exclusive) (scilab: to exclusive) curves.

Mon Apr 4 17:03:13 CEST 2016

Buk.

Could you please provide the data points in your example so that we can test
different methods.

Note that in the moving median filter solution presented there is no propagation
of errors because the original dataset is always used and only one filtering
pass is made using a very short 3-point filter.

Regards,
Rafael

-----Original Message-----
From: users [mailto:users-bounces at lists.scilab.org] On Behalf Of
scilab.20.browseruk at xoxy.net
Sent: Monday, April 04, 2016 4:45 PM
To: users at lists.scilab.org
Subject: Re: [Scilab-users] "Smoothing" very localised discontinuities in
(scilab: to exclusive) (scilab: to exclusive) curves.

Rafael/Stepahane/Tom,

The problem with using a median filter -- and actually any continuous filter --
is that it implies that the median value of any n-group of adjacent values is
"more reliable" than the actual value *for every value in the dataset*. And I'm
really not convinced that is true for this data.

In other words. Continuous filtering can adjust all the values in the dataset;
rather than just adjusting or rejecting the anomalous ones. One (large)
erroneous data point early in the dataset would impose an influence upon the
rest of the entire dataset causing a subtle shift in one direction or the other.
If there are multiple erroneous values that all tend to be in the same direction
-- as appears to be the case with these data -- then that shift accumulates
through the dataset.

And as an engineer, that feels wrong. If you're taking a set of measurements and
some external influence messes with one of them -- a fly blocks your sensor --
you reject that single data point; not spread some percentage of it through the
rest of your readings.

I'm going to put in a request to the manufacturer of the equipment that produces
this data, to request an explanation of the cause of the discontinuities; in the
hope that might shed some light on the best way to deal with them. (With luck
they'll have some standard mechanism for doing so.) 

(I've been trying to word the request all weekend, but its difficult to phrase
it correctly.  These are the pre-eminent people in their field; they don't know
me, and I don't have an introduction; and their equipment defines the standard
for these types of measurements. It is extremely difficult to formulate the
request such that it does not imply some shortcoming in their equipment or
techniques.)

The data is magnetic field intensity vs field strength for samples of amorphous
metal. The measurement involves ramping the surrounding field with one set of
coils, and measuring the field strength induced in the material with another set
of coils. The samples have hysteresis; the coils have hysteresis; the ambient
surrounding can influence. The equipment goes to great pains to adjust the speed
of ramping and sampling to try and eliminate discontinuities due to hysteresis
and eddy current effects. 

I believe (at this point) that the discontinuities are due to these effects
"settling out"; and the right thing to do is to essentially ignore them. My
problem is how to go about that.

I've come up with something. (It almost certainly can be written in a less
prosaic way; but I'm still finding my feet in SciLab):

    plot2d(  ptype, h*1000, b, style = [ rgb( i ) ] );
    e = gce(); e.children.mark_style = 2;

    h1 = [h(1)]; b1 = [b(1)];
    for n=2:size(h,'r') 
        if( (b(n) - b(n-1)) / (h(n) - h(n-1) + %eps) > 0 ) then
             h1 = [ h1, h(n) ]; b1 = [ b1, b(n) ];
        end
    end
    plot2d( ptype, h1*1000, b1, style = [ rgb( i + 1 ) ] );

    h = h1'; b = b1'; 
    h1 = [h(1)]; b1 = [b(1)];
    for n=2:size(h,'r') 
        if( (b(n) - b(n-1)) / (h(n) - h(n-1) + %eps) > 0 ) then
             h1 = [ h1, h(n) ]; b1 = [ b1, b(n) ];
        end
    end
    plot2d( ptype, h1*1000, b1, style = [ rgb( i + 2 ) ] );

See the attached png. The black Xs are the raw data. 
The red is the results of the first pass.
The green is the results of the second pass.
The purple are hand-drawn "what I think I'd like" lines.

What I like about this is that it only adjust (currently omits; but it could
interpolate replacements) points that fall outside the criteria. As you said of
the median filter; it doesn't guarantee monotonicity after one pass (or even 2),
but it only makes changes where they are strictly required, leaving most of the
raw data intact. 

(Note: At this stage I'm not saying that is the right thing to do; just that it
seems to be :)

I'm not entirely happy with the results:

a) I think the had-drawn purple lines are a better representation of the
replaced data; but I can't divine the criteria to produce those?
b) I've hard coded two passes for this particular dataset; but I need to repeat
until no negative slopes remain; and I haven't worked out how to do that yet.

Comments; rebuttals; referrals to the abuse of SciLab/math police; along with
better implementations of what I have; or better criteria for solving my problem
all actively sought.

Thanks, Buk.