[scilab-Users] optimizing non-differentiable functions

Mon Aug 24 10:54:49 CEST 2009

Hi,

As Eric Dubois already said, Scilab can compute numerical
derivatives of a given function.

The "naive" approach is to implement
a numerical derivative with a somewhat arbitrary small step h.
Instead, I recommend that you use either
the "derivative" and "numdiff" functions, as they provide the "optimal"
step with respect to double precision numbers.

Indeed, it is known that the total error associated with the numerical
derivative is the sum of :
* the linearization error (i.e. the fact that higher order terms are 
neglected
in the Taylor expansion),
* the rounding error associated with the term f(x+h)-f(x).

If we consider a forward numerical derivative, make the
assumption that the ratio f(x)/f''(x) ~1 and minimize the
total error, the optimal step is sqrt(eps) ~ 1e-8 for double
precision numbers. The formula used by derivative is based
on that formula, which is valid only for x ~ 1.
For values of x which are not in this range, or to take into
account for ratios f/f'' which are not close to 1, an optionnal
scaling is available in the derivative function with the Q optionnal
argument.

Several formula are available in derivative, with order 1, 2 and 4.
Notice that, due to the rounding error, the increase in accuracy
is not what one would expect from the increasing amount of
computation due to the evaluation of f in more points.

The strategy used in numdiff is slightly different.
It allows to scale the step depending on the value of x.
Indeed, the constant step used by default in derivative
produces inaccurate results when x is either very small
or very large. This is why the step used in numdiff is
sqrt(%eps)*(1+1d-3*abs(%x))
Notice that if x is a vector, abs(x) is also a vector.
The term 1+1d-3*abs(%x) is a compromise between the
optimal accuracy when x is near 1 (the first part of the sum),
and the optimal accuracy when x is small or large (the term
1d-3*abs(%x)).

Choosing between numdiff and derivative might be done
depending on the following rules :
* if the optimal parameters are of moderate magnitude,
use derivative,
* if the optimal parameters are large or small, use numdiff,
* if there is only one parameter and the ratio between f and
the second derivative is not close to 1, use derivative and
the optionnal argument Q to scale by sqrt(f/f'').
There are many other cases for which I do not know the
optimal step. For example, what is the optimal step when
there are two parameters near 1 in magnitude, with one eigenvalue of the
Hessian matrix near 0 and the other near infinity ?

References for this subject include :
For an introduction : W. H. Press, Saul A. Teukolsky, William T. 
Vetterling, and Brian P. Flannery. Numerical Recipes in C, Second 
Edition. Cambridge University Press, 1992.
Scaling with function values only : P. E. Gill, W. Murray, and M. H. 
Wright. Practical optimization. Academic Press, London, 1981.
Sparse numerical derivatives : Stephen J. Wright Jorge Nocedal. 
Numerical Optimization. Springer, 1999.

Best regards,

Michaël Baudin

Eric Dubois a écrit :
> If your problem is only that you cannot calculate the deriative (and 
> not that it does not exist), then you can use a numerical derivative 
> (such as the Scilab function derivative). To that end, if your 
> function is func, then create a function newf:
>
> function [f,g,ind] =newf(param,ind)
> f=func(param)
> g=derivative(func,param)
> endfunction
>  
>
> 2009/8/16 Sebastian Urban <urban at stoch.uni-karlsruhe.de 
> <mailto:urban at stoch.uni-karlsruhe.de>>
>
>     Hello everybody,
>
>     is there a way to optimize a function, whose gradient I don't know /
>     cannot compute? The command 'optim' seems to require the cost function
>     to provide the gradient...
>
>     Thanks for help, suggestions and comments!
>     Sebastian
>
>