Skip to content

Regression of complex-valued functions

8 messages · Andrea Graziani, Frede Aakmann Tøgersen, David Winsemius +2 more

#
Hi everyone,

I previously posted this question but my message was not well written and did not contain any code so I will try to do a better job this time.

The goal is to perform a non-linear regression on complex-valued data.
I will first give a short description of the data and then describe the complex-valued non-linear function.

***The Data
Obtained from mechanical tests performed at 5 loading frequencies 0.1, 0.25, 1, 0.4 and 12 Hz, and at 5 temeperatures.
The independent variable used in the regression is:
f_data <- rep(c(0.1,0.25,1,4,12),5)

The measured values of the response variable are:
E <- c(289.7411+ 225.0708i , 386.4417+ 303.5021i ,  671.5132+ 521.1253i , 1210.8638+ 846.6253i , 1860.9623+1155.4139i ,
862.8984+ 636.2637i , 1159.0436+ 814.5609i , 1919.0369+1186.5679i , 3060.7207+1573.6088i  , 4318.1781+1868.4761i  ,
2760.7782+1418.5450i , 3306.3013+1612.2712i , 4746.6958+1923.8662i , 6468.5769+2148.9502i , 8072.2642+2198.5344i  ,
6757.7680+2061.3110i , 7591.9787+2123.9168i , 9522.9471+2261.8489i , 11255.0952+2166.6411i , 12601.3970+2120.7178i ,
11913.6543+2016.0828i , 12906.8294+2030.0610i , 14343.7693+1893.4877i , 15942.7703+1788.0910i , 16943.2261+1665.9847i)

To visualize the data:
plot(f_data,Re(E),log="xy")
plot(f_data,Im(E),log="xy")
plot(E)

***Non-linear regression function:
Obtained from an analytical model

E_2S2P1D <- function(f,logaT,Eg,Ee,k,h,delta,logbeta,logtau) 
           Ee+(Eg-Ee)*(
           1+delta*(2i*pi*10^(logtau)*f*10^logaT)^-k +
                   (2i*pi*10^(logtau)*f*10^logaT)^-h +
                   (2i*pi*10^(logtau)*f*10^logaT*10^(logbeta))^-1
                     )^-1 

E_2S2P1D is a complex-valued function (note the imaginary unit "i") where:
"f" is the real-valued independent variable (i.e. the frequency),
"logaT","Eg","Ee","k","h","delta","logbeta","logtau" are real-valued parameters.

For example:
[1] 9544.759+2204.974i
[1] 6283.748+2088.473i

In order to find the parameters of the regression function I use ?nls".

Executing

fit <- nls(E ~ E_2S2P1D(f_data,c(rep(loga40,5),rep(loga30,5),rep(loga20,5),rep(0,5),rep(loga0,5)),
                            Eg,Ee,k,h,delta,logbeta,logtau), 
          start=list(loga40=-3.76,loga30=-2.63,loga20=-1.39,loga0=1.68,
                     Eg=27000,Ee=200,k=0.16,h=0.47,delta=1.97,logbeta=2.2,logtau=-0.02) )

results in a lot of warnings (my translation into english):
"In numericDeriv(form[[3L]], names(ind), env) :
imaginary parts removed during the conversion?

I?ve the same error trying:
y <- E_2S2P1D(f_data,c(rep(-3.76,5),rep(-2.63,5),rep(-1.39,5),rep(0,5),rep(1.68,5)),27000,200,0.16,0.47,1.97,2.2,0.02)
plot(E,y)

If I got it right R is not able to handle regression problems on complex-valued functions, correct?
If yes, is there some workaround?
For example using MSExcel I write the Real and Imaginary parts separately, and than I minimize 

error = sum [ ( Re(E) - Re(E_E_2S2P1D) ) / Re(E) )^2 ] + sum [ ( Im(E) - Im(E_E_2S2P1D) ) / Im(E) )^2 ]

using the MSExcel solver function.

thank you for your help
Andrea

PS 
Thanks to David Winsemius for his suggestions
1 day later
#
I have not the mental energy to go through your somewhat complicated 
example, but I suspect that your problem is simply the following:  The 
function nls() is trying to minimize a sum of squares, and that does not 
make sense in the context of complex observations.  That is, nls() is 
trying to minimize a sum of terms of the form

	(f(x_i,theta) - z_i)^2

where f(x_i,theta) and z_i are complex quantities (and theta is the 
vector of parameters with respect to which minimization is "trying to be 
done".  But as I said, this makes no sense.  Not that
(f(x_i,theta) - z_i)^2 is a complex quantity and *not* a positive real 
quantity.

You need to minimize a sum of terms of the form

	|f(x_i,theta) - z_i|^2

where |w|^2 equal to w times the complex conjugate of w.  You get this 
in R via the Mod() function, i.e. |w|^2 is (Mod(w))^2.

Thus it seems to me that nls() cannot handle what you want handled.  You 
need to code up the function to be minimized yourself, using Mod() and 
then use optim() or nlm() to minimize it that function with respect to 
"theta".

HTH

cheers,

Rolf Turner
On 10/02/14 11:45, Andrea Graziani wrote:
#
Hi Andrea

Although there are theory on linear models for complex valued normal distributions (see e.g. http://www.amazon.com/Linear-Graphical-Models-Multivariate-Distribution/dp/0387945210) to my knowledge there is no function for doing complex-valued regression in R.

However as my reference probably explain (long time since I saw the textbook) you can consider the Im and Re parts separately. But first some questions. I don't know anything about the physics behind your formula. It seems that you want to do non-linear regression for five curves (one for each value of components of the vector logaT). Why are loga40, loga30, loga20, 0 (zero value), and loga0 not independent variables that you control? If so you don't want to estimate those.

If the logaT parameters are not under your control do you still really want to estimate the same set of values of the parameters of the model for each set of your data? And not separate estimates for each curve?


However (putting my head under my shoulder) you can do something like this (Having just noticed Rolf Turner's answer you should absolutely explore that solutions. That would probably the best way to go).


###### R script:

## added some NAs do make separat plot of curves
## then you need to set na.action argument of nls()

## your data
f_data <- rep(c(0.1,0.25,1,4,12, NA),5)

E <- c(289.7411+ 225.0708i , 386.4417+ 303.5021i ,  671.5132+ 521.1253i , 1210.8638+ 846.6253i ,     1860.9623+1155.4139i,NA,
       862.8984+ 636.2637i , 1159.0436+ 814.5609i , 1919.0369+1186.5679i , 3060.7207+1573.6088i  ,   4318.1781+1868.4761i,NA,
       2760.7782+1418.5450i , 3306.3013+1612.2712i , 4746.6958+1923.8662i , 6468.5769+2148.9502i ,   8072.2642+2198.5344i,NA,
       6757.7680+2061.3110i , 7591.9787+2123.9168i , 9522.9471+2261.8489i , 11255.0952+2166.6411i , 12601.3970+2120.7178i,NA,
       11913.6543+2016.0828i , 12906.8294+2030.0610i ,14343.7693+1893.4877i, 15942.7703+1788.0910i, 16943.2261+1665.9847i,NA)

## function
E_2S2P1D <- function(f,logaT,Eg,Ee,k,h,delta,logbeta,logtau) {
    Ee+(Eg-Ee)*(
        1+delta*(2i*pi*10^(logtau)*f*10^logaT)^-k +
        (2i*pi*10^(logtau)*f*10^logaT)^-h +
        (2i*pi*10^(logtau)*f*10^logaT*10^(logbeta))^-1
        )^-1 }


mycol <- rep(c("red", "green", "blue", "magenta", "black"), each = 6)

png("plot-raw-data.png", height = 3*480)
par(mfrow = c(3,1))
plot(f_data, Re(E), type = "p", col =  mycol)# log="xy")
plot(f_data,Im(E), type = "p", col =  mycol)# log="xy")
plot(f_data,abs(E), type = "p", col = mycol)# log="xy")
dev.off()


myE <- data.frame(f = f_data, E = E)

## this seems to run
fit.re <- nls(Re(E) ~ Re(E_2S2P1D(f,c(rep(loga40,5),rep(loga30,5),rep(loga20,5),rep(0,5),rep(loga0,5)),
                                      Eg,Ee,k,h,delta,logbeta,logtau)),
              data = myE, na.action = "na.exclude",  
              start=list(loga40=-3.76,loga30=-2.63,loga20=-1.39,loga0=1.68,
                  Eg=27000,Ee=200,k=0.16,h=0.47,delta=1.97,logbeta=2.2,logtau=-0.02) )

fit.re

## plot fit overlayed observations
png("fitted-Re.png")
plot(f_data, Re(E), type = "p", col = mycol)# log="xy")
lines(f_data, fitted(fit.re), col = "grey80") #, log = "xy"
dev.off()

## this gives an error
## probably due to the 5 curves do not share parameters. See 3rd panel in attached plot-raw-data.png
fit.im <- nls(Im(E) ~ Im(E_2S2P1D(f,c(rep(loga40,5),rep(loga30,5),rep(loga20,5),rep(0,5),rep(loga0,5)),
                                  Eg,Ee,k,h,delta,logbeta,logtau)),
              data = myE, na.action = "na.exclude",
              start=list(loga40=-3.76,loga30=-2.63,loga20=-1.39,loga0=1.68,
                   Eg=27000,Ee=200,k=0.16,h=0.47,delta=1.97,logbeta=2.2,logtau=-0.02) )


## Now try abs()
## which works and gives allmost same estimates as for fit.re
fit.abs <- nls(abs(E) ~ abs(E_2S2P1D(f,c(rep(loga40,5),rep(loga30,5),rep(loga20,5),rep(0,5),rep(loga0,5)),
                                      Eg,Ee,k,h,delta,logbeta,logtau)),
              data = myE, na.action = "na.exclude",
              start=list(loga40=-3.76,loga30=-2.63,loga20=-1.39,loga0=1.68,
                  Eg=27000,Ee=200,k=0.16,h=0.47,delta=1.97,logbeta=2.2,logtau=-0.02))

fit.abs

png("fitted-abs.png")
plot(f_data, abs(E), type = "p", col = mycol)# log="xy")
lines(f_data, predict(fit.abs), col = "grey80") #, log = "xy"
dev.off()










Yours sincerely / Med venlig hilsen


Frede Aakmann T?gersen
Specialist, M.Sc., Ph.D.
Plant Performance & Modeling

Technology & Service Solutions
T +45 9730 5135
M +45 2547 6050
frtog at vestas.com
http://www.vestas.com

Company reg. name: Vestas Wind Systems A/S
This e-mail is subject to our e-mail disclaimer statement.
Please refer to www.vestas.com/legal/notice
If you have received this e-mail in error please contact the sender.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plot-raw-data.png
Type: image/png
Size: 22434 bytes
Desc: plot-raw-data.png
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140211/1e0e282e/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fitted-Re.png
Type: image/png
Size: 15021 bytes
Desc: fitted-Re.png
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140211/1e0e282e/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fitted-abs.png
Type: image/png
Size: 15622 bytes
Desc: fitted-abs.png
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140211/1e0e282e/attachment-0002.png>
#
On Feb 9, 2014, at 2:45 PM, Andrea Graziani wrote:

            
Others have pointed out the theoretical difficulties regarding curve fitting on the complex plane. My concern looking at this is that you have apparently lapse into mathematical notation rather than functional programming notation above:

This 
2i*pi*10^(logtau)*f*10^logaT

Would need to be this:

 2*i*pi*10^(logtau)*f*10^logaT

You would start getting parser errors since '2i' is not a valid token name. I would also worry about negative powers. They can sometimes be numerically unstable. 

And you have not offered the values for temperatures which I suspect are the items in your start list: loga40=-3.76,loga30=-2.63,loga20=-1.39,loga0=1.68. I would think these are fixed rather than something to be optimized. You can leave them in the global environment or build them into the function but I don't think they should be in your start list. (Frede Aakmann T?gersen noted that issue but did not build a solution that incorporated addressed that concern.)

To examine that relationship you would:

loga40=-3.76;loga30=-2.63;loga20=-1.39;loga0=1.68
logaT <- c(rep(loga40,5),rep(loga30,5),rep(loga20,5),rep(0,5),rep(loga0,5))
 # To visualize the data:

 plot(logaT,Re(E), log="y")
 plot(logaT,Im(E), log="y")
#
On 11/02/2014 2:10 PM, David Winsemius wrote:
2i is fine:  it is the same as 0+2i, which is how it normally prints.

Duncan Murdoch
1 day later
#
Dear Rolf,

Thank you for your suggestion.
Based on your remarks I solved my problem using nlm().

Actually there are two quite straightforward ways to split the complex-valued problem into two ?linked? real-valued problems.

### 1. Real part and Imaginary part

# Experimental data
E1_data <- Re(E)
E2_data <- Im(E)

# Model function (same of my previous post only different notations)
E_2S2P1D <- function(f,logaT,logEg,logEe,k,h,delta,logbeta,logtau) 
  10^logEe+(10^logEg-10^logEe)*(
    1+delta*(2i*pi*10^(logtau)*f*10^logaT)^-k +
      (2i*pi*10^(logtau)*f*10^logaT)^-h +
      (2i*pi*10^(logtau)*f*10^logaT*10^(logbeta))^-1
  )^-1 

# Distance function
dist <- function(p)
sum( ( (E1_data - Re(E_2S2P1D(f_data,c(rep(p[1],5),rep(p[2],5),rep(p[3],5),rep(0,5),rep(p[4],5)),                                 p[5],p[6],p[7],p[8],p[9],p[10],p[11])) ) / E1_data )^2 ) + 
sum( ( (E2_data - Im(E_2S2P1D(f_data,c(rep(p[1],5),rep(p[2],5),rep(p[3],5),rep(0,5),rep(p[4],5)),                                 p[5],p[6],p[7],p[8],p[9],p[10],p[11])) ) / E2_data )^2 )

# fitting
fit <- nlm(dist, p = c(-3.76,-2.63,-1.39,1.68,log10(27000),log10(200),0.16,0.47,1.97,2.2,0.02))


### 2. Modulus and argument

# Experimental data
Emod_data <- Mod(E)
Earg_data <- Arg(E)

# Model function: same as above

# Distance function
dist <- function(p)
sum(((Emod_data - log10(Mod(E_2S2P1D(f_data,c(rep(p[1],5),rep(p[2],5),rep(p[3],5),rep(0,5),rep(p[4],5)),                                            p[5],p[6],p[7],p[8],p[9],p[10],p[11])) ) / Emod_data )^2 ) + 
sum( ( (Earg_data - Arg(E_2S2P1D(f_data,c(rep(p[1],5),rep(p[2],5),rep(p[3],5),rep(0,5),rep(p[4],5)),                                 p[5],p[6],p[7],p[8],p[9],p[10],p[11])) ) / Earg_data )^2 )

# fitting: same as above

Using the same starting values, the two approaches bring to slightly different solutions:

### 1. Real part and Imaginary part
[1] -3.8519181 -2.7342861 -1.4823740  1.7173982  4.4529298  1.4383334  0.1564904  0.4856774  2.2789567  3.9011926  0.1227758

### 2. Modulus and argument
[1] -3.8674680 -2.7250640 -1.4574350  1.6447868  4.4355748  1.2400092  0.1574215  0.4731295  2.0624878  3.7526197  0.1161640

Do you believe this is due only to ?numerical" reasons linked to how the nls() function works?

Anyway...Thanks again!
Andrea




Il giorno 11/feb/2014, alle ore 09:42, Rolf Turner <r.turner at auckland.ac.nz> ha scritto:
#
Hi Frede,

Thank you for your accurate answer!

If I understand well, your way to use nls() solves the problem using too many physical parameters.
I solved the problem following the other way that you and Rolf Turner suggested (i.e. splitting the complex-valued problem into two real-valued problems and using nlm() ).
If you are interested in more details, please see my answer to Rolf?s mail.
You noticed a key point of the mechanical problem!
However I?m looking for a ?single curve?, so-called ?master curve?, to fit all experimental data!
;)

Thanks again!
Andrea
#
On 13/02/14 12:03, Andrea Graziani wrote:
<SNIP>
I find it a bit surprising that there are differences in the third 
decimal place here.  I would have thought that the results would be 
exactly the same, or ***very*** close to it.

If you have done it right (I have not checked your code at all) the "two 
approaches" should amount to the same approach.  I.e. Mod(z)^2 is
equal to Re(z)^2 + Im(z)^2.

Check your code carefully.

Also it would be a good idea to try a "toy example", something much less 
complicated than your real problem, and see what happens there.

cheers,

Rolf Turner