downsampling - R-help | R Mailing Lists

Fri, Jul 24, 2009 12:32 AM #

Hi,
I am looking for ways to donwsample one-dimensional vectors.
For example,

x=sample(1:5, 115, replace=TRUE)

How do I downsample this vector to 100 entries? Are there any R functions or packages that provide such functionality. 

I did find the zoo package and the aggregate() function, but these appear to be rather specific for time-series. 

Thanks in advance,
Jan

Michael Knudsen

Fri, Jul 24, 2009 1:03 AM #

On Fri, Jul 24, 2009 at 9:32 AM, Jan Wiener<jan.wiener at tuebingen.mpg.de> wrote:

What exactly do you mean by downsampling? Do you just want to sample
100 random entries from x?

sample(sample(1:5,115,replace=TRUE),100,replace=FALSE))

Michael Knudsen
micknudsen at gmail.com
http://lifeofknudsen.blogspot.com/

Warren Young

Fri, Jul 24, 2009 2:16 AM #

Michael Knudsen wrote:

It means that the original 115 points should be treated as a continuous 
function of x, or t, or whatever the horizontal axis is, with new values 
coming from this function at 100 evenly-spaced points along this function.

This procedure is how a sound editing program can produce a 
good-sounding 44.1 kHz CD quality file from material recorded at 48 kHz, 
for instance.  Something similar happens when you ask your photo editing 
program to give you a smaller version of, say, a 12 Mpix picture for 
emailing or putting up on the web.  These are all forms of interpolation.

There's a degenerate case, where the number of output samples divides 
evenly into the number of input samples.  For instance, to downsample a 
96 kHz audio file to 48 kHz, just throw away every other sample.

I, too, wish I know how to do the harder interpolation case in R.  I've 
been in the OP's shoes, fighting with zoo and failing.  The last time I 
had to do this, I gave up on R and did it in Mathematica.  I also 
remember that it was easy to do this in Igor Pro when I played with its 
demo version.

Philipp Pagel

Fri, Jul 24, 2009 4:12 AM #

On Fri, Jul 24, 2009 at 03:16:58AM -0600, Warren Young wrote:

There probably is a proper function for that and some expert will
point it out. Until then I'll share my thoughts:

# make up some data
foo <- data.frame(x= 1:115, y=jitter(sin(1:115/10), 1000))
plot(foo)

# use approx for interpolation
bar <- approx(foo, n=30)
lines(bar, col='red', lwd=2)

# or use spline for interpolation
bar <- spline(foo, n=30)
lines(bar, col='green', lwd=2)

# or fit a loess curve
# had to play with span to make it look ok
model <- loess(y~x, foo, span=1/2)   
x <- seq(1, 115, length.out=30)
bar <- predict(model, newdata=data.frame(x=x, y=NA))
lines(x, bar, col='blue', lwd=2)


Jan, does that help a little?

cu
	Philipp

Dr. Philipp Pagel
Lehrstuhl f?r Genomorientierte Bioinformatik
Technische Universit?t M?nchen
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

Jan M. Wiener

Mon, Jul 27, 2009 5:42 AM #

Dear Philipp and R-Users,

thank you very much for the help.

However, both approx() and spline() seem to select the number of
required data points from the original data (at the correct positions,
of course) and ignore the remaining data points, as the following
example demonstrates:

$x
[1] 1 3 5

$y
[1] 1 2 0

Essentially, what approx has done (spline does the same) is to simply
select the first, third, and fifth entry (as we want to downsample a 5
point vector into a three point vector). The second and fourth data
point are completely ignored. This can result in quite dramatic changes
of your data, if the data points selected by approx() or spline() happen
to be outliers and if you downsample data by a rather strong factor.

Best,
Jan

Philipp Pagel wrote:

Dr. Jan M. Wiener
Centre for Cognitive Science
University of Freiburg, Institute of Computer Science and Social Research (IIG)
Friedrichstr. 50, D-79098 Freiburg, GERMANY
-
e-mail: mail at jan-wiener.net
phone: ++49 (0)761 203 4951
url: www.jan-wiener.net

Philipp Pagel

Mon, Jul 27, 2009 7:22 AM #

On Mon, Jul 27, 2009 at 02:42:33PM +0200, Jan M. Wiener wrote:

That seems to be what Warren described as the 'degenerate case'
where approx will 'just throw away every other sample'. If you choose
a differetn n (e.g. n=4) interpolation does happen.

Yes, that could affect your downsampled data. For more
robustness it would probably be better to fit a proper model (if you
have one) or a lowess curve (or smooth.spline) and go from there.

cu
	Philipp

Dr. Philipp Pagel
Lehrstuhl f?r Genomorientierte Bioinformatik
Technische Universit?t M?nchen
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/