Is there a way for an apply-type function to return a data frame?
the closest thing I think of is
foo <- as.data.frame(sapply(...))
names(foo) <- c(....)
is there a more "elegant" way?
Thanks!
* Sam Steingold <fqf at tah.bet> [2012-08-30 08:56:17 -0400]:
Is there a way for an apply-type function to return a data frame?
the closest thing I think of is
foo <- as.data.frame(t(sapply(...)))
names(foo) <- c(....)
alas, this has a problem of creating a "homogeneous" data frame, i.e.,
all the columns are numbers or characters, because the function passed
to sapply returns c(....) and
c(1,2,"a")
[1] "1" "2" "a"
e.g.,
as.data.frame(t(sapply(c("a,1","b,2","c,3"),function (n) strsplit(n,",")[[1]])))
V1 V2
a,1 a 1
b,2 b 2
c,3 c 3
'data.frame': 3 obs. of 2 variables:
$ V1: Factor w/ 3 levels "a","b","c": 1 2 3
..- attr(*, "names")= chr "a,1" "b,2" "c,3"
$ V2: Factor w/ 3 levels "1","2","3": 1 2 3
..- attr(*, "names")= chr "a,1" "b,2" "c,3"
I wanted the V1 column to be a string, and V2 to be a number.
(I know stringsAsFactors=FALSE would replace factors with strings, but I
need a string and a number)
I could, of course, do ret$V2 <- as.numeric(ret$V2) but this would mean
a double conversion: from number to string first (by c()) and then back.
thanks.
data.frame(lapply(z,"[",1:2)) ## Is this not what you want?
a b
1 1 a
2 2 b
You really should spend a little more time with the docs figuring out
what R _does_ and a little less complaining about what you think R
cannot do.
-- Bert
On Thu, Aug 30, 2012 at 9:44 AM, Sam Steingold <sds at gnu.org> wrote:
* Sam Steingold <fqf at tah.bet> [2012-08-30 08:56:17 -0400]:
Is there a way for an apply-type function to return a data frame?
the closest thing I think of is
foo <- as.data.frame(t(sapply(...)))
names(foo) <- c(....)
alas, this has a problem of creating a "homogeneous" data frame, i.e.,
all the columns are numbers or characters, because the function passed
to sapply returns c(....) and
c(1,2,"a")
[1] "1" "2" "a"
e.g.,
as.data.frame(t(sapply(c("a,1","b,2","c,3"),function (n) strsplit(n,",")[[1]])))
V1 V2
a,1 a 1
b,2 b 2
c,3 c 3
'data.frame': 3 obs. of 2 variables:
$ V1: Factor w/ 3 levels "a","b","c": 1 2 3
..- attr(*, "names")= chr "a,1" "b,2" "c,3"
$ V2: Factor w/ 3 levels "1","2","3": 1 2 3
..- attr(*, "names")= chr "a,1" "b,2" "c,3"
I wanted the V1 column to be a string, and V2 to be a number.
(I know stringsAsFactors=FALSE would replace factors with strings, but I
need a string and a number)
I could, of course, do ret$V2 <- as.numeric(ret$V2) but this would mean
a double conversion: from number to string first (by c()) and then back.
thanks.
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/http://mideasttruth.comhttp://truepeace.orghttp://openvotingconsortium.orghttp://ffii.orghttp://www.memritv.org
Diplomacy is the art of saying "nice doggy" until you can find a nice rock.
* Sam Steingold <fqf at tah.bet> [2012-08-30 08:56:17 -0400]:
Is there a way for an apply-type function to return a data frame?
the closest thing I think of is
foo <- as.data.frame(t(sapply(...)))
names(foo) <- c(....)
alas, this has a problem of creating a "homogeneous" data frame, i.e.,
all the columns are numbers or characters, because the function passed
to sapply returns c(....) and
c(1,2,"a")
[1] "1" "2" "a"
e.g.,
as.data.frame(t(sapply(c("a,1","b,2","c,3"),function (n)
strsplit(n,",")[[1]])))
V1 V2
a,1 a 1
b,2 b 2
c,3 c 3
'data.frame': 3 obs. of 2 variables:
$ V1: Factor w/ 3 levels "a","b","c": 1 2 3
..- attr(*, "names")= chr "a,1" "b,2" "c,3"
$ V2: Factor w/ 3 levels "1","2","3": 1 2 3
..- attr(*, "names")= chr "a,1" "b,2" "c,3"
I wanted the V1 column to be a string, and V2 to be a number.
(I know stringsAsFactors=FALSE would replace factors with strings,
but I
need a string and a number)
I could, of course, do ret$V2 <- as.numeric(ret$V2) but this would
mean
a double conversion: from number to string first (by c()) and then
back.
It is starting as a 'string' ('character' in R parlance) so you will
need to coerce it to "numeric" at some point:
Consider this alternate route:
> do.call(rbind, strsplit(c("a,1","b,2","c,3"), ",") )
[,1] [,2]
[1,] "a" "1"
[2,] "b" "2"
[3,] "c" "3"
> as.data.frame( do.call(rbind, strsplit(c("a,1","b,2","c,3"), ",") ) )
V1 V2
1 a 1
2 b 2
3 c 3
> str( as.data.frame( do.call(rbind, strsplit(c("a,1","b,2","c,3"),
",") ) , stringsAsFactors=FALSE) )
'data.frame': 3 obs. of 2 variables:
$ V1: chr "a" "b" "c"
$ V2: chr "1" "2" "3"
Here are two ways of turning a character vector like yours into a data.frame,
neither of which uses an apply-like function.
> s <- c(XVI="p,16", XVII="q,17", XVIII="r,18")
> d1 <- data.frame(Letter=sub(",.*", "", s), Number=as.integer(sub(".*,","",s)))
> d2 <- read.table(text=s, sep=",", col.names=c("Letter","Number"), row.names=names(s))
> d1
Letter Number
XVI p 16
XVII q 17
XVIII r 18
> all.equal(d1, d2)
[1] TRUE
I don't agree with your analysis of what went wrong with your example
> z0 <- as.data.frame(t(sapply(c("a,1","b,2","c,3"),function (n) strsplit(n,",")[[1]])))
> str(z0)
'data.frame': 3 obs. of 2 variables:
$ V1: Factor w/ 3 levels "a","b","c": 1 2 3
..- attr(*, "names")= chr "a,1" "b,2" "c,3"
$ V2: Factor w/ 3 levels "1","2","3": 1 2 3
..- attr(*, "names")= chr "a,1" "b,2" "c,3"
You wrote
I could, of course, do ret$V2 <- as.numeric(ret$V2) but this would mean
a double conversion: from number to string first (by c()) and then back.
The "numbers" 1,2,3 were always considered to be strings, because
strsplit() takes strings, like "a,1", and returns a list of vectors of strings,
like list(c("a","1")). The c() function has nothing to do with it.
Functions like read.table() will guess when it is appropriate to convert things
from strings to numbers, but most times you have to do the conversion explicitly.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of Sam Steingold
Sent: Thursday, August 30, 2012 9:44 AM
To: r-help at r-project.org
Subject: Re: [R] apply --> data.frame
* Sam Steingold <fqf at tah.bet> [2012-08-30 08:56:17 -0400]:
Is there a way for an apply-type function to return a data frame?
the closest thing I think of is
foo <- as.data.frame(t(sapply(...)))
names(foo) <- c(....)
alas, this has a problem of creating a "homogeneous" data frame, i.e.,
all the columns are numbers or characters, because the function passed
to sapply returns c(....) and
c(1,2,"a")
[1] "1" "2" "a"
e.g.,
as.data.frame(t(sapply(c("a,1","b,2","c,3"),function (n) strsplit(n,",")[[1]])))
V1 V2
a,1 a 1
b,2 b 2
c,3 c 3
'data.frame': 3 obs. of 2 variables:
$ V1: Factor w/ 3 levels "a","b","c": 1 2 3
..- attr(*, "names")= chr "a,1" "b,2" "c,3"
$ V2: Factor w/ 3 levels "1","2","3": 1 2 3
..- attr(*, "names")= chr "a,1" "b,2" "c,3"
I wanted the V1 column to be a string, and V2 to be a number.
(I know stringsAsFactors=FALSE would replace factors with strings, but I
need a string and a number)
I could, of course, do ret$V2 <- as.numeric(ret$V2) but this would mean
a double conversion: from number to string first (by c()) and then back.
thanks.
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/http://mideasttruth.comhttp://truepeace.orghttp://openvotingconsortium.orghttp://ffii.orghttp://www.memritv.org
Diplomacy is the art of saying "nice doggy" until you can find a nice rock.
Hello,
Yet another alternative.
library(plyr)
dfr <- ldply(strsplit(c("a,1", "b,2", "c,3"), ","), identity)
str(dfr)
#dfr$V2 <- as.numeric(dfr$V2)
So, if the op was about conversion to df, the answer is yes.
Rui Barradas
Em 30-08-2012 18:14, David Winsemius escreveu:
On Aug 30, 2012, at 9:44 AM, Sam Steingold wrote:
* Sam Steingold <fqf at tah.bet> [2012-08-30 08:56:17 -0400]:
Is there a way for an apply-type function to return a data frame?
the closest thing I think of is
foo <- as.data.frame(t(sapply(...)))
names(foo) <- c(....)
alas, this has a problem of creating a "homogeneous" data frame, i.e.,
all the columns are numbers or characters, because the function passed
to sapply returns c(....) and
c(1,2,"a")
[1] "1" "2" "a"
e.g.,
as.data.frame(t(sapply(c("a,1","b,2","c,3"),function (n)
strsplit(n,",")[[1]])))
V1 V2
a,1 a 1
b,2 b 2
c,3 c 3
'data.frame': 3 obs. of 2 variables:
$ V1: Factor w/ 3 levels "a","b","c": 1 2 3
..- attr(*, "names")= chr "a,1" "b,2" "c,3"
$ V2: Factor w/ 3 levels "1","2","3": 1 2 3
..- attr(*, "names")= chr "a,1" "b,2" "c,3"
I wanted the V1 column to be a string, and V2 to be a number.
(I know stringsAsFactors=FALSE would replace factors with strings, but I
need a string and a number)
I could, of course, do ret$V2 <- as.numeric(ret$V2) but this would mean
a double conversion: from number to string first (by c()) and then back.
It is starting as a 'string' ('character' in R parlance) so you will
need to coerce it to "numeric" at some point:
Consider this alternate route:
* Bert Gunter <thagre.oregba at trar.pbz> [2012-08-30 09:59:46 -0700]:
You really should spend a little more time with the docs figuring out
what R _does_ and a little less complaining about what you think R
cannot do.
The only thing I think R cannot do is compact its memory, thus,
effectively, leaking it in _some_ situations.
The rest are just my humble questions...
PS. speaking about "complaining", my pet peeve atm is the speed (or,
rather, lack thereof) of e1071 functions read.matrix.csr and
write.matrix.csr (they are implemented in R, not in C, and do a lot of
string manipulation, so they slowness is not surprising)
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Sam Steingold
Sent: Thursday, August 30, 2012 9:17 PM
To: William Dunlap
Cc: r-help at r-project.org
Subject: Re: [R] apply --> data.frame
* William Dunlap <jqhaync at gvopb.pbz> [2012-08-30 17:35:08 +0000]:
I don't agree with your analysis of what went wrong with your example
a double conversion: from number to string first (by c()) and then
back.
I did not make myself quite clear, sorry.
I should have written something like
c(1,2,"a") ==> "1" "2" "a" =[as.numeric]=> 1 2 "a"
But vector cannot contain values of different mode (character, numeric). There are other structures capable of this. Converting such vector by as numeric inserts NA whenever it encounters non numeric characters.
as.numeric(c(1,2,"a") )
[1] 1 2 NA
Warning message:
NAs introduced by coercion
Regards
Petr
do.call/rbind appeared to be TRT. I tried it and got a data frame with
list columns (instead of vectors);
as.data.frame(do.call(rbind,lapply(list.files(...), function (name) {
....
c(name,list(num1,num2,num3), # num* come from some calculations above
strsplit(sub("[^-]*(train|test)[^-]*(-(S)?pca([0-9]*))?-s([0-9]*)c([0-9.]*)\\.score",
"\\1,\\3,\\4,\\5,\\6",name),",")[[1]])
})), stringsAsFactors = FALSE)
'data.frame': 2 obs. of 8 variables:
$ file :List of 2
..$ : chr "zzz_test_0531_0630-Spca181-s0c10.score"
..$ : chr "zzz_train_0531_0630-Spca181-s0c10.score"
$ lift.quality:List of 2
..$ : num 0.59
..$ : num 0.621
$ proficiency :List of 2
..$ : num 0.0472
..$ : num 0.0472
$ set :List of 2
..$ : chr "test"
..$ : chr "train"
$ scale :List of 2
..$ : chr "S"
..$ : chr "S"
$ pca :List of 2
..$ : chr "181"
..$ : chr "181"
$ s :List of 2
..$ : chr "0"
..$ : chr "0"
$ c :List of 2
..$ : chr "10"
..$ : chr "10"
I guess the easiest way is to replace c(...list()...) with c(...) but
that would mean converting num1,num2,num3 to string and back which I
want to avoid for aesthetic reasons. Any better suggestions?
thanks a lot!
It is hard to help when you don't give an example of your input data
and what you want to be computed (in a form one can source or copy
into an R session). Is the following something like what you are doing?
Suppose you have a function that takes a file name and
returns a list of things of various types extracted from the
file. A toy example would be
fileExtract <- function(fileName) {
fi <- file.info(fileName)
byte0 <- if (fi$isdir || fi$size < 1) NA_integer_ else readBin(fileName, what="integer", size=1, n=1)
list(Name=basename(fileName), IsDir=fi$isdir, Size=fi$size, FirstByte = byte0, ModTime=fi$mtime)
}
Then you can get the list of rows that you want converted to a data.frame
with
rows <- lapply(dir(R.home(), full.names=TRUE), fileExtract)
E.g., I get
> dput(rows[1:2])
list(structure(list(Name = "bin", IsDir = TRUE, Size = 0, FirstByte = NA_integer_,
ModTime = structure(1343316337, class = c("POSIXct", "POSIXt"
))), .Names = c("Name", "IsDir", "Size", "FirstByte", "ModTime"
)), structure(list(Name = "CHANGES", IsDir = FALSE, Size = 28204,
FirstByte = 87L, ModTime = structure(1340406834, class = c("POSIXct",
"POSIXt"))), .Names = c("Name", "IsDir", "Size", "FirstByte",
"ModTime")))
Note that the j'th element of each row has a fixed type.
You want a data.frame with columns named "Name", "IsDir",
"Size", and "FirstByte" where the i'th row contains the data in row[[i]].
If that is what you want then here is a function that does a pretty good job of it:
function (listOfRows, nItemsPerRow = unique(vapply(listOfRows,
length, 0)), col.names = names(rowTemplate), rowTemplate = listOfRows[[1]],
...)
{
stopifnot(length(nItemsPerRow) == 1, nItemsPerRow == length(rowTemplate))
if (is.null(col.names)) {
col.names <- sprintf("V%d", seq_len(nItemsPerRow))
}
else {
stopifnot(nItemsPerRow == length(col.names))
}
columns <- lapply(structure(seq_len(nItemsPerRow), names = col.names),
FUN = function(i) {
v <- vapply(listOfRows, function(Row) Row[[i]], rowTemplate[[i]])
if (is.matrix(v)) { # for when length(rowTemplate[[i]])>1
v <- t(v)
}
v
})
data.frame(columns, ...)
}
E.g.,
str(f(rows))
'data.frame': 19 obs. of 5 variables:
$ Name : Factor w/ 19 levels "bin","CHANGES",..: 1 2 3 4 5 6 7 8 9 10 ...
$ IsDir : logi TRUE FALSE FALSE TRUE TRUE TRUE ...
$ Size : num 0 28204 18351 0 0 ...
$ FirstByte: int NA 87 9 NA NA NA NA 101 NA 82 ...
$ ModTime : num 1.34e+09 1.34e+09 1.34e+09 1.34e+09 1.34e+09 ...
Note that the POSIXct item, ModTime, got converted to numeric because
vapply didn't handle that class properly.
An advantage of vapply is that it will do some type checking:
f(list(list(a=1,b=11), list(a=2,b="Twelve")))
Error in vapply(listOfRows, function(Row) Row[[i]], rowTemplate[[i]]) :
values must be type 'double',
but FUN(X[[2]]) result is type 'character'
It will also deal with things like the following, where each row element
contains a few vectors and you want the each vector element in its
own column:
> str(f(list(list(1:2, 1+1i, letters[1:3]), list(11:12, 11+11i, letters[4:6]))))
'data.frame': 2 obs. of 6 variables:
$ V1.1: int 1 11
$ V1.2: int 2 12
$ V2 : cplx 1+1i 11+11i
$ V3.1: Factor w/ 2 levels "a","d": 1 2
$ V3.2: Factor w/ 2 levels "b","e": 1 2
$ V3.3: Factor w/ 2 levels "c","f": 1 2
There are other ways to do this, but I don't know if this is the problem
you want to solve.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of Sam Steingold
Sent: Friday, August 31, 2012 9:11 AM
To: r-help at r-project.org; David Winsemius
Subject: Re: [R] apply --> data.frame
* David Winsemius <qjvafrzvhf at pbzpnfg.arg> [2012-08-30 10:14:34 -0700]:
do.call/rbind appeared to be TRT. I tried it and got a data frame with
list columns (instead of vectors);
as.data.frame(do.call(rbind,lapply(list.files(...), function (name) {
....
c(name,list(num1,num2,num3), # num* come from some calculations above
strsplit(sub("[^-]*(train|test)[^-]*(-(S)?pca([0-9]*))?-s([0-9]*)c([0-9.]*)\\.score",
"\\1,\\3,\\4,\\5,\\6",name),",")[[1]])
})), stringsAsFactors = FALSE)
'data.frame': 2 obs. of 8 variables:
$ file :List of 2
..$ : chr "zzz_test_0531_0630-Spca181-s0c10.score"
..$ : chr "zzz_train_0531_0630-Spca181-s0c10.score"
$ lift.quality:List of 2
..$ : num 0.59
..$ : num 0.621
$ proficiency :List of 2
..$ : num 0.0472
..$ : num 0.0472
$ set :List of 2
..$ : chr "test"
..$ : chr "train"
$ scale :List of 2
..$ : chr "S"
..$ : chr "S"
$ pca :List of 2
..$ : chr "181"
..$ : chr "181"
$ s :List of 2
..$ : chr "0"
..$ : chr "0"
$ c :List of 2
..$ : chr "10"
..$ : chr "10"
I guess the easiest way is to replace c(...list()...) with c(...) but
that would mean converting num1,num2,num3 to string and back which I
want to avoid for aesthetic reasons. Any better suggestions?
thanks a lot!
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/http://jihadwatch.orghttp://thereligionofpeace.comhttp://palestinefacts.orghttp://ffii.orghttp://pmw.org.il
I don't have an attitude problem. You have a perception problem.