Skip to content

How to create a numeric data.frame

13 messages · Sarah Goslee, Barry Rowlingson, Joshua Wiley +4 more

#
Hi All

I am new to R and  I am not sure of how this should be done. I have a matrix of 
985x100 values and the class is data.frame. 

A sample of my dataset looks like this (Since its a huge dataset and it would 
make the screen look more complex, I am pasting only the first few rows and 
columns. 

 V2           V3           V4           V5           V6
2   0.009953966  -0.01586103 -0.016227028  0.016774711 -0.021342598
3  -0.230181145  0.203303786 -0.685321843  0.147050709 -0.122269004
4  -0.552905273 -0.034039644 -0.511356309 -0.330524909 -0.239088566
5  -0.089739322 -0.082768643 -0.411209134 -0.301011664  1.560185991
6  -1.986059137 -0.252217616 -0.369044526 -0.585619405  0.545903757
7  -1.635875161  2.741310455 -0.058411313 -1.458825827  0.078480977
8   0.525846706 -1.134643662 -0.067014844 -1.431990219 -0.557057121
9  -0.913511821  0.688374777  0.376412044 -0.861746434  2.065507172
10 -1.538179621  0.814330376  1.639939042  -1.41478931  1.802738289
11  0.817957993 -0.426560507  2.773380242 -0.123291817  1.316883748


When I try to use this command to convert it to numeric, 

as.numeric(leu_cluster1): I get an error Error: (list) object cannot be coerced 
to type 'double'. I tried several functions and looked into other forums too, 
but could not find a solution. i am trying to change it to numeric data.frame 
and not to a matrix.


thanks in advance. :)
#
What are you trying to do? It looks numeric, although a visual
assessment isn't reliable.

The output of str() would be helpful.

But I'm not sure what your objective is. What do you think your data
frame is now, and what do you think it should be?

Sarah
On Mon, Jun 13, 2011 at 6:06 AM, Aparna <aparna.sampath26 at gmail.com> wrote:

  
    
#
Hi,

If your matrix is already numeric, then:

as.data.frame(your_matrix_name)

will do the trick.  However, if you have a matrix that is not numeric
(say it is character), then you could use:

as.data.frame(as.numeric(your_matrix_name))

Matrices can only hold one class of data (for example, all numeric, or
all character, or all factor), so if *any* of your data is character
(say one column contains people's names), then the entire matrix will
be character, and calling as.numeric() on it is probably not what you
want (the character data will get screwed up).  In which case, you
might convert the matrix to a data frame first:

as.data.frame(your_matrix_name)

because data frames can contain different classes of data in their
different columns.  Once it is a data frame, you could convert the
columns that should be numeric to numeric (say, columns 2 through 6
only) by:

your_data_name[, 2:6] <- lapply(your_data_name[, 2:6], as.numeric)

For relevant documentation, see

?as.numeric
?as.data.frame
## under the "Details" section, it shows the hierarchy of data types
## that is how I could know that if there is character data, the numeric
## data will be converted up to the character class
?matrix


Hope this helps,

Josh
On Mon, Jun 13, 2011 at 3:06 AM, Aparna <aparna.sampath26 at gmail.com> wrote:

  
    
#
On Mon, Jun 13, 2011 at 11:06 AM, Aparna <aparna.sampath26 at gmail.com> wrote:
You don't have a 'matrix' in the R sense of the word. You seem to
have a table of numbers which are stored in an object of class
'data.frame'.
A data frame doesn't have an overall sense of itself being numeric or
character. Only the columns have that, and they can be independent.
Because a data frame is implemented as a list where each element is
the same length. Each element is a vector of numbers or characters.
You are doing the equivalent of:

 as.numeric(list(foo=c(1,2,3))

 now you may think it reasonable to do an 'as.numeric' on that, but what about:

 as.numeric(list(foo=list(bar=c(1,2,3),baz=c(34,5)),bar=c("Hello","World"))

  how would you 'as.numeric' that?
There is no numeric data frame. There is only numeric matrix, or a
dataframe with all numeric columns. Do summary(mydataframe) to see
what class your columns all are.

Barry
#
On Mon, Jun 13, 2011 at 7:45 AM, Barry Rowlingson
<b.rowlingson at lancaster.ac.uk> wrote:
[snip]
Well, "Hello" is one of the first words spoken when meeting someone
and in programming, and I think "World" represents not just the earth,
but everything that exists.  Everything seems best represented by a
continuous circle, so logically I would 'as.numeric' that

$foo
$foo$bar
[1] 1 2 3

$foo$baz
[1] 34  5

$bar
[1] 1 0

However, R does not agree with my logic ;)

Josh
#
On Mon, Jun 13, 2011 at 4:45 PM, Barry Rowlingson
<b.rowlingson at lancaster.ac.uk> wrote:
but you could have one:

a <- data.frame(matrix(rnorm(100),10) # get some data
class(a) # check for its class
as.numeric(a) # whoops, won't work

class(as.matrix(a)) # change class, and
as.numeric(as.matrix(a)) # bingo, it works

PF
#
Hi
r-help-bounces at r-project.org napsal dne 13.06.2011 17:19:39:
wrote:
matrix of
Which results in vector of numbers

str(as.numeric(as.matrix(a)))
 num [1:100] 0.82 -1.339 1.397 0.673 -0.461 ...

data frame is convenient list structure which can contain vectors of 
various nature (numeric, character, factor, logical, ...)
and looks quite similar to Excel table.

matrix is a vector with (2) dimensions but as it is a vector it can not 
consist from objects of different nature (class). Therefore you can have 
numeric or character matrix but not numeric and character columns in your 
matrix.

and vector is vector (numeric, character, logical,  ...) but again you can 
not mix items of different class in one vector.

Regards
Petr
http://www.R-project.org/posting-guide.html
#
Hi Sarah

Thanks for your advice. My dataset contains all the normalized values. I have to 
give this dataset as input to ClusterCons package in R. In order to run the 
package, it requires the data to be converted to numeric data.frame. 

When i check my data using class(mydataset), it is in the form of data.frame. 
But when I try to convert it to numeric using as.numeric(mydataset), it gives me 
an error saying 

Error: (list) object cannot be coerced to type 'double'. 
Could you tell me why this error occurs. 

Thanks
Aparna
#
Hi Joshua

While looking at the data, all the values seem to be in numeric. As i mentioned, 
the dataset is already in data.frame. 

As suggested, I used str(mydata) and got the following result:


str(leu_cluster1)
'data.frame':   984 obs. of  100 variables:
 $ V2  : Factor w/ 986 levels "-0.00257361",..: 543 116 252 54 520 ...
 $ V3  : Factor w/ 986 levels "-0.000790437",..: 7 666 14 32 105 ...
 $ V4  : Factor w/ 986 levels "-0.0023231","-0.004207663",..: 6 353 267 208 
187... 
 $ V5  : Factor w/ 986 levels "-0.006466083",..: 585 627 146 131 263 ...
 $ V6  : Factor w/ 986 levels "-0.002119173",..: 11 56 111 898 780...
 
 
The columns are not numeric, which I understood from this. 


There is a function called data_check as part of Clustercons package 
I am using for this project. This helps me check whether the input data is 
numeric or not. Using this I could tell that my data is not numeric 
and that is why I was trying to convert it to numeric data.
 
This forum is of great help since I am able to learn more and 
thanks for making this forum so helpful to people like us who are new to R.

Aparna
#
of course it is. I forgot to say that the way I proposed works only if
the data-frame contains numeric objects only.

R is a great tool because you can get to the very same results in many
different ways.
Depending on the problem you're dealing with, you have to choose the
most efficient one.
Often, in my research work, the most efficient is the one that use as
less as possible lines of code:

Suppose a is a data.frame which contains numeric objects only

a <- data.frame(matrix(rnorm(100),10)) # some data

## 1 not very nice
b <- 0
for (j in 1:length(a)) b<-c(b,as.numeric(a[i]))
b<-b[-1]

## 2 long time ago I was a fortran guy
b<-numeric(length(a))
for (j in 1:dim(a)[2]){
  for (i in 1:dim(a)[1]){
     b[10*(j-1)+i] <- as.numeric(a[i,j])
  }
}

## 3 better: sapply function
as.numeric(sapply(a,function(x)as.numeric(x)))

## 4 shorter
as.numeric(as.matrix(a))

## which type of data a has
a <- data.frame(a,fact=sample(c('F1','F2'),dim(a)[1],replace=T))
class_a <- sapply(a,function(x)class(x))
class_a
a_numeric <- a[,class_a=='numeric']
as.numeric(as.matrix(a_numeric))

Regards,

PF
#
On Mon, Jun 13, 2011 at 6:47 PM, Aparna <aparna.sampath26 at gmail.com> wrote:
your data columns are not numeric but factors indeed.
you may try this one

a <- as.character(rnorm(100))		# some numeric data
adf <- data.frame(matrix(a,10))		# which are misinterpreted as factors
adf
adf[,1]
class(adf[,1]) # check for the class of the first column
sapply(adf,function(x)class(x)) # check classes for all columns

b <- sapply(adf,function(x)as.numeric(as.character(x))) #
as.character: use levels literally, as.numeric: transforms in numbers
b # look at b

class(b) # which is now a numeric matrix

best regards

PF
#
On Mon, Jun 13, 2011 at 2:11 PM, Patrizio Frederic
<frederic.patrizio at gmail.com> wrote:
But coercing to a character class first is not the recommended method.
 Also, I am leery about using sapply() with data frames, because it
converts them to matrices, which can cause havoc, if you have
different classes of data.  You mentioned that as a first step, you
had removed the names column from the data frame before trying to
convert it to numeric.  I would simply leave the names in, and then
(supposing they are in column 101)

leu_cluster1[, 1:100] <- lapply(leu_cluster1[, 1:100], function(x)
as.numeric(levels(x))[x])

apply the conversion to numeric on only the necessary columns.  This
simplifies life because you are not making interim data sets.  Using
lapply() allows you to work with (potentially) different classes of
data (although I realize in this particular case you are only dealing
with one class).  So long as you are assigning the results back into a
data frame (as above), the methods for lapply will automatically
conver the list back to a data frame.  If you are concerned about
this, just wrap the call in as.data.frame()

leu_cluster1[, 1:100] <- as.data.frame(lapply(
  leu_cluster1[, 1:100], function(x) as.numeric(levels(x))[x]))

Cheers,

Josh