Skip to content

how to address last and all but last column in dataframe

3 messages · drflxms, Mark Difford, David Winsemius

#
Dear R-colleagues,

another question from a newbie: I am creating a lot of simple
pivot-charts from my raw data using the reshape-package. In these charts
we have medical doctors judging videos in the columns and the videos
they judge in the rows. Simple example of chart/data.frame "input" with
two categories 1/0:

video 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

1      1 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
2      2 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  1
3      3 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
4      4 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
5      5 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  1  0
6      6 0 0 0 0 0 0 0 0 0  0  0  0  0  1  0  0  0  0  0  0  0
7      7 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
8      8 0 0 0 0 0 0 0 0 0  0  0  0  0  0  1  0  0  0  0  0  0
9      9 0 0 0 0 0 0 0 0 0  1  0  1  1  0  1  1  0  0  0  1  0
10    10 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0

I recently learned, that I can easily create a confusion matrix out of
this data using the following commands:

pairs<-data.frame(pred=factor(unlist(input[2:21])),ref=factor(input[,22]))
pred<-pairs$pred
ref <- pairs$ref
library (caret)
confusionMatrix(pred, ref, positive=1)

- where column 21 is the reference/goldstandard.

My problem is now, that I analyse data.frames with an unknown count of
columns. So to get rid of the first and last column for the "pred"
variable and to select the last column for the "ref" variable, I have to
look at the data.frame before doing the above commands to set the proper
column numbers.

It would be very comfortable, if I could address the last column not by
number (where I have to count beforehand) but by a variable "last column".

Probably there is a more easy solution for this problem using the names
of the columns as well: the reference is always number "21" the first
column is always called "video". So I tried:

attach(input)
pairs<-data.frame(pred=factor(unlist(input[[,-c(video,21)]])),ref=factor(input[[21]]))

which does not work unfortunately :-(.

I'd be very happy in case someone could help me out, cause I am really
tired of counting - there are a lot of tables to analyse...

Cheers and greetings from Munich,
Felix
#
Hi Felix,
Doubtless there are other routes. Generally I use ?length to get the number
of columns. Then do your arithmetic within the indexing operator ?"[" to
select what you want.

## Dummy ex. to select first and last column of any data frame ( = DF )
DF[ , c(1, length( names( DF ) ) ) ]

## Dummy ex. to select first and penultimate column of any data frame
DF[ , c(1, length( names( DF ) ) -1 ) ]

HTH, Mark.
drflxms wrote:

  
    
#
Not sure where your "input" came from. It's not in a format I would  
have expected of an R object and the first line is not in a form that  
would be particularly easy to read into a valid R object. Numbers are  
no legitimate object names. It's also not clear what you want to do  
with the duplicated line numbers at the beginning. Your question  
implies that you do not consider them part of the data.

In the future a worked example along the lines of that constructed by  
Jorge Ivan Velez in a recent answer to another question might increase  
chances of a prompt reply with tested code:

# Data set
DF=read.table(textConnection("V1 V2 V3
a    b    0:1:12
d    f    1:2:1
c    d    1:0:9
b    e    2:2:6
f    c    5:5:0"),header=TRUE)
closeAllConnections()

The "length" of a dataframe is the number of columns.

?length

Dataframes can be referenced using the extract operation e.g.   
df[<row>, <col>]

?Extract       # for additional information on indexing using  column  
vectors.

So:

video[ ,length(video)]  #should return the last column vector although  
it will be no longer be named.

The rest of the dataframe with intact column names could be obtained  
with:

video[  ,-length(video)]