Skip to content

Problem with comparing multiple data sets

3 messages · Mohammad Alimohammadi, John Kane, Jim Lemon

#
Hi everyone,

I am very new to R and I have a task to do. I appreciate any help. I have 3
data sets. Each data set has 4 columns. For example:

Class  Comment   Term   Text
0           com1        aac    text1
2           com2        aax    text2
1           com3        vvx    text3

Now I need t compare the class section between 3 data sets and assign the
most available class to that text. For example if text1 is assigned to
class 0 in data set 1&2 but assigned as 2 in data set 3 then it should be
assigned to class 0. If they are all the same so the class will be the
same. The ideal thing would be to keep the same format and just update the
class. Is there any easy way to do this?

Thanks a lot.
#
Hi Mohammad 

Welcome to the R-help list.

There probably is a fairly easy way to what you want but I think we probably need a bit more background information on what you are trying to achieve.  I know I'm not exactly clear on your decision rule(s). 

It would also be very useful to see some actual sample data in useable R format.Have a look at these links http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and http://adv-r.had.co.nz/Reproducibility.html for some hints on what you might want to include in your question.

In particular, read up about dput()  in those links and/or see ?dput.  This is the generally preferred way to supply sample or illustrative data to the R-help list.  It basically creates a perfect copy of the data as it exists on 'your' machine so that R-help readers see exactly what you do.  







John Kane
Kingston ON Canada
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
#
Hi Mohammad,
You know, I thought this would be fairly easy, but it wasn't really.

df1<-data.frame(Class=c(0,2,1),Comment=c("com1","com2","com3"),
 Term=c("aac","aax","vvx"),Text=c("text1","text2","text3"))
df2<-data.frame(Class=c(0,2,1),Comment=c("com1","com2","com3"),
 Term=c("aac","aax","vvx"),Text=c("text1","text2","text3"))
df3<-data.frame(Class=c(2,1,0),Comment=c("com1","com2","com3"),
 Term=c("aac","aax","vvx"),Text=c("text1","text2","text3"))
dflist<-list(df1,df2,df3)
dflist

# define a function that extracts the value from one field
# selected by a value in another field
extract_by_value<-function(x,field1,value1,field2) {
 return(x[x[,field1]==value1,field2])
}

# define another function that equates all of the values
sub_value<-function(x,field1,value1,field2,value2) {
 x[x[,field1]==value1,field2]<-value2
 return(x)
}

conformity<-function(x,fieldname1,value1,fieldname2) {
 # get the most frequent value in fieldname2
 # for the desired value in fieldname1
 most_freq<-as.numeric(names(which.max(table(unlist(lapply(x,
  extract_by_value,fieldname1,value1,fieldname2))))))
 # now set all the values to the most frequent
 for(i in 1:length(x))
  x[[i]]<-sub_value(x[[i]],fieldname1,value1,fieldname2,most_freq)
 return(x)
}

conformity(dflist,"Text","text1","Class")

Jim
On Sat, May 23, 2015 at 11:23 PM, John Kane <jrkrideau at inbox.com> wrote: