Skip to content

Problem with comparing multiple data sets

5 messages · Mohammad Alimohammadi, David L Carlson, John Kane

#
Ok. so I read about the ("modeest") package that gives the results that I
am looking for (most repeated value).

I modified the data frame a little and moved the text to the first column.
This is the data frame with all 3 possible classes for each term.

=================================
structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L,
4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("#dac",
"#mac,#security",
"accountability,anonymous", "data security,encryption,security"
), class = "factor"), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L,
2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L),
    class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L,
    0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c("terms", "class.1",
"class.2", "class.3"), class = "data.frame", row.names = c(NA,
-49L))
=============================================
#Then I applied the function below:

======================
library(modeest)
df<- read.csv(file="short.csv", head= TRUE, sep=",")
apply(df[ ,2:length(df)], 1, mfv)

============================
# It gives the most frequent value for each row which is what I need. The
only problem is that all the values are displayed in one single row.

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 1 1 1 1 0 0 0 0 2 1 2

It would be much better to show them in separate rows.
For example:

 [1] 0

 [2] 0

 [3] 1
....

Any idea how to do this?




On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi <
mxalimohamma at ualr.edu> wrote:

            

  
    
#
Save the result of the apply() function:

Out <- apply(df[ ,2:length(df)], 1, mfv)

Then there are several options:

Approximately what you asked for
data.frame(Out)
t(t(Out))

More typing but exactly what you asked for
cat(paste0("[", 1:length(Out), "] ", Out), sep="\n")


David L. Carlson
Department of Anthropology
Texas A&M University


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Mohammad Alimohammadi
Sent: Wednesday, May 27, 2015 1:47 PM
To: John Kane; r-help at r-project.org
Subject: Re: [R] Problem with comparing multiple data sets

Ok. so I read about the ("modeest") package that gives the results that I
am looking for (most repeated value).

I modified the data frame a little and moved the text to the first column.
This is the data frame with all 3 possible classes for each term.

=================================
structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L,
4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("#dac",
"#mac,#security",
"accountability,anonymous", "data security,encryption,security"
), class = "factor"), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L,
2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L),
    class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L,
    0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c("terms", "class.1",
"class.2", "class.3"), class = "data.frame", row.names = c(NA,
-49L))
=============================================
#Then I applied the function below:

======================
library(modeest)
df<- read.csv(file="short.csv", head= TRUE, sep=",")
apply(df[ ,2:length(df)], 1, mfv)

============================
# It gives the most frequent value for each row which is what I need. The
only problem is that all the values are displayed in one single row.

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 1 1 1 1 0 0 0 0 2 1 2

It would be much better to show them in separate rows.
For example:

 [1] 0

 [2] 0

 [3] 1
....

Any idea how to do this?




On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi <
mxalimohamma at ualr.edu> wrote:

            

  
    
#
Thanks David it worked !

One more thing. I hope it's not complicated. Is it also possible to display
the terms for each row next to it?

for example:

[1] #dac    2
[2] #dac    0
[3] #dac    1
...
On Wed, May 27, 2015 at 2:18 PM, David L Carlson <dcarlson at tamu.edu> wrote:

            

  
    
#
cat(paste0("[", 1:length(Out), "] #dac     ", Out), sep="\n")

David
From: Mohammad Alimohammadi [mailto:mxalimohamma at ualr.edu]
Sent: Wednesday, May 27, 2015 2:29 PM
To: David L Carlson; r-help at r-project.org
Subject: Re: [R] Problem with comparing multiple data sets

Thanks David it worked !

One more thing. I hope it's not complicated. Is it also possible to display the terms for each row next to it?

for example:

[1] #dac    2
[2] #dac    0
[3] #dac    1
...
On Wed, May 27, 2015 at 2:18 PM, David L Carlson <dcarlson at tamu.edu<mailto:dcarlson at tamu.edu>> wrote:
Save the result of the apply() function:

Out <- apply(df[ ,2:length(df)], 1, mfv)

Then there are several options:

Approximately what you asked for
data.frame(Out)
t(t(Out))

More typing but exactly what you asked for
cat(paste0("[", 1:length(Out), "] ", Out), sep="\n")


David L. Carlson
Department of Anthropology
Texas A&M University


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org>] On Behalf Of Mohammad Alimohammadi
Sent: Wednesday, May 27, 2015 1:47 PM
To: John Kane; r-help at r-project.org<mailto:r-help at r-project.org>
Subject: Re: [R] Problem with comparing multiple data sets

Ok. so I read about the ("modeest") package that gives the results that I
am looking for (most repeated value).

I modified the data frame a little and moved the text to the first column.
This is the data frame with all 3 possible classes for each term.

=================================
structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L,
4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("#dac",
"#mac,#security",
"accountability,anonymous", "data security,encryption,security"
), class = "factor"), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L,
2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L),
    class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L,
    0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c("terms", "class.1",
"class.2", "class.3"), class = "data.frame", row.names = c(NA,
-49L))
=============================================
#Then I applied the function below:

======================
library(modeest)
df<- read.csv(file="short.csv", head= TRUE, sep=",")
apply(df[ ,2:length(df)], 1, mfv)

============================
# It gives the most frequent value for each row which is what I need. The
only problem is that all the values are displayed in one single row.

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 1 1 1 1 0 0 0 0 2 1 2

It would be much better to show them in separate rows.
For example:

 [1] 0

 [2] 0

 [3] 1
....

Any idea how to do this?



On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi <
mxalimohamma at ualr.edu<mailto:mxalimohamma at ualr.edu>> wrote:

            
--
Mohammad Alimohammadi | Graduate Assistant
University of Arkansas at Little Rock | College of Science and Mathematics
(CSAM)
501.346.8007<tel:501.346.8007> | mxalimohamma at ualr.edu<mailto:mxalimohamma at ualr.edu> | ualr.edu<http://ualr.edu>

Public URL: http://scholar.google.com/citations?user=MsfN_i8AAAAJ

______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Mohammad Alimohammadi | Graduate Assistant
University of Arkansas at Little Rock | College of Science and Mathematics (CSAM)
501.346.8007 | mxalimohamma at ualr.edu<mailto:mxalimohamma at ualr.edu> | ualr.edu<http://ualr.edu/>

Public URL: http://scholar.google.com/citations?user=MsfN_i8AAAAJ
[https://mailtrack.io/trace/mail/d062b2c3f56ab0e306570c96c3e24fb7c7b80685.png]
#
Lovely solution Mohammed. I had not even heard of the modeest package.   

For names, I'd just create another data.frame

mode.names  <-  data.frame(df[,1], Out)

John Kane
Kingston ON Canada
[mailto:r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org>]
____________________________________________________________
Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords & protects your account.