Skip to content

data frame manipulation with condition

14 messages · Uwe Ligges, Arnaud Gaboury, Sarah Goslee +1 more

#
Dear list,

n00b question, but still can't find any easy answer.

Here is a df:
x y
1 AA 1
2 BB 2
3 CC 3
4 AA 4


I want to modify this df this way :
 if df$x=="AA" then df$y=df$y*10
 if df$x=="BB" then df$y=df$y*25

and so on with other conditions.

TY for any help.

Trading
?
A2CT2 Ltd.


Arnaud Gaboury
?
A2CT2 Ltd.
#
On 24.02.2012 16:25, Arnaud Gaboury wrote:
Change
to

  df <- data.frame(x = c("AA","BB","CC","AA"), y = 1:4)

to make your object a sensible data.frame.
df$y[df$x=="AA"] <- df$y[df$x=="AA"] * 25

...


Uwe Ligges
#
TY Uwe,

So I will have to write a line for each condition? Right?

In fact I was trying to do something with apply in one line, but couldn't achieve any result. In fact, all my transformation will be multiplying one object by a specific number according to the value of df$x.

Arnaud Gaboury
?
A2CT2 Ltd.


-----Original Message-----
From: Uwe Ligges [mailto:ligges at statistik.tu-dortmund.de] 
Sent: vendredi 24 f?vrier 2012 16:33
To: Arnaud Gaboury
Cc: r-help at r-project.org
Subject: Re: [R] data frame manipulation with condition
On 24.02.2012 16:25, Arnaud Gaboury wrote:
Change
to

  df <- data.frame(x = c("AA","BB","CC","AA"), y = 1:4)

to make your object a sensible data.frame.
df$y[df$x=="AA"] <- df$y[df$x=="AA"] * 25

...


Uwe Ligges
#
On 24.02.2012 16:59, Arnaud Gaboury wrote:
In that case:

mult <- c(AA = 10, BB = 25)

Then:


df$y <- df$y * mult[df$x]


Uwe Ligges
#
x  y
1 AA 10
2 BB 50
3 CC 45
4 AA 40
5 DD NA
6 DD NA

My df is in fact much more longer than the chosen example shown here. It seems your tip didn't do the job.
I am expecting this as result :
x  y
1 AA 10  ----> if df$x==AA, df$y<-1*10 
2 BB 50   ----> if df$x==BB, df$y<-2*25 
3 CC 3         NOTHING
4 AA 40    ----> if df$x==AA, df$y<-4*10 
5 DD 75   ----> if df$x==DD, df$y<-5*15
6 DD 90   ----> if df$x==DD, df$y<-6*15

Arnaud Gaboury
?
A2CT2 Ltd.

-----Original Message-----
From: Uwe Ligges [mailto:ligges at statistik.tu-dortmund.de] 
Sent: vendredi 24 f?vrier 2012 17:07
To: Arnaud Gaboury
Cc: r-help at r-project.org
Subject: Re: [R] data frame manipulation with condition
On 24.02.2012 16:59, Arnaud Gaboury wrote:
In that case:

mult <- c(AA = 10, BB = 25)

Then:


df$y <- df$y * mult[df$x]


Uwe Ligges
#
On 24.02.2012 17:36, Arnaud Gaboury wrote:
This is not the I do the job for you hotline. You are free to think a 
little bit yourself given you have not managed in two attempts to 
describe your problem sufficiently well!

Uwe Ligges
#
You need, as I already suggested, to use a value of 1 for levels you don't want
to change.
AA BB CC AA DD DD
10 25  1 10 15 15
AA BB CC AA DD DD
10 50  3 40 75 90


On Fri, Feb 24, 2012 at 11:36 AM, Arnaud Gaboury
<arnaud.gaboury at a2ct2.com> wrote:
#
OK Uwe, I understand, and I will be more explicit.

Here is how could my df be:

reported <-
structure(list(Product = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 
11L, 12L, 12L, 13L, 14L, 14L), .Label = c("CL", "Cocoa", "Coffee C", 
"GC", "HG", "HO", "NG", "PL", "RB", "SI", "Sugar No 11", "ZC", 
"ZL", "ZW"), class = "factor"), reported.Price = c(105.35, 2380, 
2407, 2408, 202.35, 202.8, 202.95, 205.85, 206.05, 206.1, 206.2, 
1748, 378.8, 379.25, 379.5, 320.61, 2.538, 2.543, 1669, 1678.5, 
304.49, 321.39, 321.6, 321.65, 322.5, 322.55, 322.8, 323.04, 
3390, 3397.5, 24.16, 24.2, 24.22, 24.23, 24.54, 25.5, 25.55, 
631.75, 638, 53.77, 630.75, 633), reported.Nbr.Lots = c(6L, 3L, 
-1L, -2L, -40L, -1L, -1L, 10L, 5L, 6L, 19L, 17L, 23L, 12L, 35L, 
11L, -54L, -52L, 26L, 26L, 10L, -10L, 1L, 4L, 4L, 1L, 5L, 5L, 
17L, 17L, 114L, 71L, 16L, 27L, -3L, 3L, -3L, -89L, -1L, -1L, 
-51L, -51L)), .Names = c("Product", "reported.Price", "reported.Nbr.Lots"
), row.names = c(7L, 4L, 5L, 6L, 13L, 14L, 15L, 16L, 17L, 18L, 
19L, 8L, 9L, 10L, 11L, 12L, 20L, 21L, 22L, 23L, 35L, 36L, 37L, 
38L, 39L, 40L, 41L, 42L, 31L, 32L, 24L, 25L, 26L, 27L, 28L, 29L, 
30L, 2L, 3L, 1L, 33L, 34L), class = "data.frame")


Row will change. I am looking to multiply reported.Price by 100 IF Product=CL, multiply by 10 IF product=GC, multiply by 100 IF product=HG, multiply by 1000 IF Product=NG, multiply by 100 IF product=RB.

I hope I am clear enough, and YES I have tried many workarounds myself before posting. Feel free to ignore my post if you think I am lazy and disrespectful to the list.


Arnaud Gaboury
?
A2CT2 Ltd.



-----Original Message-----
From: Uwe Ligges [mailto:ligges at statistik.tu-dortmund.de] 
Sent: vendredi 24 f?vrier 2012 17:41
To: Arnaud Gaboury
Cc: r-help at r-project.org
Subject: Re: [R] data frame manipulation with condition
On 24.02.2012 17:36, Arnaud Gaboury wrote:
This is not the I do the job for you hotline. You are free to think a little bit yourself given you have not managed in two attempts to describe your problem sufficiently well!

Uwe Ligges
#
TY very much Sarah: your tip is doing the job:

reported <-
structure(list(Product = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 
11L, 12L, 12L, 13L, 14L, 14L), .Label = c("CL", "Cocoa", "Coffee C", 
"GC", "HG", "HO", "NG", "PL", "RB", "SI", "Sugar No 11", "ZC", 
"ZL", "ZW"), class = "factor"), reported.Price = c(105.35, 2380, 
2407, 2408, 202.35, 202.8, 202.95, 205.85, 206.05, 206.1, 206.2, 
1748, 378.8, 379.25, 379.5, 320.61, 2.538, 2.543, 1669, 1678.5, 
304.49, 321.39, 321.6, 321.65, 322.5, 322.55, 322.8, 323.04, 
3390, 3397.5, 24.16, 24.2, 24.22, 24.23, 24.54, 25.5, 25.55, 
631.75, 638, 53.77, 630.75, 633), reported.Nbr.Lots = c(6L, 3L, 
-1L, -2L, -40L, -1L, -1L, 10L, 5L, 6L, 19L, 17L, 23L, 12L, 35L, 
11L, -54L, -52L, 26L, 26L, 10L, -10L, 1L, 4L, 4L, 1L, 5L, 5L, 
17L, 17L, 114L, 71L, 16L, 27L, -3L, 3L, -3L, -89L, -1L, -1L, 
-51L, -51L)), .Names = c("Product", "reported.Price", "reported.Nbr.Lots"
), row.names = c(7L, 4L, 5L, 6L, 13L, 14L, 15L, 16L, 17L, 18L, 
19L, 8L, 9L, 10L, 11L, 12L, 20L, 21L, 22L, 23L, 35L, 36L, 37L, 
38L, 39L, 40L, 41L, 42L, 31L, 32L, 24L, 25L, 26L, 27L, 28L, 29L, 
30L, 2L, 3L, 1L, 33L, 34L), class = "data.frame")
reported <-
structure(list(Product = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 
11L, 12L, 12L, 13L, 14L, 14L), .Label = c("CL", "Cocoa", "Coffee C", 
"GC", "HG", "HO", "NG", "PL", "RB", "SI", "Sugar No 11", "ZC", 
"ZL", "ZW"), class = "factor"), reported.Price = c(10535, 23800, 
24070, 24080, 2023.5, 2028, 2029.5, 2058.5, 2060.5, 2061, 2062, 
1748000, 3788, 3792.5, 3795, 32061, 25.38, 25.43, 166900, 167850, 
30449, 32139, 32160, 32165, 32250, 32255, 32280, 32304, 3390, 
3397.5, 24.16, 24.2, 24.22, 24.23, 24.54, 25.5, 25.55, 631.75, 
638, 53.77, 630.75, 633), reported.Nbr.Lots = c(6L, 3L, -1L, 
-2L, -40L, -1L, -1L, 10L, 5L, 6L, 19L, 17L, 23L, 12L, 35L, 11L, 
-54L, -52L, 26L, 26L, 10L, -10L, 1L, 4L, 4L, 1L, 5L, 5L, 17L, 
17L, 114L, 71L, 16L, 27L, -3L, 3L, -3L, -89L, -1L, -1L, -51L, 
-51L)), .Names = c("Product", "reported.Price", "reported.Nbr.Lots"
), row.names = c(7L, 4L, 5L, 6L, 13L, 14L, 15L, 16L, 17L, 18L, 
19L, 8L, 9L, 10L, 11L, 12L, 20L, 21L, 22L, 23L, 35L, 36L, 37L, 
38L, 39L, 40L, 41L, 42L, 31L, 32L, 24L, 25L, 26L, 27L, 28L, 29L, 
30L, 2L, 3L, 1L, 33L, 34L), class = "data.frame")

Have a good weekend.

Arnaud Gaboury
?
A2CT2 Ltd.
Trade: +41 22 849 88 63
Fax:?? +41 22 849 88 66
arnaud.gaboury at a2ct2.com 

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. If you have received this email in error please notify the sender. 


-----Original Message-----
From: Sarah Goslee [mailto:sarah.goslee at gmail.com] 
Sent: vendredi 24 f?vrier 2012 17:54
To: Arnaud Gaboury
Cc: r-help at r-project.org
Subject: Re: [R] data frame manipulation with condition

You need, as I already suggested, to use a value of 1 for levels you don't want to change.
AA BB CC AA DD DD
10 25  1 10 15 15
AA BB CC AA DD DD
10 50  3 40 75 90
On Fri, Feb 24, 2012 at 11:36 AM, Arnaud Gaboury <arnaud.gaboury at a2ct2.com> wrote:
--
Sarah Goslee
http://www.functionaldiversity.org
#
Use mult[as.character(df$x)] instead of mult[df$x].
They are different when df$x is a factor and the
character version is what you want.

  > df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
  > mult <- c(AA = 10, BB = 25,DD=15)
  > df$y <- df$y * mult[as.character(df$x)]
  > df
     x  y
  1 AA 10
  2 BB 50
  3 CC NA
  4 AA 40
  5 DD 75
  6 DD 90

This gets the order right.  The NA for "CC" is because
your vector of multipliers didn't include an entry for
CC.  You can either add CC=1 to mult or work only on the
subset of the data which has entries in the mult vector.

  > df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
  > mult <- c(AA = 10, BB = 25,DD=15)
  > i <- as.character(df$x) %in% names(mult)
  > df$y[i] <- df$y[i] * mult[as.character(df$x[i])]
  > df
     x  y
  1 AA 10
  2 BB 50
  3 CC  3
  4 AA 40
  5 DD 75
  6 DD 90

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
On Fri, Feb 24, 2012 at 12:23 PM, William Dunlap <wdunlap at tibco.com> wrote:
R will coerce a factor to character to perform the comparison; explicitly
calling as.character() is not necessary:
[1] AA BB CC AA DD DD
[1]  TRUE FALSE FALSE  TRUE FALSE FALSE

See ?factor for details.

Sarah
#
In fact I need to use William tip: Use mult[as.character(df$x)] instead of mult[df$x].


Let's try again with a shorter df as example:

The rule: if AA, then multiply y by 2, if BB multiply y by 5, if CC do nothing, if DD multiply by 2.


Let's say day 1 I have df1:

df1 <-
structure(list(x = structure(c(1L, 2L, 2L, 3L), .Label = c("AA", 
"BB", "CC"), class = "factor"), y = 1:4), .Names = c("x", "y"
), row.names = c(NA, -4L), class = "data.frame")
x y
1 AA 1
2 BB 2
3 BB 3
4 CC 4
x  y
1 AA  2
2 BB 10
3 BB 15
4 CC  4

WORKING

Now day 2 with df2:
x  y
1 AA  2
2 AA  4
3 BB 15
4 BB 20
5 BB 25
6 CC  6
7 DD 14
8 DD 16

WORKING


Ty both of you and have a good weekend.


Arnaud Gaboury
?
A2CT2 Ltd.


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Arnaud Gaboury
Sent: vendredi 24 f?vrier 2012 18:17
To: Sarah Goslee
Cc: r-help at r-project.org
Subject: Re: [R] data frame manipulation with condition

TY very much Sarah: your tip is doing the job:

reported <-
structure(list(Product = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 13L, 14L, 14L), .Label = c("CL", "Cocoa", "Coffee C", "GC", "HG", "HO", "NG", "PL", "RB", "SI", "Sugar No 11", "ZC", "ZL", "ZW"), class = "factor"), reported.Price = c(105.35, 2380, 2407, 2408, 202.35, 202.8, 202.95, 205.85, 206.05, 206.1, 206.2, 1748, 378.8, 379.25, 379.5, 320.61, 2.538, 2.543, 1669, 1678.5, 304.49, 321.39, 321.6, 321.65, 322.5, 322.55, 322.8, 323.04, 3390, 3397.5, 24.16, 24.2, 24.22, 24.23, 24.54, 25.5, 25.55, 631.75, 638, 53.77, 630.75, 633), reported.Nbr.Lots = c(6L, 3L, -1L, -2L, -40L, -1L, -1L, 10L, 5L, 6L, 19L, 17L, 23L, 12L, 35L, 11L, -54L, -52L, 26L, 26L, 10L, -10L, 1L, 4L, 4L, 1L, 5L, 5L, 17L, 17L, 114L, 71L, 16L, 27L, -3L, 3L, -3L, -89L, -1L, -1L, -51L, -51L)), .Names = c("Product", "reported.Price", "reported.Nbr.Lots"
), row.names = c(7L, 4L, 5L, 6L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 8L, 9L, 10L, 11L, 12L, 20L, 21L, 22L, 23L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 31L, 32L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 2L, 3L, 1L, 33L, 34L), class = "data.frame")
reported <-
structure(list(Product = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 13L, 14L, 14L), .Label = c("CL", "Cocoa", "Coffee C", "GC", "HG", "HO", "NG", "PL", "RB", "SI", "Sugar No 11", "ZC", "ZL", "ZW"), class = "factor"), reported.Price = c(10535, 23800, 24070, 24080, 2023.5, 2028, 2029.5, 2058.5, 2060.5, 2061, 2062, 1748000, 3788, 3792.5, 3795, 32061, 25.38, 25.43, 166900, 167850, 30449, 32139, 32160, 32165, 32250, 32255, 32280, 32304, 3390, 3397.5, 24.16, 24.2, 24.22, 24.23, 24.54, 25.5, 25.55, 631.75, 638, 53.77, 630.75, 633), reported.Nbr.Lots = c(6L, 3L, -1L, -2L, -40L, -1L, -1L, 10L, 5L, 6L, 19L, 17L, 23L, 12L, 35L, 11L, -54L, -52L, 26L, 26L, 10L, -10L, 1L, 4L, 4L, 1L, 5L, 5L, 17L, 17L, 114L, 71L, 16L, 27L, -3L, 3L, -3L, -89L, -1L, -1L, -51L, -51L)), .Names = c("Product", "reported.Price", "reported.Nbr.Lots"
), row.names = c(7L, 4L, 5L, 6L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 8L, 9L, 10L, 11L, 12L, 20L, 21L, 22L, 23L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 31L, 32L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 2L, 3L, 1L, 33L, 34L), class = "data.frame")

Have a good weekend.

Arnaud Gaboury
?
A2CT2 Ltd.
Trade: +41 22 849 88 63
Fax:?? +41 22 849 88 66
arnaud.gaboury at a2ct2.com 

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. If you have received this email in error please notify the sender. 


-----Original Message-----
From: Sarah Goslee [mailto:sarah.goslee at gmail.com]
Sent: vendredi 24 f?vrier 2012 17:54
To: Arnaud Gaboury
Cc: r-help at r-project.org
Subject: Re: [R] data frame manipulation with condition

You need, as I already suggested, to use a value of 1 for levels you don't want to change.
AA BB CC AA DD DD
10 25  1 10 15 15
AA BB CC AA DD DD
10 50  3 40 75 90
On Fri, Feb 24, 2012 at 11:36 AM, Arnaud Gaboury <arnaud.gaboury at a2ct2.com> wrote:
--
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Whatever makes you happy.
+ structure(list(x = structure(c(1L, 2L, 2L, 3L), .Label = c("AA",
+ "BB", "CC"), class = "factor"), y = 1:4), .Names = c("x", "y"
+ ), row.names = c(NA, -4L), class = "data.frame")
AA BB BB CC
 2 10 15  4
AA BB BB CC
 2 10 15  4
AA AA BB BB BB CC DD DD
 2  4 15 20 25  6 14 16
AA AA BB BB BB CC DD DD
 2  4 15 20 25  6 14 16


On Fri, Feb 24, 2012 at 12:52 PM, Arnaud Gaboury
<arnaud.gaboury at a2ct2.com> wrote:
#
When a factor is used as a subscript it is treated
as its integer codes so explicit conversion to character
is needed if you want to subscript by names:
  > f <- factor(c("One","Three","Two"), levels=c("One","Two","Three"))
  > x <- c(Two=2, One=1, Three=3)
  > x[f]
    Two Three   One 
      2     3     1 
  > x[as.character(f)]
    One Three   Two 
      1     3     2
For most other functions (e.g., %in%, paste, sprintf("%s"))
you do not need an explicit conversion to character, but '['
requires you to choose.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com