Skip to content

Issue replacing dataset values from read data

3 messages · Chang, Emily, Jim Lemon, William Michels

#
Dear all,

I am reading a modest dataset (2297 x 644) with specific values I want to change. The code is inelegant but looks like this:

df <- read.csv("mydata.csv", header = TRUE, stringsAsFactors = FALSE)

# yrsquit, packyrs missing for following IDs. Manually change.
for(myid in c(2165, 2534, 2553, 2611, 2983, 3233)){
     temp <- subset(df, id == myid)
     df[df$id == myid , "yrsquit"] <- 0
     temp.yrssmoke <- temp$age-(temp$agesmoke+temp$yrsquit)
     df[df$id == myid , "yrssmoke"]  <- temp.yrssmoke
     df[df$id == myid , "packyrs"] <- (temp$cigsdaytotal/20)*(temp.yrssmoke)
}

If I run just the first line and then the for loop, it works.
If I run the first line and for loop together, yrsquit is properly replaced to == 0, but packyrs is NA still.

Obviously there's many ways around this specific problem, but I was wondering what the issue is here, so as to look out for and avoid it in the future.

Apologies for the lack of reproducible code; I haven't yet reproduced the problem with generated data.

Much thanks in advance.

Best regards,
Emily
#
Hi Emily,
I haven't tested this exhaustively, but it seems to work:

df<-data.frame(id=2001:3300,yrssmoke=sample(1:40,1300,TRUE),
 cigsdaytotal=sample(1:60,1300,TRUE),yrsquit=sample(1:20,1300,TRUE))
dfNA<-sapply(df$id,"%in%",c(2165,2534,2553,2611,2983,3233))
# create your NA values
df[dfNA,c("yrsquit","packyrs")]<-NA
# since you know the NA id values
df[dfNA,"yrsquit"]<-0
df[dfNA,"packyrs"]<-df[dfNA,"yrssmoke"]*df[dfNA,"cigsdaytotal"]/20

Jim
On Sat, May 7, 2016 at 8:19 AM, Chang, Emily <Emily.Chang2 at ucsf.edu> wrote:
#
1. It's not immediately clear why you need the line "temp <- subset(df, id
== myid)"

2. The objects described by "temp$age", temp$agesmoke, and temp$yrsquit are
all vectors. So temp.yrssmoke is also a vector. This means that when you
replace, it should be with "<- temp.yrssmoke[i]", where "i" is the (row)
 number you're looping over (note "temp" re-numbers rows to 1 through 6,
another reason to remove the "temp" line).

3. Ditto for " <- (temp$cigsdaytotal[i]/20)*(temp.yrssmoke[i]) "

Hope this helps!

Bill

W. Michels, Ph.D.
On Fri, May 6, 2016 at 3:19 PM, Chang, Emily <Emily.Chang2 at ucsf.edu> wrote: