I have the following .csv file containing about 40000 values (here only an extract and simplified version): NAME ; YEAR; ID; VALUE; CUMMB Sample1; 1998; 354; 45; 45 Sample1; 1999; 354; 23; 68 Sample1; 2000; NA; 66; 134 Sample1; 2001; NA; 98; 232 Sample1; 2002; NA; 36; 268 Sample1; 2003; NA; 59; 327 Sample1; 2004; NA; 64; 391 Sample1; 2005; 354; 23; 414 Sample1; 2006; 354; 69; 483 Sample1; 2007; 354; 94; 577 Sample1; 2008; 354; 24; 601 Sample2; 1964; 1342; 7; 7 Sample2; 1965; 1342; 24; 31 Sample3; 2002; 859; 90; 90 Sample3; 2003; NA; 93; 183 Sample3; 2004; NA; 53; 236 Sample3; 2005; 859; 98; 334 What I would like to do is to replace the NA values in ID with the values from the ID. E.g. all values in ID from Sample 1 should have the value 354; all values in ID from Sample 3 should have the value 859 etc. Is there a simple way to do this? Thanks for your help. -- View this message in context: http://r.789695.n4.nabble.com/Filling-in-missing-values-in-a-column-based-on-previous-and-following-values-tp4694993.html Sent from the R help mailing list archive at Nabble.com.
Filling in missing values in a column based on previous and following values
8 messages · fd, Gerrit Eichner, John McKown +1 more
Hello, Florian, function na.locf() from package zoo mightdo what you want. Hth -- Gerrit --------------------------------------------------------------------- Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner ---------------------------------------------------------------------
On Mon, 4 Aug 2014, fd wrote:
I have the following .csv file containing about 40000 values (here only an extract and simplified version): NAME ; YEAR; ID; VALUE; CUMMB Sample1; 1998; 354; 45; 45 Sample1; 1999; 354; 23; 68 Sample1; 2000; NA; 66; 134 Sample1; 2001; NA; 98; 232 Sample1; 2002; NA; 36; 268 Sample1; 2003; NA; 59; 327 Sample1; 2004; NA; 64; 391 Sample1; 2005; 354; 23; 414 Sample1; 2006; 354; 69; 483 Sample1; 2007; 354; 94; 577 Sample1; 2008; 354; 24; 601 Sample2; 1964; 1342; 7; 7 Sample2; 1965; 1342; 24; 31 Sample3; 2002; 859; 90; 90 Sample3; 2003; NA; 93; 183 Sample3; 2004; NA; 53; 236 Sample3; 2005; 859; 98; 334 What I would like to do is to replace the NA values in ID with the values from the ID. E.g. all values in ID from Sample 1 should have the value 354; all values in ID from Sample 3 should have the value 859 etc. Is there a simple way to do this? Thanks for your help. -- View this message in context: http://r.789695.n4.nabble.com/Filling-in-missing-values-in-a-column-based-on-previous-and-following-values-tp4694993.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
this is great, thanks! one problem I noticed though is that it fills all NA values in every column, is it possible to specify only one column, e.g. only ID (I have NA values in another column I want to keep) Kind regards, Florian Am 04.08.2014 um 11:31 schrieb Gerrit Eichner <Gerrit.Eichner at math.uni-giessen.de>:
Hello, Florian, function na.locf() from package zoo mightdo what you want. Hth -- Gerrit --------------------------------------------------------------------- Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner --------------------------------------------------------------------- On Mon, 4 Aug 2014, fd wrote:
I have the following .csv file containing about 40000 values (here only an extract and simplified version): NAME ; YEAR; ID; VALUE; CUMMB Sample1; 1998; 354; 45; 45 Sample1; 1999; 354; 23; 68 Sample1; 2000; NA; 66; 134 Sample1; 2001; NA; 98; 232 Sample1; 2002; NA; 36; 268 Sample1; 2003; NA; 59; 327 Sample1; 2004; NA; 64; 391 Sample1; 2005; 354; 23; 414 Sample1; 2006; 354; 69; 483 Sample1; 2007; 354; 94; 577 Sample1; 2008; 354; 24; 601 Sample2; 1964; 1342; 7; 7 Sample2; 1965; 1342; 24; 31 Sample3; 2002; 859; 90; 90 Sample3; 2003; NA; 93; 183 Sample3; 2004; NA; 53; 236 Sample3; 2005; 859; 98; 334 What I would like to do is to replace the NA values in ID with the values from the ID. E.g. all values in ID from Sample 1 should have the value 354; all values in ID from Sample 3 should have the value 859 etc. Is there a simple way to do this? Thanks for your help. -- View this message in context: http://r.789695.n4.nabble.com/Filling-in-missing-values-in-a-column-based-on-previous-and-following-values-tp4694993.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Mon, 4 Aug 2014, Florian Denzinger wrote:
this is great, thanks! one problem I noticed though is that it fills all NA values in every column, is it possible to specify only one column, e.g. only ID (I have NA values in another column I want to keep)
Yes, of course. Just access only one column, not all: Something like yourdataframe$ID <- na.locf( yourdataframe$ID) should replace the ID-column with the modified version you want. Regards -- Gerrit
Kind regards, Florian Am 04.08.2014 um 11:31 schrieb Gerrit Eichner <Gerrit.Eichner at math.uni-giessen.de>:
Hello, Florian, function na.locf() from package zoo mightdo what you want. Hth -- Gerrit --------------------------------------------------------------------- Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner --------------------------------------------------------------------- On Mon, 4 Aug 2014, fd wrote:
I have the following .csv file containing about 40000 values (here only an extract and simplified version): NAME ; YEAR; ID; VALUE; CUMMB Sample1; 1998; 354; 45; 45 Sample1; 1999; 354; 23; 68 Sample1; 2000; NA; 66; 134 Sample1; 2001; NA; 98; 232 Sample1; 2002; NA; 36; 268 Sample1; 2003; NA; 59; 327 Sample1; 2004; NA; 64; 391 Sample1; 2005; 354; 23; 414 Sample1; 2006; 354; 69; 483 Sample1; 2007; 354; 94; 577 Sample1; 2008; 354; 24; 601 Sample2; 1964; 1342; 7; 7 Sample2; 1965; 1342; 24; 31 Sample3; 2002; 859; 90; 90 Sample3; 2003; NA; 93; 183 Sample3; 2004; NA; 53; 236 Sample3; 2005; 859; 98; 334 What I would like to do is to replace the NA values in ID with the values from the ID. E.g. all values in ID from Sample 1 should have the value 354; all values in ID from Sample 3 should have the value 859 etc. Is there a simple way to do this? Thanks for your help. -- View this message in context: http://r.789695.n4.nabble.com/Filling-in-missing-values-in-a-column-based-on-previous-and-following-values-tp4694993.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140804/76e49d1a/attachment.pl>
On Mon, Aug 4, 2014 at 6:34 AM, Florian Denzinger
<florian.denzinger at uzh.ch> wrote:
Shortly after answering to your first email, I got to solution. Sorry for the unnecessary noise. Regards, F
Florian, I'd be interested in seeing your solution. I need as many techniques in my "bag of tricks" as I can find.
There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown
Hi John, I only meant the subsetting of the dataframe by using the zoo function suggested by Gerrit! Not a new solution/function, I am afraid. Regards, F Am 04.08.2014 um 14:13 schrieb John McKown <john.archie.mckown at gmail.com>:
On Mon, Aug 4, 2014 at 6:34 AM, Florian Denzinger <florian.denzinger at uzh.ch> wrote:
Shortly after answering to your first email, I got to solution. Sorry for the unnecessary noise. Regards, F
Florian, I'd be interested in seeing your solution. I need as many techniques in my "bag of tricks" as I can find. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown
On Mon, Aug 4, 2014 at 9:01 AM, Florian Denzinger
<florian.denzinger at uzh.ch> wrote:
Hi John, I only meant the subsetting of the dataframe by using the zoo function suggested by Gerrit! Not a new solution/function, I am afraid. Regards, F
I take it that means you _don't_ have a solution. If not, or even if
you do, my solution is below. There likely is a better way to do it.
I'm still learning R and there is a _LOT_ of things in all those
packages which I don't know.
library(reshape2); # needed for dcast()
#
# create variable
florian <- data.frame(
NAME=c('Sample1','Sample1','Sample1','Sample1','Sample1','Sample1','Sample1',
'Sample1','Sample1','Sample1','Sample1','Sample2','Sample2','Sample3','Sample3',
'Sample3','Sample3'),
YEAR=c( 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 1964,
1965, 2002, 2003, 2004, 2005),
ID=c( 354 , 354 ,NA ,NA ,NA ,NA ,NA , 354 , 354 , 354 , 354 , 1342 ,
1342 , 859 ,NA ,NA , 859 ),
VALUE=c( 45 , 23 , 66 , 98 , 36 , 59 , 64 , 23 , 69 , 94 , 24 , 7 , 24 , 90 ,
93 , 53 , 98 ),
CUMMB=c( 45 , 68 , 134 , 232 , 268 , 327 , 391 , 414 , 483 , 577 , 601 , 7 , 31
, 90 , 183 , 236 , 334 ));
#
# start of solution: y is a temporary table I need to
# find the non-NA ID values per NAME value.
# I actually use max() but ASSuME that all non-NA value are ==
# !is.na() is used to remove NA values from consideration.
y <- dcast(florian[!is.na(florian$ID),],NAME ~
.,value.var="ID",fun.aggregate=max,fill=-1,drop=FALSE);
names(y) <- c("NAME","IDy"); # Nicer names that dcast() makes.
#
# The real work is done in merge()
florian <- merge(x=florian,y=y,by=c("NAME"),all.x=TRUE);
florian$ID <- florian$IDy; #copy merged values to ID column
florian$IDy <- NULL; # remove temporary column
rm(y); # erase temporary frame.
There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown