creating mulptiple new variables from one data.frame according to columns and rows in that frame
My guess is that we are being affected by FAQ 7.31 (good old floating point numbers). The test 'age %in% 5:50' might be affected by round off. Something like the following might be better: age < 5 | (abs(age - round(age)) < 0.001) This should give TRUE for all ages that are 'close' to the year. Take a look at your data where you thing values might be missing and set 'options(digit=20)' to print out the full values.
On Wed, Nov 4, 2009 at 8:03 AM, Hayes, Daniel <D.J.Hayes at liverpool.ac.uk> wrote:
Jim Holtman, Thank you for your reply. Your script is very concise and I think it could help me. However when I run it on my real data object (musigma.lat.m) the age range from 5-50 skips certain full years (see script below). Am not sure why that is and no error is given. Hoping you can help. Thank you in advance for your time and energy. All the best, Daniel
dput(musigma.lat.m[580:620,])
structure(list(age = c(48.25, 48.3333333333333, 48.4166666666667,
48.5, 48.5833333333333, 48.6666666666667, 48.75, 48.8333333333333,
48.9166666666667, 49, 49.0833333333333, 49.1666666666667, 49.25,
49.3333333333333, 49.4166666666667, 49.5, 49.5833333333333, 49.6666666666667,
49.75, 49.8333333333333, 49.9166666666667, 50, 0, 0.0833333333333333,
0.166666666666667, 0.25, 0.333333333333333, 0.416666666666667,
0.5, 0.583333333333333, 0.666666666666667, 0.75, 0.833333333333333,
0.916666666666667, 1, 1.08333333333333, 1.16666666666667, 1.25,
1.33333333333333, 1.41666666666667, 1.5), country = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bolivia", "Brazil",
"Colombia", "Dominican Rep.", "El Salvador", "Guatemala", "Guyana",
"Haiti", "Honduras", "Nicaragua", "Paraguay", "Peru", "Suriname"
), class = "factor"), mu = c(10.7198320154036, 10.7193221119285,
10.7188036231439, 10.7182764259851, 10.7177406001273, 10.7171962535812,
10.7166435245754, 10.7160826629999, 10.7155141252060, 10.7149385933270,
10.7143568116012, 10.7137696820872, 10.7131779280271, 10.7125822168258,
10.7119832145823, 10.7113816139594, 10.7107780960397, 10.7101732860418,
10.7095677728307, 10.7089620284128, 10.7083564497153, 10.7077512971194,
11.548875536071, 11.4634458099448, 11.4113675486745, 11.3384424250672,
11.2435706626324, 11.1313969585720, 11.0086560681222, 10.8827443523793,
10.7598371816865, 10.6440424747848, 10.5382165128003, 10.4442220905656,
10.3633207905823, 10.2961499250469, 10.2427320635721, 10.2025802100475,
10.1749531325293, 10.1590477762319, 10.1540156426321), sigma = c(0.0947487228789027,
0.0947760295260326, 0.0948033853581562, 0.0948307832769866, 0.094858216728106,
0.0948856796527442, 0.0949131660004063, 0.0949406718763748, 0.0949681949273155,
0.0949957322607503, 0.0950232806230888, 0.095050836445582, 0.0950783958990592,
0.0951059550037287, 0.0951335102859937, 0.0951610590705954, 0.0951885984623664,
0.0952161256367413, 0.0952436392777666, 0.0952711384472643, 0.0952986226318235,
0.0953260918098295, 0.108394172852678, 0.112555919942990, 0.114345649992535,
0.115763779372203, 0.116984886895669, 0.118065092089138, 0.119029362771532,
0.119887968678076, 0.120638553936562, 0.121278180095107, 0.121810743569063,
0.122245010348365, 0.122590801228219, 0.122858869689557, 0.123059216409329,
0.123199542683827, 0.123286339009648, 0.123324768295488, 0.123319375423601
)), .Names = c("age", "country", "mu", "sigma"), row.names = c("580",
"581", "582", "583", "584", "585", "586", "587", "588", "589",
"590", "591", "592", "593", "594", "595", "596", "597", "598",
"599", "600", "601", "602", "603", "604", "605", "606", "607",
"608", "609", "610", "611", "612", "613", "614", "615", "616",
"617", "618", "619", "620"), class = "data.frame")
result <- lapply(split(musigma.lat.m, musigma.lat.m$country), function(.ctry){
+ ? ? ?# keep all < 5 and only integers over 5 + ? ? ?subset(.ctry, .ctry$age < 5 | .ctry$age %in% 5:50) + ?})
result
$Bolivia ? ? ? ? ? ?age country ? ? ? mu ? ? ?sigma 1 ? ?0.00000000 Bolivia 11.42168 0.10148719 2 ? ?0.08333333 Bolivia 11.33625 0.10538375 3 ? ?0.16666667 Bolivia 11.28417 0.10705943 4 ? ?0.25000000 Bolivia 11.21125 0.10838720 5 ? ?0.33333333 Bolivia 11.11637 0.10953050 ... 59 ? 4.83333333 Bolivia 10.49080 0.10671819 60 ? 4.91666667 Bolivia 10.48562 0.10653400 109 ?9.00000000 Bolivia 10.43279 0.10180158 133 11.00000000 Bolivia 10.33394 0.10160484 169 14.00000000 Bolivia 10.24878 0.09946659 193 16.00000000 Bolivia 10.20148 0.09694376 205 17.00000000 Bolivia 10.16589 0.09573946 $Brazil ? ? ? ? ? ? age country ? ? ? mu ? ? ?sigma 602 ? 0.00000000 ?Brazil 11.54888 0.10839417 603 ? 0.08333333 ?Brazil 11.46345 0.11255592 604 ? 0.16666667 ?Brazil 11.41137 0.11434565 605 ? 0.25000000 ?Brazil 11.33844 0.11576378 ... 660 ? 4.83333333 ?Brazil 10.61799 0.11398118 661 ? 4.91666667 ?Brazil 10.61281 0.11378445 710 ? 9.00000000 ?Brazil 10.55999 0.10872996 734 ?11.00000000 ?Brazil 10.46113 0.10851983 770 ?14.00000000 ?Brazil 10.37597 0.10623606 794 ?16.00000000 ?Brazil 10.32867 0.10354153 -----Original Message----- From: jim holtman [mailto:jholtman at gmail.com] Sent: 04 November 2009 03:12 To: Hayes, Daniel Cc: r-help at lists.R-project.org Subject: Re: [R] creating mulptiple new variables from one data.frame according to columns and rows in that frame try this:
x <- read.table(textConnection(" ? ? ? ? ?Age(yrs) country ? ? ? mu ? ? sigma
+ 1 ? 0.00000000 ? Bolivia 11.42168 0.1014872 + 2 ? 0.08333333 ? Bolivia 11.33625 0.1053837 + 3 ? 0.16666667 ? Bolivia 11.28417 0.1070594 + 4 ? 0.25000000 ? Bolivia 11.21125 0.1083872 + 5 ? 0.33333333 ? Bolivia 11.11637 0.1095305 + 5.1 ? 5 ?Bolivia 11.11637 0.1095305 + 5.2 ? 5.5 ? Bolivia 11.11637 0.1095305 + 5.3 ? 6 ? Bolivia 11.11637 0.1095305 + 5.4 ? 20 ? Bolivia 11.11637 0.1095305 + 5.5 ? 20.1 ? Bolivia 11.11637 0.1095305 + 5.6 ? 50 ? Bolivia 11.11637 0.1095305 + 602 ?0.00000000 ?Brazil 11.54888 0.10839417 + 603 ?0.08333333 ?Brazil 11.46345 0.11255592 + 604 ?0.16666667 ?Brazil 11.41137 0.11434565 + 605 ?0.25000000 ?Brazil 11.33844 0.11576378 + 606 ?0.33333333 ?Brazil 11.24357 0.11698489"), header=TRUE)
closeAllConnections()
result <- lapply(split(x, x$country), function(.ctry){
+ ? ? # keep all < 5 and only integers over 5 + ? ? subset(.ctry, .ctry$Age.yrs. < 5 | .ctry$Age.yrs. %in% 5:50) + })
result
$Bolivia ? ? ? Age.yrs. country ? ? ? mu ? ? sigma 1 ? ?0.00000000 Bolivia 11.42168 0.1014872 2 ? ?0.08333333 Bolivia 11.33625 0.1053837 3 ? ?0.16666667 Bolivia 11.28417 0.1070594 4 ? ?0.25000000 Bolivia 11.21125 0.1083872 5 ? ?0.33333333 Bolivia 11.11637 0.1095305 5.1 ?5.00000000 Bolivia 11.11637 0.1095305 5.3 ?6.00000000 Bolivia 11.11637 0.1095305 5.4 20.00000000 Bolivia 11.11637 0.1095305 5.6 50.00000000 Bolivia 11.11637 0.1095305 $Brazil ? ? ?Age.yrs. country ? ? ? mu ? ? sigma 602 0.00000000 ?Brazil 11.54888 0.1083942 603 0.08333333 ?Brazil 11.46345 0.1125559 604 0.16666667 ?Brazil 11.41137 0.1143456 605 0.25000000 ?Brazil 11.33844 0.1157638 606 0.33333333 ?Brazil 11.24357 0.1169849 On Tue, Nov 3, 2009 at 9:31 AM, Hayes, Daniel <D.J.Hayes at liverpool.ac.uk> wrote:
Dear R-helpers, I have a data.frame (bcpe.lat.m) containing 13 countries, ages 0-50yrs per month, and the corresponding mu&sigma (see below). * ? ? ? ?I would like to limit the age range to include all 12 months for the 1st 5 years and only whole years for all ages thereafter for each of the countries present in the data frame. * ? ? ? ?I would like to create separate data.frames according to the country the data is from (Bolivia.bcpe.lat.m, brazil.bcpe.lat.m, etc) I have tried using: ?c(seq(0,5,1/12),seq(5,50,1) ) ?to select the desired ages but am unsure how to repeat that sequence for consecutive countries. I have tried using: split(bcpe.lat.m, bcpe.lat.m$country) But end up with a string which I am no longer to select the specific ages I want and all the data still remains in one ?variable Have also looked a 'by', 'apply' and things like 'for (i in 1:13)' Help with either or both steps would be greatly appreciated. Greetings from Formentera, Daniel ? ? ? ? ? Age(yrs) country ? ? ? mu ? ? sigma 1 ? 0.00000000 ? Bolivia 11.42168 0.1014872 2 ? 0.08333333 ? Bolivia 11.33625 0.1053837 3 ? 0.16666667 ? Bolivia 11.28417 0.1070594 4 ? 0.25000000 ? Bolivia 11.21125 0.1083872 5 ? 0.33333333 ? Bolivia 11.11637 0.1095305 ... 602 ?0.00000000 ?Brazil 11.54888 0.10839417 603 ?0.08333333 ?Brazil 11.46345 0.11255592 604 ?0.16666667 ?Brazil 11.41137 0.11434565 605 ?0.25000000 ?Brazil 11.33844 0.11576378 606 ?0.33333333 ?Brazil 11.24357 0.11698489 ... ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?