removing specified length of text after a period in dataframe of char's
Hi Sarah, this is a neat solution. Thanks very much for your help, and your patience with my poorly posed questions. I've learned a lot from your approach. best regards, Aidan
On Wed, Dec 7, 2011 at 1:40 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
Hi, If you really wanted precision (significant figures) rather than decimal places, it would be easy: format() handles that, I believe. Your original email said you'd been reading about regular expressions; continuing that reading will lead you to the meaning of the cryptic ^ and all the \. As for the final ., you're right: I didn't think about having nothing following the decimal place. It's much easier to do in two steps:
testdata <- data.frame(values=c("10,000.0", "5.321", "1.1"), digits=c(0, 1, 2))
intermediate <- apply(testdata, 1, function(x)sub(paste("(^.*\\.\\d{", x[2], "})(\\d*)", sep=""), "\\1", x[1]))
intermediate
[1] "10,000." "5.3" ? ? "1.1"
sub("\\.$", "", intermediate)
[1] "10,000" "5.3" ? ?"1.1" Sarah On Wed, Dec 7, 2011 at 8:20 AM, Aidan Corcoran <aidan.corcoran11 at gmail.com> wrote:
Hi Sarah,
apologies for the excess. A smaller example:
f<-structure(list(c("GDP per capita (LCU)", "Ratio to EZ GDP Per Cap"
), `2005` = c(32128, 0.1), `2009` = c(52163, 0.1), `2010` = c(63100,
0.1), `2011` = c(72461, 0.1), `2012` = c(81313, 0.1)), .Names = c("",
"2005", "2009", "2010", "2011", "2012"), row.names = 3:4, class = c("cast_df",
"data.frame"))
nam2<-
structure(list(var1 = c("GDP per capita (LCU)", "Ratio to EZ GDP Per Cap"
), digi = c(0, 1)), .Names = c("var1", "digi"), row.names = c("98",
"110"), class = "data.frame")
I'm trying to place a thousand separator in the numbers in the table f:
f
? ? ? ? ? ? ? ? ? ? ? ? ? ? 2005 ? ?2009 ? ?2010 ? ?2011 ? ?2012 3 ? ?GDP per capita (LCU) 32128.0 52163.0 63100.0 72461.0 81313.0 4 Ratio to EZ GDP Per Cap ? ? 0.1 ? ? 0.1 ? ? 0.1 ? ? 0.1 ? ? 0.1 and also have precision given by variable digi:
nam2
? ? ? ? ? ? ? ? ? ? ? var1 digi 98 ? ? GDP per capita (LCU) ? ?0 110 Ratio to EZ GDP Per Cap ? ?1 format ?hi<-format(f,big.mark=",",scientific=F) gives me the comma, but now I'm not sure how to get the precision. Your answer seems to be doing what I want, although when I changed the testdata slightly
testdata[1,1]<-10000 ? hi<-format(testdata,big.mark=",",scientific=F) hi
? ?values digits 1 10,000.0 ? ? ?0 2 ? ? ?5.3 ? ? ?1 3 ? ? ?1.1 ? ? ?2
apply(hi, 1, function(x)sub(paste("(^.*\\.\\d{", x[2], "})(\\d*)", sep=""), "\\1", x[1]))
? ? ? ? 1 ? ? ? ? ?2 ? ? ? ? ?3 ?"10,000." " ? ? 5.3" " ? ? 1.1" The decimal appears to be left behind in 10,000. Unfortunately your approach is a bit too advanced for me, so I can't adapt it. Perhaps you could recommend somewhere where I could read up on what the caret and other symbols mean in your paste call? thanks for your help! Aidan On Wed, Dec 7, 2011 at 12:05 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
Hi, Example data is crucial, but small simple example data is even better. I'm too lazy to figure out which bits I need from your data, so here's a simple example of one way to approach your question. You could use gsub() in very much the same manner if you need more complex output.
testdata <- data.frame(values=c(2.0, 5.3, 1.1), digits=c(0, 1, 2)) testdata
?values digits 1 ? ?2.0 ? ? ?0 2 ? ?5.3 ? ? ?1 3 ? ?1.1 ? ? ?2 # a nice way that works on numbers
apply(testdata, 1, function(x)sprintf(paste("%0.", x[2], "f", sep=""), x[1]))
[1] "2" ? ?"5.3" ?"1.10" # a messy way that works on strings
apply(testdata, 1, function(x)sub(paste("(^.*\\.\\d{", x[2], "})(\\d*)", sep=""), "\\1", x[1]))
[1] "2" ? "5.3" "1.1" Also note that the second method will not add zeros to pad out the end. If you need that, I'd consider rearranging the order of your steps so that you can use sprintf(). Someone else might have a more flexible way too; I'd be interested to see it. Unfortunately I don't think sprintf() has a way to insert a thousands separator, or that would be a one-step solution. Sarah On Wed, Dec 7, 2011 at 6:05 AM, Aidan Corcoran <aidan.corcoran11 at gmail.com> wrote:
?Dear all,
?I'm trying to remove some text after the period (a decimal point) in
the data frame 'hi', below. This is one step in formatting a table. So
I would like e.g.
"2.0" to become "2"
and "5.3" to be "5.3",
where the variable digordered contains the number of digits after the
decimal that I would like to display, in the same order in which the
variables appear in hi. If it makes it easier to use, this info is
also contained in the dataframe nam2. The reason the numbers are
recorded as characters is because I used format to get a thousand
separator, which I also need.
The string manipulation functions in R generally don't seem to work
with matrices or data frames, so e.g. ? regexpr("\\.", ?hi[1,2]) works
but not regexpr("\\.", hi). Finding the location of the period and
then using substring was the approach I was thinking of taking, but
this would seem to need for loops here. I was wondering if anyone
knows any easier ways.
Thanks very much for any help!
Aidan
digordered<- ?c(0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1)
f<-structure(list(c("GDP (LCU,bn)", "GDP ($, bn)", "GDP per capita (LCU)",
"Ratio to EZ GDP Per Cap", "Share of World GDP (Intl $, %)",
"Real GDP Growth (%)", "Population (mn)", "Unemployment Rate (%)",
"Ratio of Employed/Unemployed", "PPP Exchange Rate", "Nominal Exchange
Rate (LCU per $)",
"Inflation (%)", "Main Lending Rate to Private Sector (%)", "Claims on
Central Gov",
"Claims on Private Sector", "Bank Assets", "Regulator Capital to RWA",
"Tier 1 Capital to RWA", "Return on Equity", "Liquid Assets to ST Liabilities"
), `2005` = c(35662, 809, 32128, 0.1, 4.3, 9, 1110, 3.5, NA,
14.7, 44.1, 4, 10.8, 7, 15, 22835, NA, NA, NA, NA), `2009` = c(61240,
1265, 52163, 0.1, 5.2, 6.8, 1174, NA, NA, 16.8, 48.4, 10.9, 12.2,
14, 31, 47180, 13.6, 9, 10.8, 42.8), `2010` = c(75122, 1632,
63100, 0.1, 5.5, 10.1, 1191, NA, NA, 18.5, 45.7, 12, NA, 15,
39, 56787, 14.7, 9.9, 10.5, 41.1), `2011` = c(87455, 1843, 72461,
0.1, 5.7, 7.8, 1207, NA, NA, 19.6, NA, 10.6, NA, NA, NA, NA,
13.5, 9.3, 14.3, 35.8), `2012` = c(99459, 2013, 81313, 0.1, 5.9,
7.5, 1223, NA, NA, 20.5, NA, 8.6, NA, NA, NA, NA, NA, NA, NA,
NA)), .Names = c("", "2005", "2009", "2010", "2011", "2012"), row.names = c(NA,
20L), class = c("cast_df", "data.frame"))
?hi<-format(f,big.mark=",",scientific=F)
?regexpr("\\.", ?hi) #don't know to get location of "." in a dataframe of chars
nam2<- ?structure(list(var1 = c("GDP (LCU,bn)", "GDP ($, bn)", "GDP
per capita (LCU)",
"Ratio to EZ GDP Per Cap", "GDP per capita (Intl $)", "EU GDP per
capita (Intl $)",
"Share of World GDP (Intl $, %)", "Real GDP Growth (%)", "Population (mn)",
"Unemployment Rate (%)", "Ratio of Employed/Unemployed", "Employment (1000s)",
"Unemployment (1000s)", "PPP Exchange Rate", "Nominal Exchange Rate
(LCU per $)",
"Inflation (%)", "Main Lending Rate to Private Sector (%)", "Claims on
Central Gov",
"Claims on Private Sector", "Bank Assets", "Regulator Capital to RWA",
"Tier 1 Capital to RWA", "Return on Equity", "Liquid Assets to ST Liabilities",
"Reserves"), digi = c(0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0,
1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0)), .Names = c("var1", "digi"
), row.names = c("96", "97", "98", "110", "99", "100", "101",
"102", "103", "111", "112", "104", "105", "106", "107", "108",
"109", "114", "115", "113", "119", "120", "121", "122", "116"
), class = "data.frame")
________________________