Replacing values
On Mon, 2006-12-18 at 10:58 -0800, downunder wrote:
Hi all, I have to recode some values in a dataset. for example changing all zeros to "." or 999 would be also ok. does anybody know how to do this? thanks in advance. lars
R has its own missing value designator, which is NA. A "." or "999" would not be handled in a consistent fashion by most R functions, whereas NA would be. As you will note below, "." would be rejected in numerical operations. For example (see ?mean):
mean(c(1, 2, 3, 0))
[1] 1.5
mean(c(1, 2, 3, NA))
[1] NA
mean(c(1, 2, 3, NA), na.rm = TRUE)
[1] 2
mean(c(1, 2, 3, .), na.rm = TRUE)
Error in mean(c(1, 2, 3, .), na.rm = TRUE) : object "." not found
mean(c(1, 2, 3, 999), na.rm = TRUE)
[1] 251.25 See ?NA and ?is.na and take note of the assignment usage in the latter. To provide some examples: 1. Vector
Vec <- sample(0:5, 10, replace = TRUE) Vec
[1] 5 3 4 5 1 4 4 0 1 0
is.na(Vec) <- Vec == 0 Vec
[1] 5 3 4 5 1 4 4 NA 1 NA 2. Matrix
Mat <- matrix(sample(0:5, 20, replace = TRUE), ncol = 4) Mat
[,1] [,2] [,3] [,4] [1,] 4 4 1 4 [2,] 3 1 1 3 [3,] 3 0 1 0 [4,] 2 2 0 5 [5,] 4 0 5 1
is.na(Mat) <- Mat == 0
Mat
[,1] [,2] [,3] [,4] [1,] 4 4 1 4 [2,] 3 1 1 3 [3,] 3 NA 1 NA [4,] 2 2 NA 5 [5,] 4 NA 5 1 3. Dataframe
iris.tmp <- iris[1:10, ] iris.tmp
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa 7 4.6 3.4 1.4 0.3 setosa 8 5.0 3.4 1.5 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 4.9 3.1 1.5 0.1 setosa
iris.tmp$Sepal.Length[sample(10, 3)] <- 0 iris.tmp$Sepal.Width[sample(10, 3)] <- 0 iris.tmp$Petal.Length[sample(10, 3)] <- 0 iris.tmp$Petal.Width[sample(10, 3)] <- 0
iris.tmp
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 0.0 0.0 0.2 setosa 2 4.9 0.0 1.4 0.2 setosa 3 4.7 0.0 1.3 0.0 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.0 setosa 6 5.4 3.9 0.0 0.0 setosa 7 0.0 3.4 1.4 0.3 setosa 8 0.0 3.4 0.0 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 0.0 3.1 1.5 0.1 setosa
is.na(iris.tmp) <- iris.tmp == 0
iris.tmp
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 NA NA 0.2 setosa 2 4.9 NA 1.4 0.2 setosa 3 4.7 NA 1.3 NA setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 NA setosa 6 5.4 3.9 NA NA setosa 7 NA 3.4 1.4 0.3 setosa 8 NA 3.4 NA 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 NA 3.1 1.5 0.1 setosa
summary(iris.tmp)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.400 Min. :2.900 Min. :1.300 Min. :0.1
1st Qu.:4.650 1st Qu.:3.100 1st Qu.:1.400 1st Qu.:0.2
Median :4.900 Median :3.400 Median :1.400 Median :0.2
Mean :4.871 Mean :3.343 Mean :1.414 Mean :0.2
3rd Qu.:5.050 3rd Qu.:3.500 3rd Qu.:1.450 3rd Qu.:0.2
Max. :5.400 Max. :3.900 Max. :1.500 Max. :0.3
NA's :3.000 NA's :3.000 NA's :3.000 NA's :3.0
Species
setosa :10
versicolor: 0
virginica : 0
If you want a more generic approach to replacing values based upon
logical conditions, there is also the replace() function:
iris.tmp$Sepal.Length <- with(iris.tmp,
replace(Sepal.Length,
Sepal.Length > 5.0, 999))
iris.tmp
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 999.0 NA NA 0.2 setosa 2 4.9 NA 1.4 0.2 setosa 3 4.7 NA 1.3 NA setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 NA setosa 6 999.0 3.9 NA NA setosa 7 NA 3.4 1.4 0.3 setosa 8 NA 3.4 NA 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 NA 3.1 1.5 0.1 setosa See ?replace for more information and note that the assignment does not happen "in place", you need to assign the result. Finally, if you are reading in data sets from ASCII files using one of the read.table() family of functions, take note of the 'na.strings' argument, which will define the incoming values that you want to explicitly set to missing (NA) during the import process. See ?read.table for more information. HTH, Marc Schwartz