Problems with reading data by readWorksheetFromFile of XLConnect Package
On May 2, 2013, at 11:00 PM, jpm miao wrote:
Hi Anthony, Thank you very much. It works very well. However, after this line
temp <- sapply( temp , as.numeric )
the data becomes a series of numbers instead of a matrix. Is there any way to keep it a matrix?
Perhaps (assuming this were a data.frame to be coerced: temp <- matrix( sapply( temp , as.numeric ), dim(temp)[1]) But the persistence of the "-"'s is puzzling. You should (as always) have posted the output from dput(temp). Thanks,
Miao
temp<-readWorksheetFromFile("130502temp.xlsx", sheet=1, header=FALSE,
startRow=2, endRow= 11, startCol=2, endCol=5)
temp <- sapply( temp , function( x ) gsub( ',' , '' , x ) ) temp
Col1 Col2 Col3 Col4 [1,] "647853" "1413" "57662" "27897" [2,] "491400" "1365" "40919" "20411" [3,] "38604" "-" "5505" "985" [4,] "576" "-" "20" "54" [5,] "80845" "21" "10211" "4494" [6,] "36428" "27" "1007" "1953" [7,] "269915" "587" "32988" "12779" [8,] "224494" "-" "30554" "9184" [9,] "11858" "587" "-" "686" [10,] "3742" "-" "81" "415"
temp <- sapply( temp , as.numeric )
Warning messages: 1: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion 2: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion 3: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion 4: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion 5: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion
temp
647853 491400 38604 576 80845 36428 269915
647853 491400 38604 576 80845 36428 269915
224494 11858 3742 1413 1365 - -
224494 11858 3742 1413 1365 NA NA
21 27 587 - 587 - 57662
21 27 587 NA 587 NA 57662
40919 5505 20 10211 1007 32988 30554
40919 5505 20 10211 1007 32988 30554
- 81 27897 20411 985 54 4494
NA 81 27897 20411 985 54 4494
1953 12779 9184 686 415
1953 12779 9184 686 415
temp[ is.na( temp ) ] <- 0 temp
647853 491400 38604 576 80845 36428 269915
647853 491400 38604 576 80845 36428 269915
224494 11858 3742 1413 1365 - -
224494 11858 3742 1413 1365 0 0
21 27 587 - 587 - 57662
21 27 587 0 587 0 57662
40919 5505 20 10211 1007 32988 30554
40919 5505 20 10211 1007 32988 30554
- 81 27897 20411 985 54 4494
0 81 27897 20411 985 54 4494
1953 12779 9184 686 415
1953 12779 9184 686 415
2013/5/2 Anthony Damico <ajdamico at gmail.com>
try adding colTypes = 'numeric' to your readWorkSheetFromFile() call if that doesn't work, try a few other steps # view what data types your file is being read in as sapply( temp , class ) # convert all fields to character if they're factor variables.. but i don't think you need this, readWorksheet defaults to `character` temp <- sapply( temp , as.character ) # you can also convert a subset like this temp[ , c( 1 , 3:4 ) ] <- sapply( temp[ , c( 1 , 3:4 ) ] , as.character ) # remove commas from character strings temp <- sapply( temp , function( x ) gsub( ',' , '' , x ) ) # convert all fields to numeric temp <- sapply( temp , as.numeric ) # convert all NA fields to zeroes if you prefer temp[ is.na( temp ) ] <- 0 On Wed, May 1, 2013 at 11:55 PM, jpm miao <miaojpm at gmail.com> wrote:
Hi, Attached are two datasheet to be read. My raw data "130502temp.xlsx" contains numbers with ' symbols, and they can't be read as numbers. Even if I copy and paste as numbers to form a new file "130502temp_number1.xlsx", they could not be read smoothly. 1. How can I read the datasheet as numbers? 2. How can I treat the notation "-" as (1) "NA" or (2) zero? Thanks, Miao
temp<-readWorksheetFromFile("130502temp.xlsx", sheet=1, header=FALSE,
startRow=2, endRow= 11, startCol=2, endCol=5)
temp
Col1 Col2 Col3 Col4
1 647,853 1,413 57,662 27,897
2 491,400 1,365 40,919 20,411
3 38,604 - 5,505 985
4 576 - 20 54
5 80,845 21 10,211 4,494
6 36,428 27 1,007 1,953
7 269,915 587 32,988 12,779
8 224,494 - 30,554 9,184
9 11,858 587 - 686
10 3,742 - 81 415
temp[2,2]
[1] "1,365"
temp[2,2]+3
Error in temp[2, 2] + 3 : non-numeric argument to binary operator
temp_num<-readWorksheetFromFile("130502temp_number1.xlsx", sheet=1,
header=FALSE, startRow=2, endRow= 11, startCol=2, endCol=5)
temp_num[2,2]
[1] "1,365"
temp_num[2,2]+3
Error in temp_num[2, 2] + 3 : non-numeric argument to binary operator
as.numeric(temp_num[2,2])+3
[1] NA Warning message: NAs introduced by coercion
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA