Skip to content

R ignores number only with a nine under 10000

6 messages · R. Michael Weylandt, Jeff Newmiller, Dennis Murphy +2 more

set
#
Hello R users,

I'm trying to replace numerical values in a datamatrix with strings. R does
this except for numbers under 10000 starting with a 9 (eg 98, 970, 9504
etc). This is really weird and I wondered whether someone had encountered
such a problem or knows the solution. I'm using the next script:

test_1 <- read.table("5+ref_151111clusters3.csv", header = TRUE, sep = ",",
colClasses = "numeric")
test_1[test_1 > 94885 & test_1 <= 113835] = "KE3926OT"
test_1[test_1 != 0 & test_1 <= 18954] = "I8456"
test_1[test_1 > 75944 & test_1 <= 94885] = "KE3873"
test_1[test_1 > 56951 & test_1 <= 75944] = "KE3870"
test_1[test_1 > 37991 & test_1 <= 56951] = "Cyprus1"
test_1[test_1 > 18954 & test_1 <= 37991] = "ref"
write.table(test_1, file = "test_replace7.txt", quote = FALSE, sep="\t") 

Thanks,
Set

--
View this message in context: http://r.789695.n4.nabble.com/R-ignores-number-only-with-a-nine-under-10000-tp4091936p4091936.html
Sent from the R help mailing list archive at Nabble.com.
#
This can't be reproduced without data -- kindly supply the result of
test_1 right after the first line using dput() if you would.

Michael
On Mon, Nov 21, 2011 at 10:42 AM, set <astareh at hotmail.com> wrote:
#
1) "datamatrix" is not a defined term. I think you mean "data.frame".

2) you have not supplied any sample data, so your example is not reproducible.

3) All of the values in a vector (i.e. a column of a data.table must be of the same type, be that character or numeric (or anything else, such as factor). We cannot tell what data you have in your file, but if you are already trying to mix numeric and strings then the data are probably being imported as factors which act like numbers in some cases and as strings in others. You might need to look at the arguments for read.table to turn off conversion to factor.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
set <astareh at hotmail.com> wrote:

            
#
Hi:

Strictly a guess, but the following might be helpful. The call below
assumes that the referent data frame is test1, which consists of a
single column named x. Modify as appropriate.

test_lab <- with(test1, cut(x, c(0, 18954, 37791, 56951, 75944, 84885, 113835),
                 labels = c('I8456', 'ref', 'Cyprus1', 'KE3870', 'KE3873',
                            'KE3926OT')))

cut() creates a factor from a numeric variable. The second argument
consists of the cut points and the third argument generates the labels
to be associated with values falling between the cut points. See ?cut
for more details, and pay attention to the options.

The object test_lab is a vector external to test1; if you want it to
be a column of test1, then add it to the data frame in one of the
usual ways.

HTH,
Dennis
On Mon, Nov 21, 2011 at 7:42 AM, set <astareh at hotmail.com> wrote:
#
On Mon, Nov 21, 2011 at 7:42 AM, set <astareh at hotmail.com> wrote:
I think others have already hinted at the problem, but here it is once
again more explicitly: your line
test_1[test_1 > 94885 & test_1 <= 113835] = "KE3926OT"

converts the entire test1 to character (or at least the columns in
which a replacement happens). When something is a character, you will
find "strange" results:

a = "109"
b = "9"

a<b
[1] TRUE

Note that when one side of a comparison is numeric and the other
character, the numeric is converted to character and then they are
compared:
[1] "character"
[1] "numeric"
[1] TRUE

This is why your entries starting with 9 are "ignored" - because as
character strings they are the largest.


The solution is simple: create a test2 initialized to test1:

test2 = test1

then replace elements in test2 depending on test1, for example

test_2[test_1 > 94885 & test_1 <= 113835] = "KE3926OT"

This way your test1 remains numeric and the comparisons will work as you expect.

HTH

Peter