Skip to content

= vs. ==?

7 messages · Linn, Gábor Csárdi, Pedro de Barros +3 more

#
I'm sure you'll get a friendlier answer, but...  see

?"="
?"=="
Introduction to R

G.
On Tue, Apr 15, 2008 at 05:28:53AM -0700, Linn wrote:

  
    
#
Hi.

= means assignment (like a=2, may be used instead of a <- 2, although 
I prefer to always use <-); It is also used to pass values to 
arguments in named argument lists, or to set default argument values).

== is the boolean (logical) operator for testing whether two values are equal
e.g. a <- 2
b <- 2

then a == b is TRUE

Hope this helps,
Pedro
At 13:28 2008/04/15, Linn wrote:

            
#
On 15-Apr-08 12:28:53, Linn wrote:
While these are indeed documented in ?"=" and ?"==", as
Gabor Csardi has pointed out, these particular help pages
(especially ?"=") devote so much attention to deep issues
in the implementation of R that they are unlikely to give
much to a newcomer to R. (Though ?"==" is not too bad).

Putting it simply:

"==" is a comparison operator. If 'x' and 'y' are two
variables of the same type, then "x==y" has value TRUE
if 'x' and 'y' have the same value.

There are a couple of traps here, which even beginners
should take care to be aware of.

One is that "NA" is not a value. Its logical status is,
in effect, "value not known". Therefore, when 'y' is "NA",
 "x==y" cannot have a definite resolution, since it is
possible for the unkown value of 'y' to be equal to the
value of 'x'; and equally possible that it may not be.
Hence the value of "x==y" is itself "NA". Similarly
the value of "x==y" is "NA" when both of 'x' and 'y'
are "NA". The function to use for testing whether (say)
'x' is "NA" is is.na(x).

The other is that the comparison of two floating-point
numbers which (mathematically) should be equal may be
FALSE, since their internal binary representations may
be different. Floating-point arithmetic in fixed-precision
computers is almost always approximate (though, in R,
to a very close degree of approximation). Thus, for instance,

  x <- sqrt(2)
  x^2 == 2
# [1] FALSE

and the reason for this is

  2 - sqrt(2)^2
# [1] -4.440892e-16

But, as pointed out in ?"==", a better test for this kind
of "equality" is the function all.equal():

  all.equal(x^2,2)
# [1] TRUE

since all.equal(x,y) considers x and y to be "equal" if
the numerical values corresponding to their representations
do not differ by more than a certain "tolerance" which
has a default value, but can be changed by the user.

So much for "==". Where "=" is concerned, it functions
rather like an assignment, but with complications. All that
incomprehensible stuff in ?"=" has to do with the complications.

In R, use "<-" rather than "=" for assigning a value to
a variable. Using "=" may often work, but sometimes it
won't, for deeply tangled reasons! As in "x <- sqrt(2)"
above, rather than "x = sqrt(2)" -- though in this case
that works as you would expect:

  y=sqrt(2)
  x==y
# [1] TRUE

In programming in R, it is a useful rule of thumb to
think "use something I know will work" rather than
thinking "use something which will work unless ... ";
unless, of course, you know all about those "..."!

Where you will routinely use "=" is in naming elements
of lists and dataframes, and in assigning values to named
parameters in functions.

Thus, if you already have vectors X and Y and you want
to make a dataframe in which you want X to play the role
of the "independent variable" in a subsequent regression,
and Y the role of the "dependent variable", then you
could write

 MyData <- data.frame(Indep=X, Depend=Y)

and then, later, execute the linear modelling function lm()
in the form:

 lm(Depend ~ Indep, data=MyData)

This executes lm() using what it finds in "Data" with
name "Depend" as the dependent variable, and what it
finds in "Data" with name "Indep" as the independent
variable.

This lm() call, in turn, illustrates the other typical
use of "=" in assigning a value to a parameter in a
function call, since the lm() function has a paramater
called "data", and "data=MyData" then tells it which
dataframe to use as the parameter "data" in this call
of lm().

Not that you necessarily *have* to do it that way,
of course, since often you may simply write

  lm(Y ~ X)

without reference to a dataframe, just referring to
variables you happen to have around at the time. But
the lm(...,data=...) form is useful in two kinds of
context: one, where the data come to you as a dataframe
in the first place, and it then saves you explicitly
extracting the variables from the dataframe; the other,
where the call (e.g., as above)

  lm(Depend ~ Indep, data=MyData)

is in some "generic" part of your code, and you do not
want to change it. Then it makes sense to change the
contents of "MyData", but keeping the names "Depend"
and "Indep", so that whatever you actually put in as
X and Y will be used in the same way.

Hoping this helps!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Apr-08                                       Time: 14:36:30
------------------------------ XFMail ------------------------------
#
Linn wrote:
"=" is the assignment operator as in

 > x = 3
 > x
[1] 3

(but use the "<-" operator instead, see a post by Bill Venables a few 
days ago called ``What to use for assignment, " = " or " <- "?'')

whereas "==" is the binary operator for testing equality, as in

 > x == 3
[1] TRUE

Have a look at any introductory R book or at the R intro at 
http://cran.r-project.org/doc/manuals/R-intro.html
#
Ted Harding wrote:
Just as an off-topic tangent, I found it quite interesting
that real-world language Aymara (see ...
http://en.wikipedia.org/wiki/Aymara_language
...) uses this three value logic system (I think the
computer jargon is "trollean logic").

Alberto Monteiro
#
On 16-Apr-08 11:13:40, Alberto Monteiro wrote:
Equally interesting (and equally OT) to us as people concerned
with information and evidence is that some real-world langauges
have a specific provision for expressing the evidential status
of the information they express. It seems that this has only
relatively recently been formalised by linguists under the
heading of Evidentiality.

For example, Turkish has the suffix [transcribed] mish/mush
to indicate whether what is stated is known directly or
indirectly.

Thus, at the Department Meeting we are all waiting for
X to turn up, so that we can start. Finally, I phone his
home. His wife answers. I ask about X. She says "Geliyor"
[He is coming]. She knows this because she has just seen
him leave the house.

I turn round to report this: I say "Geliyormush", because
I don't know it. What I do know is that I have been told it.

For more: http://en.wikipedia.org/wiki/Evidentiality

In the same vein, you might tell me what I regard as a
tall tale. I could respond "Mush!", with meaning on the
lines (in escalating order)of: "So you say!", "Pull the
other one!", "Come off it!", ...

Best wishes to all,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Apr-08                                       Time: 15:38:24
------------------------------ XFMail ------------------------------