Skip to content

Importing data using Foreign

9 messages · Elham Daadmehr, Eric Berger, Peter Dalgaard

#
Hi all,

I have a simple problem. I get stuck in using the imported spss data (.sav)
using "read.spss".
I imported data (z) without any problem. After importing, the first column
doesn't contain any "NA". but when I choose a subset of it (like:
z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the
first column).

The (.sav) file is the output of Compustat (WRDS).

It is terrible, I can't find the mistake.

Thank you in advance for your help,
Elham
#
Hi Elham,
You are not giving us much to go on here.
Show us the commands that (a) confirm there are no NA's in the first column
of z
and (b) output a row of z that has an NA in the first column.
Here's how one might do this:
(a) sum(is.na(z[,1]))
(b) z[ match(TRUE, z[,8] %in% c("11","12","14")), ]

Eric
On Wed, Aug 26, 2020 at 3:56 PM Elham Daadmehr <e.daadmehr at gmail.com> wrote:

            

  
  
#
Thanks for your reply.

You're right, here is what I did:
folder/2014/1.sav", to.data.frame=TRUE)

Warning message:

In read.spss("/Users/e.daadmehr/Desktop/Term/LastLast/untitled
folder/2014/1.sav",  :

  /Users/e.daadmehr/Desktop/Term/LastLast/untitled folder/2014/1.sav:
Compression bias (0) is not the usual value of 100
[1] TRUE
[1] TRUE
[1] 0
[1] 399


my file is not compressed.


Thank you in advance,

Elham
On Wed, Aug 26, 2020 at 3:31 PM Eric Berger <ericjberger at gmail.com> wrote:

            

  
  
#
Offhand, I suspect that the NAs are in the 8th column.

  
    
#
Good point! :-)
On Wed, Aug 26, 2020 at 5:55 PM peter dalgaard <pdalgd at gmail.com> wrote:

            

  
  
#
Thanks guys. but I'm a bit confused. the input is the first column (z[,1]
and z1[,1]).
How is it possible that a subset of a non-NA vector, contains NA?
On Wed, Aug 26, 2020 at 4:58 PM Eric Berger <ericjberger at gmail.com> wrote:

            

  
  
#
c(1:3)[c(1,NA,3)]
[1] 1 NA 3
On Wed, Aug 26, 2020 at 6:06 PM Elham Daadmehr <e.daadmehr at gmail.com> wrote:

            

  
  
#
It is because you don't know whether you want it or not. 

It is a bit more obvious with integer indexing, as in color[race]: if race is NA you don't know what color to put in, but the result should be the same length as race. 

With logical indices, the behaviour is a bit annoying, but ultimately follows from the coercion rules: You might think that you could treat NA as FALSE (& the subset() function does just that), but then you'd get the problem that x[NA] would differ from x[as.integer(NA)] because NA is of mode "logical", lowest in the coercion hierarchy.

-pd

  
    
#
Thanks a lot. I?ve got it just now.
On Wed, Aug 26, 2020 at 6:03 PM peter dalgaard <pdalgd at gmail.com> wrote: