Importing data using Foreign - R-help

Wed, Aug 26, 2020 1:57 AM #

Hi all,

I have a simple problem. I get stuck in using the imported spss data (.sav)
using "read.spss".
I imported data (z) without any problem. After importing, the first column
doesn't contain any "NA". but when I choose a subset of it (like:
z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the
first column).

The (.sav) file is the output of Compustat (WRDS).

It is terrible, I can't find the mistake.

Thank you in advance for your help,
Elham

Eric Berger

Wed, Aug 26, 2020 6:31 AM #

Hi Elham,
You are not giving us much to go on here.
Show us the commands that (a) confirm there are no NA's in the first column
of z
and (b) output a row of z that has an NA in the first column.
Here's how one might do this:
(a) sum(is.na(z[,1]))
(b) z[ match(TRUE, z[,8] %in% c("11","12","14")), ]

Eric

On Wed, Aug 26, 2020 at 3:56 PM Elham Daadmehr <e.daadmehr at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Elham Daadmehr

Wed, Aug 26, 2020 7:27 AM #

Thanks for your reply.

You're right, here is what I did:

folder/2014/1.sav", to.data.frame=TRUE)

Warning message:

In read.spss("/Users/e.daadmehr/Desktop/Term/LastLast/untitled
folder/2014/1.sav",  :

  /Users/e.daadmehr/Desktop/Term/LastLast/untitled folder/2014/1.sav:
Compression bias (0) is not the usual value of 100

[1] TRUE

[1] TRUE

[1] 0

[1] 399


my file is not compressed.


Thank you in advance,

Elham

On Wed, Aug 26, 2020 at 3:31 PM Eric Berger <ericjberger at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Peter Dalgaard

Wed, Aug 26, 2020 7:51 AM #

Offhand, I suspect that the NAs are in the 8th column.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Eric Berger

Wed, Aug 26, 2020 7:57 AM #

Good point! :-)

On Wed, Aug 26, 2020 at 5:55 PM peter dalgaard <pdalgd at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Elham Daadmehr

Wed, Aug 26, 2020 8:06 AM #

Thanks guys. but I'm a bit confused. the input is the first column (z[,1]
and z1[,1]).
How is it possible that a subset of a non-NA vector, contains NA?

On Wed, Aug 26, 2020 at 4:58 PM Eric Berger <ericjberger at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Eric Berger

Wed, Aug 26, 2020 8:09 AM #

c(1:3)[c(1,NA,3)]
[1] 1 NA 3

On Wed, Aug 26, 2020 at 6:06 PM Elham Daadmehr <e.daadmehr at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Peter Dalgaard

Wed, Aug 26, 2020 9:03 AM #

It is because you don't know whether you want it or not. 

It is a bit more obvious with integer indexing, as in color[race]: if race is NA you don't know what color to put in, but the result should be the same length as race. 

With logical indices, the behaviour is a bit annoying, but ultimately follows from the coercion rules: You might think that you could treat NA as FALSE (& the subset() function does just that), but then you'd get the problem that x[NA] would differ from x[as.integer(NA)] because NA is of mode "logical", lowest in the coercion hierarchy.

-pd

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Elham Daadmehr

Wed, Aug 26, 2020 9:16 AM #

Thanks a lot. I?ve got it just now.

On Wed, Aug 26, 2020 at 6:03 PM peter dalgaard <pdalgd at gmail.com> wrote:

It is because you don't know whether you want it or not.

It is a bit more obvious with integer indexing, as in color[race]: if race
is NA you don't know what color to put in, but the result should be the
same length as race.

With logical indices, the behaviour is a bit annoying, but ultimately
follows from the coercion rules: You might think that you could treat NA as
FALSE (& the subset() function does just that), but then you'd get the
problem that x[NA] would differ from x[as.integer(NA)] because NA is of
mode "logical", lowest in the coercion hierarchy.

-pd

On 26 Aug 2020, at 17:06 , Elham Daadmehr <e.daadmehr at gmail.com> wrote:

Thanks guys. but I'm a bit confused. the input is the first column

(z[,1] and z1[,1]).

How is it possible that a subset of a non-NA vector, contains NA?

On Wed, Aug 26, 2020 at 4:58 PM Eric Berger <ericjberger at gmail.com>

wrote:

Good point! :-)


On Wed, Aug 26, 2020 at 5:55 PM peter dalgaard <pdalgd at gmail.com> wrote:
Offhand, I suspect that the NAs are in the 8th column.

On 26 Aug 2020, at 10:57 , Elham Daadmehr <e.daadmehr at gmail.com>

wrote:

Hi all,

I have a simple problem. I get stuck in using the imported spss data

(.sav)

using "read.spss".
I imported data (z) without any problem. After importing, the first

column

doesn't contain any "NA". but when I choose a subset of it (like:
z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in

the

first column).

The (.sav) file is the output of Compustat (WRDS).

It is terrible, I can't find the mistake.

Thank you in advance for your help,
Elham

      [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com