Skip to content

List / Matrix to Data Frame

4 messages · Bill Dunlap, Avi Gross, Sparks, John

#
Hi R-Helpers,

I am taking it upon myself to delve into the world of lists for R.  In no small part because I appear to have discovered a source of data for an exceptionally good price but that delivers much of that data in json format.

So over the last day or so I managed to fight the list processing tools to a draw and get a list that has only selected elements (actually it ends up in matrix form).  But when I try to convert that to a data frame I can't get it to a form that is workable.

I have visited some pages about converting a matrix to a data frame but they result in highly redundant and inelegant data.

I am thinking that someone who works with lists and matrices knows how to do this quite easily and would be willing to provide a solution.

The reproducible example is shown below.  Just to be explicit, what I am trying to get to is something along the lines of a data frame like this.

Date              netIncome      Gross Profit
2020-09-30 57411000000 104956000000
2019-09-30 55256000000   98392000000

.....

The closest I get is a matrix that looks like this
2020-09-30        2019-09-30       2018-09-30        2017-09-30       2016-09-30       2015-09-30       2014-09-30
date        "2020-09-30"      "2019-09-30"     "2018-09-30"      "2017-09-30"     "2016-09-30"     "2015-09-30"     "2014-09-30"
netIncome   "57411000000.00"  "55256000000.00" "59531000000.00"  "48351000000.00" "45687000000.00" "53394000000.00" "39510000000.00"
grossProfit "104956000000.00" "98392000000.00" "101839000000.00" "88186000000.00" "84263000000.00" "93626000000.00" "70537000000.00"

Code for example

library(jsonlite)
test <- jsonlite::fromJSON("https://eodhistoricaldata.com/api/fundamentals/AAPL.US?api_token=OeAFFmMliFG5orCUuwAKQ8l4WWFQ67YX")

hist<-test[[13]]
ISY<-hist$Income_Statement$yearly
wanted<-sapply(ISY, "[", j = c("date","netIncome","grossProfit"))


Your guidance would be much appreciated.

--John J. Sparks, Ph.D.
#
Does this do what you want?
lapply(c(Date="date",netIncome="netIncome",`Gross Profit`="grossProfit"),
function(nm)vapply(ISY, "[[", nm, FUN.VALUE=NA_character_)))
'data.frame':   36 obs. of  3 variables:
 $ Date        : chr  "2020-09-30" "2019-09-30" "2018-09-30" "2017-09-30"
...
 $ netIncome   : chr  "57411000000.00" "55256000000.00" "59531000000.00"
"48351000000.00" ...
 $ Gross Profit: chr  "104956000000.00" "98392000000.00" "101839000000.00"
"88186000000.00" ...
'data.frame':   36 obs. of  3 variables:
 $ Date        : Date, format: "2020-09-30" "2019-09-30" "2018-09-30"
"2017-09-30" ...
 $ netIncome   : num  5.74e+10 5.53e+10 5.95e+10 4.84e+10 4.57e+10 ...
 $ Gross Profit: num  1.05e+11 9.84e+10 1.02e+11 8.82e+10 8.43e+10 ...

        
On Thu, Jul 1, 2021 at 5:35 PM Sparks, John <jspark4 at uic.edu> wrote:

            

  
  
#
Bill,

A Matrix can only contain one kind of data. I ran your code after modifying
it to be proper and took a transpose to get it risght-side up:

t(wanted)
date         netIncome        grossProfit      
2020-09-30 "2020-09-30" "57411000000.00" "104956000000.00"
2019-09-30 "2019-09-30" "55256000000.00" "98392000000.00" 
2018-09-30 "2018-09-30" "59531000000.00" "101839000000.00"
2017-09-30 "2017-09-30" "48351000000.00" "88186000000.00"

That looks better, I think.

So I did this:

wanted <- t(wanted)

It has rownames and colnames and you can make it a data frame easily enough:

mydf <- data.frame(wanted)

But the columns are all character strings, so CONVERT them as you wish:
date      netIncome     grossProfit
2020-09-30 2020-09-30 57411000000.00 104956000000.00
2019-09-30 2019-09-30 55256000000.00  98392000000.00
2018-09-30 2018-09-30 59531000000.00 101839000000.00

Your numbers are quite large and may or may not be meant to be integers:
date  netIncome grossProfit
2020-09-30 2020-09-30 5.7411e+10 1.04956e+11
2019-09-30 2019-09-30 5.5256e+10 9.83920e+10
2018-09-30 2018-09-30 5.9531e+10 1.01839e+11
2017-09-30 2017-09-30 4.8351e+10 8.81860e+10
2016-09-30 2016-09-30 4.5687e+10 8.42630e+10
2015-09-30 2015-09-30 5.3394e+10 9.36260e+10

The first entries may have something wrong as they become NA when I make
them integers.

The date column is the same as the rownames and is not in a normal vector
format. It shows as a list and you may want to convert it to one of several
formats R supports for dates or a more normal character string. 

So here is how I made it a character string:
date  netIncome grossProfit
2020-09-30 2020-09-30 5.7411e+10 1.04956e+11
2019-09-30 2019-09-30 5.5256e+10 9.83920e+10
2018-09-30 2018-09-30 5.9531e+10 1.01839e+11
2017-09-30 2017-09-30 4.8351e+10 8.81860e+10
2016-09-30 2016-09-30 4.5687e+10 8.42630e+10
2015-09-30 2015-09-30 5.3394e+10 9.36260e+10

If you want a DATE, it can now be converted again using one of many methods.

Just FYI, numbers that big and rounded might work just as well measured in
millions as in what I did to grossProfit:
date netIncome grossProfit
2020-09-30 2020-09-30     57411      104956
2019-09-30 2019-09-30     55256       98392
2018-09-30 2018-09-30     59531      101839
2017-09-30 2017-09-30     48351       88186
2016-09-30 2016-09-30     45687       84263
2015-09-30 2015-09-30     53394       93626

Some of the numbers are negative though.

If rownames are not needed:
date netIncome grossProfit
1 2020-09-30     57411      104956
2 2019-09-30     55256       98392
3 2018-09-30     59531      101839
4 2017-09-30     48351       88186
5 2016-09-30     45687       84263
6 2015-09-30     53394       93626

It may be easier to work with this, but again, if you need the dates to be
real dates, as in graphing.

Hope that helps. 





-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Bill Dunlap
Sent: Thursday, July 1, 2021 9:01 PM
To: Sparks, John <jspark4 at uic.edu>
Cc: r-help at r-project.org
Subject: Re: [R] List / Matrix to Data Frame

Does this do what you want?
lapply(c(Date="date",netIncome="netIncome",`Gross Profit`="grossProfit"),
function(nm)vapply(ISY, "[[", nm, FUN.VALUE=NA_character_)))
'data.frame':   36 obs. of  3 variables:
 $ Date        : chr  "2020-09-30" "2019-09-30" "2018-09-30" "2017-09-30"
...
 $ netIncome   : chr  "57411000000.00" "55256000000.00" "59531000000.00"
"48351000000.00" ...
 $ Gross Profit: chr  "104956000000.00" "98392000000.00" "101839000000.00"
"88186000000.00" ...
'data.frame':   36 obs. of  3 variables:
 $ Date        : Date, format: "2020-09-30" "2019-09-30" "2018-09-30"
"2017-09-30" ...
 $ netIncome   : num  5.74e+10 5.53e+10 5.95e+10 4.84e+10 4.57e+10 ...
 $ Gross Profit: num  1.05e+11 9.84e+10 1.02e+11 8.82e+10 8.43e+10 ...

        
On Thu, Jul 1, 2021 at 5:35 PM Sparks, John <jspark4 at uic.edu> wrote:

            
format.
this.
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Works like a charm.

Thanks so much.

--John Sparks