Skip to content

Data manipulation: transforming SQL query in an appropriate dataframe

3 messages · Hebbertt de Farias Soares, Gabor Grothendieck, M@rk_Otto m@iii@g oii iws@gov

#
Dear list,

I am a beginner with R and I have  a question which may sound stupid: How can I transform my SQL query into an appropriate dataframe?

I've looking around, and haven't found any solution.

My return data is stored in a SQL database. When I query it, it  gives me the following output:
Id            Fund                     Return                 Period
1             Fund01                0,5                         may/08
2             Fund01                0,4                         june/08
3             Fund 02               -0,3                        may/08
4             Fund02                -0,4                        june/2008

However this data is not suitable for analysis. I would like to transform it to a format like:

Period                  Fund01                Fund02
May/08                0,5                         0,4
June/08               -0,3                        -0,4

Is it possible to do this with R? Could you please recommend me references or packages which are useful for handling this kind of data?

Thank you very much for your time.

Regards,

Hebbertt
#
Correcting the inconsistencies in the data, try this
(and also see ?reshape and the reshape package for
other approaches):

DF <- structure(list(Id = 1:4, Fund = structure(c(1L, 1L, 2L, 2L),
.Label = c("Fund01",
"Fund02"), class = "factor"), Return = c(0.5, 0.4, -0.3, -0.4
), Period = structure(c(2L, 1L, 2L, 1L), .Label = c("june/08",
"may/08"), class = "factor")), .Names = c("Id", "Fund", "Return",
"Period"), class = "data.frame", row.names = c(NA, -4L))
DF

tapply(DF$Return, DF[c(2, 4)], c)


On Fri, Sep 5, 2008 at 10:13 AM, Hebbertt de Farias Soares
<hebbertt.soares at rosasoffice.com> wrote:
#
Here are three ways of handling the problem of taking values of a variable 
and making them columns of another table.  Let the original data be 
OldTable.  Note I had to clean up your example a little bit.

fh<-textConnection("Id            Fund                  Return  Period
1             Fund01                0,5                         may/08
2             Fund01                0,4                         june/08
3             Fund02               -0,3                        may/08
4             Fund02                -0,4                        june/08
")

OldTable<-read.table(fh,header=T,dec=",",sep="",row.names="Id")


1. Use reshape in R.  This is used with longitudinal data, so you have to 
look closely at the manual page to get what you need out of it.

 
NewTable<-reshape(OldTable,idvar="Period",v.names='Return',timevar='Fund',direction="wide")

2. I you can to the data rearranging in your database.  SQL is not great 
at converting values of variables to columns.  It is great at doing lots 
of manipulations on large amounts of data better than R and without the 
memory worries.  For this reshaping, you can use pivot tables to do this 
in a straight forward way in Excel or MS Access. 

Here we let OldTable be a table in the database too. So, make a view in 
the database:

  create view WhatIWantMyRTableToBeView as 
  select Period,Fund01,Fund02
   from (
    select Period,Return as Fund01
     from OldTable
     where Fund='Fund01'
   ) as f1 inner join (
    select Period,Return as Fund02
     from OldTable
     where Fund=in('Fund02','Fund 02')
   ) as f1 on(f1.Period=f2=Period)
   order by Period;
 
Then, you can then read in the data to R using the RODBC or other database 
package such as RSQLite

  library(RODBC)
  dbh<-odbcConnect("<path>/<database name>")
  NewTable<-sqlFetch(dbh,"WhatIWantMyRTableToBeView")

or with an SQL query in R. 

 NewTable<-sqlQuery(dbh,"
  select Period,Fund01,Fund02
   from (
    select Period,Return as Fund01
     from OldTable
     where Fund='Fund01'
   ) as f1 inner join (
    select Period,Return as Fund02
     from OldTable
     where Fund=in('Fund02','Fund 02')
   ) as f1 on(f1.Period=f2=Period)
   order by Period;
 ")

3. These could be done all in R with a merge:

 fund1<-OldTable[OldTable$Fund=="Fund01",]
 fund1<-dataframe(Period=fund1$Period,Fund01=fund1$Fund)
 fund2<-OldTable[OldTable$Fund%in%c("Fund02","Fund 02"),]
 fund2<-dataframe(Period=fund2$Period,Fund02=fund2$Fund)
 NewTable<-merge(fund1,fund2)  # merge keys on the variables common to 
both datasets.

I haven't run the last two parts, so you may need to tweak the code. 
Others on this list are more knowledgeable on these matters. 

Mark
 


Mark Otto, Biometrician 
U. S. Fish and Wildlife Service 
Patuxent Wildlife Research Center
11510 American Holly Dr
Laurel MD  20708-4002




"Gabor Grothendieck" <ggrothendieck at gmail.com> 
Sent by: r-sig-db-bounces at stat.math.ethz.ch
09/05/2008 10:39 AM

To
"Hebbertt de Farias Soares" <hebbertt.soares at rosasoffice.com>
cc
"r-sig-db at stat.math.ethz.ch" <r-sig-db at stat.math.ethz.ch>
Subject
Re: [R-sig-DB] Data manipulation: transforming SQL query in an appropriate 
dataframe






Correcting the inconsistencies in the data, try this
(and also see ?reshape and the reshape package for
other approaches):

DF <- structure(list(Id = 1:4, Fund = structure(c(1L, 1L, 2L, 2L),
.Label = c("Fund01",
"Fund02"), class = "factor"), Return = c(0.5, 0.4, -0.3, -0.4
), Period = structure(c(2L, 1L, 2L, 1L), .Label = c("june/08",
"may/08"), class = "factor")), .Names = c("Id", "Fund", "Return",
"Period"), class = "data.frame", row.names = c(NA, -4L))
DF

tapply(DF$Return, DF[c(2, 4)], c)


On Fri, Sep 5, 2008 at 10:13 AM, Hebbertt de Farias Soares
<hebbertt.soares at rosasoffice.com> wrote:
How can I transform my SQL query into an appropriate dataframe?
me the following output:
transform it to a format like:
references or packages which are useful for handling this kind of data?
_______________________________________________
R-sig-DB mailing list -- R Special Interest Group
R-sig-DB at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-db