Dear list, I am a beginner with R and I have a question which may sound stupid: How can I transform my SQL query into an appropriate dataframe? I've looking around, and haven't found any solution. My return data is stored in a SQL database. When I query it, it gives me the following output: Id Fund Return Period 1 Fund01 0,5 may/08 2 Fund01 0,4 june/08 3 Fund 02 -0,3 may/08 4 Fund02 -0,4 june/2008 However this data is not suitable for analysis. I would like to transform it to a format like: Period Fund01 Fund02 May/08 0,5 0,4 June/08 -0,3 -0,4 Is it possible to do this with R? Could you please recommend me references or packages which are useful for handling this kind of data? Thank you very much for your time. Regards, Hebbertt
Data manipulation: transforming SQL query in an appropriate dataframe
3 messages · Hebbertt de Farias Soares, Gabor Grothendieck, M@rk_Otto m@iii@g oii iws@gov
Correcting the inconsistencies in the data, try this
(and also see ?reshape and the reshape package for
other approaches):
DF <- structure(list(Id = 1:4, Fund = structure(c(1L, 1L, 2L, 2L),
.Label = c("Fund01",
"Fund02"), class = "factor"), Return = c(0.5, 0.4, -0.3, -0.4
), Period = structure(c(2L, 1L, 2L, 1L), .Label = c("june/08",
"may/08"), class = "factor")), .Names = c("Id", "Fund", "Return",
"Period"), class = "data.frame", row.names = c(NA, -4L))
DF
tapply(DF$Return, DF[c(2, 4)], c)
On Fri, Sep 5, 2008 at 10:13 AM, Hebbertt de Farias Soares
<hebbertt.soares at rosasoffice.com> wrote:
Dear list,
I am a beginner with R and I have a question which may sound stupid: How can I transform my SQL query into an appropriate dataframe?
I've looking around, and haven't found any solution.
My return data is stored in a SQL database. When I query it, it gives me the following output:
Id Fund Return Period
1 Fund01 0,5 may/08
2 Fund01 0,4 june/08
3 Fund 02 -0,3 may/08
4 Fund02 -0,4 june/2008
However this data is not suitable for analysis. I would like to transform it to a format like:
Period Fund01 Fund02
May/08 0,5 0,4
June/08 -0,3 -0,4
Is it possible to do this with R? Could you please recommend me references or packages which are useful for handling this kind of data?
Thank you very much for your time.
Regards,
Hebbertt
[[alternative HTML version deleted]]
_______________________________________________ R-sig-DB mailing list -- R Special Interest Group R-sig-DB at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-db
Here are three ways of handling the problem of taking values of a variable
and making them columns of another table. Let the original data be
OldTable. Note I had to clean up your example a little bit.
fh<-textConnection("Id Fund Return Period
1 Fund01 0,5 may/08
2 Fund01 0,4 june/08
3 Fund02 -0,3 may/08
4 Fund02 -0,4 june/08
")
OldTable<-read.table(fh,header=T,dec=",",sep="",row.names="Id")
1. Use reshape in R. This is used with longitudinal data, so you have to
look closely at the manual page to get what you need out of it.
NewTable<-reshape(OldTable,idvar="Period",v.names='Return',timevar='Fund',direction="wide")
2. I you can to the data rearranging in your database. SQL is not great
at converting values of variables to columns. It is great at doing lots
of manipulations on large amounts of data better than R and without the
memory worries. For this reshaping, you can use pivot tables to do this
in a straight forward way in Excel or MS Access.
Here we let OldTable be a table in the database too. So, make a view in
the database:
create view WhatIWantMyRTableToBeView as
select Period,Fund01,Fund02
from (
select Period,Return as Fund01
from OldTable
where Fund='Fund01'
) as f1 inner join (
select Period,Return as Fund02
from OldTable
where Fund=in('Fund02','Fund 02')
) as f1 on(f1.Period=f2=Period)
order by Period;
Then, you can then read in the data to R using the RODBC or other database
package such as RSQLite
library(RODBC)
dbh<-odbcConnect("<path>/<database name>")
NewTable<-sqlFetch(dbh,"WhatIWantMyRTableToBeView")
or with an SQL query in R.
NewTable<-sqlQuery(dbh,"
select Period,Fund01,Fund02
from (
select Period,Return as Fund01
from OldTable
where Fund='Fund01'
) as f1 inner join (
select Period,Return as Fund02
from OldTable
where Fund=in('Fund02','Fund 02')
) as f1 on(f1.Period=f2=Period)
order by Period;
")
3. These could be done all in R with a merge:
fund1<-OldTable[OldTable$Fund=="Fund01",]
fund1<-dataframe(Period=fund1$Period,Fund01=fund1$Fund)
fund2<-OldTable[OldTable$Fund%in%c("Fund02","Fund 02"),]
fund2<-dataframe(Period=fund2$Period,Fund02=fund2$Fund)
NewTable<-merge(fund1,fund2) # merge keys on the variables common to
both datasets.
I haven't run the last two parts, so you may need to tweak the code.
Others on this list are more knowledgeable on these matters.
Mark
Mark Otto, Biometrician
U. S. Fish and Wildlife Service
Patuxent Wildlife Research Center
11510 American Holly Dr
Laurel MD 20708-4002
"Gabor Grothendieck" <ggrothendieck at gmail.com>
Sent by: r-sig-db-bounces at stat.math.ethz.ch
09/05/2008 10:39 AM
To
"Hebbertt de Farias Soares" <hebbertt.soares at rosasoffice.com>
cc
"r-sig-db at stat.math.ethz.ch" <r-sig-db at stat.math.ethz.ch>
Subject
Re: [R-sig-DB] Data manipulation: transforming SQL query in an appropriate
dataframe
Correcting the inconsistencies in the data, try this
(and also see ?reshape and the reshape package for
other approaches):
DF <- structure(list(Id = 1:4, Fund = structure(c(1L, 1L, 2L, 2L),
.Label = c("Fund01",
"Fund02"), class = "factor"), Return = c(0.5, 0.4, -0.3, -0.4
), Period = structure(c(2L, 1L, 2L, 1L), .Label = c("june/08",
"may/08"), class = "factor")), .Names = c("Id", "Fund", "Return",
"Period"), class = "data.frame", row.names = c(NA, -4L))
DF
tapply(DF$Return, DF[c(2, 4)], c)
On Fri, Sep 5, 2008 at 10:13 AM, Hebbertt de Farias Soares
<hebbertt.soares at rosasoffice.com> wrote:
Dear list, I am a beginner with R and I have a question which may sound stupid:
How can I transform my SQL query into an appropriate dataframe?
I've looking around, and haven't found any solution. My return data is stored in a SQL database. When I query it, it gives
me the following output:
Id Fund Return Period 1 Fund01 0,5 may/08 2 Fund01 0,4 june/08 3 Fund 02 -0,3 may/08 4 Fund02 -0,4 june/2008 However this data is not suitable for analysis. I would like to
transform it to a format like:
Period Fund01 Fund02 May/08 0,5 0,4 June/08 -0,3 -0,4 Is it possible to do this with R? Could you please recommend me
references or packages which are useful for handling this kind of data?
Thank you very much for your time.
Regards,
Hebbertt
[[alternative HTML version deleted]]
_______________________________________________ R-sig-DB mailing list -- R Special Interest Group R-sig-DB at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-db
_______________________________________________ R-sig-DB mailing list -- R Special Interest Group R-sig-DB at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-db