An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121026/e85b2b2c/attachment.pl>
using match-type function to return correctly ordered data from a dataframe
4 messages · Markus Weisner, Jeff Newmiller, William Dunlap
Have you actually read
?"%in%"
?
Although a valuable tool, not all answers are most effectively obtained by Googling.
Also, your repeated assertions that the answers are not maintained in order are poorly framed. They DO stay in order according to the zipcode database order. That said, your desire for numeric indexes is only as far away as your help file.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Markus Weisner <r at themarkus.com> wrote:
I am regularly running into a problem where I can't seem to figure out
how
maintain correct data order when selecting data out of a dataframe.
The
below code shows an example of trying to pull data from a dataframe
using
ordered zip codes. My problem is returning the pulled data in the
correct
order. This is a very simple example, but it illustrates a regular
problem
that I am running into.
In the past, I have used fairly complicated solutions to pull this off.
There has got to be a more simple and straightforward method ...
probably
some function that I missed in all my googling.
Thanks in advance for anybody's help figuring this out.
~Markus
### Function Definitions ###
# FUNCTION #1 (returns wrong order)
getLatitude1 = function(myzips) {
# load libraries and data
library(zipcode)
data(zipcode)
# get latitude values
mylats = zipcode[zipcode$zip %in% myzips, "latitude"] #problem is that
this code does not maintain order
# return data
return(mylats)
}
# FUNCTION #2 (also returns wrong order)
getLatitude2 = function(myzips) {
# load libraries and data
library(zipcode)
data(zipcode)
# convert myzips to DF
myzips = as.data.frame(as.character(myzips))
# merge in zipcode data based on zip
results = merge(myzips, zipcode[,c("zip", "latitude")], by.x =
"as.character(myzips)", by.y="zip", all.x=TRUE)
# return data
return(results$latitude)
}
### Code ###
# specify a set of zip codes
myzips = c("74432", "72537", "06026", "01085", "65793")
# create a DF
myzips.df = data.frame(zip=myzips, latitude=NA, longitude=NA)
# look at data to determine what should be returned and in what order
library(zipcode)
data(zipcode)
zipcode[zipcode$zip %in% myzips,]
# test function #1 (function definition below)
myzips.df$latitude = getLatitude1(myzips.df$zip) #returns wrong order
# test function #2 (function definition below)
myzips.df$latitude = getLatitude2(myzips.df$zip) #also returns wrong
order
# need "myzips %in% zipcode$zip" to return array/df indices rather than
logical
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121027/09c5235f/attachment.pl>
Is the following what you want?
> dfLETTERS <- data.frame(LETTER=LETTERS[1:5], lData=c("Ay","Bee","Cee","Dee","Eee"), row.names=sprintf("LRow%d",1:5))
> z <- c("D", "B", "A", "B")
> dfLETTERS[match(z, dfLETTERS$LETTER), ]
LETTER lData
LRow4 D Dee
LRow2 B Bee
LRow1 A Ay
LRow2.1 B Bee
> # or when z includes things not in the list to match:
> dfLETTERS[match(c("E",NA,"notALetter","A"), dfLETTERS$LETTER), ]
LETTER lData
LRow5 E Eee
NA <NA> <NA>
NA.1 <NA> <NA>
LRow1 A Ay
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of Markus Weisner
Sent: Saturday, October 27, 2012 9:01 AM
To: Jeff Newmiller
Cc: r-help at r-project.org
Subject: Re: [R] using match-type function to return correctly ordered data from a
dataframe
Hi Jeff. I believe my Function #1 actually does use "%in%" to select the
data. I use "%in%" all the time but, as far as I can tell, it can only
return a vector of logical values. As a result, it does keep the order of
the dataframe from which you are selecting data. It does not, however,
appear that you can return the data in the order of the values that you
were specifying the data to be in.
To try and clarify my order assertion, take for example a dataframe that
has a column "LETTER" with a record for each alphabetical letter. The
dataframe is ordered so that "A" is record 1 and "Z" is record 26. Say
that I want to pull records from this dataframe based on a list of letters
and I want it to return those records in the order of the letters I passed
it. I could use a something like the following code to pull records ...
myDataFrame[myDataFrame$LETTERS, %in% myPassedListOfLetters,]
If I pass it the list, myPassedListOfLetters <- c("C", "B", "A"), I will
receive the data back in the order "A", "B", "C". What I am trying to
figure out is how to get the data back in the order of the list that I
specified I want the data in ("C", "B", "A").
Hope that clarifies what I am trying to figure out a bit. Thanks for your
help!
Best,
Markus
On Fri, Oct 26, 2012 at 11:00 PM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us>wrote:
Have you actually read
?"%in%"
?
Although a valuable tool, not all answers are most effectively obtained by
Googling.
Also, your repeated assertions that the answers are not maintained in
order are poorly framed. They DO stay in order according to the zipcode
database order. That said, your desire for numeric indexes is only as far
away as your help file.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Markus Weisner <r at themarkus.com> wrote:
I am regularly running into a problem where I can't seem to figure out
how
maintain correct data order when selecting data out of a dataframe.
The
below code shows an example of trying to pull data from a dataframe
using
ordered zip codes. My problem is returning the pulled data in the
correct
order. This is a very simple example, but it illustrates a regular
problem
that I am running into.
In the past, I have used fairly complicated solutions to pull this off.
There has got to be a more simple and straightforward method ...
probably
some function that I missed in all my googling.
Thanks in advance for anybody's help figuring this out.
~Markus
### Function Definitions ###
# FUNCTION #1 (returns wrong order)
getLatitude1 = function(myzips) {
# load libraries and data
library(zipcode)
data(zipcode)
# get latitude values
mylats = zipcode[zipcode$zip %in% myzips, "latitude"] #problem is that
this code does not maintain order
# return data
return(mylats)
}
# FUNCTION #2 (also returns wrong order)
getLatitude2 = function(myzips) {
# load libraries and data
library(zipcode)
data(zipcode)
# convert myzips to DF
myzips = as.data.frame(as.character(myzips))
# merge in zipcode data based on zip
results = merge(myzips, zipcode[,c("zip", "latitude")], by.x =
"as.character(myzips)", by.y="zip", all.x=TRUE)
# return data
return(results$latitude)
}
### Code ###
# specify a set of zip codes
myzips = c("74432", "72537", "06026", "01085", "65793")
# create a DF
myzips.df = data.frame(zip=myzips, latitude=NA, longitude=NA)
# look at data to determine what should be returned and in what order
library(zipcode)
data(zipcode)
zipcode[zipcode$zip %in% myzips,]
# test function #1 (function definition below)
myzips.df$latitude = getLatitude1(myzips.df$zip) #returns wrong order
# test function #2 (function definition below)
myzips.df$latitude = getLatitude2(myzips.df$zip) #also returns wrong
order
# need "myzips %in% zipcode$zip" to return array/df indices rather than
logical
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.