Hello experts, I am sorry that my subject line is confusing, because I am confused as nuts. Let me take a shot at explaining what I am trying to do. I have a data set of log GDP, education, democracy index, and a whole bunch of variables for every country from 1950 to 2000. Each country accounts for 10 observations with each observation representing the mean GDP for each 5 year interval. Example: Country log GDP yr Canada 9.115211 1950 Canada 9.205848 1955 Canada 9.247975 1960 Canada 9.429002 1965 Canada 9.554069 1970 Canada 9.719351 1975 Canada 9.851376 1980 Canada 9.937892 1985 Canada 10.01457 1990 Canada 10.04093 1995 Canada 10.20005 2000 USA 9.27824 1950 USA 9.38968 1955 USA 9.415136 1960 USA 9.594625 1965 USA 9.70207 1970 USA 9.800418 1975 USA 9.96813 1980 USA 10.07001 1985 USA 10.18331 1990 USA 10.25446 1995 USA 10.4131 2000 For log GDP: I want to create a new object in R with one line for each country and the average log GDP from the 10 5yr interval observations. With the subset I want to then create a table with 3 columns and 4 rows. (I have no idea how to write the codes to create the new object. Friend said something about conditional median.) Columns 1) All countries 2) High income countries 3) Low income countries Rows 1) Democracy index 2) Log GDP 3) Obs 4) Countries To create a high and low income columns, I am using the median as the boundary. (i.e. high income for gdp > median of the mean for each country, low income for gdp <= median of the mean for each country.) I hope someone can understand what I am writing here and help me out with it. Thanks so much! -- View this message in context: http://r.789695.n4.nabble.com/keep-average-values-and-delete-duplicate-rows-tp4647677.html Sent from the R help mailing list archive at Nabble.com.
keep average values and delete duplicate rows
5 messages · arun, Jeff Newmiller, fuckecon
HI, I guess it is a bit confusing as to what you want.? In the example dataset, there was no democracy_index, but in the result you want it.? Regarding the median calculation, I guess you want to calculate the median for each country.? I created one more country (China) with fake data.? May be this helps: dat1<-read.table(text=" Country? log_GDP yr Canada 9.115211 1950 Canada 9.205848 1955 Canada 9.247975 1960 Canada 9.429002 1965 Canada 9.554069 1970 Canada 9.719351 1975 Canada 9.851376 1980 Canada 9.937892 1985 Canada 10.01457 1990 Canada 10.04093 1995 Canada 10.20005 2000 USA?????? 9.27824??? 1950 USA?????? 9.38968??? 1955 USA?????? 9.415136? 1960 USA?????? 9.594625? 1965 USA?????? 9.70207??? 1970 USA?????? 9.800418? 1975 USA?????? 9.96813??? 1980 USA?????? 10.07001? 1985 USA?????? 10.18331? 1990 USA?????? 10.25446? 1995 USA?????? 10.4131??? 2000 China????? 7.5?????? 1950 China????? 7.32????? 1955 China?????? 7.33? 1960 China?????? 7.6? 1965 China?????? 7.8??? 1970 China?????? 8.0?? 1975 China?????? 8.2??? 1980 China?????? 8.3? 1985 China?????? 8.5? 1990 China?????? 8.6? 1995 China?????? 8.7??? 2000?? ",sep="",header=TRUE,stringsAsFactors=FALSE) ?dat2<-with(dat1,aggregate(log_GDP,by=list(Country=Country),mean)) colnames(dat2)[2]<-"Mean" ?dat3<-with(dat1,aggregate(log_GDP,by=list(Country=Country),median)) colnames(dat3)[2]<-"Median" dat4<-merge(dat3,dat2) dat4$HighIncome<-ifelse(dat4$Mean>dat4$Median,dat4$Country[dat4$Mean>dat4$Median],NA) ?dat4$LowIncome<-ifelse(dat4$Mean>dat4$Median,NA,dat4$Country[!dat4$Mean>dat4$Median]) dat5<-dat4[,-2] dat5 #? Country???? Mean HighIncome LowIncome #1? Canada 9.665116?????? <NA>??? Canada #2?? China 7.986364?????? <NA>???? China #3???? USA 9.824471??????? USA????? <NA> res<-merge(dat1,dat5) ?head(res) #? Country? log_GDP?? yr???? Mean HighIncome LowIncome #1? Canada 9.115211 1950 9.665116?????? <NA>??? Canada #2? Canada 9.205848 1955 9.665116?????? <NA>??? Canada #3? Canada 9.247975 1960 9.665116?????? <NA>??? Canada #4? Canada 9.429002 1965 9.665116?????? <NA>??? Canada #5? Canada 9.554069 1970 9.665116?????? <NA>??? Canada #6? Canada 9.719351 1975 9.665116?????? <NA>??? Canada A.K. ----- Original Message ----- From: fuckecon <iamstanhu at gmail.com> To: r-help at r-project.org Cc: Sent: Sunday, October 28, 2012 12:16 AM Subject: [R] keep average values and delete duplicate rows Hello experts, I am sorry that my subject line is confusing, because I am confused as nuts. Let me take a shot at explaining what I am trying to do. I have a data set of log GDP, education, democracy index, and a whole bunch of variables for every country from 1950 to? 2000. Each country accounts for 10 observations with each observation representing the mean GDP for each 5 year interval. Example: Country? log GDP yr Canada??? 9.115211 1950 Canada??? 9.205848 1955 Canada??? 9.247975 1960 Canada??? 9.429002 1965 Canada??? 9.554069 1970 Canada??? 9.719351 1975 Canada??? 9.851376 1980 Canada??? 9.937892 1985 Canada??? 10.01457 1990 Canada??? 10.04093 1995 Canada??? 10.20005 2000 USA? ? ? 9.27824? ? 1950 USA? ? ? 9.38968? ? 1955 USA? ? ? 9.415136? 1960 USA? ? ? 9.594625? 1965 USA? ? ? 9.70207? ? 1970 USA? ? ? 9.800418? 1975 USA? ? ? 9.96813? ? 1980 USA? ? ? 10.07001? 1985 USA? ? ? 10.18331? 1990 USA? ? ? 10.25446? 1995 USA? ? ? 10.4131? ? 2000 For log GDP: I want to create a new object in R with one line for each country and? the average log GDP from the 10 5yr interval observations. With the subset I want to then create a table with 3 columns and 4 rows. (I have no idea how to write the codes to create the new object. Friend said something about conditional median.) Columns 1) All countries 2) High income countries 3) Low income countries Rows 1) Democracy index 2) Log GDP 3) Obs 4) Countries To create a high and low income columns, I am using the median as the boundary. (i.e. high income for gdp > median of the mean for each country, low income for gdp <= median of the mean for each country.) I hope someone can understand what I am writing here and help me out with it. Thanks so much! -- View this message in context: http://r.789695.n4.nabble.com/keep-average-values-and-delete-duplicate-rows-tp4647677.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Please read the posting guide... this is not a homework help line.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
fuckecon <iamstanhu at gmail.com> wrote:
Hello experts, I am sorry that my subject line is confusing, because I am confused as nuts. Let me take a shot at explaining what I am trying to do. I have a data set of log GDP, education, democracy index, and a whole bunch of variables for every country from 1950 to 2000. Each country accounts for 10 observations with each observation representing the mean GDP for each 5 year interval. Example: Country log GDP yr Canada 9.115211 1950 Canada 9.205848 1955 Canada 9.247975 1960 Canada 9.429002 1965 Canada 9.554069 1970 Canada 9.719351 1975 Canada 9.851376 1980 Canada 9.937892 1985 Canada 10.01457 1990 Canada 10.04093 1995 Canada 10.20005 2000 USA 9.27824 1950 USA 9.38968 1955 USA 9.415136 1960 USA 9.594625 1965 USA 9.70207 1970 USA 9.800418 1975 USA 9.96813 1980 USA 10.07001 1985 USA 10.18331 1990 USA 10.25446 1995 USA 10.4131 2000 For log GDP: I want to create a new object in R with one line for each country and the average log GDP from the 10 5yr interval observations. With the subset I want to then create a table with 3 columns and 4 rows. (I have no idea how to write the codes to create the new object. Friend said something about conditional median.) Columns 1) All countries 2) High income countries 3) Low income countries Rows 1) Democracy index 2) Log GDP 3) Obs 4) Countries To create a high and low income columns, I am using the median as the boundary. (i.e. high income for gdp > median of the mean for each country, low income for gdp <= median of the mean for each country.) I hope someone can understand what I am writing here and help me out with it. Thanks so much! -- View this message in context: http://r.789695.n4.nabble.com/keep-average-values-and-delete-duplicate-rows-tp4647677.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Arun,
Thanks for replying.
Sorry I didn't list it, I do have democracy index in my dataset.
The full set includes these columns:
Country Code (3 letter abbreviation)
Country
Education
Freedom house demo index
log population
log real gdp
nominal savings
polity demo index
year (5 yr interval: 1950, 1955, 1960....)
sample (not sure what it's for yet)
world income instrument
This is a set of panel data for some 200 countries with 10 obs each. each
country has it's own missing data for various columns.
I imported the csv file into an object in R called Table 1.
Here is the few lines I wrote so far:
#Importing data from excel to R
Table1 <- read.csv("5YearPanel.csv")
Table1 <- data.frame(Table1)
Table1
# Deleting Netherlands data from Table1 and naming the new table deDutch
deDutch <- subset(Table1, country!="Netherlands")
deDutch
What I am trying to do next is to clean the data in R as follows:
1) Take avg values of each column for each country.
2) Store these values in a new object.
3) Based on the median income, I want to divide them into a subset called
high income (i.e. >median), and a subset called low income (i.e. <=median).
4) Once I get it cleaned, I believe I can start running regressions with the
data.
I'll look at your comments and try things out first.
Thank you!
--
View this message in context: http://r.789695.n4.nabble.com/keep-average-values-and-delete-duplicate-rows-in-R-tp4647677p4647681.html
Sent from the R help mailing list archive at Nabble.com.
Hi Jeff, Sorry for the misunderstanding. I may have phrased my question as such, but my intention is really learn the workings of R. I am trying to replicate the results from an econometric paper in R, and I don't have any experience. I will reconsider my posts in the future. Regards, -- View this message in context: http://r.789695.n4.nabble.com/keep-average-values-and-delete-duplicate-rows-in-R-tp4647677p4647682.html Sent from the R help mailing list archive at Nabble.com.