Skip to content
Prev 327016 / 398503 Next

Splitting dataframes and cleaning extraneous characters

HI,
One problem with using ?subst() would be it depends upon the number of digits, characters etc.? 

For eg.
substring("-005-190",6)
#[1] "190"
?substring("-0057-190",6)
#[1] "-190"

#whereas

gsub("^-[^-]*-","","-0057-190")
#[1] "190"

Probably, your dataset doesn't have that sort of problem.

dat1<- read.table(text="
project boro
123 m
134 k
123 m
123 m
543 q
543 q
134 k
",sep="",header=TRUE,stringsAsFactors=FALSE)
?res<-split(dat1,gsub("\\.","",as.character(interaction(dat1[,2],dat1[,1]))))
?res
$k134
#? project boro
#2???? 134??? k
#7???? 134??? k
#
#$m123
?# project boro
#1???? 123??? m
#3???? 123??? m
#4???? 123??? m
#
#$q543
?# project boro
#5???? 543??? q
#6???? 543??? q
?str(res$k134)
#'data.frame':??? 2 obs. of? 2 variables:
# $ project: int? 134 134
# $ boro?? : chr? "k" "k"
A.K.



I was able to split the extraneous stuff using 

a<-substring(Project_NBR, first=6) 

and then cbind to add the edited column to the df. I have a 
sample but I am not sure how to provide it to you. I will try to produce
 an example that's similar to what I have: 

project	boro 
123	m 
134	k 
123	m 
123 	m 
543	q 
543	q 
134	k 


Basically I am trying to subset the data frame according to 
project and boro with the name of the subset being boro-project (ex. 
m123, k134) 

I hope this provides more clarity to my problem. 


----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Wednesday, July 17, 2013 11:06 AM
Subject: Re: Splitting dataframes and cleaning extraneous characters

Hi,
YOu could try.
?split()
split(ats,ats$Project_NBR)
You also mentioned about two columns.

split(ats,list(ats$col1, ats$col2))

You should have provided an example dataset using ?dput() ( dput(head(data,10)) ) for testing.
Also,

gsub("^-[^-]*-","","-005-190")
#[1] "190"
A.K.




Problem: I have a large data set and need to separate based on factors 
in 2 columns. The final output would be a collection of dataframes 
renamed to 

the corresponding factor levels. ? 

So far I know that for each corresponding factor I can execute 

x190<-ats[which(Project_NBR=='-005-190'),] 

However there are about 400 factors needing to be separated. 
Also, I would like to remove the "-005-". ?Any guidance will be greatly 
appreciated. ?