spliting first 10 words in a string
...I would like i.e. split this sentence from field Opis in data.frame : Opis : "I have a sentense with ten words", so that it would conver to something like this : Opis : "I have a sentense with then words"; Column1 : "I"; Column2 : "have"; Column3 : "a"; Column4 : "sentense"; Column5: "with"; Column6 :"ten";column7:"words" ....or in data.frame something like this (as I understand) : data.frame': xx obs. of 12 variables: $ Opis : factor :"I have a sentense with then words"; $ Column1 : factor "I"; $ Column2 : factor "have"; $ Column3 : factor "a"; $ Column4 : factor "sentense"; $ Column5: factor "with"; $ Column6 : factor "ten"; $ Column7: factor"words" Hope that explains it better, I am still having some troubles understanding R and all.. m -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Matev? Pavli? Sent: Monday, November 01, 2010 10:34 PM To: David Winsemius Cc: r-help at r-project.org Subject: Re: [R] spliting first 10 words in a string Hi, I am sorry, will try to be more exact from now on... I have a data.frame with a field called Opis. IT contains sentenses that I would like to split in words or fields in data.frame...when I say columns I mean as in Excel table. I would like to split "Opis" into ten fields from the first ten words in Opis field. Here is an example of my data.frame. 'data.frame': 22928 obs. of 12 variables: $ VrtinaID : int 1 1 1 1 2 2 2 2 2 2 ... $ ZapStev : int 1 2 3 4 1 2 3 4 5 6 ... $ GlobinaOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ... $ GlobinaDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ... $ Opis : Factor w/ 12754 levels "","(MIVKA) DROBEN MELJAST PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884 9123 2500 4756 ... $ ACklasifikacija : Factor w/ 290 levels "","(CL)","(CL)/(SC)",..: 154 125 101 101 NA 106 125 80 106 101 ... $ GeolNastOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ... $ GeolNastDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ... $ GeolNastOpis : Factor w/ 113 levels "","B. M. S.",..: 56 53 53 53 56 53 53 53 53 53 ... $ NacinVrtanjaOd : num 0e+00 1e+09 1e+09 1e+09 0e+00 ... $ NacinVrtanjaDo : num 1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ... $ NacinVrtanjaOpis: Factor w/ 43 levels "","H. N.","IZKOP",..: 26 1 1 1 26 1 1 1 1 1 ... Hope that explains better... Thank you, m -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Monday, November 01, 2010 10:13 PM To: Matev? Pavli? Cc: r-help at r-project.org Subject: Re: [R] spliting first 10 words in a string
On Nov 1, 2010, at 4:39 PM, Matev? Pavli? wrote:
Hi all, I have a columnn with text that has quite a few words in it. I would like to split these words in separate columns, but just first ten words in the string. Is that possible in R?
Not sure what a column means to you. It's not a precisely defined R
type or class. (And you are requested to offered a concrete example
rather than making us guess.)
>words <-"I have a columnn with text that has quite a few words in
it. I would like to split these words in separate columns, but just
first ten words in the string. Is that possible in R?"
> strsplit(words, " ")[[1]][1:10]
[1] "I" "have" "a" "columnn" "with" "text"
"that" "has" "quite" "a"
Or if in a dataframe:
> words <-c("I have a columnn with text that has quite a few words in
it.", "I would like to split these words in separate columns", "but
just first ten words in the string. Is that possible in R?")
> worddf <- data.frame(words=words)
> t(sapply(strsplit(worddf$words, " "), "[", 1:10) )
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,
8] [,9] [,10]
[1,] "I" "have" "a" "columnn" "with" "text" "that" "has"
"quite" "a"
[2,] "I" "would" "like" "to" "split" "these" "words" "in"
"separate" "columns"
[3,] "but" "just" "first" "ten" "words" "in" "the" "string."
"Is" "that"
David Winsemius, MD West Hartford, CT ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.