convert list to Dataframe

Hi. I have a huge list called twitter:
dim(twitter)
NULL
str(twitter)
List of 1
 $ :Classes 'PlainTextDocument', 'TextDocument', 'character'  atomic
[1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons For
Governance From Campaigner-in-chief: President obama jumps  campaign 09 
tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
12210;10:47:37;20;10;2009;David_Stringer;William Hague heading  Washington 
meets  Gen. Jim Jones, Sen. John McCain  others. Will Obama team raise
worries  EU ties?;London, England;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362
12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses wearing
thin  Obama, media pals... http://tinyurl.com/yfw6cd9;So.
California;USA;CA;;;36.778261;-119.4179324
12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama   Afghanistan
troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama
#video;USA;USA;;;;37.09024;-95.712891 ...
  .. ..- attr(*, "Author")= chr(0) 
  .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31
04:46:56"
  .. ..- attr(*, "Description")= chr(0) 
  .. ..- attr(*, "Heading")= chr(0) 
  .. ..- attr(*, "ID")= chr "1"
  .. ..- attr(*, "Language")= chr "en"
  .. ..- attr(*, "LocalMetaData")= list()
  .. ..- attr(*, "Origin")= chr(0) 
 - attr(*, "CMetaData")=List of 3
  ..$ NodeID  : num 0
  ..$ MetaData:List of 2
  .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
  .. ..$ creator    : Named chr ""
  .. .. ..- attr(*, "names")= chr "LOGNAME"
  ..$ Children: NULL
  ..- attr(*, "class")= chr "MetaDataNode"
 - attr(*, "DMetaData")='data.frame':   1 obs. of  1 variable:
  ..$ MetaID: num 0
 - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"

It contains tweets but in many languages. The "columns" are separated by
semi-colons. I am using the tm package and it is a "corpus".

It looks like this:

547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1   day
:p;Huddersfield/Lincoln;United
Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro computador
da facool? BOM DIA.;Belo Horizonte - MG -
BR;Brazil;MG;;;-19.8157306;-43.9542226
547284;06:37:17;21;10;2009;romanotr;???, "????????? ??? ??????" ????????????
?????? ????? ?? ???????? ?????, ?? 173 ?????? ?? 81 ????? ???????? ???????.
??????,??????...;Portugal Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169
547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton &lt\;Someone's
Daughter&gt\;;Kanazawa, Japan;Japan;Ishikawa
Prefecture;;;36.5613254;136.6562051
Error: invalid input
'547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT
@zuola ???????????? @wenyunc

I want to convert it to "fields" or columns and so I thought I should
convert it to a dataframe. I tried
twitterDF<-as.data.frame(twitter)
Error in sort.list(y) : 
  invalid input
'547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT
@zuola ???????????? @wenyunchao
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????*????????????????????????????????????????????????????????????????????????????????????;???????????????;China;Zhejiang;;;28.695035;119.751054'
in 'utf8towcs'

Can anyone suggest what I can do? 

P.S. Actually, I would love to remove all the non-English tweets but I have
no clue about how to do that.
View this message in context: http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148889.html
Sent from the R help mailing list archive at Nabble.com.
Three suggestions:

-- drop the idea of using a dataframe. It's only appropriate when the  
data is rectangular.
-- look at strsplit for separating at "@" characters.
-- post the output of dput() on your sample, since email is probably  
not capable of rendering this data without creating distortions.
David

On Nov 1, 2009, at 7:43 AM, onyourmark wrote:

>
> Hi. I have a huge list called twitter:
>
>> dim(twitter)
> NULL
>> str(twitter)

This looks to have been converted into an R object through soe process  
on some unspecified input. You should describe that process, and hte  
only unambiguous method of doing so is by including the code.

> List of 1
> $ :Classes 'PlainTextDocument', 'TextDocument', 'character'  atomic
> [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed  
> Lessons For
> Governance From Campaigner-in-chief: President obama jumps  campaign  
> 09
> tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
> 12210;10:47:37;20;10;2009;David_Stringer;William Hague heading   
> Washington
> meets  Gen. Jim Jones, Sen. John McCain  others. Will Obama team raise
> worries  EU ties?;London, England;United Kingdom;Greater
> London;Westminster;;51.5001524;-0.1262362
> 12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses  
> wearing
> thin  Obama, media pals... http://tinyurl.com/yfw6cd9;So.
> California;USA;CA;;;36.778261;-119.4179324
> 12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama    
> Afghanistan
> troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama
> #video;USA;USA;;;;37.09024;-95.712891 ...
>  .. ..- attr(*, "Author")= chr(0)
>  .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31
> 04:46:56"
>  .. ..- attr(*, "Description")= chr(0)
>  .. ..- attr(*, "Heading")= chr(0)
>  .. ..- attr(*, "ID")= chr "1"
>  .. ..- attr(*, "Language")= chr "en"
>  .. ..- attr(*, "LocalMetaData")= list()
>  .. ..- attr(*, "Origin")= chr(0)
> - attr(*, "CMetaData")=List of 3
>  ..$ NodeID  : num 0
>  ..$ MetaData:List of 2
>  .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
>  .. ..$ creator    : Named chr ""
>  .. .. ..- attr(*, "names")= chr "LOGNAME"
>  ..$ Children: NULL
>  ..- attr(*, "class")= chr "MetaDataNode"
> - attr(*, "DMetaData")='data.frame':   1 obs. of  1 variable:
>  ..$ MetaID: num 0
> - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"
>
> It contains tweets but in many languages. The "columns" are  
> separated by
> semi-colons. I am using the tm package and it is a "corpus".
>
> It looks like this:

It is difficult to see any connection with what you have above.

>
> 547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1   day
> :p;Huddersfield/Lincoln;United
> Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
> 547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro  
> computador
> da facool? BOM DIA.;Belo Horizonte - MG -
> BR;Brazil;MG;;;-19.8157306;-43.9542226
> 547284;06:37:17;21;10;2009;romanotr;???, "?????????  
> ??? ??????" ????????????
> ?????? ????? ?? ???????? ?????, ?? 173  
> ?????? ?? 81 ????? ???????? ???????.
> ??????,??????...;Portugal Aveiro;Portugal;Aveiro;;; 
> 40.6411848;-8.6536169
> 547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton &lt\;Someone's
> Daughter&gt\;;Kanazawa, Japan;Japan;Ishikawa
> Prefecture;;;36.5613254;136.6562051
> Error: invalid input
> '547286;06:37:18;21;10;2009;Atogey;????????  
> ?????????????????? 
> ?????????????????????????? 
> ????????? ?????????RT
> @zuola ???????????? @wenyunc
>
> I want to convert it to "fields" or columns and so I thought I should
> convert it to a dataframe. I tried
>
>> twitterDF<-as.data.frame(twitter)
> Error in sort.list(y) :
>  invalid input
> '547286;06:37:18;21;10;2009;Atogey;????????  
> ?????????????????? 
> ?????????????????????????? 
> ????????? ?????????RT
> @zuola ???????????? @wenyunchao
> ?????????????????????????  
> ???????????????????????? ?? 
> ?????????????? ???????? ????????? 
> ?????????????????????????????  
> ?????????????????????????????? 
> ??????????????????????*?????????? 
> ??????????????? 
> ???????????????????????????? 
> ????????????????????????????? 
> ??;???????????????;China;Zhejiang;;; 
> 28.695035;119.751054'
> in 'utf8towcs'
>>
>
> Can anyone suggest what I can do?
>
> P.S. Actually, I would love to remove all the non-English tweets but  
> I have
> no clue about how to do that.
>
> -- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT
Hello. The "fields" are separated by a ';'. I think that the data is
"rectangular" in the sense that there are about 15 fields for each row. Some
of the fields are empty. In the dput() display below, it seems that the rows
are delimited by ' " ' .
Any idea from this?

Here is the end of the output for dput(twitter)

"4927861;05:04:14;28;10;2009;HOYTSTHEATRES;GameStop Brings  15K  Manage
Holiday Rush [Black Friday]
http://bit.ly/2d3OJg;Australia;Australia;;;;-25.274398;133.775136", 
"4927863;05:04:14;28;10;2009;padden;Rachel  master chef  cook 
anytime!;Sydney, Australia;Australia;NSW;;;-33.867139;151.207114", 
"4927878;05:04:17;28;10;2009;GSpotMagazine;The penalty  success   bored 
attentions  people  formerly snubbed you. -Mary Wilson Little
#quote;UK;United Kingdom;;;;55.378051;-3.435973", 
"4927885;05:04:20;28;10;2009;super_assassin;@triplejsr flight  conchords,
pleeeeeaaase :) thanks rosie
xx;Australia;Australia;;;;-25.274398;133.775136", 
"4927893;05:04:21;28;10;2009;SLMFE;Gestern:Achso,ja okey,um 5 nach las ich
jemanden komen der dir die Akupunkturnadel(zb 5!im Ohr!)entfernt..Um 10 n.
kommt immer noch keiner..;Germany;Germany;;;;51.165691;10.451526", 
"4927901;05:04:23;28;10;2009;mikesemple;HHS Secretary pushes health care
reform  rural America: By Christopher Smart The health-care crisis  ..
http://bit.ly/49Iqcu;London;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362", 
"4927913;05:04:26;28;10;2009;coax_k;Facebook Headquarters  Studio O+A: San
Francisco based interior design firm Studio O+A  designed  ..
http://bit.ly/hdqWp;Sydney;Australia;NSW;;;-33.867139;151.207114"
), Author = character(0), DateTimeStamp = structure(list(sec =
56.4049999713898, 
    min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L, 
    wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", "min", 
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXt", 
"POSIXlt"), tzone = "GMT"), Description = character(0), Heading =
character(0), ID = "1", Language = "en", LocalMetaData = list(), Origin =
character(0), class = c("PlainTextDocument", 
"TextDocument", "character"))), CMetaData = structure(list(NodeID = 0, 
    MetaData = structure(list(create_date = structure(list(sec =
56.4059998989105, 
        min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L, 
        wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", 
    "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
    ), class = c("POSIXt", "POSIXlt"), tzone = "GMT"), creator =
structure("", .Names = "LOGNAME")), .Names = c("create_date", 
    "creator")), Children = NULL), .Names = c("NodeID", "MetaData", 
"Children"), class = "MetaDataNode"), DMetaData = structure(list(
    MetaID = 0), .Names = "MetaID", row.names = c(NA, -1L), class =
"data.frame"), class = c("VCorpus", 
"Corpus", "list"))
Hi. I have a huge list called twitter:

dim(twitter)
NULL
str(twitter)
List of 1
 $ :Classes 'PlainTextDocument', 'TextDocument', 'character'  atomic
[1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons
For Governance From Campaigner-in-chief: President obama jumps  campaign
09  tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
12210;10:47:37;20;10;2009;David_Stringer;William Hague heading  Washington 
meets  Gen. Jim Jones, Sen. John McCain  others. Will Obama team raise
worries  EU ties?;London, England;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362
12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses
wearing thin  Obama, media pals... http://tinyurl.com/yfw6cd9;So.
California;USA;CA;;;36.778261;-119.4179324
12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama   Afghanistan
troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama
#video;USA;USA;;;;37.09024;-95.712891 ...
  .. ..- attr(*, "Author")= chr(0) 
  .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31
04:46:56"
  .. ..- attr(*, "Description")= chr(0) 
  .. ..- attr(*, "Heading")= chr(0) 
  .. ..- attr(*, "ID")= chr "1"
  .. ..- attr(*, "Language")= chr "en"
  .. ..- attr(*, "LocalMetaData")= list()
  .. ..- attr(*, "Origin")= chr(0) 
 - attr(*, "CMetaData")=List of 3
  ..$ NodeID  : num 0
  ..$ MetaData:List of 2
  .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
  .. ..$ creator    : Named chr ""
  .. .. ..- attr(*, "names")= chr "LOGNAME"
  ..$ Children: NULL
  ..- attr(*, "class")= chr "MetaDataNode"
 - attr(*, "DMetaData")='data.frame':   1 obs. of  1 variable:
  ..$ MetaID: num 0
 - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"

It contains tweets but in many languages. The "columns" are separated by
semi-colons. I am using the tm package and it is a "corpus".

It looks like this:

547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1   day
:p;Huddersfield/Lincoln;United
Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro
computador da facool? BOM DIA.;Belo Horizonte - MG -
BR;Brazil;MG;;;-19.8157306;-43.9542226
547284;06:37:17;21;10;2009;romanotr;???, "????????? ??? ??????"
???????????? ?????? ????? ?? ???????? ?????, ?? 173 ?????? ?? 81 ?????
???????? ???????. ??????,??????...;Portugal
Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169
547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton &lt\;Someone's
Daughter&gt\;;Kanazawa, Japan;Japan;Ishikawa
Prefecture;;;36.5613254;136.6562051
Error: invalid input
'547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT
@zuola ???????????? @wenyunc

I want to convert it to "fields" or columns and so I thought I should
convert it to a dataframe. I tried

twitterDF<-as.data.frame(twitter)
Error in sort.list(y) : 
  invalid input
'547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT
@zuola ???????????? @wenyunchao
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????*????????????????????????????????????????????????????????????????????????????????????;???????????????;China;Zhejiang;;;28.695035;119.751054'
in 'utf8towcs'

Can anyone suggest what I can do? 

P.S. Actually, I would love to remove all the non-English tweets but I
have no clue about how to do that.

View this message in context: http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148893.html
Sent from the R help mailing list archive at Nabble.com.
Hi. I have a huge list called twitter:
It's a list, but more importantly it's a VCorpus and a Corpus.  You 
should use the functions appropriate to those classes to extract the 
strings making up the data, declare their encoding properly (or convert 
them to your native encoding), then use read.delim() on a textConnection 
to read them in.

Duncan Murdoch

dim(twitter)
NULL
str(twitter)
List of 1
 $ :Classes 'PlainTextDocument', 'TextDocument', 'character'  atomic
[1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons For
Governance From Campaigner-in-chief: President obama jumps  campaign 09 
tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
12210;10:47:37;20;10;2009;David_Stringer;William Hague heading  Washington 
meets  Gen. Jim Jones, Sen. John McCain  others. Will Obama team raise
worries  EU ties?;London, England;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362
12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses wearing
thin  Obama, media pals... http://tinyurl.com/yfw6cd9;So.
California;USA;CA;;;36.778261;-119.4179324
12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama   Afghanistan
troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama
#video;USA;USA;;;;37.09024;-95.712891 ...
  .. ..- attr(*, "Author")= chr(0) 
  .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31
04:46:56"
  .. ..- attr(*, "Description")= chr(0) 
  .. ..- attr(*, "Heading")= chr(0) 
  .. ..- attr(*, "ID")= chr "1"
  .. ..- attr(*, "Language")= chr "en"
  .. ..- attr(*, "LocalMetaData")= list()
  .. ..- attr(*, "Origin")= chr(0) 
 - attr(*, "CMetaData")=List of 3
  ..$ NodeID  : num 0
  ..$ MetaData:List of 2
  .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
  .. ..$ creator    : Named chr ""
  .. .. ..- attr(*, "names")= chr "LOGNAME"
  ..$ Children: NULL
  ..- attr(*, "class")= chr "MetaDataNode"
 - attr(*, "DMetaData")='data.frame':   1 obs. of  1 variable:
  ..$ MetaID: num 0
 - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"

It contains tweets but in many languages. The "columns" are separated by
semi-colons. I am using the tm package and it is a "corpus".

It looks like this:

547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1   day
:p;Huddersfield/Lincoln;United
Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro computador
da facool? BOM DIA.;Belo Horizonte - MG -
BR;Brazil;MG;;;-19.8157306;-43.9542226
547284;06:37:17;21;10;2009;romanotr;???, "????????? ??? ??????" ????????????
?????? ????? ?? ???????? ?????, ?? 173 ?????? ?? 81 ????? ???????? ???????.
??????,??????...;Portugal Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169
547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton &lt\;Someone's
Daughter&gt\;;Kanazawa, Japan;Japan;Ishikawa
Prefecture;;;36.5613254;136.6562051
Error: invalid input
'547286;06:37:18;21;10;2009;Atogey;???????? ????????????????????????????????????????????????????? ?????????RT
@zuola ???????????? @wenyunc

I want to convert it to "fields" or columns and so I thought I should
convert it to a dataframe. I tried

twitterDF<-as.data.frame(twitter)
Error in sort.list(y) : 
  invalid input
'547286;06:37:18;21;10;2009;Atogey;???????? ????????????????????????????????????????????????????? ?????????RT
@zuola ???????????? @wenyunchao
?????????????????????????????????????????????????? ???????????????? ????????????????????????????????????????????????????????????????????????????????????????????????????*????????????????????????????????????????????????????????????????????????????????????;???????????????;China;Zhejiang;;;28.695035;119.751054'
in 'utf8towcs'

Can anyone suggest what I can do? 

P.S. Actually, I would love to remove all the non-English tweets but I have
no clue about how to do that.

Hello. The "fields" are separated by a ';'. I think that the data is
"rectangular" in the sense that there are about 15 fields for each  
row.
There either are 15 fields or there aren't. You can't make a dataframe  
with an approximate number of fields. In the fragment below there  
appear to be 14 fields. Try:

twitfrag <-  
strsplit(c("4927861;05:04:14;28;10;2009;HOYTSTHEATRES;GameStop Brings   
15K  Manage
Holiday Rush [Black Friday] http://bit.ly/2d3OJg;Australia;Australia;;;;-25.274398;133.775136 
",
"4927863;05:04:14;28;10;2009;padden;Rachel  master chef  cook  
anytime!;Sydney, Australia;Australia;NSW;;;-33.867139;151.207114",
"4927878;05:04:17;28;10;2009;GSpotMagazine;The penalty  success   bored
attentions  people  formerly snubbed you. -Mary Wilson Little  
#quote;UK;United Kingdom;;;;55.378051;-3.435973",
"4927885;05:04:20;28;10;2009;super_assassin;@triplejsr flight   
conchords,
pleeeeeaaase :) thanks rosie  
xx;Australia;Australia;;;;-25.274398;133.775136",
"4927893;05:04:21;28;10;2009;SLMFE;Gestern:Achso,ja okey,um 5 nach las  
ich
jemanden komen der dir die Akupunkturnadel(zb 5!im Ohr!)entfernt..Um  
10 n.
kommt immer noch keiner..;Germany;Germany;;;;51.165691;10.451526",
"4927901;05:04:23;28;10;2009;mikesemple;HHS Secretary pushes health care
reform  rural America: By Christopher Smart The health-care crisis  ..
http://bit.ly/49Iqcu;London;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362",
"4927913;05:04:26;28;10;2009;coax_k;Facebook Headquarters  Studio O+A:  
San
Francisco based interior design firm Studio O+A  designed  ..
http://bit.ly/hdqWp;Sydney;Australia;NSW;;;-33.867139;151.207114"
), ";")
twitfrag

I think you will see some patterns emerging.
Some
of the fields are empty. In the dput() display below, it seems that  
the rows
are delimited by ' " ' .
Any idea from this?
They are strings (in our aRgot, objects of type character.) That is an  
effect of whatever processing you have done with components of the tm  
package, the entirety of which you are failing to share with us.
Here is the end of the output for dput(twitter)
The whole point of using dput is to create a complete representation  
of an object.
"4927861;05:04:14;28;10;2009;HOYTSTHEATRES;GameStop Brings  15K   
Manage
Holiday Rush [Black Friday]
http://bit.ly/2d3OJg;Australia;Australia;;;;-25.274398;133.775136",
"4927863;05:04:14;28;10;2009;padden;Rachel  master chef  cook
anytime!;Sydney, Australia;Australia;NSW;;;-33.867139;151.207114",
"4927878;05:04:17;28;10;2009;GSpotMagazine;The penalty  success    
bored
attentions  people  formerly snubbed you. -Mary Wilson Little
#quote;UK;United Kingdom;;;;55.378051;-3.435973",
"4927885;05:04:20;28;10;2009;super_assassin;@triplejsr flight   
conchords,
pleeeeeaaase :) thanks rosie
xx;Australia;Australia;;;;-25.274398;133.775136",
"4927893;05:04:21;28;10;2009;SLMFE;Gestern:Achso,ja okey,um 5 nach  
las ich
jemanden komen der dir die Akupunkturnadel(zb 5!im Ohr!)entfernt..Um  
10 n.
kommt immer noch keiner..;Germany;Germany;;;;51.165691;10.451526",
"4927901;05:04:23;28;10;2009;mikesemple;HHS Secretary pushes health  
care
reform  rural America: By Christopher Smart The health-care crisis  ..
http://bit.ly/49Iqcu;London;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362",
"4927913;05:04:26;28;10;2009;coax_k;Facebook Headquarters  Studio O 
+A: San
Francisco based interior design firm Studio O+A  designed  ..
http://bit.ly/hdqWp;Sydney;Australia;NSW;;;-33.867139;151.207114"
), Author = character(0), DateTimeStamp = structure(list(sec =
56.4049999713898,
  min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L,
  wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", "min",
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), class =  
c("POSIXt",
"POSIXlt"), tzone = "GMT"), Description = character(0), Heading =
character(0), ID = "1", Language = "en", LocalMetaData = list(),  
Origin =
character(0), class = c("PlainTextDocument",
"TextDocument", "character"))), CMetaData = structure(list(NodeID = 0,
  MetaData = structure(list(create_date = structure(list(sec =
56.4059998989105,
      min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L,
      wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec",
  "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
  ), class = c("POSIXt", "POSIXlt"), tzone = "GMT"), creator =
structure("", .Names = "LOGNAME")), .Names = c("create_date",
  "creator")), Children = NULL), .Names = c("NodeID", "MetaData",
"Children"), class = "MetaDataNode"), DMetaData = structure(list(
  MetaID = 0), .Names = "MetaID", row.names = c(NA, -1L), class =
"data.frame"), class = c("VCorpus",
"Corpus", "list"))

onyourmark wrote:
Hi. I have a huge list called twitter:

dim(twitter)
NULL
str(twitter)
List of 1
$ :Classes 'PlainTextDocument', 'TextDocument', 'character'  atomic
[1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed  
Lessons
For Governance From Campaigner-in-chief: President obama jumps   
campaign
09  tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
12210;10:47:37;20;10;2009;David_Stringer;William Hague heading   
Washington
meets  Gen. Jim Jones, Sen. John McCain  others. Will Obama team  
raise
worries  EU ties?;London, England;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362
12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses
wearing thin  Obama, media pals... http://tinyurl.com/yfw6cd9;So.
California;USA;CA;;;36.778261;-119.4179324
12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama    
Afghanistan
troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8  
#obama
#video;USA;USA;;;;37.09024;-95.712891 ...
.. ..- attr(*, "Author")= chr(0)
.. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31
04:46:56"
.. ..- attr(*, "Description")= chr(0)
.. ..- attr(*, "Heading")= chr(0)
.. ..- attr(*, "ID")= chr "1"
.. ..- attr(*, "Language")= chr "en"
.. ..- attr(*, "LocalMetaData")= list()
.. ..- attr(*, "Origin")= chr(0)
- attr(*, "CMetaData")=List of 3
..$ NodeID  : num 0
..$ MetaData:List of 2
.. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
.. ..$ creator    : Named chr ""
.. .. ..- attr(*, "names")= chr "LOGNAME"
..$ Children: NULL
..- attr(*, "class")= chr "MetaDataNode"
- attr(*, "DMetaData")='data.frame':   1 obs. of  1 variable:
..$ MetaID: num 0
- attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"

It contains tweets but in many languages. The "columns" are  
separated by
semi-colons. I am using the tm package and it is a "corpus".

It looks like this:

547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1   day
:p;Huddersfield/Lincoln;United
Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro
computador da facool? BOM DIA.;Belo Horizonte - MG -
BR;Brazil;MG;;;-19.8157306;-43.9542226
547284;06:37:17;21;10;2009;romanotr;???, "?????????  
??? ??????"
???????????? ?????? ????? ??  
???????? ?????, ?? 173 ?????? ?? 81 ?????
???????? ???????.  
??????,??????...;Portugal
Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169
547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton &lt\;Someone's
Daughter&gt\;;Kanazawa, Japan;Japan;Ishikawa
Prefecture;;;36.5613254;136.6562051
Error: invalid input
'547286;06:37:18;21;10;2009;Atogey;????????  
?????????????????? 
?????????????????????????? 
????????? ?????????RT
@zuola ???????????? @wenyunc

I want to convert it to "fields" or columns and so I thought I should
convert it to a dataframe. I tried

twitterDF<-as.data.frame(twitter)
Error in sort.list(y) :
invalid input
'547286;06:37:18;21;10;2009;Atogey;????????  
?????????????????? 
?????????????????????????? 
????????? ?????????RT
@zuola ???????????? @wenyunchao
?????????????????????????  
???????????????????????? ?? 
?????????????? ???????? ????????? 
?????????????????????????????  
?????????????????????????????? 
??????????????????????*?????????? 
??????????????? 
???????????????????????????? 
????????????????????????????? 
??;???????????????;China;Zhejiang;;; 
28.695035;119.751054'
in 'utf8towcs'

Can anyone suggest what I can do?

P.S. Actually, I would love to remove all the non-English tweets  
but I
have no clue about how to do that.

-- 
View this message in context: http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148893.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
I did this on the source files which were semi-colon delimted (to delimit the
fields, I am not sure what character denotes the new tweet)

After loading the tm package
txt <- system.file("texts", "txt", package = "tm")
(twitter <- Corpus(DirSource(txt),
+ readerControl = list(language = "lat")))

then

twitter <- tm_map(twitter, removeWords, stopwords("english"))

That last command took about an hour to complete.
Hi. I have a huge list called twitter:

dim(twitter)
NULL
str(twitter)
List of 1
 $ :Classes 'PlainTextDocument', 'TextDocument', 'character'  atomic
[1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons
For Governance From Campaigner-in-chief: President obama jumps  campaign
09  tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
12210;10:47:37;20;10;2009;David_Stringer;William Hague heading  Washington 
meets  Gen. Jim Jones, Sen. John McCain  others. Will Obama team raise
worries  EU ties?;London, England;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362
12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses
wearing thin  Obama, media pals... http://tinyurl.com/yfw6cd9;So.
California;USA;CA;;;36.778261;-119.4179324
12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama   Afghanistan
troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama
#video;USA;USA;;;;37.09024;-95.712891 ...
  .. ..- attr(*, "Author")= chr(0) 
  .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31
04:46:56"
  .. ..- attr(*, "Description")= chr(0) 
  .. ..- attr(*, "Heading")= chr(0) 
  .. ..- attr(*, "ID")= chr "1"
  .. ..- attr(*, "Language")= chr "en"
  .. ..- attr(*, "LocalMetaData")= list()
  .. ..- attr(*, "Origin")= chr(0) 
 - attr(*, "CMetaData")=List of 3
  ..$ NodeID  : num 0
  ..$ MetaData:List of 2
  .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
  .. ..$ creator    : Named chr ""
  .. .. ..- attr(*, "names")= chr "LOGNAME"
  ..$ Children: NULL
  ..- attr(*, "class")= chr "MetaDataNode"
 - attr(*, "DMetaData")='data.frame':   1 obs. of  1 variable:
  ..$ MetaID: num 0
 - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"

It contains tweets but in many languages. The "columns" are separated by
semi-colons. I am using the tm package and it is a "corpus".

It looks like this:

547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1   day
:p;Huddersfield/Lincoln;United
Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro
computador da facool? BOM DIA.;Belo Horizonte - MG -
BR;Brazil;MG;;;-19.8157306;-43.9542226
547284;06:37:17;21;10;2009;romanotr;???, "????????? ??? ??????"
???????????? ?????? ????? ?? ???????? ?????, ?? 173 ?????? ?? 81 ?????
???????? ???????. ??????,??????...;Portugal
Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169
547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton &lt\;Someone's
Daughter&gt\;;Kanazawa, Japan;Japan;Ishikawa
Prefecture;;;36.5613254;136.6562051
Error: invalid input
'547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT
@zuola ???????????? @wenyunc

I want to convert it to "fields" or columns and so I thought I should
convert it to a dataframe. I tried

twitterDF<-as.data.frame(twitter)
Error in sort.list(y) : 
  invalid input
'547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT
@zuola ???????????? @wenyunchao
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????*????????????????????????????????????????????????????????????????????????????????????;???????????????;China;Zhejiang;;;28.695035;119.751054'
in 'utf8towcs'

Can anyone suggest what I can do? 

P.S. Actually, I would love to remove all the non-English tweets but I
have no clue about how to do that.

View this message in context: http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148898.html
Sent from the R help mailing list archive at Nabble.com.