convert list to Dataframe
Hello. The "fields" are separated by a ';'. I think that the data is "rectangular" in the sense that there are about 15 fields for each row. Some of the fields are empty. In the dput() display below, it seems that the rows are delimited by ' " ' . Any idea from this? Here is the end of the output for dput(twitter) "4927861;05:04:14;28;10;2009;HOYTSTHEATRES;GameStop Brings 15K Manage Holiday Rush [Black Friday] http://bit.ly/2d3OJg;Australia;Australia;;;;-25.274398;133.775136", "4927863;05:04:14;28;10;2009;padden;Rachel master chef cook anytime!;Sydney, Australia;Australia;NSW;;;-33.867139;151.207114", "4927878;05:04:17;28;10;2009;GSpotMagazine;The penalty success bored attentions people formerly snubbed you. -Mary Wilson Little #quote;UK;United Kingdom;;;;55.378051;-3.435973", "4927885;05:04:20;28;10;2009;super_assassin;@triplejsr flight conchords, pleeeeeaaase :) thanks rosie xx;Australia;Australia;;;;-25.274398;133.775136", "4927893;05:04:21;28;10;2009;SLMFE;Gestern:Achso,ja okey,um 5 nach las ich jemanden komen der dir die Akupunkturnadel(zb 5!im Ohr!)entfernt..Um 10 n. kommt immer noch keiner..;Germany;Germany;;;;51.165691;10.451526", "4927901;05:04:23;28;10;2009;mikesemple;HHS Secretary pushes health care reform rural America: By Christopher Smart The health-care crisis .. http://bit.ly/49Iqcu;London;United Kingdom;Greater London;Westminster;;51.5001524;-0.1262362", "4927913;05:04:26;28;10;2009;coax_k;Facebook Headquarters Studio O+A: San Francisco based interior design firm Studio O+A designed .. http://bit.ly/hdqWp;Sydney;Australia;NSW;;;-33.867139;151.207114" ), Author = character(0), DateTimeStamp = structure(list(sec = 56.4049999713898, min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L, wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXt", "POSIXlt"), tzone = "GMT"), Description = character(0), Heading = character(0), ID = "1", Language = "en", LocalMetaData = list(), Origin = character(0), class = c("PlainTextDocument", "TextDocument", "character"))), CMetaData = structure(list(NodeID = 0, MetaData = structure(list(create_date = structure(list(sec = 56.4059998989105, min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L, wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst" ), class = c("POSIXt", "POSIXlt"), tzone = "GMT"), creator = structure("", .Names = "LOGNAME")), .Names = c("create_date", "creator")), Children = NULL), .Names = c("NodeID", "MetaData", "Children"), class = "MetaDataNode"), DMetaData = structure(list( MetaID = 0), .Names = "MetaID", row.names = c(NA, -1L), class = "data.frame"), class = c("VCorpus", "Corpus", "list"))
onyourmark wrote:
Hi. I have a huge list called twitter:
dim(twitter)
NULL
str(twitter)
List of 1 $ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons For Governance From Campaigner-in-chief: President obama jumps campaign 09 tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535 12210;10:47:37;20;10;2009;David_Stringer;William Hague heading Washington meets Gen. Jim Jones, Sen. John McCain others. Will Obama team raise worries EU ties?;London, England;United Kingdom;Greater London;Westminster;;51.5001524;-0.1262362 12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses wearing thin Obama, media pals... http://tinyurl.com/yfw6cd9;So. California;USA;CA;;;36.778261;-119.4179324 12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama Afghanistan troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama #video;USA;USA;;;;37.09024;-95.712891 ... .. ..- attr(*, "Author")= chr(0) .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31 04:46:56" .. ..- attr(*, "Description")= chr(0) .. ..- attr(*, "Heading")= chr(0) .. ..- attr(*, "ID")= chr "1" .. ..- attr(*, "Language")= chr "en" .. ..- attr(*, "LocalMetaData")= list() .. ..- attr(*, "Origin")= chr(0) - attr(*, "CMetaData")=List of 3 ..$ NodeID : num 0 ..$ MetaData:List of 2 .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56" .. ..$ creator : Named chr "" .. .. ..- attr(*, "names")= chr "LOGNAME" ..$ Children: NULL ..- attr(*, "class")= chr "MetaDataNode" - attr(*, "DMetaData")='data.frame': 1 obs. of 1 variable: ..$ MetaID: num 0 - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list" It contains tweets but in many languages. The "columns" are separated by semi-colons. I am using the tm package and it is a "corpus". It looks like this: 547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1 day :p;Huddersfield/Lincoln;United Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296 547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro computador da facool? BOM DIA.;Belo Horizonte - MG - BR;Brazil;MG;;;-19.8157306;-43.9542226 547284;06:37:17;21;10;2009;romanotr;???, "????????? ??? ??????" ???????????? ?????? ????? ?? ???????? ?????, ?? 173 ?????? ?? 81 ????? ???????? ???????. ??????,??????...;Portugal Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169 547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton <\;Someone's Daughter>\;;Kanazawa, Japan;Japan;Ishikawa Prefecture;;;36.5613254;136.6562051 Error: invalid input '547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT @zuola ???????????? @wenyunc I want to convert it to "fields" or columns and so I thought I should convert it to a dataframe. I tried
twitterDF<-as.data.frame(twitter)
Error in sort.list(y) : invalid input '547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT @zuola ???????????? @wenyunchao ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????*????????????????????????????????????????????????????????????????????????????????????;???????????????;China;Zhejiang;;;28.695035;119.751054' in 'utf8towcs'
Can anyone suggest what I can do? P.S. Actually, I would love to remove all the non-English tweets but I have no clue about how to do that.
View this message in context: http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148893.html Sent from the R help mailing list archive at Nabble.com.