Skip to content
Back to formatted view

Raw Message

Message-ID: <26148893.post@talk.nabble.com>
Date: 2009-11-01T13:24:54Z
From: onyourmark
Subject: convert list to Dataframe
In-Reply-To: <26148889.post@talk.nabble.com>

Hello. The "fields" are separated by a ';'. I think that the data is
"rectangular" in the sense that there are about 15 fields for each row. Some
of the fields are empty. In the dput() display below, it seems that the rows
are delimited by ' " ' .
Any idea from this?

Here is the end of the output for dput(twitter)

"4927861;05:04:14;28;10;2009;HOYTSTHEATRES;GameStop Brings  15K  Manage
Holiday Rush [Black Friday]
http://bit.ly/2d3OJg;Australia;Australia;;;;-25.274398;133.775136", 
"4927863;05:04:14;28;10;2009;padden;Rachel  master chef  cook 
anytime!;Sydney, Australia;Australia;NSW;;;-33.867139;151.207114", 
"4927878;05:04:17;28;10;2009;GSpotMagazine;The penalty  success   bored 
attentions  people  formerly snubbed you. -Mary Wilson Little
#quote;UK;United Kingdom;;;;55.378051;-3.435973", 
"4927885;05:04:20;28;10;2009;super_assassin;@triplejsr flight  conchords,
pleeeeeaaase :) thanks rosie
xx;Australia;Australia;;;;-25.274398;133.775136", 
"4927893;05:04:21;28;10;2009;SLMFE;Gestern:Achso,ja okey,um 5 nach las ich
jemanden komen der dir die Akupunkturnadel(zb 5!im Ohr!)entfernt..Um 10 n.
kommt immer noch keiner..;Germany;Germany;;;;51.165691;10.451526", 
"4927901;05:04:23;28;10;2009;mikesemple;HHS Secretary pushes health care
reform  rural America: By Christopher Smart The health-care crisis  ..
http://bit.ly/49Iqcu;London;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362", 
"4927913;05:04:26;28;10;2009;coax_k;Facebook Headquarters  Studio O+A: San
Francisco based interior design firm Studio O+A  designed  ..
http://bit.ly/hdqWp;Sydney;Australia;NSW;;;-33.867139;151.207114"
), Author = character(0), DateTimeStamp = structure(list(sec =
56.4049999713898, 
    min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L, 
    wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", "min", 
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXt", 
"POSIXlt"), tzone = "GMT"), Description = character(0), Heading =
character(0), ID = "1", Language = "en", LocalMetaData = list(), Origin =
character(0), class = c("PlainTextDocument", 
"TextDocument", "character"))), CMetaData = structure(list(NodeID = 0, 
    MetaData = structure(list(create_date = structure(list(sec =
56.4059998989105, 
        min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L, 
        wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", 
    "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
    ), class = c("POSIXt", "POSIXlt"), tzone = "GMT"), creator =
structure("", .Names = "LOGNAME")), .Names = c("create_date", 
    "creator")), Children = NULL), .Names = c("NodeID", "MetaData", 
"Children"), class = "MetaDataNode"), DMetaData = structure(list(
    MetaID = 0), .Names = "MetaID", row.names = c(NA, -1L), class =
"data.frame"), class = c("VCorpus", 
"Corpus", "list"))




onyourmark wrote:
> 
> Hi. I have a huge list called twitter:
> 
>> dim(twitter)
> NULL
>> str(twitter)
> List of 1
>  $ :Classes 'PlainTextDocument', 'TextDocument', 'character'  atomic
> [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons
> For Governance From Campaigner-in-chief: President obama jumps  campaign
> 09  tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
> 12210;10:47:37;20;10;2009;David_Stringer;William Hague heading  Washington 
> meets  Gen. Jim Jones, Sen. John McCain  others. Will Obama team raise
> worries  EU ties?;London, England;United Kingdom;Greater
> London;Westminster;;51.5001524;-0.1262362
> 12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses
> wearing thin  Obama, media pals... http://tinyurl.com/yfw6cd9;So.
> California;USA;CA;;;36.778261;-119.4179324
> 12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama   Afghanistan
> troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama
> #video;USA;USA;;;;37.09024;-95.712891 ...
>   .. ..- attr(*, "Author")= chr(0) 
>   .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31
> 04:46:56"
>   .. ..- attr(*, "Description")= chr(0) 
>   .. ..- attr(*, "Heading")= chr(0) 
>   .. ..- attr(*, "ID")= chr "1"
>   .. ..- attr(*, "Language")= chr "en"
>   .. ..- attr(*, "LocalMetaData")= list()
>   .. ..- attr(*, "Origin")= chr(0) 
>  - attr(*, "CMetaData")=List of 3
>   ..$ NodeID  : num 0
>   ..$ MetaData:List of 2
>   .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
>   .. ..$ creator    : Named chr ""
>   .. .. ..- attr(*, "names")= chr "LOGNAME"
>   ..$ Children: NULL
>   ..- attr(*, "class")= chr "MetaDataNode"
>  - attr(*, "DMetaData")='data.frame':   1 obs. of  1 variable:
>   ..$ MetaID: num 0
>  - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"
> 
> It contains tweets but in many languages. The "columns" are separated by
> semi-colons. I am using the tm package and it is a "corpus".
> 
> It looks like this:
> 
> 547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1   day
> :p;Huddersfield/Lincoln;United
> Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
> 547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro
> computador da facool? BOM DIA.;Belo Horizonte - MG -
> BR;Brazil;MG;;;-19.8157306;-43.9542226
> 547284;06:37:17;21;10;2009;romanotr;???, "????????? ??? ??????"
> ???????????? ?????? ????? ?? ???????? ?????, ?? 173 ?????? ?? 81 ?????
> ???????? ???????. ??????,??????...;Portugal
> Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169
> 547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton &lt\;Someone's
> Daughter&gt\;;Kanazawa, Japan;Japan;Ishikawa
> Prefecture;;;36.5613254;136.6562051
> Error: invalid input
> '547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT
> @zuola ???????????? @wenyunc
> 
> I want to convert it to "fields" or columns and so I thought I should
> convert it to a dataframe. I tried
> 
>> twitterDF<-as.data.frame(twitter)
> Error in sort.list(y) : 
>   invalid input
> '547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT
> @zuola ???????????? @wenyunchao
> ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????*????????????????????????????????????????????????????????????????????????????????????;???????????????;China;Zhejiang;;;28.695035;119.751054'
> in 'utf8towcs'
>> 
> 
> Can anyone suggest what I can do? 
> 
> P.S. Actually, I would love to remove all the non-English tweets but I
> have no clue about how to do that.
> 
> 

-- 
View this message in context: http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148893.html
Sent from the R help mailing list archive at Nabble.com.