merging corpora and metadata
Hi Henri-Paul, This can be rather tricky. It would really help if you could give us a reproducible example. In this case, because you are dealing with non standard data structures (or at least added attributes), the data exactly as R "sees" it. This means either A) code to create some data that demonstrates your problem or B) the output of calling dput(corpus.1) (see ?dput for what it does and what to do). One possibility (though it does not concatenate per se): combined <- list(corpus.1, corpus.2) *if* (there are only attributes in corpus.1 OR corpus.2) OR (the attribute names in corpus.1 and corpus.2 are unique), then you could do: combined <- c(corpus.1, corpus.2) attributes(combined) <- c(attributes(corpus.1), attributes(corpus.2) but note that it is *very* likely that at least the names attributes overlap, so you would need to address that somehow. If attributes overlap, you need to somehow merge them, and what is an appropriate way to do that, I have no idea without knowing more about the data and what is expected by functions that work with it. Best regards, Josh On Thu, Nov 17, 2011 at 1:43 PM, Henri-Paul Indiogine
<hindiogine at gmail.com> wrote:
Greetings! I loose all my metadata after concatenating corpora. This is an example of what happens:
meta(corpus.1)
? MetaID cid fid selfirst selend ? ? ? ? ? ? ? ? ? ? ? ? fname 1 ? ? ? 0 ? 1 ?11 ? ? 2169 ? 2518 ? ?WCPD-2001-01-29-Pg217.scrb 2 ? ? ? 0 ? 1 ?14 ? ? 9189 ? 9702 ? ? WCPD-2003-01-13-Pg39.scrb 3 ? ? ? 0 ? 1 ?14 ? ? 2109 ? 2577 ? ? WCPD-2003-01-13-Pg39.scrb .... .... 17 ? ? ?0 ? 1 114 ? ?17863 ?18256 ? ?WCPD-2007-04-30-Pg515.scrb
meta(corpus.2)
? MetaID cid fid selfirst selend ? ? ? ? ? ? ? ? ? ? ? ? fname 1 ? ? ? 0 ? 2 ? 2 ? ?11016 ?11600 ? ? ? ? ? DCPD-200900595.scrb 2 ? ? ? 0 ? 2 ? 6 ? ?19510 ?20098 ? ? ? ? ? DCPD-201000636.scrb 3 ? ? ? 0 ? 2 ? 6 ? ?23935 ?24573 ? ? ? ? ? DCPD-201000636.scrb .... .... 94 ? ? ?0 ? 2 127 ? ?16225 ?17128 ? WCPD-2009-01-12-Pg22-3.scrb
tot.corpus <- c(corpus.1, corpus.2) meta(tot.corpus)
? ?MetaID 1 ? ? ? ?0 2 ? ? ? ?0 3 ? ? ? ?0 .... .... 111 ? ? ?0
This is from the structure of corpus.1 ..$ MetaData:List of 2 ?.. ..$ create_date: POSIXlt[1:1], format: "2011-11-17 21:09:57" ?.. ..$ creator ? ?: chr "henk" ?..$ Children: NULL ?..- attr(*, "class")= chr "MetaDataNode" ?- attr(*, "DMetaData")='data.frame': ? 17 obs. of ?6 variables: ?..$ MetaID ?: num [1:17] 0 0 0 0 0 0 0 0 0 0 ... ?..$ cid ? ? : int [1:17] 1 1 1 1 1 1 1 1 1 1 ... ?..$ fid ? ? : int [1:17] 11 14 14 17 46 80 80 80 91 91 ... ?..$ selfirst: num [1:17] 2169 9189 2109 8315 9439 ... ?..$ selend ?: num [1:17] 2518 9702 2577 8881 10102 ... ?..$ fname ? : chr [1:17] "WCPD-2001-01-29-Pg217.scrb" "WCPD-2003-01-13-Pg39.scrb" "WCPD-2003-01-13-Pg39.scrb" "WCPD-2004-05-17-Pg856.scrb" ... ?- attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list" Any idea on what I could do to keep the metadata in the merged corpus? Thanks, Henri-Paul -- Henri-Paul Indiogine Curriculum & Instruction Texas A&M University TutorFind Learning Centre Email: hindiogine at gmail.com Skype: hindiogine Website: http://people.cehd.tamu.edu/~sindiogine
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/