Skip to content

Parsing "back" to API strcuture

4 messages · Eric Fail, jim holtman

#
Dear R experts,

I'm reading data from an online database via API and it gets delivered in this messy comma separated structure,
I have this script that nicely parses it into a data frame,
sep = ",", na.strings = "", stringsAsFactors = FALSE))
I then do some calculations and write them to pushed_text and pushed_calc whereafter I need to format the data back to the messy comma separated structure it came in.

I imagine something like this,
Some command that can format my data from the data frame I made, df, back to the structure that the raw API-object came in, RAW.API.

Any help would be appreciated.

Thanks for reading.

Eric
#
This is close, but it does quote the header names, but does produce
the same dataframe when read back in:
id     event_arm name        dob pushed_text pushed_calc complete
1  1 event_1_arm_1 John 1979-05-01                      NA        2
2  1 event_2_arm_1 John 2012-09-02         abc         123        1
3  1 event_3_arm_1 John 2012-09-10                      NA        2
4  2 event_1_arm_1 Mary 1951-09-10         def         456        2
5  2 event_2_arm_1 Mary 1978-09-12                      NA        2
id     event_arm name        dob pushed_text pushed_calc complete
1  1 event_1_arm_1 John 1979-05-01                      NA        2
2  1 event_2_arm_1 John 2012-09-02         abc         123        1
3  1 event_3_arm_1 John 2012-09-10                      NA        2
4  2 event_1_arm_1 Mary 1951-09-10         def         456        2
5  2 event_2_arm_1 Mary 1978-09-12                      NA        2

        
On Wed, Sep 12, 2012 at 8:21 PM, Eric Fail <eric.fail at gmx.us> wrote:

  
    
#
Dear Jim,

Thank you for your response I appreciate your effort!

It is close, I must admit that. What I am looking for is an object
that is identical to 'RAW.API,' or at least in the stricture (I guess
i do not need the ","`Content-Type`" = structure(c("text/html",
"utf-8"), .Names = c("",
"charset")))" part.

When I investigate 'x.out' it also have the NA's. I've tried to fix
it, but I had to give up. It is strange because getting there seems so
easy (warning false logic!).

Here is what I got on my looong and alternative route in the hope that
someone on the list might be able to help

RAW.API <- structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n",
"`Content-Type`" = structure(c("text/html", "utf-8"), .Names =
c("","charset")))

# I used an alternative way of converting it to a dataset to keep the
leading 0 in the id variables
x <- read.table(file = textConnection(RAW.API ), header = TRUE, sep =
",", na.strings = "", stringsAsFactors = FALSE, colClasses ="character")
x

 # now put it back into the same string; write.csv does quote alphanumerics
write.csv(x, textConnection('output', 'w'), row.names = FALSE)
unlockBinding("output", env = .GlobalEnv)
# fixes the problem with the header
output[1] <- gsub("\\\"", "", output[1])
# removes NAs
output <- gsub("NA", "\"\"", output)
# removes "\ at the beginning of each line
output <- gsub("^\\\"", "", output)
# removes an " at the end of each line
output <- gsub("\\\"$", "", output)
# same as before
x.out <- paste(output, collapse = '\n\"')
# adds an line break at the end
x.out <- gsub("$", "\n", x.out)

# so much manual gsub ...

Any help would be very much appreciated.
On Wed, Sep 12, 2012 at 5:54 PM, jim holtman <jholtman at gmail.com> wrote:
4 days later
#
Problem solved by Josh O'Brien on stackoverflow,
http://stackoverflow.com/questions/12393004/parsing-back-to-messy-api-strcuture/12435389#12435389

some_magic <- function(df) {
    ## Replace NA with "", converting column types as needed
    df[] <- lapply(df, function(X) {
                if(any(is.na(X))) {X[is.na(X)] <- ""; X} else {X}
            })

    ## Print integers in first column as 2-digit character strings
    ## (DO NOTE: Hardwiring the number of printed digits here is probably
    ## inadvisable, though needed to _exactly_ reconstitute RAW.API.)
    df[[1]] <- sprintf("%02.0f", df[[1]])

    ## Separately build header and table body, then suture them together
    l1 <- paste(names(df), collapse=",")
    l2 <- capture.output(write.table(df, sep=",", col.names=FALSE,
                                     row.names=FALSE))
    out <- paste0(c(l1, l2, ""), collapse="\n")

    ## Reattach attributes
    att <- list("`Content-Type`" = structure(c("text/html", "utf-8"),
                .Names = c("", "charset")))
    attributes(out) <- att
    out
}

identical(some_magic(df), RAW.API)
# [1] TRUE
On Thu, Sep 13, 2012 at 11:32 AM, Eric Fail <eric.fail at gmx.us> wrote: