Mayukh,
I apologize for taking so long to get back to your problem. I expect you may have found the solution. If so I would be interested. I have developed a hack to solve the problem, but I expect if someone knew how to handle JSON objects or even text parsing better they could develop a more elegant solution.
As I understand the problem, your text file has more than one JSON object in text form. There are three. The first two are very similar and the last is a trailer indication what was done, when it was done and the number of JSON objects sent. The problem is that fromJSON() only pulls off the first of the JSON objects.
I have defined three helper functions to separate the JSON objects, read them in, and store them in a list.
library(RJSONIO)
library(stringi, quietly = TRUE)
#library(jsonlite) # also works
#' Returns dataframe with ordered locations of the matching braces.
#'
#' There is almost certainly a better function to do this.
#' @param txt character vector of length one having 0 or more matching braces.
#' @import stringi
#' @examples
#' library(rmsutilityr)
#' match_braces("{123{456{78}9}10}")
#' @export
match_braces <- function(txt) {
txt <- txt[1] # just in the case of having more than one element
left <- stri_locate_all_regex(txt, "\\{")[[1]][ , 1]
right <- stri_locate_all_regex(txt, "\\}")[[1]][ , 2]
len <- length(left)
braces <- data.frame(left = rep(0, len), right = rep(0, len))
for (i in seq_along(right)) {
for (j in rev(seq_along(left))) {
if (left[j] < right[i] & left[j] != 0) {
braces$left[i] <- left[j]
braces$right[i] <- right[i]
left[j] <- 0
break
}
}
}
braces[order(braces$left), ]
}
#' Returns a list containing two objects in the text of a character vector
#' of length one: (1) object = the first json object found and (2) remainder =
#' the remaining text.
#'
#' Properly formed messages are assumed. Error checking is non-existent.
#' @param json_txt character vector of length one having one or more JSON
#' objects in character form.
#' @import stringi
#' @export
get_first_json_message <- function(json_txt) {
len <- stri_length(json_txt)
braces <- match_braces(json_txt)
if (braces$right[1] + 1 > len) {
remainder <- ""
} else {
remainder <- stri_trim_both(stri_sub(json_txt, braces$right[1] + 1))
}
list(object = stri_sub(json_txt, braces$left[1], to = braces$right[1]),
remainder = remainder)
}
#' Returns list of lists made by call to fromJSON()
#' @param json_txt character vector of length 1 having one or more
#' JSON objects in text form.
#' @import stringi
#' @export
get_json_list <- function (json_txt) {
t_json_txt <- json_txt
i <- 0
json_list <- list()
repeat{
i <- i + 1
message_remainder <- get_first_json_message(t_json_txt)
json_list[i] <- list(fromJSON(message_remainder$object))
if (message_remainder$remainder == "")
break
t_json_txt <- message_remainder$remainder
}
json_list
}
json_file <- "../data/json_file.txt"
json_txt <- stri_trim_both(stri_c(readLines(json_file), collapse = " "))
json_list <- get_json_list(json_txt)
length(json_list)
R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center
Texas Biomedical Research Institute
P.O. Box 760549
San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msharp at TxBiomed.org
On Jul 27, 2015, at 5:16 PM, Mark Sharp <msharp at TxBiomed.org> wrote: Mayukh, I think you are missing an argument to paste() and a right parenthesis character. Try json_data <- fromJSON(paste(readLines(json_file), collapse = " ")) Mark R. Mark Sharp, Ph.D. msharp at TxBiomed.org
On Jul 27, 2015, at 3:41 PM, Mayukh Dass <mayukh.dass at gmail.com> wrote:
Hello,
I am trying to read a set of json files containing tweets using the
following code:
json_data <- fromJSON(paste(readLines(json_file))
Unfortunately, it only reads the first record on the file. For example, in
the file below, it only reads the first record starting with "id":"tag:
search.twitter.com,2005:3318539389". What is the best way to retrieve these
records? I have 20 such json files with varying number of tweets in it.
Thank you in advance.
Best,
Mayukh
{"id":"tag:search.twitter.com
,2005:3318539389","objectType":"activity","actor":{"objectType":"person","id":"id:
twitter.com:2859421","link":"http://www.twitter.com/meetjenn","displayName":"Jenn","postedTime":"2007-01-29T17:06:00.000Z","image":"06-19-07_2010.jpg","summary":"I
say 'like' a lot. I fall down a lot. I walk into everything. Love Pgh Pens,
NE Pats, Fundraising, Dogs & History. Craft Beer & Running
Novice.","links":[{"href":"http://meetjenn.tumblr.com","rel":"me"}],"friendsCount":0,"followersCount":0,"listedCount":0,"statusesCount":0,"twitterTimeZone":"Eastern
Time (US &
Canada)","verified":false,"utcOffset":"0","preferredUsername":"meetjenn","languages":["en"],"location":{"objectType":"place","displayName":"Pgh/Philajersey"},"favoritesCount":0},"verb":"post","postedTime":"2009-08-15T00:00:12.000Z","generator":{"displayName":"tweetdeck","link":"
http://twitter.com
"},"provider":{"objectType":"service","displayName":"Twitter","link":"
http://www.twitter.com"},"link":"
http://twitter.com/meetjenn/statuses/3318539389","body":"Cool story about
the man who created the @Starbucks logo. Additional link at the bottom on
how it came to be: http://bit.ly/16bOJk
","object":{"objectType":"note","id":"object:search.twitter.com,2005:3318539389","summary":"Cool
story about the man who created the @Starbucks logo. Additional link at the
bottom on how it came to be: http://bit.ly/16bOJk","link":"
http://twitter.com/meetjenn/statuses/3318539389
","postedTime":"2009-08-15T00:00:12.000Z"},"twitter_entities":{"urls":[{"expanded_url":null,"indices":[111,131],"url":"
http://bit.ly/16bOJk
"}],"hashtags":[],"user_mentions":[{"id":null,"name":null,"indices":[41,51],"screen_name":"@Starbucks","id_str":null}]},"retweetCount":0,"gnip":{"matching_rules":[{"value":"Starbucks","tag":null}]}}
{"id":"tag:search.twitter.com
,2005:3318543260","objectType":"activity","actor":{"objectType":"person","id":"id:
twitter.com:61595468","link":"http://www.twitter.com/FastestFood","displayName":"FastFood
Bob","postedTime":"2009-01-30T20:51:10.000Z","image":"","summary":"Just A
little food for
thought","links":[{"href":"http://www.TeamSantilli.com","rel":"me"}],"friendsCount":0,"followersCount":0,"listedCount":0,"statusesCount":0,"twitterTimeZone":"Pacific
Time (US &
Canada)","verified":false,"utcOffset":"0","preferredUsername":"FastestFood","languages":["en"],"location":{"objectType":"place","displayName":"eating
some
thoughts"},"favoritesCount":0},"verb":"post","postedTime":"2009-08-15T00:00:23.000Z","generator":{"displayName":"oauth:17","link":"
http://twitter.com
"},"provider":{"objectType":"service","displayName":"Twitter","link":"
http://www.twitter.com"},"link":"
http://twitter.com/FastestFood/statuses/3318543260","body":"Oregon Biz
Report ? How Starbucks saved millions. Oregon closures ...
http://u.mavrev.com/02bdj","object":{"objectType":"note","id":"object:
search.twitter.com,2005:3318543260","summary":"Oregon Biz Report ? How
Starbucks saved millions. Oregon closures ... http://u.mavrev.com/02bdj
","link":"http://twitter.com/FastestFood/statuses/3318543260
","postedTime":"2009-08-15T00:00:23.000Z"},"twitter_entities":{"urls":[{"expanded_url":null,"indices":[70,95],"url":"
http://u.mavrev.com/02bdj
"}],"hashtags":[],"user_mentions":[]},"retweetCount":0,"gnip":{"matching_rules":[{"value":"Starbucks","tag":null}]}}
{"info":{"message":"Replay Request
Completed","sent":"2015-02-18T00:05:15+00:00","activity_count":2}}
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.