I wrote a package that requires downloading data from an external server based on parameters specified by the user. I have used RMySQL or Rcurl to accomplish this, but in the interest of simplicity for users of this package, I'd like this communication to not require installing other packages and their dependencies (e.g. RMySQL, Rcurl, etc.). These dependencies are sometimes a deterrent to using my package. I set up a CGI script on the server to handle data requests. Now, is there a *built-in* R function that could be adapted to post forms or otherwise allows long URLs to be passed to this script to retrieve the data requested. I tried using just "scan", but it fails with long URLs (sometimes a long list of parameters is required). I think my only option is to use POST, but it doesn't appear it is possible without Rcurl. Can anyone help or suggest a "native" solution to the problem? Thanks.
Post CGI forms with built-in R function?
4 messages · Duncan Temple Lang, Mike Schaffer
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Mike
If you don't want to have to download any packages but stick entirely
within R, then you can mimic the code in httpRequest. But,
as far as I know, there is no function in the standard R distribution to
POST an HTTP request.
As for using
scan("http://....")
what is the string you are using? Are you escaping all the characters
correctly? What's the error message?
If what you are doing involves only relatively basic HTTP requests,
perhaps the simplest thing to do is use the httpRequest package on CRAN.
It does the basics. You may have to handle chunked responses yourself
and escaping certain characters. But that is a pure R solution that can
be easily installed via install.packages().
Mike Schaffer wrote:
I wrote a package that requires downloading data from an external server based on parameters specified by the user. I have used RMySQL or Rcurl to accomplish this, but in the interest of simplicity for users of this package, I'd like this communication to not require installing other packages and their dependencies (e.g. RMySQL, Rcurl, etc.). These dependencies are sometimes a deterrent to using my package. I set up a CGI script on the server to handle data requests. Now, is there a *built-in* R function that could be adapted to post forms or otherwise allows long URLs to be passed to this script to retrieve the data requested. I tried using just "scan", but it fails with long URLs (sometimes a long list of parameters is required). I think my only option is to use POST, but it doesn't appear it is possible without Rcurl. Can anyone help or suggest a "native" solution to the problem? Thanks.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
- -- Duncan Temple Lang duncan at wald.ucdavis.edu Department of Statistics work: (530) 752-4782 4210 Mathematical Sciences Building fax: (530) 752-7099 One Shields Ave. University of California at Davis Davis, CA 95616, USA -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) iD8DBQFEvlwb9p/Jzwa2QP4RAvL1AJ9W/ixKDEl0KbH9yrGmTbEWsXoP+ACfXVdW vxQEDd6UsMXzX0baOPdhMvo= =uzVj -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 There is a hard coded limit of 4096 characters in RxmlNanoHTTPScanURL and other ScanURL routines in nanohttp.c and nanoftp.c. And your URI is 5138 and so walks past the bounds of the array of length 4096. I am not yet convinced that it is worthwhile to increase this limit to a larger number. Using POST in this context really is a better solution. But we do need to add checks to the code to ensure that the URI string is smaller than 4096. I'll try to get an opportunity to do that tomorrow before I take off. D.
Duncan Temple Lang wrote:
Hi Mike
If you don't want to have to download any packages but stick entirely
within R, then you can mimic the code in httpRequest. But,
as far as I know, there is no function in the standard R distribution to
POST an HTTP request.
As for using
scan("http://....")
what is the string you are using? Are you escaping all the characters
correctly? What's the error message?
If what you are doing involves only relatively basic HTTP requests,
perhaps the simplest thing to do is use the httpRequest package on CRAN.
It does the basics. You may have to handle chunked responses yourself
and escaping certain characters. But that is a pure R solution that can
be easily installed via install.packages().
Mike Schaffer wrote:
I wrote a package that requires downloading data from an external server based on parameters specified by the user. I have used RMySQL or Rcurl to accomplish this, but in the interest of simplicity for users of this package, I'd like this communication to not require installing other packages and their dependencies (e.g. RMySQL, Rcurl, etc.). These dependencies are sometimes a deterrent to using my package. I set up a CGI script on the server to handle data requests. Now, is there a *built-in* R function that could be adapted to post forms or otherwise allows long URLs to be passed to this script to retrieve the data requested. I tried using just "scan", but it fails with long URLs (sometimes a long list of parameters is required). I think my only option is to use POST, but it doesn't appear it is possible without Rcurl. Can anyone help or suggest a "native" solution to the problem? Thanks.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Duncan Temple Lang duncan at wald.ucdavis.edu Department of Statistics work: (530) 752-4782 4210 Mathematical Sciences Building fax: (530) 752-7099 One Shields Ave. University of California at Davis Davis, CA 95616, USA
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - -- Duncan Temple Lang duncan at wald.ucdavis.edu Department of Statistics work: (530) 752-4782 4210 Mathematical Sciences Building fax: (530) 752-7099 One Shields Ave. University of California at Davis Davis, CA 95616, USA -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) iD8DBQFEv4nO9p/Jzwa2QP4RAnPnAJ974RMxo/KXfxQjaRHoHB1ZsdIy+QCeNhXg EDk/WHaFUeH5C2v/607kovo= =FAGn -----END PGP SIGNATURE-----
Thanks Duncan. I figured there was some limit. Your suggestion to check out the httpRequest code has me headed in the right direction, but I am having problems with the data returned from the socketConnection. Some of the returned data appears to improperly decoded. I don't know if I've stumbled on a low-level socket bug, or I need to error check the results myself. Can anyone figure out the problem in the code below? This was run using R version 2.3.1 (2006-06-01) on both Linux and OS X with the same results. # First, the correctly formatted data for comparison is returned by the code below. As we've established, this works fine, but I can't use it with long URIs: full.url<-"http://genomics11.bu.edu/cgi-bin/Tractor_dev/external/ get_msa.cgi? user_id=0&table=seqs_ucsc_hg18&len=350&gene_set_ids=NM_000029,NM_000064, NM_000066&orgs=Hs,mm8,canFam2" read<-readLines(full.url) # This returns three tab-delimited lines with a header row. > read [1] "NA\tHs\tmm8\tcanFam2" [2] "NM_000029\tTAAGCA--AGACTC-TCCCCTGCCCTCTGCCCTCTGCACCTCCGG--- CCTGCATGTC----------CCTGTGGCCTCTTGGGGGTACATCTCCCGGGG--- CTGGGTCAGAAG---------GCCTGGGTGGTTGGCCTCAGG------------------------ CTGTCACACACCTAGGGAGATGCTC------------------ CCGTTTCTGGGAACCTTGGCCCCGACTCCTGCA---- AACTTCGGTAAATGTGTAACTCGACCCTGCACCGGCTC---------------- ACTCTGTTCAGCA----GTGAAACTCTGCATCGATCACTAAGACTTCCTGG- AAGAGGTCCCAGCGT----GAGTGTCGCT--- TCTGGCATCTGTCCTTCTGG---------------------CCAGCCTGTGGTC-------------- TGG-CCAAGTGATGTAACCCTCCTCT---CCAGCCT\tTGCAAGTGAGCCCC- CTTCCTG-----------------------------GCATGCC----------CAGAGAGGCTTACG-- AGTGCATCACGAGGGGG-CTTTCATCCCAAG--------- GTCTGCATGGCTGGCTTCAGG------------------------TTGTCACAACCC----- ACTCAATC------------------CTGTGACTG-------TGGTCCTGGCTCCAGGG---- AACTGGGGTAAATGTGTAACCCAAGGCCAGCC---------------------- TATTTTTGCATGA----GGCT-------CATCTGCCAGTAGGGCTTCCTGG-AAGGGG- CCCAGAG-----GAACATCAC----CCTGGCCCTGATCCATCTTGGT------------------- CAAGCCTGGATTCTCA-----------TGG-TTCCCTGATCTGGGTCCTCCC----CCAGCCT \tCCGG----GGCTCC-TTCCCTG--------------CGCCCTGGGGCCTCAGCACATT---------- CTTGGGGACTCTCAGAAGCACACCTCGAGAGG--GCTCTGTCAGAAG---------GCTTG- GTGGCTGGCCTCGGC------------------------ TTGTCACAGCTCAGGGCAGAGACGCGACACACACACCTACACACAGGTACGGGGCGCTCCGGACCCGGCCCG GGCAGGGGAGCTGCGGTCAATGTGTAACTCGGCGGCCCAGCGGCTC---------------- GTTCTGCTCAGCA----CAGAAAGTGTGCATCGATCTCCCTGACTTCCTGG- AAGGCGTCCCAGCCT----GAGAGTAGCT----CTGGCGCCTGTACCCCCCACC------ CCCGTGGGGCCCCCACCCCCATGGTC--------------GGG- CCAAGTGATGTCACCTCCCGCCTCCCCAGCCT" [3] "NM_000064\tC------------------------------------- CAAAAGTGAACTGGGG-ATGAG-GTCCAAGACATCTGCGGTGGGGGGTT- CTCCAGACCTTAGTGTTCTTC--CACTACAAAGTGGGTCCAACAGAGAAAGG------------ TCTGTG----------------TTCACCAGGTGG---CCCTGACCC--- TGGGAGAGTCCAGGGCAGGGTGCAGCTGCATTCATGCTGCTGGG----GAACATGC- CCTCAGGTTACTCACCCCATGGA----CATGTTGGCC-CCAGGGACTGAAAA-GCTTAG---- GAAATGGTATTGAGAAATCTGGGGCAGC-CCCAAAAGGG-GAGAGG--CCATGGGGAGAAGGGG-- GGGCTGAG----TGGGGGAAAGGCAGGAGCCAG--ATAAAA----AGCCAGCTCCAGCAGGCGCTGCTCA \tATTTAGCAAGACCTTGGGGGTAGGGAGAACCAGCCATCCAGAAGTG--CTGGGTTACTGG- GACCCAGCTAAGTGTGGGAGGAGGTCACTCTAGACTTCAATGGTCTCTGGTGTAACCAAGTA---- CAACAGGGACCAG------------CCCAGG----------------TTCAGCATCTGG--- CCTTGACCC---CAAGAAAAGCCTGAGCCAAG-- CAGGTACTTTCAAGCTCCAGGGTAATGGAAATGTGCCTAGGGTTACTCACCCCA-AGG---- CTTGTTGCCC-CAGGTTTGTGAAAAAGCTTAG----GAAACTATGTTGCGAAATTTTGGGCAGT- CCCTGGTG--------------CAGGAACAGGGAG--GGACCAGA------GAGGA------- GAGCCAT--ATAAAG----AGCCAGCGGCTACAGCCCCAGCTCG \t---------------------------------------------------------------------- --------------------------------------------------------- CCACGGGGAAAGG------------T----------------------TCACCAGCTGG--- CCTTGACCC---TGAGGGAGGCCATGGCAAGGGGAAGGTGTGTTCATGTTGCAGGA----GGACATGC- CCTTGGGTTAGTTACCCCC--GA----CACACTGGCC-CCGGGGATTGAAAA-ACTTAG---- GAAATGGTATTGAGTAATCTGGGGCAGC-TGCAGGGAGG-GGGAGG-- CTACAGGAGCTGTGGGCTGGGCTGAA---GGTGGGGGGAGGCTGGGGCCAG--ATAAAA---- GGCAATCCCCAACAGCCTCTGCTCA" [4] "NM_000066 \tAGCTGTTAGGTTGGTGCAAAAGTAATTGTGGTTTTTGCCATTAAAAGCAATGACAA-------------- --AAACTG------------- CAATTACTTTTGCACCAACCTAGTCAGTGGCAGAGAATGTACTTGAACCCAGGCTGTCTAGACCTAGATCCC ACAGTCCTTGCCACCTCA--CTAATAGCCTGTCCAC---TTGGCAGCTTACCCTAAAGTTA----- CAGAGGAATAAACACCATGCTGCTACA- GATTTTTCATTAT----------------------------------- TCTGGTTGGTTTCCAGAGTGACAGG---TAAGTTT-TTGGTC-TGTGCAAAGTCTG----- TTTCCAGTCACTAGTGGCTTTCTGTTTACTTTGCAGAGCTATTTGCTCT-TGGGGACAGAAGCTGACAGT \t---------------------------------------------------------------------- ------------------------------------------------------------------------ -------------------------------------------------------------- GGCTGCTT-CCATGGAATCA--------------------------------- CAGTTCTCACTGT-----------------------------------CCTAG--- GTGTGCGGTATCACAAG---TGAGTACATTCGTGCTGTGCAAAGCTGA---- GGTTCCAGGTACAAGCG----CCTGTTTGCTTTGCTGACCTGTTTGCTCTATGTCAACAGAA- CTGAGAGC \t---------------------------------------------------------------------- ------------------------------------------- GCAAGTGGCAGAGAAAGGATTTGAACCCAGGCAGTCTGGACCTCGAGCC-------------- CCTCATCCTAACAGCTTGTCCTT---TTGGCTGCCT--------CTTT----- CTTGTGTAGAA------------------- CTTTCCATTGT----------------------------------- TCTCCCTGGTGTCCAGAGTCATAGG---TAAGT-T-TCTGTC- TGTACAAAGTCTAAGGGGTTTCCGGTCACTGTTGATTTTTTGTTTACTTTGCTGACCTGTTTGCTCTATGGG GACAGAAGCTGACAGC" # Now, if I try to use sockets with a POST request, I get some of the correct data, but some of it is incorrect or improperly decoded: host<-"genomics11.bu.edu" path<-"/cgi-bin/Tractor_dev/external/get_msa.cgi" dat<-"user_id=0&table=seqs_ucsc_hg18&len=350&gene_set_ids=NM_000029,NM_0 00064,NM_000066&orgs=Hs,mm8,canFam2" len <- length( strsplit(dat,"")[[1]]) request<-paste("POST ",path," HTTP/1.1\nHost: ",host,"\nReferer: \nContent-type: application/x-www-form-urlencoded\nContent-length: ",len,"\nConnection: Keep-Alive\n\n",dat,sep="") fp <- socketConnection(host=host,port=80,server=FALSE,blocking=TRUE) write(request,fp) socketSelect(list(fp)) # Wait until results are ready sock<-readLines(fp) close(fp) # Returns: > sock [1] "HTTP/1.1 200 OK" [2] "Date: Thu, 20 Jul 2006 14:27:37 GMT" [3] "Server: Apache/2.0.53 (Fedora)" [4] "Connection: close" [5] "Transfer-Encoding: chunked" [6] "Content-Type: text/plain; charset=ISO-8859-1" [7] "" [8] "fd0" [9] "NA\tHs\tmm8\tcanFam2" [10] "NM_000029\tTAAGCA--AGACTC-TCCCCTGCCCTCTGCCCTCTGCACCTCCGG--- CCTGCATGTC----------CCTGTGGCCTCTTGGGGGTACATCTCCCGGGG--- CTGGGTCAGAAG---------GCCTGGGTGGTTGGCCTCAGG------------------------ CTGTCACACACCTAGGGAGATGCTC------------------ CCGTTTCTGGGAACCTTGGCCCCGACTCCTGCA---- AACTTCGGTAAATGTGTAACTCGACCCTGCACCGGCTC---------------- ACTCTGTTCAGCA----GTGAAACTCTGCATCGATCACTAAGACTTCCTGG- AAGAGGTCCCAGCGT----GAGTGTCGCT--- TCTGGCATCTGTCCTTCTGG---------------------CCAGCCTGTGGTC-------------- TGG-CCAAGTGATGTAACCCTCCTCT---CCAGCCT\tTGCAAGTGAGCCCC- CTTCCTG-----------------------------GCATGCC----------CAGAGAGGCTTACG-- AGTGCATCACGAGGGGG-CTTTCATCCCAAG--------- GTCTGCATGGCTGGCTTCAGG------------------------TTGTCACAACCC----- ACTCAATC------------------CTGTGACTG-------TGGTCCTGGCTCCAGGG---- AACTGGGGTAAATGTGTAACCCAAGGCCAGCC---------------------- TATTTTTGCATGA----GGCT-------CATCTGCCAGTAGGGCTTCCTGG-AAGGGG- CCCAGAG-----GAACATCAC----CCTGGCCCTGATCCATCTTGGT------------------- CAAGCCTGGATTCTCA-----------TGG-TTCCCTGATCTGGGTCCTCCC----CCAGCCT \tCCGG----GGCTCC-TTCCCTG--------------CGCCCTGGGGCCTCAGCACATT---------- CTTGGGGACTCTCAGAAGCACACCTCGAGAGG--GCTCTGTCAGAAG---------GCTTG- GTGGCTGGCCTCGGC------------------------ TTGTCACAGCTCAGGGCAGAGACGCGACACACACACCTACACACAGGTACGGGGCGCTCCGGACCCGGCCCG GGCAGGGGAGCTGCGGTCAATGTGTAACTCGGCGGCCCAGCGGCTC---------------- GTTCTGCTCAGCA----CAGAAAGTGTGCATCGATCTCCCTGACTTCCTGG- AAGGCGTCCCAGCCT----GAGAGTAGCT----CTGGCGCCTGTACCCCCCACC------ CCCGTGGGGCCCCCACCCCCATGGTC--------------GGG- CCAAGTGATGTCACCTCCCGCCTCCCCAGCCT" [11] "NM_000064\tC------------------------------------- CAAAAGTGAACTGGGG-ATGAG-GTCCAAGACATCTGCGGTGGGGGGTT- CTCCAGACCTTAGTGTTCTTC--CACTACAAAGTGGGTCCAACAGAGAAAGG------------ TCTGTG----------------TTCACCAGGTGG---CCCTGACCC--- TGGGAGAGTCCAGGGCAGGGTGCAGCTGCATTCATGCTGCTGGG----GAACATGC- CCTCAGGTTACTCACCCCATGGA----CATGTTGGCC-CCAGGGACTGAAAA-GCTTAG---- GAAATGGTATTGAGAAATCTGGGGCAGC-CCCAAAAGGG-GAGAGG--CCATGGGGAGAAGGGG-- GGGCTGAG----TGGGGGAAAGGCAGGAGCCAG--ATAAAA----AGCCAGCTCCAGCAGGCGCTGCTCA \tATTTAGCAAGACCTTGGGGGTAGGGAGAACCAGCCATCCAGAAGTG--CTGGGTTACTGG- GACCCAGCTAAGTGTGGGAGGAGGTCACTCTAGACTTCAATGGTCTCTGGTGTAACCAAGTA---- CAACAGGGACCAG------------CCCAGG----------------TTCAGCATCTGG--- CCTTGACCC---CAAGAAAAGCCTGAGCCAAG-- CAGGTACTTTCAAGCTCCAGGGTAATGGAAATGTGCCTAGGGTTACTCACCCCA-AGG---- CTTGTTGCCC-CAGGTTTGTGAAAAAGCTTAG----GAAACTATGTTGCGAAATTTTGGGCAGT- CCCTGGTG--------------CAGGAACAGGGAG--GGACCAGA------GAGGA------- GAGCCAT--ATAAAG----AGCCAGCGGCTACAGCCCCAGCTCG \t---------------------------------------------------------------------- --------------------------------------------------------- CCACGGGGAAAGG------------T----------------------TCACCAGCTGG--- CCTTGACCC---TGAGGGAGGCCATGGCAAGGGGAAGGTGTGTTCATGTTGCAGGA----GGACATGC- CCTTGGGTTAGTTACCCCC--GA----CACACTGGCC-CCGGGGATTGAAAA-ACTTAG---- GAAATGGTATTGAGTAATCTGGGGCAGC-TGCAGGGAGG-GGGAGG-- CTACAGGAGCTGTGGGCTGGGCTGAA---GGTGGGGGGAGGCTGGGGCCAG--ATAAAA---- GGCAATCCCCAACAGCCTCTGCTCA" [12] "NM_000066 \tAGCTGTTAGGTTGGTGCAAAAGTAATTGTGGTTTTTGCCATTAAAAGCAATGACAA-------------- --AAACTG------------- CAATTACTTTTGCACCAACCTAGTCAGTGGCAGAGAATGTACTTGAACCCAGGCTGTCTAGACCTAGATCCC ACAGTCCTTGCCACCTCA--CTAATAGCCTGTCCAC---TTGGCAGCTTACCCTAAAGTTA----- CAGAGGAATAAACACCATGCTGCTACA- GATTTTTCATTAT----------------------------------- TCTGGTTGGTTTCCAGAGTGACAGG---TAAGTTT-TTGGTC-TGTGCAAAGTCTG----- TTTCCAGTCACTAGTGGCTTTCTGTTTACTTTGCAGAGCTATTTGCTCT-TGGGGACAGAAGCTGACAGT \t---------------------------------------------------------------------- ------------------------------------------------------------------------ -------------------------------------------------------------- GGCTGCTT-CCATGGAATCA--------------------------------- CAGTTCTCACTGT-----------------------------------CCTAG--- GTGTGCGGTATCACAAG---TGAGTACATTCGTGCTGTGCAAAGCTGA---- GGTTCCAGGTACAAGCG----CCTGTTTGCTTTGCTGACCTGTTTGCTCTATGTCAACAGAA- CTGAGAGC \t---------------------------------------------------------------------- ------------------------------------------- GCAAGTGGCAGAGAAAGGATTTGAACCCAGGCAGTCTGGACCTCGAGCC-------------- CCTCATCCTAACAGCTTGTCCTT---TTGGCTGCCT--------CTTT----- CTTGTGTAGAA-------------------CTTTCCATTGT------" [13] "a1" [14] "-----------------------------TCTCCCTGGTGTCCAGAGTCATAGG---TAAGT- T-TCTGTC- TGTACAAAGTCTAAGGGGTTTCCGGTCACTGTTGATTTTTTGTTTACTTTGCTGACCTGTTTGCTCTATGGG GACAGAAGCTGACAGC" [15] "" [16] "0" [17] "" Element 13 appears to be improperly decoded hexadecimal data. Can anyone shed some light on why this is? Are the strings too long for the readLines command to properly read from the socket? Thanks for any additional help anyone can provide. -- Mike
On Jul 20, 2006, at 9:49 AM, Duncan Temple Lang wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 There is a hard coded limit of 4096 characters in RxmlNanoHTTPScanURL and other ScanURL routines in nanohttp.c and nanoftp.c. And your URI is 5138 and so walks past the bounds of the array of length 4096. I am not yet convinced that it is worthwhile to increase this limit to a larger number. Using POST in this context really is a better solution. But we do need to add checks to the code to ensure that the URI string is smaller than 4096. I'll try to get an opportunity to do that tomorrow before I take off. D.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel - -- Duncan Temple Lang duncan at wald.ucdavis.edu Department of Statistics work: (530) 752-4782 4210 Mathematical Sciences Building fax: (530) 752-7099 One Shields Ave. University of California at Davis Davis, CA 95616, USA -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) iD8DBQFEv4nO9p/Jzwa2QP4RAnPnAJ974RMxo/KXfxQjaRHoHB1ZsdIy+QCeNhXg EDk/WHaFUeH5C2v/607kovo= =FAGn -----END PGP SIGNATURE-----