Skip to content
Prev 21500 / 63424 Next

Post CGI forms with built-in R function?

Thanks Duncan.  I figured there was some limit.

Your suggestion to check out the httpRequest code has me headed in  
the right direction, but I am having problems with the data returned  
from the socketConnection.  Some of the returned data appears to  
improperly decoded.  I don't know if I've stumbled on a low-level  
socket bug, or I need to error check the results myself.  Can anyone  
figure out the problem in the code below?

This was run using R version 2.3.1 (2006-06-01) on both Linux and OS  
X with the same results.



# First, the correctly formatted data for comparison is returned by  
the code below.  As we've established, this works fine, but I can't  
use it with long URIs:

full.url<-"http://genomics11.bu.edu/cgi-bin/Tractor_dev/external/ 
get_msa.cgi? 
user_id=0&table=seqs_ucsc_hg18&len=350&gene_set_ids=NM_000029,NM_000064, 
NM_000066&orgs=Hs,mm8,canFam2"
read<-readLines(full.url)


# This returns three tab-delimited lines with a header row.

 > read
[1] "NA\tHs\tmm8\tcanFam2"

[2] "NM_000029\tTAAGCA--AGACTC-TCCCCTGCCCTCTGCCCTCTGCACCTCCGG--- 
CCTGCATGTC----------CCTGTGGCCTCTTGGGGGTACATCTCCCGGGG--- 
CTGGGTCAGAAG---------GCCTGGGTGGTTGGCCTCAGG------------------------ 
CTGTCACACACCTAGGGAGATGCTC------------------ 
CCGTTTCTGGGAACCTTGGCCCCGACTCCTGCA---- 
AACTTCGGTAAATGTGTAACTCGACCCTGCACCGGCTC---------------- 
ACTCTGTTCAGCA----GTGAAACTCTGCATCGATCACTAAGACTTCCTGG- 
AAGAGGTCCCAGCGT----GAGTGTCGCT--- 
TCTGGCATCTGTCCTTCTGG---------------------CCAGCCTGTGGTC-------------- 
TGG-CCAAGTGATGTAACCCTCCTCT---CCAGCCT\tTGCAAGTGAGCCCC- 
CTTCCTG-----------------------------GCATGCC----------CAGAGAGGCTTACG-- 
AGTGCATCACGAGGGGG-CTTTCATCCCAAG--------- 
GTCTGCATGGCTGGCTTCAGG------------------------TTGTCACAACCC----- 
ACTCAATC------------------CTGTGACTG-------TGGTCCTGGCTCCAGGG---- 
AACTGGGGTAAATGTGTAACCCAAGGCCAGCC---------------------- 
TATTTTTGCATGA----GGCT-------CATCTGCCAGTAGGGCTTCCTGG-AAGGGG- 
CCCAGAG-----GAACATCAC----CCTGGCCCTGATCCATCTTGGT------------------- 
CAAGCCTGGATTCTCA-----------TGG-TTCCCTGATCTGGGTCCTCCC----CCAGCCT 
\tCCGG----GGCTCC-TTCCCTG--------------CGCCCTGGGGCCTCAGCACATT---------- 
CTTGGGGACTCTCAGAAGCACACCTCGAGAGG--GCTCTGTCAGAAG---------GCTTG- 
GTGGCTGGCCTCGGC------------------------ 
TTGTCACAGCTCAGGGCAGAGACGCGACACACACACCTACACACAGGTACGGGGCGCTCCGGACCCGGCCCG 
GGCAGGGGAGCTGCGGTCAATGTGTAACTCGGCGGCCCAGCGGCTC---------------- 
GTTCTGCTCAGCA----CAGAAAGTGTGCATCGATCTCCCTGACTTCCTGG- 
AAGGCGTCCCAGCCT----GAGAGTAGCT----CTGGCGCCTGTACCCCCCACC------ 
CCCGTGGGGCCCCCACCCCCATGGTC--------------GGG- 
CCAAGTGATGTCACCTCCCGCCTCCCCAGCCT"

[3] "NM_000064\tC------------------------------------- 
CAAAAGTGAACTGGGG-ATGAG-GTCCAAGACATCTGCGGTGGGGGGTT- 
CTCCAGACCTTAGTGTTCTTC--CACTACAAAGTGGGTCCAACAGAGAAAGG------------ 
TCTGTG----------------TTCACCAGGTGG---CCCTGACCC--- 
TGGGAGAGTCCAGGGCAGGGTGCAGCTGCATTCATGCTGCTGGG----GAACATGC- 
CCTCAGGTTACTCACCCCATGGA----CATGTTGGCC-CCAGGGACTGAAAA-GCTTAG---- 
GAAATGGTATTGAGAAATCTGGGGCAGC-CCCAAAAGGG-GAGAGG--CCATGGGGAGAAGGGG-- 
GGGCTGAG----TGGGGGAAAGGCAGGAGCCAG--ATAAAA----AGCCAGCTCCAGCAGGCGCTGCTCA 
\tATTTAGCAAGACCTTGGGGGTAGGGAGAACCAGCCATCCAGAAGTG--CTGGGTTACTGG- 
GACCCAGCTAAGTGTGGGAGGAGGTCACTCTAGACTTCAATGGTCTCTGGTGTAACCAAGTA---- 
CAACAGGGACCAG------------CCCAGG----------------TTCAGCATCTGG--- 
CCTTGACCC---CAAGAAAAGCCTGAGCCAAG-- 
CAGGTACTTTCAAGCTCCAGGGTAATGGAAATGTGCCTAGGGTTACTCACCCCA-AGG---- 
CTTGTTGCCC-CAGGTTTGTGAAAAAGCTTAG----GAAACTATGTTGCGAAATTTTGGGCAGT- 
CCCTGGTG--------------CAGGAACAGGGAG--GGACCAGA------GAGGA------- 
GAGCCAT--ATAAAG----AGCCAGCGGCTACAGCCCCAGCTCG 
\t---------------------------------------------------------------------- 
--------------------------------------------------------- 
CCACGGGGAAAGG------------T----------------------TCACCAGCTGG--- 
CCTTGACCC---TGAGGGAGGCCATGGCAAGGGGAAGGTGTGTTCATGTTGCAGGA----GGACATGC- 
CCTTGGGTTAGTTACCCCC--GA----CACACTGGCC-CCGGGGATTGAAAA-ACTTAG---- 
GAAATGGTATTGAGTAATCTGGGGCAGC-TGCAGGGAGG-GGGAGG-- 
CTACAGGAGCTGTGGGCTGGGCTGAA---GGTGGGGGGAGGCTGGGGCCAG--ATAAAA---- 
GGCAATCCCCAACAGCCTCTGCTCA"

[4] "NM_000066 
\tAGCTGTTAGGTTGGTGCAAAAGTAATTGTGGTTTTTGCCATTAAAAGCAATGACAA-------------- 
--AAACTG------------- 
CAATTACTTTTGCACCAACCTAGTCAGTGGCAGAGAATGTACTTGAACCCAGGCTGTCTAGACCTAGATCCC 
ACAGTCCTTGCCACCTCA--CTAATAGCCTGTCCAC---TTGGCAGCTTACCCTAAAGTTA----- 
CAGAGGAATAAACACCATGCTGCTACA- 
GATTTTTCATTAT----------------------------------- 
TCTGGTTGGTTTCCAGAGTGACAGG---TAAGTTT-TTGGTC-TGTGCAAAGTCTG----- 
TTTCCAGTCACTAGTGGCTTTCTGTTTACTTTGCAGAGCTATTTGCTCT-TGGGGACAGAAGCTGACAGT 
\t---------------------------------------------------------------------- 
------------------------------------------------------------------------ 
-------------------------------------------------------------- 
GGCTGCTT-CCATGGAATCA--------------------------------- 
CAGTTCTCACTGT-----------------------------------CCTAG--- 
GTGTGCGGTATCACAAG---TGAGTACATTCGTGCTGTGCAAAGCTGA---- 
GGTTCCAGGTACAAGCG----CCTGTTTGCTTTGCTGACCTGTTTGCTCTATGTCAACAGAA- 
CTGAGAGC 
\t---------------------------------------------------------------------- 
------------------------------------------- 
GCAAGTGGCAGAGAAAGGATTTGAACCCAGGCAGTCTGGACCTCGAGCC-------------- 
CCTCATCCTAACAGCTTGTCCTT---TTGGCTGCCT--------CTTT----- 
CTTGTGTAGAA------------------- 
CTTTCCATTGT----------------------------------- 
TCTCCCTGGTGTCCAGAGTCATAGG---TAAGT-T-TCTGTC- 
TGTACAAAGTCTAAGGGGTTTCCGGTCACTGTTGATTTTTTGTTTACTTTGCTGACCTGTTTGCTCTATGGG 
GACAGAAGCTGACAGC"





# Now, if I try to use sockets with a POST request,  I get some of  
the correct data, but some of it is incorrect or improperly decoded:

host<-"genomics11.bu.edu"
path<-"/cgi-bin/Tractor_dev/external/get_msa.cgi"
dat<-"user_id=0&table=seqs_ucsc_hg18&len=350&gene_set_ids=NM_000029,NM_0 
00064,NM_000066&orgs=Hs,mm8,canFam2"

len <- length( strsplit(dat,"")[[1]])
request<-paste("POST ",path," HTTP/1.1\nHost: ",host,"\nReferer:  
\nContent-type: application/x-www-form-urlencoded\nContent-length:  
",len,"\nConnection: Keep-Alive\n\n",dat,sep="")
fp <- socketConnection(host=host,port=80,server=FALSE,blocking=TRUE)
write(request,fp)
socketSelect(list(fp)) # Wait until results are ready
sock<-readLines(fp)
close(fp)


# Returns:

 > sock
[1] "HTTP/1.1 200 OK"

[2] "Date: Thu, 20 Jul 2006 14:27:37 GMT"

[3] "Server: Apache/2.0.53 (Fedora)"

[4] "Connection: close"

[5] "Transfer-Encoding: chunked"

[6] "Content-Type: text/plain; charset=ISO-8859-1"

[7] ""

[8] "fd0"

[9] "NA\tHs\tmm8\tcanFam2"

[10] "NM_000029\tTAAGCA--AGACTC-TCCCCTGCCCTCTGCCCTCTGCACCTCCGG--- 
CCTGCATGTC----------CCTGTGGCCTCTTGGGGGTACATCTCCCGGGG--- 
CTGGGTCAGAAG---------GCCTGGGTGGTTGGCCTCAGG------------------------ 
CTGTCACACACCTAGGGAGATGCTC------------------ 
CCGTTTCTGGGAACCTTGGCCCCGACTCCTGCA---- 
AACTTCGGTAAATGTGTAACTCGACCCTGCACCGGCTC---------------- 
ACTCTGTTCAGCA----GTGAAACTCTGCATCGATCACTAAGACTTCCTGG- 
AAGAGGTCCCAGCGT----GAGTGTCGCT--- 
TCTGGCATCTGTCCTTCTGG---------------------CCAGCCTGTGGTC-------------- 
TGG-CCAAGTGATGTAACCCTCCTCT---CCAGCCT\tTGCAAGTGAGCCCC- 
CTTCCTG-----------------------------GCATGCC----------CAGAGAGGCTTACG-- 
AGTGCATCACGAGGGGG-CTTTCATCCCAAG--------- 
GTCTGCATGGCTGGCTTCAGG------------------------TTGTCACAACCC----- 
ACTCAATC------------------CTGTGACTG-------TGGTCCTGGCTCCAGGG---- 
AACTGGGGTAAATGTGTAACCCAAGGCCAGCC---------------------- 
TATTTTTGCATGA----GGCT-------CATCTGCCAGTAGGGCTTCCTGG-AAGGGG- 
CCCAGAG-----GAACATCAC----CCTGGCCCTGATCCATCTTGGT------------------- 
CAAGCCTGGATTCTCA-----------TGG-TTCCCTGATCTGGGTCCTCCC----CCAGCCT 
\tCCGG----GGCTCC-TTCCCTG--------------CGCCCTGGGGCCTCAGCACATT---------- 
CTTGGGGACTCTCAGAAGCACACCTCGAGAGG--GCTCTGTCAGAAG---------GCTTG- 
GTGGCTGGCCTCGGC------------------------ 
TTGTCACAGCTCAGGGCAGAGACGCGACACACACACCTACACACAGGTACGGGGCGCTCCGGACCCGGCCCG 
GGCAGGGGAGCTGCGGTCAATGTGTAACTCGGCGGCCCAGCGGCTC---------------- 
GTTCTGCTCAGCA----CAGAAAGTGTGCATCGATCTCCCTGACTTCCTGG- 
AAGGCGTCCCAGCCT----GAGAGTAGCT----CTGGCGCCTGTACCCCCCACC------ 
CCCGTGGGGCCCCCACCCCCATGGTC--------------GGG- 
CCAAGTGATGTCACCTCCCGCCTCCCCAGCCT"

[11] "NM_000064\tC------------------------------------- 
CAAAAGTGAACTGGGG-ATGAG-GTCCAAGACATCTGCGGTGGGGGGTT- 
CTCCAGACCTTAGTGTTCTTC--CACTACAAAGTGGGTCCAACAGAGAAAGG------------ 
TCTGTG----------------TTCACCAGGTGG---CCCTGACCC--- 
TGGGAGAGTCCAGGGCAGGGTGCAGCTGCATTCATGCTGCTGGG----GAACATGC- 
CCTCAGGTTACTCACCCCATGGA----CATGTTGGCC-CCAGGGACTGAAAA-GCTTAG---- 
GAAATGGTATTGAGAAATCTGGGGCAGC-CCCAAAAGGG-GAGAGG--CCATGGGGAGAAGGGG-- 
GGGCTGAG----TGGGGGAAAGGCAGGAGCCAG--ATAAAA----AGCCAGCTCCAGCAGGCGCTGCTCA 
\tATTTAGCAAGACCTTGGGGGTAGGGAGAACCAGCCATCCAGAAGTG--CTGGGTTACTGG- 
GACCCAGCTAAGTGTGGGAGGAGGTCACTCTAGACTTCAATGGTCTCTGGTGTAACCAAGTA---- 
CAACAGGGACCAG------------CCCAGG----------------TTCAGCATCTGG--- 
CCTTGACCC---CAAGAAAAGCCTGAGCCAAG-- 
CAGGTACTTTCAAGCTCCAGGGTAATGGAAATGTGCCTAGGGTTACTCACCCCA-AGG---- 
CTTGTTGCCC-CAGGTTTGTGAAAAAGCTTAG----GAAACTATGTTGCGAAATTTTGGGCAGT- 
CCCTGGTG--------------CAGGAACAGGGAG--GGACCAGA------GAGGA------- 
GAGCCAT--ATAAAG----AGCCAGCGGCTACAGCCCCAGCTCG 
\t---------------------------------------------------------------------- 
--------------------------------------------------------- 
CCACGGGGAAAGG------------T----------------------TCACCAGCTGG--- 
CCTTGACCC---TGAGGGAGGCCATGGCAAGGGGAAGGTGTGTTCATGTTGCAGGA----GGACATGC- 
CCTTGGGTTAGTTACCCCC--GA----CACACTGGCC-CCGGGGATTGAAAA-ACTTAG---- 
GAAATGGTATTGAGTAATCTGGGGCAGC-TGCAGGGAGG-GGGAGG-- 
CTACAGGAGCTGTGGGCTGGGCTGAA---GGTGGGGGGAGGCTGGGGCCAG--ATAAAA---- 
GGCAATCCCCAACAGCCTCTGCTCA"

[12] "NM_000066 
\tAGCTGTTAGGTTGGTGCAAAAGTAATTGTGGTTTTTGCCATTAAAAGCAATGACAA-------------- 
--AAACTG------------- 
CAATTACTTTTGCACCAACCTAGTCAGTGGCAGAGAATGTACTTGAACCCAGGCTGTCTAGACCTAGATCCC 
ACAGTCCTTGCCACCTCA--CTAATAGCCTGTCCAC---TTGGCAGCTTACCCTAAAGTTA----- 
CAGAGGAATAAACACCATGCTGCTACA- 
GATTTTTCATTAT----------------------------------- 
TCTGGTTGGTTTCCAGAGTGACAGG---TAAGTTT-TTGGTC-TGTGCAAAGTCTG----- 
TTTCCAGTCACTAGTGGCTTTCTGTTTACTTTGCAGAGCTATTTGCTCT-TGGGGACAGAAGCTGACAGT 
\t---------------------------------------------------------------------- 
------------------------------------------------------------------------ 
-------------------------------------------------------------- 
GGCTGCTT-CCATGGAATCA--------------------------------- 
CAGTTCTCACTGT-----------------------------------CCTAG--- 
GTGTGCGGTATCACAAG---TGAGTACATTCGTGCTGTGCAAAGCTGA---- 
GGTTCCAGGTACAAGCG----CCTGTTTGCTTTGCTGACCTGTTTGCTCTATGTCAACAGAA- 
CTGAGAGC 
\t---------------------------------------------------------------------- 
------------------------------------------- 
GCAAGTGGCAGAGAAAGGATTTGAACCCAGGCAGTCTGGACCTCGAGCC-------------- 
CCTCATCCTAACAGCTTGTCCTT---TTGGCTGCCT--------CTTT----- 
CTTGTGTAGAA-------------------CTTTCCATTGT------"

[13] "a1"

[14] "-----------------------------TCTCCCTGGTGTCCAGAGTCATAGG---TAAGT- 
T-TCTGTC- 
TGTACAAAGTCTAAGGGGTTTCCGGTCACTGTTGATTTTTTGTTTACTTTGCTGACCTGTTTGCTCTATGGG 
GACAGAAGCTGACAGC"

[15] ""

[16] "0"

[17] ""




Element 13 appears to be improperly decoded hexadecimal data.  Can  
anyone shed some light on why this is?  Are the strings too long for  
the readLines command to properly read from the socket?  Thanks for  
any additional help anyone can provide.


--
Mike
On Jul 20, 2006, at 9:49 AM, Duncan Temple Lang wrote: