Skip to content

[External] Re: read.csv fails in R console in Ubuntu terminal but works in RStudio after R 3.6.3 upgrade to R 4.0.2?

4 messages · Rui Barradas, Rasmus Liland, Sam H

#
Hello,Yes, I thought it's a site policy issue too. But the file can be accessed and read/downloaded from RStudio and Firefox so apparently there's no reason why R console shouldn't .Anyway, I believe it's time for the OP to say someyhing, maybe he has solved it and there's no point in continuing.Rui Barradas?Enviado a partir do meu smartphone Samsung Galaxy.-------- Mensagem original --------De: luke-tierney at uiowa.edu Data: 17/07/2020  02:59  (GMT+00:00) Para: Ista Zahn <istazahn at gmail.com> Cc: Rui Barradas <ruipbarradas at sapo.pt>, r-help at r-project.org, Sam H <sam.hhh1 at gmail.com> Assunto: Re: [External] Re: [R] read.csv fails in R console in Ubuntu terminal but works in RStudio after R 3.6.3 upgrade to R 4.0.2? On my Ubuntu system the download with read.csv succeeds in an Rconsole if I set the HTTPUserAgent and download.file.method options tomatch the ones used by RStudio.Given how picky the server is being I would worry about whether thisuse is in line with the site's terms of service.Best,lukeOn Thu, 16 Jul 2020, Ista Zahn wrote:> On Thu, Jul 16, 2020 at 5:15 PM Ista Zahn <istazahn at gmail.com> wrote:>>>> On Thu, Jul 16, 2020 at 8:18 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:>>>>>> Hello,>>>>>> Thanks, but no, download.file still gives 403 Forbidden with both method>>> = "libcurl" and method = "wget".>>>> I think that makes it "not an R question". Ask on>> https://unix.stackexchange.com/ maybe?>> Oh, sorry I misread your message. Nevertheless:>> $ curl "https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download"> <HTML><HEAD>> <TITLE>Access Denied</TITLE>> </HEAD><BODY>> <H1>Access Denied</H1>>> You don't have permission to access> "http&#58;&#47;&#47;old&#46;nasdaq&#46;com&#47;screening&#47;companies&#45;by&#45;name&#46;aspx&#63;"> on this server.<P>> Reference&#32;&#35;18&#46;5506d217&#46;1594934303&#46;938edcb> </BODY>> </HTML>>> $ wget "https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download"> --2020-07-16 17:19:12--> https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download> Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'> Resolving old.nasdaq.com (old.nasdaq.com)... 2600:1400:9000:28f::1b46,> 2600:1400:9000:29b::1b46, 23.78.161.120> Connecting to old.nasdaq.com> (old.nasdaq.com)|2600:1400:9000:28f::1b46|:443... connected.> HTTP request sent, awaiting response... 403 Forbidden> 2020-07-16 17:19:12 ERROR 403: Forbidden.>> I don't think this is an R problem.>> Best,> Ista>>>>> Best,>> Ista>>>>>>>> Rui Barradas>>>>>> ?s 05:31 de 16/07/20, Jeff Newmiller escreveu:>>>> Perhaps read FAQ 7.43? [1]>>>>>>>> [1] https://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-enable-secure-https-downloads-in-R_003f>>>>>>>> On July 15, 2020 4:02:27 PM PDT, Rui Barradas <ruipbarradas at sapo.pt> wrote:>>>>> Hello,>>>>>>>>>> R 4.0.2 on Ubuntu 20.04 LTS, sessionInfo below.>>>>>>>>>> I'm also unable to read the file with Rscript from the Ubuntu terminal>>>>> but the error is not the same as the OP's.>>>>>>>>>>>>>>> The first try was a file test1.R with the following commands.>>>>>>>>>> x<-"https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download">>>>> read.csv(x, as.is=TRUE, na="n/a")>>>>>>>>>>>>>>> And run with Rscript>>>>>>>>>> rui at rui:~$ Rscript --vanilla test1.R>>>>> Error in file(file, "rt") :>>>>>??? cannot open the connection to>>>>> 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download'>>>>> Calls: read.csv -> read.table -> file>>>>> In addition: Warning message:>>>>> In file(file, "rt") :>>>>>??? cannot open URL>>>>> 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download':>>>>>>>>>> HTTP status was '403 Forbidden'>>>>> Execution halted>>>>>>>>>>>>>>>>>>>> The second try was download.file() and then read it.>>>>> File test2.R is:>>>>>>>>>> x<-"https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download">>>>> download.file(x, "companylist.csv")>>>>> read.csv("companylist.csv", as.is=TRUE, na="n/a")>>>>>>>>>>>>>>> But this too failed with error 403 Forbiden.>>>>>>>>>> rui at rui:~$ Rscript --vanilla test2.R>>>>> trying URL>>>>> 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download'>>>>> Error in download.file(x, "companylist.csv") :>>>>>??? cannot open URL>>>>> 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download'>>>>> In addition: Warning message:>>>>> In download.file(x, "companylist.csv") :>>>>>??? cannot open URL>>>>> 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download':>>>>>>>>>> HTTP status was '403 Forbidden'>>>>> Execution halted>>>>>>>>>>>>>>> This is my session info.>>>>>>>>>> rui at rui:~$ Rscript --vanilla -e 'sessionInfo()'>>>>> R version 4.0.2 (2020-06-22)>>>>> Platform: x86_64-pc-linux-gnu (64-bit)>>>>> Running under: Ubuntu 20.04 LTS>>>>>>>>>> Matrix products: default>>>>> BLAS:?? /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0>>>>> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0>>>>>>>>>> locale:>>>>>?? [1] LC_CTYPE=pt_PT.UTF-8?????? LC_NUMERIC=C>>>>>?? [3] LC_TIME=pt_PT.UTF-8??????? LC_COLLATE=pt_PT.UTF-8>>>>>?? [5] LC_MONETARY=pt_PT.UTF-8??? LC_MESSAGES=pt_PT.UTF-8>>>>>?? [7] LC_PAPER=pt_PT.UTF-8?????? LC_NAME=C>>>>>?? [9] LC_ADDRESS=C?????????????? LC_TELEPHONE=C>>>>> [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C>>>>>>>>>> attached base packages:>>>>> [1] stats???? graphics? grDevices utils???? datasets? methods?? base>>>>>>>>>> loaded via a namespace (and not attached):>>>>> [1] compiler_4.0.2>>>>>>>>>>>>>>>>>>>> ?s 08:45 de 15/07/20, Sam H escreveu:>>>>>> Hi,>>>>>>>>>>>> I am trying to download some data using read.csv and it works>>>>> perfectly in>>>>>> RStudio and fails in the R console in the terminal in Ubuntu 18.04>>>>> after>>>>>> upgrading from R 3.6.3 to 4.0.2. Before upgrading this worked in the>>>>> R>>>>>> console in the terminal also without any issues.>>>>>>>>>>>> Why would that be? How to fix this?>>>>>>>>>>>> Below please find R code output and sessionInfo().>>>>>>>>>>>> *Works in RStudio*>>>>>>>>>>>>>>>>>> read.csv("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download",>>>>> header=TRUE, as.is=TRUE, na="n/a")>>>>>>?????? Symbol??????????????????????????????????????????????? Name>>>>>> LastSale MarketCap IPOyear1????? TXG>>>>>> 10x Genomics, Inc.? 87.4400???? $8.6B??? 20192?????? YI>>>>>>????????????????????????????? 111, Inc.?? 6.4800? $533.69M??? 20183>>>>>> PIH????????????? 1347 Property Insurance Holdings, Inc.?? 4.5350>>>>>> $27.52M??? 2014>>>>>>??? sessionInfo()>>>>>> R version 4.0.2 (2020-06-22)>>>>>> Platform: x86_64-pc-linux-gnu (64-bit)>>>>>> Running under: Ubuntu 18.04.4 LTS>>>>>>>>>>>> Matrix products: default>>>>>> BLAS:?? /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3>>>>>> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so>>>>>>>>>>>> locale:>>>>>>??? [1] LC_CTYPE=en_US.UTF-8?????? LC_NUMERIC=C>>>>>> LC_TIME=en_US.UTF-8??????? LC_COLLATE=en_US.UTF-8>>>>>>??? [5] LC_MONETARY=en_US.UTF-8??? LC_MESSAGES=en_US.UTF-8>>>>>> LC_PAPER=en_US.UTF-8?????? LC_NAME=C>>>>>>??? [9] LC_ADDRESS=C?????????????? LC_TELEPHONE=C>>>>>> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C>>>>>>>>>>>> attached base packages:[1] stats???? graphics? grDevices utils>>>>>> datasets? methods?? base>>>>>>>>>>>> loaded via a namespace (and not attached):[1] compiler_4.0.2>>>>> tools_4.0.2>>>>>>>>>>>> *Fails in R console in terminal*>>>>>>>>>>>>????? >>>>>> read.csv("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download",>>>>>> header=TRUE, as.is=TRUE, na="n/a")>>>>>> Error in file(file, "rt") :>>>>>>???? cannot open the connection to>>>>>>>>>>> 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download'>>>>>> In addition: Warning message:>>>>>> In file(file, "rt") :>>>>>>???? URL>>>>> 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download':>>>>>> status was 'Failure when receiving data from the peer'> traceback()3:>>>>>> file(file, "rt")2: read.table(file = file, header = header, sep =>>>>> sep,>>>>>> quote = quote,>>>>>>????????? dec = dec, fill = fill, comment.char = comment.char, ...)1:>>>>>>>>>>> read.csv("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download",>>>>>>????????? header = TRUE, as.is = TRUE, na = "n/a")>? sessionInfo()>>>>>> R version 4.0.2 (2020-06-22)>>>>>> Platform: x86_64-pc-linux-gnu (64-bit)>>>>>> Running under: Ubuntu 18.04.4 LTS>>>>>>>>>>>> Matrix products: default>>>>>> BLAS:?? /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3>>>>>> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so>>>>>>>>>>>> locale:>>>>>>??? [1] LC_CTYPE=en_US.UTF-8?????? LC_NUMERIC=C>>>>>>??? [3] LC_TIME=en_US.UTF-8??????? LC_COLLATE=en_US.UTF-8>>>>>>??? [5] LC_MONETARY=en_US.UTF-8??? LC_MESSAGES=en_US.UTF-8>>>>>>??? [7] LC_PAPER=en_US.UTF-8?????? LC_NAME=C>>>>>>??? [9] LC_ADDRESS=C?????????????? LC_TELEPHONE=C??????????? [11]>>>>>> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C>>>>>>>>>>>> attached base packages:[1] stats???? graphics? grDevices utils>>>>>> datasets? methods?? base>>>>>>>>>>>> loaded via a namespace (and not attached):[1] compiler_4.0.2>>>>>>>>>>>>> I also asked this question here>>>>>>>>>>> https://stackoverflow.com/questions/62898008/why-read-csv-fails-in-r-console-in-ubuntu-terminal-but-works-in-rstudio-after-r>>>>>> . Since there was no answer on stackoverflow I sent this question>>>>> also to>>>>>> this list.>>>>>>>>>>>> Best regards,>>>>>> Sam>>>>>>>>>>>>???? [[alternative HTML version deleted]]>>>>>>>>>>>> ______________________________________________>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help>>>>>> PLEASE do read the posting guide>>>>> http://www.R-project.org/posting-guide.html>>>>>> and provide commented, minimal, self-contained, reproducible code.>>>>>>>>>>>>>>>> ______________________________________________>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see>>>>> https://stat.ethz.ch/mailman/listinfo/r-help>>>>> PLEASE do read the posting guide>>>>> http://www.R-project.org/posting-guide.html>>>>> and provide commented, minimal, self-contained, reproducible code.>>>>>>>>>> ______________________________________________>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see>>> https://stat.ethz.ch/mailman/listinfo/r-help>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html>>> and provide commented, minimal, self-contained, reproducible code.>> ______________________________________________> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.>-- Luke TierneyRalph E. Wareham Professor of Mathematical SciencesUniversity of Iowa????????????????? Phone:???????????? 319-335-3386Department of Statistics and??????? Fax:?????????????? 319-335-3017??? Actuarial Science241 Schaeffer Hall????????????????? email:?? luke-tierney at uiowa.eduIowa City, IA 52242???????????????? WWW:? http://www.stat.uiowa.edu
#
Hello,

Thank you very much to you all to look into this.

I came across this problem when I was using TTR::stockSymbols() (
https://github.com/joshuaulrich/TTR/blob/e6609b9f7621f3a4b1a204c159af61aebc89997e/R/WebData.R)
.

As a workaround I added this function to my private R package and instead
of read.csv I am now using data.table::fread() which properly (without
failing) downloads the file and reads it.

Best,
Sam
On Fri, Jul 17, 2020 at 4:30 AM ruipbarradas <ruipbarradas at sapo.pt> wrote:

            

  
    
#
On 2020-07-17 07:54 -0400, Sam H wrote:
| On 2020-07-17 09:30 +0100, ruipbarradas wrote:
| | On 2020-07-16 20:59 -0500, luke-tierney at uiowa.edu wrote:
| | | ?s 08:45 de 15/07/20, Sam H escreveu:
| | | | Hi,
| | | | 
| | | | I am trying to download some 
| | | | data using read.csv and it works 
| | | | perfectly in RStudio and fails 
| | | | in the R console in the terminal 
| | | | in Ubuntu 18.04 after upgrading 
| | | | from R 3.6.3 to 4.0.2. 
| | | 
| | | On my Ubuntu system the download 
| | | with read.csv succeeds in an R 
| | | console if I set the HTTPUserAgent 
| | | and download.file.method options to 
| | | match the ones used by RStudio.
| | | 
| | | Given how picky the server is being 
| | | I would worry about whether this use 
| | | is in line with the site's terms of 
| | | service.
| |
| | Yes, I thought it's a site policy 
| | issue too. But the file can be 
| | accessed and read/downloaded from 
| | RStudio and Firefox so apparently 
| | there's no reason why R console 
| | shouldn't .
| 
| Hello,
| 
| Thank you very much to you all to look into this.
| 
| I came across this problem when I was using TTR::stockSymbols() (
| https://github.com/joshuaulrich/TTR/blob/e6609b9f7621f3a4b1a204c159af61aebc89997e/R/WebData.R)
| .
| 
| As a workaround I added this function 
| to my private R package and instead of 
| read.csv I am now using 
| data.table::fread() which properly 
| (without failing) downloads the file 
| and reads it.

Dear Sam,

Good thing you solved this.  

Like Luke said, to use read.csv you need 
to set the HTTPUserAgent option:

	options("HTTPUserAgent"="User-Agent: RStudio Desktop (1.3.959)")

... or with cURL directly:

	rasmus at twentyfive ~ % curl -H 'User-Agent: RStudio Desktop (1.3.959)' 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download'

?s 08:45 de 15/07/20, Sam H escreveu:
| Before upgrading this worked in the R 
| console in the terminal also without 
| any issues.

In version 3.6.3, I was not able to 
run the lines

	> R.Version()$version.string
	[1] "R version 3.6.3 (2020-02-29)"
	> options()[c("download.file.method", "HTTPUserAgent")]
	$<NA>
	NULL
	
	$HTTPUserAgent
	[1] "R (3.6.3 x86_64-pc-linux-gnu x86_64 linux-gnu)"
	
	> x<-"https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download"
	> read.csv(x, as.is=TRUE, na="n/a")
	Error in file(file, "rt") :
	  cannot open the connection to 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download'
	In addition: Warning message:
	In file(file, "rt") :
	  cannot open URL 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download': HTTP status was '403 Forbidden'
	>

Running data.table::fread in 4.0.2:

	> options()[c("download.file.method", "HTTPUserAgent")]
	$<NA>
	NULL
	
	$HTTPUserAgent
	[1] "R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)"
	> x <- "https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download"
	> data.table::fread(x, header=TRUE)[1:2,]
	   Symbol               Name LastSale
	1:    TXG 10x Genomics, Inc.    89.19
	2:     YI          111, Inc.     6.53
	   MarketCap IPOyear        Sector
	1:    $8.77B    2019 Capital Goods
	2:  $537.81M    2018   Health Care
	                                           industry
	1: Biotechnology: Laboratory Analytical Instruments
	2:                         Medical/Nursing Services
	                       Summary Quote V9
	1: https://old.nasdaq.com/symbol/txg NA
	2:  https://old.nasdaq.com/symbol/yi NA

Does anyone know what data.table::fread 
does different to read.csv here (so 
setting HTTPUserAgent is not needed)?  

Without HTTPUserAgent, I think 
data.table::fread just reports something 
like "libcurl/7.71.1", like read.csv 
would have done ...

Best,
Rasmus
1 day later
#
This issue was now solved in TTR::stockSymbols() by package author
https://github.com/joshuaulrich/TTR/commit/98dec2b5aa68c3cee750397c7d11b164895e0140

Thanks for all the help and ideas.

Best,
Sam
On Fri, Jul 17, 2020, 13:54 Sam H <sam.hhh1 at gmail.com> wrote: