Skip to content

Download data from Internet contained in a Zip file

8 messages · Peter Dalgaard, Gábor Csárdi, David Winsemius +1 more

#
Hi again,

I posted this in general R thread, however it is suggested this group
since I am using MAC OS 10.7.5.

I was following the instruction available in
"http://stackoverflow.com/questions/3053833/using-r-to-download-zipped-data-file-extract-and-import-data"
to download data from Internet contained in a zip file from the
address :

https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip

However when I tried to follow the instruction I am facing below error :
Error in download.file("https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip",
 :
  unsupported URL scheme

Can someone here please tell me what went wrong in above?

Highly appreciate your feedback.

Thanks for your time.
#
Which R version is this?

-pd

  
    
#
Your R build does not support HTTPS.

I suggest that you use the curl package if you can. HTTP support in
base R is very limited currently.

Gabor



On Sun, Dec 25, 2016 at 10:37 PM, Christofer Bogaso
<bogaso.christofer at gmail.com> wrote:
#
I generally use the downloader package. It sets up the call to download.file so that it succeeds with https URLs.


 install.packages("downloader", dependencies=TRUE)
trying URL 'http://cran.cnr.Berkeley.edu/bin/macosx/mavericks/contrib/3.3/downloader_0.4.tgz'
Content type 'application/x-gzip' length 19459 bytes (19 KB)
==================================================
downloaded 19 KB


The downloaded binary packages are in
	/var/folders/68/vh2f8kzn09j8954r6q9100yh0000gn/T//Rtmpq8DVG4/downloaded_packages
starting httpd help server ... done
# Requires both a source and destination file name.

trying URL 'https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip'
Content type 'application/octet-stream' length 1228 bytes
==================================================
downloaded 1228 bytes
#
Hi David et al,

Thanks for showing the pointers. With your approach, I see the
"temp.zip" file in my working folder.

However still I could not extract the data within it. I tried using
unzip() function, however not really going through :
Warning message:
In unzip("temp.zip") : error 1 in extracting from zip file

When I try to access the link
"https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip"
manually, then download the zip file and then unzip it, I get a file
called "NAV_File_23122016.out". Which next I open in excel and get all
the data.

I was just trying to perform similar task, however through R, so that
I can load data automatically directly from Web.

Any Idea please. I am using below version of R (I know this is quite
old version, however I am not currently in a position to upgrade my
Macbook)
$platform
[1] "x86_64-apple-darwin10.8.0"

$arch
[1] "x86_64"

$os
[1] "darwin10.8.0"

$system
[1] "x86_64, darwin10.8.0"

$status
[1] ""

$major
[1] "3"

$minor
[1] "2.1"

$year
[1] "2015"

$month
[1] "06"

$day
[1] "18"

$`svn rev`
[1] "68531"

$language
[1] "R"

$version.string
[1] "R version 3.2.1 (2015-06-18)"

$nickname
[1] "World-Famous Astronaut"
On Mon, Dec 26, 2016 at 7:18 AM, David Winsemius <dwinsemius at comcast.net> wrote:
#
Things happened with HTTPS in 3.2.2, so upgrade... 

With 3.3.2 on Mavericks, I see
trying URL 'https://npscra.nsdl.co.in/download.php?path=download/&filename=NAV_File_23122016.zip'
Content type 'application/octet-stream' length 1228 bytes
==================================================
downloaded 1228 bytes

-pd

  
    
#
I didn't try to use R to unzip it. Just using my system facilities worked fine.

I'm not able to reproduce:
chr "./NAV_File_23122016.out"
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 1 did not have 17 elements
'data.frame':	75 obs. of  6 variables:
 $ V1: Factor w/ 1 level "12/23/2016": 1 1 1 1 1 1 1 1 1 1 ...
 $ V2: Factor w/ 7 levels "PFM001","PFM002",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ V3: Factor w/ 7 levels "HDFC PENSION MANAGEMENT COMPANY LIMITED",..: 6 6 6 6 6 6 6 6 6 6 ...
 $ V4: Factor w/ 75 levels "SM001001","SM001002",..: 5 7 8 11 12 13 1 2 3 4 ...
 $ V5: Factor w/ 75 levels "HDFC PENSION MANAGEMENT COMPANY LIMITED SCHEME A - TIER I",..: 62 59 63 37 56 57 54 55 60 58 ...
 $ V6: num  21.7 21.1 20.8 11.7 10.1 ...
#
Thanks David for your detailing. However still it not working for me :
100   996  100   996    0     0    917      0  0:00:01  0:00:01
--:--:--  2028 0      0      0 --:--:-- --:--:-- --:--:--     0
Warning message:
In unzip("~/temp.zip") : error 1 in extracting from zip file
chr(0)

Do I need to install something to get it worked?

However,

as a workaround can you please suggest if below data can be downloaded
directly into R? Since my only expectation is to get Data
automatically, if below works then I can still happily live with that

http://www.utimf.com/UTI-MF-Microsites/retirement/pdf/Scheme_1_NAV_since_inception.pdf

Thanks again

On Mon, Dec 26, 2016 at 11:10 PM, David Winsemius
<dwinsemius at comcast.net> wrote: