Hi Friends,
I did an extensive search over the internet and tried all possible solutions that were recommended, but couldn't figure this out.
Please see this link
http://www.dli.gov.in/data6/upload/0159/808/PTIFF/00000007.tif
It works.
But, when I try the following command
wget -r -nd --no-parent -U firefox -A tif http://www.dli.gov.in/data6/upload/0159/808/PTIFF/
I get the 403 forbidden error.
Could you please suggest a way around?
I cannot see that link. No such server.
In any case, there's no reason a server needs to permit you to see the index of a folder. If it's also forbidden in a browser, then it's just plain forbidden because they don't want you to do that.
A quick google search of that URL suggests the image you want is part of the "Brihatkathamanjari", available here in a variety of forms:
1 Like
Hi Corona,
I could access that link.
But anyways, thanks for your response.
---------- Post updated at 10:13 PM ---------- Previous update was at 10:04 PM ----------
corona688:
I cannot see that link. No such server.
In any case, there's no reason a server needs to permit you to see the index of a folder. If it's also forbidden in a browser, then it's just plain forbidden because they don't want you to do that.
I figured out that the file numbers starts with 7 preceding zeroes and for every every number greater than 0, the preceding zeroes are decreased in number.
For ex:
00000001.tif. It goes like this until 00000009.tif
And then
00000010.tif until 00000099.tif (Note the 6 preceding zeroes)
And then
00000100.tif till 00000999.tif (Note the 5 preceding zeroes)
I used this command
wget http://www.dli.gov.in/data6/upload/0159/808/PTIFF/0000000{1..94}.tif
But I could only get until 00000009.tif. Could you please suggest a for loop?
Thanks
It simply cannot be accessed from here. DNS returns nothing. Very very strange.
If it's somehow valid where you are, you could try playing with the referer settings:
wget --referer=http://www.dli.gov.in/ -U netscape
...which should pretend a little more to be a web browser and not a mining robot.
But actually, it would be simpler to go to http://www.dli.gov.in/data6/upload/0159/808/PTIFF/ in your browser since you say it works from there, then just save the list of URL's.
1 Like
corona688:
It simply cannot be accessed from here. DNS returns nothing. Very very strange.
If it's somehow valid where you are, you could try playing with the referer settings:
wget --referer=http://www.dli.gov.in/ -U netscape
...which should pretend a little more to be a web browser and not a mining robot.
But actually, it would be simpler to go to http://www.dli.gov.in/data6/upload/0159/808/PTIFF/ in your browser and save that webpage, and get all the URL's from there.
Corona,
Actually only the tif files are made public. All the above folders are forbidden.
Could you please comment on the above for loop request?
Then, for your original question, you have your answer. It won't work with wget if it won't work with your browser.
for ((N=1; N<100; N++))
do
printf "%s/%06d.tif\n" "http://www.dli.gov.in/data6/upload/0159/808/PTIFF" $N
done | wget -I -
RudiC
October 5, 2014, 4:39am
8
jacobs.smith:
. . .
I used this command
wget http://www.dli.gov.in/data6/upload/0159/808/PTIFF/0000000{1..94}.tif
But I could only get until 00000009.tif. Could you please suggest a for loop?
. . .
Isn't that obvious? Those filenames have 8 meaningful digits, but the patterns "000000010" onward will have 9. Try
wget http://www.dli.gov.in/data6/upload/0159/808/PTIFF/0000000{1..9}.tif
wget http://www.dli.gov.in/data6/upload/0159/808/PTIFF/000000{10..94}.tif
That's also why you should use printf "%s/%08d.tif\n"
in Corona688's proposal above.
1 Like