I noticed that when attempting to download videos from the url, I receive a 403 forbidden when I get through to a certain point in my downloads. I can download a third of the videos but will error:
Retrieving file 'blah-video.f4v'...
Traceback (most recent call last): ] ETA: --:--:-- 0.00 B/s
File "./download.py", line 90, in <module>
retrieve_flv(hashval_url)
File "./lynda.saint.py", line 61, in retrieve_flv
remote = urllib2.urlopen(hashval_url)
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 407, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 520, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 445, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 379, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 528, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
Here is my code that I think is pertenent:
persistent = requests.Session()
heads = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0'}
auths = requests.get('http://www.blah.com/user/login/modal', auth=('blah@blah.com', 'blah-password'), headers = heads)
def find_all_flvs(url):
soup = BeautifulSoup(urllib2.urlopen(url))
flvs = []
for link in soup.findAll(onclick=re.compile("doShowCHys=1*")):
link = str(link)
#startpos = link.find("lpk4=") + 5
#endpos = link.find("&")
vidnum = re.findall("(?<==)\d{5,7}", link, re.U)
vidurl = "http://www.blah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum[0]
for hashval_url in BeautifulSoup(urllib2.urlopen(vidurl)).findAll("flv"):
flvs.append(hashval_url.text)
return flvs
def retrieve_flv (url):
blah blah blah code.....
I am thinking that when using requests sessions(), its not keeping my cookies persistent across all the request, therefor cause the site to spit out a 403 forbidden. What am I doing wrong?
---------- Post updated at 01:48 PM ---------- Previous update was at 12:57 PM ----------
It must be that my requests.Sessions it not persisting my requests across the spectrum. In checking "auths" after my script fails to download the video, I get this at the end:
We require that your browser can accept cookies in order to login. Please enable your browser cookies and then close and restart your browser to login.\');\r\n\t\t\t});\r\n\t\t</script>\r\n\t</body>\r\n</html