Python Request Persistent

metallica1973 · May 11, 2013, 1:48pm

I noticed that when attempting to download videos from the url, I receive a 403 forbidden when I get through to a certain point in my downloads. I can download a third of the videos but will error:

Retrieving file 'blah-video.f4v'...
Traceback (most recent call last):                                                                                                                                        ] ETA:  --:--:--   0.00  B/s
  File "./download.py", line 90, in <module>
    retrieve_flv(hashval_url)
  File "./lynda.saint.py", line 61, in retrieve_flv
    remote = urllib2.urlopen(hashval_url)
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 407, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 520, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 445, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 379, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 528, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

Here is my code that I think is pertenent:

persistent = requests.Session()
heads = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0'}
auths = requests.get('http://www.blah.com/user/login/modal', auth=('blah@blah.com', 'blah-password'), headers = heads)

def find_all_flvs(url):
    soup = BeautifulSoup(urllib2.urlopen(url))
    flvs = []
    for link in soup.findAll(onclick=re.compile("doShowCHys=1*")):
        link = str(link)
        #startpos = link.find("lpk4=") + 5
        #endpos   = link.find("&amp")
        vidnum   = re.findall("(?<==)\d{5,7}", link, re.U)
        vidurl   = "http://www.blah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum[0]

        for hashval_url in BeautifulSoup(urllib2.urlopen(vidurl)).findAll("flv"):
            flvs.append(hashval_url.text)

    return flvs

def retrieve_flv (url):
        blah blah blah code.....

I am thinking that when using requests sessions(), its not keeping my cookies persistent across all the request, therefor cause the site to spit out a 403 forbidden. What am I doing wrong?

---------- Post updated at 01:48 PM ---------- Previous update was at 12:57 PM ----------

It must be that my requests.Sessions it not persisting my requests across the spectrum. In checking "auths" after my script fails to download the video, I get this at the end:

We require that your browser can accept cookies in order to login. Please enable your browser cookies and then close and restart your browser to login.\');\r\n\t\t\t});\r\n\t\t</script>\r\n\t</body>\r\n</html