Python scraping with Mechanize, cookies (login) and proxies -


i scraping bad designed , managed government agricultural website application data requires me login , kicks me out , temporarily blocks ip (se asia, don't ask!)... trying use python , mechanize along proxy list ( http://proxy-hunter.blogspot.sg/2013/01/01-01-13-l1l2l3-http-proxies-1502.html ) work of scraping permits. however, can script login doesnot seem use proxies in set_proxies list... can why , suggest can fix it? found answer on stackoverflow answer "don't use mechanize"... well, got working script minus proxy aspect, isn't of helpful answer.

current code (reading ip site test whether ip changes or uses proxies - doesn't prints real ip):

import mechanize import cookielib beautifulsoup import beautifulsoup import html2text import urllib2  # mechanize browser/cookie stuff  br = mechanize.browser() cj = cookielib.lwpcookiejar() br.set_cookiejar(cj)   br.set_handle_equiv(true) br.set_handle_gzip(true) br.set_handle_redirect(true) br.set_handle_referer(true) br.set_handle_robots(false) br.set_handle_refresh(mechanize._http.httprefreshprocessor(), max_time=1) br.addheaders = [('user-agent', 'firefox')]  # proxies use  br.set_proxies({"84.2.35.44:80": "http", "1.62.68.201:6675": "http", "119.254.90.18:8080": "http"})  # testing whether proxy works checking ip  page = br.open('http://whatismyipaddress.com/') soup = beautifulsoup(br.response().read()) ip = soup.findall('div', {'style':'text-align:center;padding-top:4px;'}) print ip 

so works in terminal comes real ip, rather of 3 proxies set.

finally, please consider in answer logging website in actual code, using sth like:

br.open('http://www.terriblegovernmentsite.hk/login.php') br.select_form(nr=3) br.form[login'user'] = 'researcher01' br.form[login'pass'] = 'mypassword' br.submit() 

however, bit irrelevant since can't proxy work simple reading of static webpage, never mind whilst handling cookies.

thanks in advance help, appreciated.


Comments

Popular posts from this blog

wireshark - USB mapping with python -

c++ - nodejs socket.io closes connection before upgrading to websocket -

Deploying Qt Application on Android is really slow? -