Опитвам се просто да изтегля страница с python.
Ако получа кода за отговор от сървъра, получавам 200
import urllib2
url = 'http://webapps.rrc.state.tx.us/CMPL/viewPdfReportFormAction.do?method=cmplP4FormPdf&packetSummaryId=97770'
file_pointer = urllib2.urlopen(url)
print file_pointer.getcode()
Ако обаче получа URL адреса, получавам страницата за пренасочване
file_pointer.geturl()
Опитах urllib, urllib2,requests и mechanize всички отделно и не мога да накарам нито един да работи. Очевидно пропускам нещо, защото други хора в офиса имат код, който работи. SOS
Тук също има повече информация, предоставена от заявки
import requests
url = 'http://webapps.rrc.state.tx.us/CMPL/viewPdfReportFormAction.do?method=cmplP4FormPdf&packetSummaryId=97770'
proxy = { 'https': '200.35.152.93:1212'}
response = requests.get(url, proxies=proxy)
send: 'GET /CMPL/viewPdfReportFormAction.do?method=cmplP4FormPdf&packetSummaryId=97770 HTTP/1.1\r\nHost: webapps.rrc.state.tx.us\r\nConnection: keep-alive\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.7.0 CPython/2.7.10 Windows/7\r\n\r\n'
reply: 'HTTP/1.1 302 Found\r\n'
header: Date: Wed, 26 Aug 2015 19:33:12 GMT
header: Server: Apache/2.2.15 (Red Hat)
header: Location: http://www.rrc.state.tx.us/site-policies/railroad-commission-of-texas-site-policies/?method=cmplP4FormPdf&packetSummaryId=97770
header: Content-Length: 405
header: Connection: close
header: Content-Type: text/html; charset=iso-8859-1
send: 'GET /site-policies/railroad-commission-of-texas-site-policies/?method=cmplP4FormPdf&packetSummaryId=97770 HTTP/1.1\r\nHost: www.rrc.state.tx.us\r\nConnection: keep-alive\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.7.0 CPython/2.7.10 Windows/7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Cache-Control: private
header: Content-Type: text/html; charset=utf-8
header: server: one
header: Date: Wed, 26 Aug 2015 19:33:11 GMT
header: Content-Length: 41216