python - Regular Expression: (or not) Looking to print only data after the header -
some homework appreciated.
using socket, need parse data website (http://www.py4inf.com/code/romeo.txt).
i'm using regular expression '^\s*$' locate first blank line after header , above data.
any tips on how extract data (and not print header)?
import socket import re mysock = socket.socket(socket.af_inet, socket.sock_stream) try: userurl = raw_input('enter url: ') d = userurl.split('/') d.remove("") host = d[1] mysock.connect((host, 80)) mysock.send('get %s http/1.0\n\n'%(userurl)) while true: data = mysock.recv(3000) if len(data) < 1: break print (''.join([x x in re.findall(**'^\s*$'**,data,re.dotall)])) except exception e: print (str(e))
i'm assuming since it's homework problem have to use socket
, can't use more user-friendly requests
.
i first loop until have complete response in string, , iterate on this:
... response = "" while true: data = mysock.recv(3000) if len(data) < 1: break response += data iterator = iter(response.split("\n")) line in iterator: if not line.strip(): # empty line break body = "\n".join(iterator) # put rest of data in string
Comments
Post a Comment