Python: Search word on a given webpage -
This is my first program that uses python for web I count the event of a specific (EI: football) I want to do, on Google and FIFA home pages
1) On Google
def the word on TheWebGoogle (): import urllib2 import re page = urllib2.urlopen ("http://www.google.com"). Print ("Football", page) Print page.Fund ("football") is output
[]
1)
2) On the FIFA home page
def wordOnTheWebFifa (): import urllib2 import re page = urllib2.urlopen ("http: //www.fifa. < ("Soccer", page) Print page. WordOnTheWebFifa () Traceback (most recent call final): The file "Code" is the output
& Lt; ipython-input-51-4e40573ed4fb "line1, WordOnTheWebFifa () file" D: L12Problem.py ", line 21, wordon TheWebFifa page = urllib2.urlopen (" http: / /www.fifa.com ") .read () file" C: \ Ana File ", line 127, urlopen return _opener.open (url, data, timeout) file" C: \ anaconda \ lib \ urllib2.py ", line 410, open response = meth (request, response) In the file "C: \ Anaconda \ lib \ urllib2.py", line 523, http_response 'http', request, response, code, message, HDR) , Line 448, error in the "c: \ Anaconda \ lib" file in the ._call_chain (* args) file "C: \ Anaconda \ lib \ urllib2.py", line 382, _call_chain result = func (* args) file. \ Urllib2.py ", Punk Increase Http_error_default HTTPError (code req.get_full_url (), code, msg, hdrs, fp) in 531, HTTPError: Forbidden I thought that at least Google search would return something, but No one. Does anyone help me solve both issues? For Fifa.com, why is this taboo message?
P>
Question 1:
- You can not find the word "football" on "www.google.com" Because the term "football" does not appear on that page. Load
www.google.com in your browser and see if you can see the word "football". If you want to find a page that comes back to Google, you search for "football", then you simulate to hit the "Google search" button on that page . As you will see that if you look at the source of google.com, how to find the form field between the code of that huge blobs and present them, it is not minor. And as seen in the comments, it may violate the terms of use. Question 2:
- This is mysterious why urlib2 fails to load www.fifa.com. I can not see anything that you are doing wrong - this is also for me. I can think of only one thing that urllib2 is not providing some headline information which is the demand of the server at fifa.com Therefore, the request has been rejected (because "Forbidden" error is telling us that this is FIFA dot is denying our connection)
I have a To use better libraries Give advice. This does what you want:
OFF WebFIFE () on the DEF word again: Page = requests. ("Http://www.fifa.com") Import. .findall ("football", page) print page. Friends ("football") TheWebFifa () Result:
mgregory $ dragon foo.py [u'fball ', u 'Football', youfball ', youfball', youfball ', youfball', youfball ', youfball', u'petball ', u'petball', u'petball ', you 'Petball', you 'football', youfball ', youfball', youfball ', youfball', 'u'petball', u'petball 'u'petball', 'u'petball', 'u' Petball ', you' football ', you'fball', youfball ', youfball', youfball ', youfball', you'petball 'u'petball', u'petball ', u'petball ', You'f Ball ', youfball', youfball ', youfball', youfball ', youfball', you'petball 'u'petball', u'petball ', u'petball', youfball ', Youfball', you football '] 25 9 0
Comments
Post a Comment