Skip to main content

Home/ Scripters/ Group items tagged link

Rss Feed Group items tagged

Jac Londe

        petit truc 2clck - 0 views

http://ad.uk.doubleclick.net/5185/jump/timeout.uk.paris.fr/restaurant;type=mpu;page=feature;itemid=p...; http://ad.uk.doubleclick.net/5185/ad/timeout.uk.paris.fr/restaurant;type=mpu;page=featu...

started by Jac Londe on 28 Feb 15 no follow-up yet
Jac Londe

Parsing HTML in Python (Shallow Thoughts) - 0 views

  • Parsing HTML in Python
  • Up until now, I've avoided doing any HTMl parsing in my RSS reader FeedMe.
  • from HTMLParser import HTMLParser class MyFancyHTMLParser(HTMLParser): def fetch_url(self, url) : request = urllib2.Request(url) response = urllib2.urlopen(request) link = response.geturl() html = response.read() response.close() self.feed(html) # feed() starts the HTMLParser parsing def handle_starttag(self, tag, attrs): if tag == 'img' : # attrs is a list of tuples, (attribute, value) srcindex = self.has_attr('src', attrs) if srcindex < 0 : return # img with no src tag? skip it src = attrs[srcindex][1] # Make relative URLs absolute src = self.make_absolute(src) attrs[srcindex] = (attrs[srcindex][0], src) print '<' + tag for attr in attrs : print ' ' + attr[0] if len(attr) > 1 and type(attr[1]) == 'str' : # make sure attr[1] doesn't have any embedded double-quotes val = attr[1].replace('"', '\"') print '="' + val + '"') print '>' def handle_endtag(self, tag): self.outfile.write('</' + tag.encode(self.encoding) + '>\n'
Jac Londe

Google Script - 0 views

  •  
    GDrive Linker
1 - 3 of 3
Showing 20 items per page