Skip to main content

Home/ Scripters/ Group items tagged parsing

Rss Feed Group items tagged

Jac Londe

Parsing HTML in Python (Shallow Thoughts) - 0 views

  • Parsing HTML in Python
  • Up until now, I've avoided doing any HTMl parsing in my RSS reader FeedMe.
  • from HTMLParser import HTMLParser class MyFancyHTMLParser(HTMLParser): def fetch_url(self, url) : request = urllib2.Request(url) response = urllib2.urlopen(request) link = response.geturl() html = response.read() response.close() self.feed(html) # feed() starts the HTMLParser parsing def handle_starttag(self, tag, attrs): if tag == 'img' : # attrs is a list of tuples, (attribute, value) srcindex = self.has_attr('src', attrs) if srcindex < 0 : return # img with no src tag? skip it src = attrs[srcindex][1] # Make relative URLs absolute src = self.make_absolute(src) attrs[srcindex] = (attrs[srcindex][0], src) print '<' + tag for attr in attrs : print ' ' + attr[0] if len(attr) > 1 and type(attr[1]) == 'str' : # make sure attr[1] doesn't have any embedded double-quotes val = attr[1].replace('"', '\"') print '="' + val + '"') print '>' def handle_endtag(self, tag): self.outfile.write('</' + tag.encode(self.encoding) + '>\n'
Jac Londe

Parsing HTML using Javascript - 0 views

  • Ive written a small js application that gets html content as a string (this only has to work in mozilla)
  • req = new XMLHttpRequest(); req.open('GET', URI, true); req.onreadystatechange = function (aEvt) { if (req.readyState == 4) { if(req.status == 200){ var myTxt = req.responseText;
  • This works fine - myTxt contains the html code as a string. However I want to be able to parse the code using the DOM... Is there a way to create an HTML DOM in Javascript easily?
  • ...1 more annotation...
  • var myTxt = req.responseXML.documentElement; alert(myTxt.getElementsByTagName("XMLElementName")[0].firstChild.data);
Jac Londe

What is the correct way to write HTML using Javascript? - Stack Overflow - 0 views

  • document.write() will only work while the page is being originally parsed and the DOM is being created. Once the browser gets to the closing </body> tag and the DOM is ready, you can't use document.write() anymore.
  • Using innerHTML on a node:
  • var node = document.getElementById('node-id'); node.innerHTML('<p>some dynamic html</p>');
Jac Londe

Feeds HTML Parser for Node Creation | Drupal.org - 0 views

  • Feeds HTML Parser for Node Creation
  • I currently have a Drupal site set up and am using the Feeds module (http://drupal.org/project/feeds) to create CCK nodes from a few RSS feeds. This is a standard use case of that module, at least as far as I know.
  • The need has arisen for the ability to also create content from non-RSS/non-XML sources. I need a new Parser created for the Feeds module that would allow for one to populate CCK fields based on the parsing of raw HTML content. My first thought is that the user should be allowed to define a regular expression for each field, with the field then being populated by the output of the regular expression applied to the raw HTML content. However, I am open to suggestions on different solutions which might be easier for the developer.
1 - 4 of 4
Showing 20 items per page