<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>Web harvesting solution's feed | Diigo Group</title>
    <link>http://groups.diigo.com/web_harvesting/bookmark</link>
    <description>Bookmarks from Web harvesting solution</description>
    <pubDate>Wed, 06 Jun 2007 21:45:27 -0000</pubDate>
    <item>
      <title>Aduna - Aperture</title>
      <link>http://www.aduna-software.com/technologies/aperture/overview.view</link>
      <description>&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Highlights and Sticky Notes:&lt;/strong&gt;&lt;p&gt;&lt;div class=&quot;content&quot;&gt;&lt;h2&gt;Flexible content and metadata extraction framework&lt;/h2&gt;
	&lt;a href=&quot;http://sourceforge.net/project/showfiles.php?group_id=150969&quot;&gt;&lt;img class=&quot;button&quot; src=&quot;http://www.aduna-software.com/images/button_download.png&quot; alt=&quot;Download button&quot; /&gt;&lt;/a&gt;
	&lt;p&gt;
		Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (e.g. file systems, web sites, mail boxes) and the file formats (e.g. documents, images) occurring in these systems.&lt;/p&gt;&lt;/div&gt;&lt;/p&gt;&lt;p&gt;&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/crawler&quot;&gt;crawler&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/data&quot;&gt;data&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/extraction&quot;&gt;extraction&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/framework&quot;&gt;framework&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/rdf&quot;&gt;rdf&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/semweb&quot;&gt;semweb&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/toolkit&quot;&gt;toolkit&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;strong&gt;Posted by:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 06 Jun 2007 21:45:27 -0000</pubDate>
    </item>
    <item>
      <title>Universal Feed Parser</title>
      <link>http://feedparser.org</link>
      <description>&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Comments:&lt;/strong&gt;&lt;ul&gt;&lt;li&gt;Easy to use python feed parser, opensource, well tested (3000 unit tests)&lt;br /&gt; &lt;small&gt;posted by &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/atom&quot;&gt;atom&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/feeds&quot;&gt;feeds&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/opensource&quot;&gt;opensource&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/parser&quot;&gt;parser&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/python&quot;&gt;python&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/rss&quot;&gt;rss&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;strong&gt;Posted by:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 06 Jun 2007 08:18:20 -0000</pubDate>
    </item>
    <item>
      <title>Aperture Framework</title>
      <link>http://aperture.sourceforge.net</link>
      <description>&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Comments:&lt;/strong&gt;&lt;ul&gt;&lt;li&gt;Java framework for data extraction, crawling, harvesting, &lt;br /&gt;able to process different data sources, and extract metadata &lt;br /&gt;and output rdf&lt;br /&gt;pluggable architecture, opensource. RDF insertion&lt;br /&gt;. &lt;small&gt;posted by &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Highlights and Sticky Notes:&lt;/strong&gt;&lt;p&gt;&lt;div class=&quot;content&quot;&gt;
Aperture is a Java framework for extracting and querying full-text
content and metadata from various information systems (e.g. file systems,
web sites, mail boxes) and the file formats (e.g. documents, images)
occurring in these systems.&lt;/div&gt;&lt;/p&gt;&lt;p&gt;&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/aduna&quot;&gt;aduna&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/crawling&quot;&gt;crawling&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/extraction&quot;&gt;extraction&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/metadata&quot;&gt;metadata&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/rdf&quot;&gt;rdf&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/sesame&quot;&gt;sesame&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;strong&gt;Posted by:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Wed, 06 Jun 2007 06:46:04 -0000</pubDate>
    </item>
    <item>
      <title>XPather :: Firefox Add-ons</title>
      <link>https://addons.mozilla.org/en-US/firefox/addon/1192</link>
      <description>&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Comments:&lt;/strong&gt;&lt;ul&gt;&lt;li&gt;Firefox extension for browsing and evaluating Xpath &lt;br /&gt; &lt;small&gt;posted by &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Highlights and Sticky Notes:&lt;/strong&gt;&lt;p&gt;&lt;div class=&quot;content&quot;&gt;&lt;div class=&quot;addon-feature-header&quot;&gt;&lt;div class=&quot;addon-feature-titleby&quot;&gt;&lt;h2 class=&quot;addon-feature-name&quot;&gt;XPather &lt;span&gt;1.3&lt;/span&gt;
                                    &lt;span class=&quot;addon-feature-homepage&quot;&gt;&lt;a href=&quot;http://xpath.alephzarro.com&quot;&gt;&lt;img src=&quot;https://addons.mozilla.org/img/developers/homepage_small.png&quot; alt=&quot;Homepage&quot; /&gt;&lt;/a&gt;&lt;/span&gt;
                                &lt;/h2&gt;
                &lt;span class=&quot;addon-feature-developer&quot; id=&quot;authors&quot;&gt; by                &lt;a href=&quot;http://addons.mozilla.org/en-US/firefox/user/5688&quot; class=&quot;profileLink&quot;&gt;Viktor Zigo&lt;/a&gt; &lt;/span&gt;
            &lt;/div&gt;
        &lt;/div&gt;

        &lt;div class=&quot;addon-feature-tagline&quot;&gt;
            &lt;p&gt;Feature rich XPath generator, editor, inspector and simple extraction tool...&lt;/p&gt;
        &lt;/div&gt;
        
        &lt;div class=&quot;addon-feature-text&quot;&gt;
            &lt;p&gt;Feature rich XPath generator, editor, inspector and simple extraction tool.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
http://xpath.alephzarro.com&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;/p&gt;&lt;p&gt;&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/addon&quot;&gt;addon&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/content&quot;&gt;content&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/extraction&quot;&gt;extraction&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/firefox&quot;&gt;firefox&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/inspectore&quot;&gt;inspectore&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/tool&quot;&gt;tool&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/xpath&quot;&gt;xpath&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;strong&gt;Posted by:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Thu, 10 May 2007 05:57:29 -0000</pubDate>
    </item>
    <item>
      <title>Solvent - SIMILE</title>
      <link>http://simile.mit.edu/wiki/Solvent</link>
      <description>&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Highlights and Sticky Notes:&lt;/strong&gt;&lt;p&gt;&lt;div class=&quot;content&quot;&gt;Solvent is a Firefox extension that helps you write screen scrapers for &lt;a href=&quot;http://simile.mit.edu/wiki/Piggy_Bank&quot; title=&quot;Piggy Bank&quot;&gt;Piggy Bank&lt;/a&gt;.&lt;/div&gt;&lt;/p&gt;&lt;p&gt;&lt;div class=&quot;content&quot;&gt;&lt;p&gt;
&lt;/p&gt;&lt;p&gt;In short, screen scrapers allow you to turn a regular web page into a regular web page plus semantic data, and thus frees the data from the page/site that contains it.&lt;/p&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;This page has nice definition of screen scrapper and what information extraction for semantic web is all about &lt;small&gt;posted by &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;&lt;p&gt;&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/no_tag&quot;&gt;no_tag&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;strong&gt;Posted by:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Sat, 28 Apr 2007 10:12:02 -0000</pubDate>
    </item>
    <item>
      <title>XULRunner Hall of Fame - MDC</title>
      <link>http://developer.mozilla.org/en/docs/XULRunner_Hall_of_Fame</link>
      <description>&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Comments:&lt;/strong&gt;&lt;ul&gt;&lt;li&gt;List of mozilla XULRunner&amp;nbsp; Gecko based applications &lt;br /&gt; &lt;small&gt;posted by &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Highlights and Sticky Notes:&lt;/strong&gt;&lt;p&gt;&lt;div class=&quot;content&quot;&gt;This page tracks existing XULRunner-based applications.&lt;/div&gt;&lt;/p&gt;&lt;p&gt;&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/application&quot;&gt;application&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/dom&quot;&gt;dom&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/embedding&quot;&gt;embedding&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/gecko&quot;&gt;gecko&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/mozilla&quot;&gt;mozilla&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/xul&quot;&gt;xul&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;strong&gt;Posted by:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Sat, 28 Apr 2007 09:46:05 -0000</pubDate>
    </item>
    <item>
      <title>PyXPCOM - MDC</title>
      <link>http://developer.mozilla.org/en/docs/PyXPCOM</link>
      <description>&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Comments:&lt;/strong&gt;&lt;ul&gt;&lt;li&gt;is it about controlling&amp;nbsp; firefox from python script?&lt;br /&gt; &lt;small&gt;posted by &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Highlights and Sticky Notes:&lt;/strong&gt;&lt;p&gt;&lt;div class=&quot;content&quot;&gt;&lt;p&gt;&lt;b&gt;PyXPCOM&lt;/b&gt; allows for communication between &lt;a href=&quot;http://www.python.org/&quot; title=&quot;http://www.python.org/&quot; class=&quot;external text&quot; rel=&quot;nofollow&quot;&gt;Python&lt;/a&gt; and &lt;a href=&quot;http://developer.mozilla.org/en/docs/XPCOM&quot; title=&quot;XPCOM&quot;&gt;XPCOM&lt;/a&gt;, such that a Python application can access XPCOM objects, and XPCOM can access any Python class that implements an XPCOM interface. With PyXPCOM, a developer can talk to XPCOM or embed &lt;a href=&quot;http://developer.mozilla.org/en/docs/Gecko&quot; title=&quot;Gecko&quot;&gt;Gecko&lt;/a&gt; from a Python application. PyXPCOM is similar to &lt;a href=&quot;http://developer.mozilla.org/en/docs/JavaXPCOM&quot; title=&quot;JavaXPCOM&quot;&gt;JavaXPCOM&lt;/a&gt; (Java-XPCOM bridge) or &lt;a href=&quot;http://developer.mozilla.org/en/docs/XPConnect&quot; title=&quot;XPConnect&quot;&gt;XPConnect&lt;/a&gt; (JavaScript-XPCOM bridge).
&lt;/p&gt;
Python classes and interfaces: Mozilla defines many external interfaces available to embeddors and component developers. PyXPCOM provides access to these interfaces as Python interfaces. PyXPCOM also contains several classes that provide access to functions for initializing and shutting down XPCOM and Gecko from Python, as well as some XPCOM helper functions.&lt;/div&gt;&lt;/p&gt;&lt;p&gt;&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/firefox&quot;&gt;firefox&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/gecko&quot;&gt;gecko&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/python&quot;&gt;python&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/xpcom&quot;&gt;xpcom&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;strong&gt;Posted by:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Sat, 28 Apr 2007 09:46:05 -0000</pubDate>
    </item>
    <item>
      <title>Amara equivalents of Mike Kay's XSLT 2.0, XQuery examples ✏Copia</title>
      <link>http://copia.ogbuji.net/blog/2005-06-12/Amara_equi</link>
      <description>&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Comments:&lt;/strong&gt;&lt;ul&gt;&lt;li&gt;This page shows pythonic vs xquery approach to xml&amp;nbsp; processing by Uche&lt;br /&gt; &lt;small&gt;posted by &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Highlights and Sticky Notes:&lt;/strong&gt;&lt;p&gt;&lt;div class=&quot;content&quot;&gt;&lt;p&gt;Since &lt;a href=&quot;http://copia.ogbuji.net/blog/2005-06-03/XTech__Mik&quot;&gt;seeing Mike Kay's presentation  at XTech
2005&lt;/a&gt; I've been meaning to write up some
&lt;a href=&quot;http://uche.ogbuji.net/tech/4Suite/amara/&quot;&gt;Amara&lt;/a&gt; equivalents to the
examples in the paper, &lt;a href=&quot;http://idealliance.org/proceedings/xtech05/papers/02-03-01/&quot;&gt;&quot;Comparing XSLT and
XQuery&quot;&lt;/a&gt;.
Here they are.&lt;/p&gt;

&lt;p&gt;This is not meant to be an advocacy piece, but rather a set of useful
examples.  I think the Amara examples tend to be easier to follow for
typical programmers (although they also expose some things I'd like to
improve), but with XSLT and XQuery you get cleaner declarative
semantics, and cross-language support.&lt;/p&gt;&lt;/div&gt;&lt;/p&gt;&lt;p&gt;&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/amara&quot;&gt;amara&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/comparison&quot;&gt;comparison&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/dom&quot;&gt;dom&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/python&quot;&gt;python&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/xml&quot;&gt;xml&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/xquery&quot;&gt;xquery&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;strong&gt;Posted by:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Sat, 28 Apr 2007 09:46:05 -0000</pubDate>
    </item>
    <item>
      <title>Sam Ruby: Bleach Alternatives</title>
      <link>http://www.intertwingly.net/blog/2006/05/31/Bleach-Alternatives</link>
      <description>&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Highlights and Sticky Notes:&lt;/strong&gt;&lt;p&gt;&lt;div class=&quot;content&quot;&gt;&lt;p xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;It occurs to me that I’ve seen these problems solved before, and with a better tool.  And I even have that the important piece installed on my machine...&lt;/p&gt;
&lt;p xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;I’d love to see all HTML processing in UFP become pluggable, and for a plug-in based on Mozilla to become a reality.  Many of the pieces seem to be in place.  After an &lt;code&gt;apt-get install python2.4-gtk2&lt;/code&gt;, I find that I can import &lt;a href=&quot;http://www.pygtk.org/pygtkmozembed/class-gtkmozembed.html&quot;&gt;gtkmozembed&lt;/a&gt; from within Python.  It looks like more pieces to the puzzle are (or will) become available with &lt;a href=&quot;http://gnome.org/~robsta/gtkmozedit.html&quot;&gt;GtkMozEdit&lt;/a&gt;.  But I don’t believe that fine grained access to the DOM from within Python is either necessary or even desirable.&lt;/p&gt;
&lt;p xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;To my way of thinking, the ideal would be to run Mozilla in a &lt;a href=&quot;http://java.sun.com/j2se/1.4.2/docs/guide/awt/AWTChanges.html#headless&quot;&gt;headless&lt;/a&gt; mode.  I’d simply &lt;a href=&quot;http://www.pygtk.org/pygtkmozembed/class-gtkmozembed.html#constructor-gtkmozembed&quot;&gt;construct a MozEmbed object&lt;/a&gt;, &lt;a href=&quot;http://www.pygtk.org/pygtkmozembed/class-gtkmozembed.html#method-gtkmozembed--open-stream&quot;&gt;stream in some data&lt;/a&gt;, that data would have some &lt;a href=&quot;http://www.onlinetools.org/articles/unobtrusivejavascript/&quot;&gt;unobtrusive javascript&lt;/a&gt; or would use an &lt;a href=&quot;http://www.xulplanet.com/references/xpcomref/ifaces/nsIXPCComponents_Utils.html#method_evalInSandbox&quot;&gt;evalInSandbox&lt;/a&gt; technique to make adjustments to the DOM tree, and finally either an &lt;a href=&quot;http://xerces.apache.org/xerces-j/apiDocs/org/apache/xml/serialize/HTMLSerializer.html&quot;&gt;HTMLSerializer&lt;/a&gt; or an &lt;a href=&quot;http://xerces.apache.org/xerces-j/apiDocs/org/apache/xml/serialize/XHTMLSerializer.html&quot;&gt;XHTMLSerializer&lt;/a&gt; would be used to return back sanitized content.&lt;/p&gt;
&lt;p xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;I’d much rather use &lt;a href=&quot;http://diveintogreasemonkey.org/patterns/index.html&quot;&gt;DOM/XPath techniques&lt;/a&gt; than &lt;a href=&quot;http://jcooney.net/archive/2005/06/26/3937.aspx&quot;&gt;regular expressions&lt;/a&gt;.&lt;/p&gt;
&lt;p xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;At this point, it occurs to me that a number of people who read this weblog have far more experience and/or better contacts than I do to help pull these pieces together.&lt;/p&gt;&lt;/div&gt;&lt;/p&gt;&lt;p&gt;&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/dom&quot;&gt;dom&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/headless&quot;&gt;headless&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/mozilla&quot;&gt;mozilla&lt;/a&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/tag/sanitizing&quot;&gt;sanitizing&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;strong&gt;Posted by:&lt;/strong&gt; &lt;a href=&quot;http://groups.diigo.com/web_harvesting/bookmark/ishtadasah&quot;&gt;ishtadasah&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Sat, 28 Apr 2007 09:46:04 -0000</pubDate>
    </item>
  </channel>
</rss>