nutchwax - Home Page - 0 views
-
Jack Park on 28 Apr 09NutchWAX ("Nutch + Web Archive eXtensions" ) searches web archive collections. The Web Archive eXtensions (WAX) include adaptation of the Nutch fetcher step to go against web archives rather than crawl the open net -- adaptation currently does Internet Archive ARC files only -- and plugins to add extra fields to the index that return an Archive Records' location in the repository, its collection name, etc.