Skip to main content

Home/ Arquitectura?/ Group items tagged scraping

Rss Feed Group items tagged

Pablo Lalloni

Metascraper - 0 views

  •  
    "A Scala Library for Scraping Page Metadata. Scraping metadata (e.g. title, description, url, etc.) from a URL is something that Facebook currently does for you when you paste a URL into the "Update Status" box. For a service that I'm currently building out, we wanted to do this as well for our users. Thus Metascraper was born. There was already a Ruby solution called link_thumbnailer, but since this is a I/O heavy operation, I knew I wanted to build a solution using tools that supported non-blocking I/O and could be used without getting caught in callback spaghetti. Scala, Akka, and the Play framework immediately came to mind."
Pablo Lalloni

zaphar / go-html-transform - Bitbucket - 0 views

  •  
    "This library provides a way to parse, scrape, and transform html5 pages using CSS selector queries."
Pablo Lalloni

Project Honey Pot - 0 views

  •  
    "Project Honey Pot is the first and only distributed system for identifying spammers and the spambots they use to scrape addresses from your website. Using the Project Honey Pot system you can install addresses that are custom-tagged to the time and IP address of a visitor to your site. If one of these addresses begins receiving email we not only can tell that the messages are spam, but also the exact moment when the address was harvested and the IP address that gathered it."
Pablo Lalloni

PhantomJS: Headless WebKit with JavaScript API - 0 views

  •  
    PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.
1 - 4 of 4
Showing 20 items per page