"A Scala Library for Scraping Page Metadata.
Scraping metadata (e.g. title, description, url, etc.) from a URL is something that Facebook currently does for you when you paste a URL into the "Update Status" box. For a service that I'm currently building out, we wanted to do this as well for our users. Thus Metascraper was born.
There was already a Ruby solution called link_thumbnailer, but since this is a I/O heavy operation, I knew I wanted to build a solution using tools that supported non-blocking I/O and could be used without getting caught in callback spaghetti. Scala, Akka, and the Play framework immediately came to mind."
"Project Honey Pot is the first and only distributed system for identifying spammers and the spambots they use to scrape addresses from your website. Using the Project Honey Pot system you can install addresses that are custom-tagged to the time and IP address of a visitor to your site. If one of these addresses begins receiving email we not only can tell that the messages are spam, but also the exact moment when the address was harvested and the IP address that gathered it."
PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.