Skip to main content

/ Arquitectura?/ Group items tagged scraping

Group items tagged

Filter: All | Bookmarks | Topics Simple Middle

Metascraper - 0 views

www.beachape.com/...ary-for-scraping-page-metadata

development programming scala scraping library internet www web web-crawling

shared by Pablo Lalloni on 12 Sep 13 - No Cached

Pablo Lalloni on 12 Sep 13

"A Scala Library for Scraping Page Metadata. Scraping metadata (e.g. title, description, url, etc.) from a URL is something that Facebook currently does for you when you paste a URL into the "Update Status" box. For a service that I'm currently building out, we wanted to do this as well for our users. Thus Metascraper was born. There was already a Ruby solution called link_thumbnailer, but since this is a I/O heavy operation, I knew I wanted to build a solution using tools that supported non-blocking I/O and could be used without getting caught in callback spaghetti. Scala, Akka, and the Play framework immediately came to mind."

<div class="cArrow"> </div><div class="cContentInner">"A Scala Library for Scraping Page Metadata. Scraping metadata (e.g. title, description, url, etc.) from a URL is something that Facebook currently does for you when you paste a URL into the "Update Status" box. For a service that I'm currently building out, we wanted to do this as well for our users. Thus Metascraper was born. There was already a Ruby solution called link_thumbnailer, but since this is a I/O heavy operation, I knew I wanted to build a solution using tools that supported non-blocking I/O and could be used without getting caught in callback spaghetti. Scala, Akka, and the Play framework immediately came to mind."</div>

...

Cancel

zaphar / go-html-transform - Bitbucket - 0 views

bitbucket.org/...go-html-transform

development golang libraries html5 html

shared by Pablo Lalloni on 16 May 15 - No Cached

Pablo Lalloni on 16 May 15

"This library provides a way to parse, scrape, and transform html5 pages using CSS selector queries."

<div class="cArrow"> </div><div class="cContentInner">"This library provides a way to parse, scrape, and transform html5 pages using CSS selector queries."</div>

...

Cancel

Project Honey Pot - 0 views

www.projecthoneypot.org/about_us.php

web spam malware spamming security

shared by Pablo Lalloni on 09 Dec 18 - Cached

Pablo Lalloni on 09 Dec 18

"Project Honey Pot is the first and only distributed system for identifying spammers and the spambots they use to scrape addresses from your website. Using the Project Honey Pot system you can install addresses that are custom-tagged to the time and IP address of a visitor to your site. If one of these addresses begins receiving email we not only can tell that the messages are spam, but also the exact moment when the address was harvested and the IP address that gathered it."

<div class="cArrow"> </div><div class="cContentInner">"Project Honey Pot is the first and only distributed system for identifying spammers and the spambots they use to scrape addresses from your website. Using the Project Honey Pot system you can install addresses that are custom-tagged to the time and IP address of a visitor to your site. If one of these addresses begins receiving email we not only can tell that the messages are spam, but also the exact moment when the address was harvested and the IP address that gathered it."</div>

...

Cancel

PhantomJS: Headless WebKit with JavaScript API - 0 views

phantomjs.org

javascript headless webkit testing scraping tools development web-development

shared by Pablo Lalloni on 28 Jun 12 - No Cached

Pablo Lalloni on 28 Jun 12

PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

<div class="cArrow"> </div><div class="cContentInner">PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.</div>

...

Cancel

1 - 4 of 4

Showing 20▼ items per page

Related searches