Churnalism | Search - 0 views
-
Janos Haits on 12 Mar 11The site compresses all articles published on national newspaper websites, on BBC news, and Sky news online, into a series of numbers based on 15 character strings (using a hash function) and then stores them in a fast access database. When someone pastes in some text and clicks 'compare', the churn engine compresses the text entered and then searches for similar compressions (or 'common hashes'). If the engine finds any articles where the similarity is greater than 20%, then it suggests the article may be churn. Churnalism.com is powered off the back of the database of over three million compressed articles in journalisted.com.