Skip to main content

Home/ Justice & Injustice/ Group items tagged Parsing

Rss Feed Group items tagged

thinkahol *

Parsing the Data and Ideology of the We Are 99% Tumblr | Rortybomb - 0 views

  •  
    One of the most fascinating things to come out of the current We Are 99%/Occupy Wall Street protests is the We Are 99% Tumblr.  At the site, people hold up signs that explain their current circumstances, and it tells the story of a whole range of Americans struggling in the Lesser Depression.  It is highly recommended. DATA The site features pictures of individuals holding their signs, and occasionally the tumblr reproduces the text of the signs themselves underneath the image as html text.  Sometimes the text under the image is blank, sometimes it is a different message, but often it is the sign itself. In order to get a slightly better empirical handle on this important tumblr, I created a script designed to read all of the pages and parse out the html text on the site.  It doesn't read the images (can anyone in the audience automate calls to an OCR?), just the html text.  After collecting all the text on all the pages, the code then goes through it to try to find interesting points. It's a fun exercise, pointing out things I wouldn't have seen otherwise.  For instance, I found this adorable little rascal, pictured below, mucking up the algorithm, as the first version of the code assumed all the ages would have two digits.  I found that he, and the sign his mom made for him as a confessional to her son, hit me a ton harder than any of the more direct signs of despair in this economy:
1 - 1 of 1
Showing 20 items per page