Skip to main content

Home/ Memedia/ Group items tagged atom

Rss Feed Group items tagged

Qien Kuen

5月1号被定为RSS节 | 精品博客 - 0 views

  • RSS节的目的是为了让尽可能多的人在5月1号谈及RSS以及它的好处。博客以及博客读者已经认识RSS形式,但是如果很多人都在谈论RSS,或许主流媒体也会谈论,也就会有更多的普通大众接触RSS。 如果大家开始阅读RSS以及它的好处时,我相信至少有一部分人会因为好奇而去尝试RSS。随着时间的推移,RSS的使用应该会加快,所有的互联网用户都会从中收益。
  •  
    好玩的节日越来越多 :-)
feng37

…My heart's in Accra » Studying Twitter and the Moldovan protests - 0 views

  • At some point on Friday, we hit a peak tweet density - 410 of 100,000 tweets included the #pman tag. Had I been scraping results by iterating 100,000 tweets at a time, I would have had four pages of new results - my script is only looking at the first page, so I’d be dropping results. If I ran the script again, I’d try to figure out the maximum tweet density by looking for the moment where the meme was most hyped, try to do a back of the envelope calculation as to an optimum step size and then halve it - that would probably have me using 20,000 steps for this set.
  • Density of tweets charted against blocks of 100,000 tweets
  • http://search.twitter.com/search?max_id=1511783811&page=2&q=%23pman&rpp=100 Picking apart the URL: max_id=1511783811 - Only return results up to tweet #1511783811 in the database page=2 - Hand over the second page of results q=%23pman - The query is for the string #pman, encoded to escape the hash rpp=100 - Give the user 100 results per page While you can manipulate these variables to your heart’s content, you can’t get more than 100 results per page. And if you retrieve 100 results per page, your results will stop at around 15 pages - the engine, by default, wants to give you only 1500 results on any search. This makes sense from a user perspective - it’s pretty rare that you actually want to read the last 1500 posts that mention the fail whale - but it’s a pain in the ass for researchers.
  • ...3 more annotations...
  • What you need to do is figure out the approximate tweet ID number that was current when the phenomenon you’re studying was taking place. If you’re a regular twitterer, go to your personal timeline, find a tweet you posted on April 7th, and click on the date to get the ID of the tweet. In the early morning (GMT) of the 7th, the ID for a new tweet was roughly 1468000000 - the URL http://search.twitter.com/search?max_id=1468000000&q=%23pman&rpp=100 retrieves the first four tweets to use the tag #pman, including our Ur-tweet: evisoft: neata, propun sa utilizam tag-ul #pman pentru mesajele din piata marii adunari nationale My Romanian’s a little rusty, but Vitalie Eşanu appears to be suggesting we use the tag #pman - short for Piata Marii Adunari Nationale, the main square in Chisinau where the protests were slated to begin - in reference to posts about the protests. His post is timestamped 4:40am GMT, suggesting that there were at least some discussions about promoting the protests on Twitter before protesters took to the streets.
  • Now the key is to grab URLs from Twitter, increasing the max_id variable in steps so that we’re getting all results from the start tweet ID to the current tweet ID. My perl script to do this steps by 10,000 results at a time, scraping the results I get from Twitter (using the Atom feed, not the HTML) and dumping novel results into a database. This seems like a pretty fine-toothed comb to use… but if you want to be comprehensive, it’s important to figure out what maximum “tweet density” is before running your code.
  •  
    http://search.twitter.com/search?max_id=1511783811&page=2&q=%23pman&rpp=100 Picking apart the URL: max_id=1511783811 - Only return results up to tweet #1511783811 in the database page=2 - Hand over the second page of results q=%23pman - The query is for the string #pman, encoded to escape the hash rpp=100 - Give the user 100 results per page While you can manipulate these variables to your heart's content, you can't get more than 100 results per page. And if you retrieve 100 results per page, your results will stop at around 15 pages - the engine, by default, wants to give you only 1500 results on any search. This makes sense from a user perspective - it's pretty rare that you actually want to read the last 1500 posts that mention the fail whale - but it's a pain in the ass for researchers. What you need to do is figure out the approximate tweet ID number that was current when the phenomenon you're studying was taking place. If you're a regular twitterer, go to your personal timeline, find a tweet you posted on April 7th, and click on the date to get the ID of the tweet. In the early morning (GMT) of the 7th, the ID for a new tweet was roughly 1468000000 - the URL http://search.twitter.com/search?max_id=1468000000&q=%23pman&rpp=100 retrieves the first four tweets to use the tag #pman, including our Ur-tweet: evisoft: neata, propun sa utilizam tag-ul #pman pentru mesajele din piata marii adunari nationale My Romanian's a little rusty, but Vitalie Eşanu appears to be suggesting we use the tag #pman - short for Piata Marii Adunari Nationale, the main square in Chisinau where the protests were slated to begin - in reference to posts about the protests. His post is timestamped 4:40am GMT, suggesting that there were at least some discussions about promoting the protests on Twitter before protesters took to the streets. Now the key is to grab URLs from Twitter, increasing the max_id variable in steps so that we're getting all results from the st
1 - 2 of 2
Showing 20 items per page