Frankenlinks / Keeping Links Relevant (sort of) - 35 views

Internet Archive dead links win win

started by The Ravine / Joseph Dunphy on 24 Sep 09

#1 The Ravine / Joseph Dunphy on 24 Sep 09

First of all, I'd like to introduce everybody to a site worth knowing about, called the Internet Archive. Tagline: Surf the net as it was.

Let's say you have a site. Let's choose one that nobody would think is mine: the official homepage of the city of Chicago, located at

http://egov.cityofchicago.org/city/webportal/home.do

Suppose that, out of idle curiousity, you'd like to see what that site looked like in the past. You go to

http://web.archive.org/web/*/http://egov.cityofchicago.org/city/webportal/home.do

where you find yourself presented with a menu of past copies of that site, which you can reach by clicking on the date in which the copy was made. For example, here we have the October 8, 2003 copy of that site:

http://web.archive.org/web/20031008120348/http://egov.cityofchicago.org/city/webportal/home.do

What makes this more than an idle source of amusement for us, at this point, is the fact that unless the owner of the url asks that his page be removed from the archive, it will stay in even after it ceases to exist. The archive can be used to recover much of a site that has been hacked, deleted or otherwise lost, if the site has been archived. Anybody who wants to can submit a site, any site, to be archived merely by going to

http://www.alexa.com/help/webmasters#crawl_site

entering the url for the site in the box marked "url", and clicking on "crawl my site", even if it isn't his site, and any url will do. It doesn't have to be an index page or the top page in a domain or anything like that. Which brings me to my suggestion for dealing with the problem of links going dead, and the cleanup that might go with that.

Set the system so that when a page is bookmarked, the url for that page is automatically submitted to the Internet Archive. The staff at the Archive, recently, gained some good press by working overtime to preserve the sites at Geocities, before that provider closed; what Diigo would be doing is very much in keeping with what the Archive is trying to do, and the staff would probably be very happy to work with Diigo on this one.

Then, should the link, on testing, keeping coming up dead for a long enough time that one would be fairly sure that the site wouldn't come back, the Diigo system would realign the links on the reviews associated with that url, so that instead of pointing to

(old url)

then would instead point to

h t t p : / / web . archive . org / web / * / (old url)

(spaces introduced in an attempt to keep the Diigo system from creating links to nowhere)

The system would then put a small annotation on the link indicating that this link is to "an archived site which might no longer be in existence". If it is in existence, there will be a link to it on the top of that menu page for it on the Archive site, and visitors to Diigo will still be able to find it.

This way, the old pages, even after vanishing, can still be read and enjoyed by the visitors, along with the user comments about those sites. The need for cleanup is reduced, and visitors are introduced to a wonderful resource.

Originally Posted to the Diigo Feature Request Community, crossposted here because I'm not sure that Diigo is reading that group, any more.

But I hope they are.

<div class="cArrow"> </div><div class="cContentInner"> First of all, I'd like to introduce everybody to a site worth knowing about, called <a target="_blank" rel="nofollow" title="Link opens in new window" href="http://www.archive.org/">the Internet Archive</a>. Tagline: Surf the net as it was. Let's say you have a site. Let's choose one that nobody would think is mine: the official homepage of the city of Chicago, located at <a target="_blank" rel="nofollow" title="Link opens in new window" href="http://egov.cityofchicago.org/city/webportal/home.do">http://egov.cityofchicago.org/city/webportal/home.do</a> Suppose that, out of idle curiousity, you'd like to see what that site looked like in the past. You go to <a target="_blank" rel="nofollow" title="Link opens in new window" href="http://web.archive.org/web/*/http://egov.cityofchicago.org/city/webportal/home.do">http://web.archive.org/web/*/http://egov.cityofchicago.org/city/webportal/home.do</a> where you find yourself presented with a menu of past copies of that site, which you can reach by clicking on the date in which the copy was made. For example, here we have the October 8, 2003 copy of that site: <a target="_blank" rel="nofollow" title="Link opens in new window" href="http://web.archive.org/web/20031008120348/http://egov.cityofchicago.org/city/webportal/home.do">http://web.archive.org/web/20031008120348/http://egov.cityofchicago.org/city/webportal/home.do</a> What makes this more than an idle source of amusement for us, at this point, is the fact that unless the owner of the url asks that his page be removed from the archive, it will stay in even after it ceases to exist. The archive can be used to recover much of a site that has been hacked, deleted or otherwise lost, if the site has been archived. Anybody who wants to can submit a site, any site, to be archived merely by going to <a target="_blank" rel="nofollow" title="Link opens in new window" href="http://www.alexa.com/help/webmasters#crawl_site">http://www.alexa.com/help/webmasters#crawl_site</a> entering the url for the site in the box marked "url", and clicking on "crawl my site", even if it isn't his site, and any url will do. It doesn't have to be an index page or the top page in a domain or anything like that. Which brings me to my suggestion for dealing with the problem of links going dead, and the cleanup that might go with that. Set the system so that when a page is bookmarked, the url for that page is automatically submitted to the Internet Archive. The staff at the Archive, recently, gained some good press by working overtime to preserve the sites at Geocities, before that provider closed; what Diigo would be doing is very much in keeping with what the Archive is trying to do, and the staff would probably be very happy to work with Diigo on this one. Then, should the link, on testing, keeping coming up dead for a long enough time that one would be fairly sure that the site wouldn't come back, the Diigo system would realign the links on the reviews associated with that url, so that instead of pointing to (old url) then would instead point to h t t p : / / web . archive . org / web / * / (old url) (spaces introduced in an attempt to keep the Diigo system from creating links to nowhere) The system would then put a small annotation on the link indicating that this link is to "an archived site which might no longer be in existence". If it is in existence, there will be a link to it on the top of that menu page for it on the Archive site, and visitors to Diigo will still be able to find it. This way, the old pages, even after vanishing, can still be read and enjoyed by the visitors, along with the user comments about those sites. The need for cleanup is reduced, and visitors are introduced to a wonderful resource. <table align="center" width="90%"><tbody><tr><td>Originally Posted to <a target="_blank" rel="nofollow" title="Link opens in a new window" href="http://groups.diigo.com/groups/diigo-feature-requests-community">the Diigo Feature Request Community</a>, crossposted here because I'm not sure that Diigo is reading that group, any more. But I hope they are.</td></tr></tbody></table> </div>

...

Cancel
#2 Graham Perrin on 29 Jun 10

> links going dead, and the cleanup that might go with that

http://groups.diigo.com/group/Diigo_HQ/content/547626#2 (2007-09) shows that
verification of links/checking for dead links was on Diigo's to-do list. Please enable e-mail notification for that topic.

<div class="cArrow"> </div><div class="cContentInner">> links going dead, and the cleanup that might go with that <a href="http://groups.diigo.com/group/Diigo_HQ/content/547626#2" rel="nofollow" target="_blank">http://groups.diigo.com/group/Diigo_HQ/content/547626#2</a> (2007-09) shows that verification of links/checking for dead links was on Diigo's to-do list. Please enable e-mail notification for that topic.</div>

...

Cancel

To Top

Start a New Topic » « Back to the Diigo Community group

Start a New Topic