Category Archives: innovation

“The Internet is a Garbage Dump”

A strong article in John Dvorak’s typical style.

The Internet is full of broken links and accrued junk.

Let’s archive it and start over again.

The Internet is a giant garbage dump filled with abandoned images, blogs, Websites—abandoned everything. And no one cares enough to clean any of it up, hoping instead that it will magically fix itself after years of neglect and server shutdowns.

I have joined or tried out most of the online products and ideas that have sprung up since AOL first introduced a convoluted tool to let people design hokey pages, back in the 1990s. Most recently, I tried Posterous, one of the hottest up-and-coming sites in the country right now. Essentially you e-mail something to these folks and they post it—whatever it might be—on their servers and give you a URL that you can pass around. It’s pretty similar to sites like, except for the e-mail gimmick.

I have no idea how backed up Posterous is, but the assertion that the site replies “instantly” after you send an e-mail could not be further from the truth—that is, unless you have a very liberal definition of the word “instant.” I tried the service using my private e-mail system. I gave up after getting no response for half an hour. I tried Gmail next. That took 20 minutes. Here’s the link, if you’re curious.

That photo is now on a server someplace, languishing, like most things on the Internet. I once joined Facebook under an assumed name and never bothered going back. It’s wasted junk that still exists on a server. I must have a half-dozen blogs that I’ve started and since forgotten about.

Yahoo did the right thing when it decided to shutdown Geocities, close down servers, and take all of the junk offline. Oh course some important sites were probably shuttered in the process, but thanks to all of the junk the service had accrued over the years, it was impossible to save them.

This brings up a parallel problem. People create canonical one-shot Websites and post them on various blogging platforms. They generally get very light traffic, but they may be referenced by a link someplace. So you’ll read something and run into a link to Thomas Jefferson’s unique formula for wine preservation. You click on the link, and the site has been taken down for one of any number of reasons.

I know some of my Blogger sites disappeared after Google bought the company. I lost a complete backup of all of my contact information when an “always free” Website went out of business. And I can’t access my Flickr photos ever since Yahoo bought the site. It’s one thing after another, and the end result is a collection of junk, missing pages, and dead ends. And all the while, site like Posterous, Reddit, and Twitter come and go. Does anyone even use LiveJournal anymore?

The usefulness of the Internet—the Web in particular—has peaked, thanks to the limitations of search engines, a problem I’ve addressed before. Missing or moved pages, combined with an accumulation of crap dumped on the Internet for no particular reason, don’t bode well for the future. There’s no evidence that the junk accumulation and missing pages are going to stop any time soon.

So instead of just complaining, we need to start the clean up—in a way that works. Personal responsibility alone won’t do it. I think the cache of information should be archived in a closed Internet—an elaborate version of’s Wayback Machine, only without the history. Just close the Internet as we know it today. Archive it and start over. Make the current Internet read-only, and search and study it, so it can be organized properly. Everything from now on can be fluid, but let’s start over from scratch. Now that would be an interesting solution.


Designing re-search engines

Greg Linden write about a topic which is hooeey webprint’s raison d’etre: finding stuff that one has searched for or has seen before.

Greg quotes a 2010 paper by Microsoft Research:

The most obvious way that a search tool can improve the user experience given the prevalence of re-finding is for the tool to explicitly remember and expose that user’s search history.

We believe that a tool that can not only remember the user search history, but the user’s web history can be more effective.  An astonishing 40% of searches are in fact re-searches (trying to find a page that the user had seen before) hence there is a strong case for re-designing search engines or building specialized applications to better support re-finding efforts.

Where does your data live?

Here’s an interesting article from New Scientist about long term personal data storage. The key idea is that while it will become easier and cheaper to store, well, an infinite amount of data, it’s important that better ways of organising, retrieving and presenting that data be developed.

“Last week New Scientist pondered the fragility of digital data stores over the very long term, in the event of a civilisation-wide calamity. But anyone worried about civilisation’s chances would do well to look to their own data stores first.

Most of us today are blithely heading for our own personal data disasters. We generate and store vast volumes of information, but few of us really look after it. Continue reading

The Implicit Web-an exploration

Here’s a presentation from USID 2008.

The Implicit Web–what it means for us

The implicit web is a fascinating and of late, a practicable idea. Here’s Brad Feld’s blog post :

I’ve been fascinated with the notion of the Implicit Web since I determined that I was tired of my computer (and the Internet in general) being stupid.  I wanted it (my computer as well as the Internet) to pay attention to what I, and others, were doing.  Theoretically “my compute infrastructure” should learn, automate repeated tasks (automatically), figure out what information I actually want, and make sure I get it when I want it. Continue reading