Sunday, November 21, 2004

Response to David Weinberger's lecture

I agree with most of what David Weinberger had to say in his lecture, except that he left an impression that there is an either-or relationship between structured hierarchical organization of information and the messier networked organization of information, while it's pretty obvious to me that both are necessary. Example: David Weinberger pitted Encyclopedia Brittanica against Wikipedia -- and this is kind of a raging debate on the internet. We shouldn't be talking "versus" in competition, but "versus" in strengths and weaknesses of both and where they complement each other. If I were teaching a class of middle schoolers, I'd tell my students to go read both sources. How cool would it be if a student came in and said, Brittanica says X and Wikipedia says Y. Who should I believe? These are kind of lessons students need to learn!

Now for a few geekier points...

E.F. "Ted" Codd's relational database idea (from the 1950s!) is ingenious. What is a network really? It looks really messy when David Weinberger showed a visualization of it, but to most computer scientists/mathematicians, it's just a set of nodes (objects/information) connected by a set of edges (relationships or links between objects/information) -- a network = a graph. But a network/graph really is a representation of a relation, and vice-versa. They are equivalent. Since relations can also be represented in tables, it seems like the world is just a giant database, not just of bits and bytes, but consisting of real objects (books, photographs, etc).

I think this can best be shown as an example. Let's take David Weinberger's right brain resume, a pictorial representation of a network. Mathematically, the network is a graph structure, with the nodes being the jobs and the edges (links) being the lines that establish some relationship between the jobs. So, the relational tables that represents the network in the picture would be:

Table 1: Relationships (links) between jobs
Each row represents a job-to-job link in the pictorial representation:
Philosophy Teacher, Interleaf
Philosophy Teacher, Writer (1)
Philosophy Teacher, Writer (2)
Writer (1), Marketing Guy
Writer (1), Journalist
Gag Writer, Journalist
Writer (2), Interleaf
Writer (2), Journalist
Journalist, Industry Analyst
and so on...

Table 2: Job-Time Relationship
Each row in this table connects each job with a year (I made approximations).
Philosophy Teacher, 1970
Writer (1), 1970
Gag Writer, 1972
Writer (2), 1975
Journalist, 1975
Interleaf, 1980
and so on...

Table 3: Pre-Web or Post-Web Relationship
There are a number of ways to represent pre- or post-Web. We can append a third column to Table 2 above that indicates pre- or post-Web for each (job, year) pair. We can associate each year as being pre- or post-Web. Or we can associate each job as being pre- or post-Web. Or we can have a special case in our database too. It really depends on how we will use the data -- this is where that classic CS/engineering tradeoff comes in. Which is more important, efficient storage or faster and more flexible retreival?

On the surface, the network looks messy, but there is a way to represent it (whether or not it is feasible to represent it is another question altogether, storage being the issue that comes to mind). It only begs the question, did Codd's relational model imitate real life, or did real life libraries adopt principles of Ted Codd's research indirectly? Maybe my old advisor John Pfaltz was right after all, "Databases are one of the most misunderstood computer concepts, even among computer professionals. There is a kind of naive assumption that they are little more than glorified file systems. In fact, good database systems represent a kind of complete microcosm of the computer world. Databases are an abstract model of the real world; in much the same way that a computer program is an abstract representation of a real process."

So why does it matter if libraries and archives continue to collect and organize information hierarchically? Just as long as the hierarchical structure is not the only way to index the information. Just like in a database system, you separate the physical representation of the information (B-Trees, a hierarchical structure) from the logical representation/access of the information (query langauges, programs that access the database), the principle of information hiding / separation of concerns kicks in and you can use any physical storage mechanism, so long as you provide sufficient access mechanisms to the end-user.

So in the larger scheme, users can use a myriad of access schemes, including the internet to query, search, explore, etc.. and when they find an item of interest, the local library/database that stores that actual item will navigate their own storage scheme (boxes, filing cabinets) to retrieve the item for you.

My preliminary claim is that libraries should continue to collect, archive, and store actual items, and not bother with coming up with the master index, but rather letting the networks and their community of users come up with the indexing schemes. After all, they are the ones that use the data. The libraries need to focus on providing quick access to actual objects, when necessary, and providing a means for other people (the distributed collective of users) to create the index for them.

I'm not totally crisp in my thoughts yet, but this is just what was running through my head after the lecture.. :)

All in all I think David Weinberger did a great job on the lecture and it really got me thinking. The television cameras did me injustice though. Although it might have looked like I was sleeping on C-Span, I was actually taking copious notes on my notebook in my lap. (Thanks B for letting me know, and feel free to drop your idea in here -- I enjoyed reading your response.)

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home