Thursday, July 24, 2008

Open Source Databases

I'm listening to an illustrious panel of database experts, many of them with much experience in proprietary realms (Oracle, Sun... Autodesk).

History: Ingres and Sybase inherited from "open source" BSD research (not called that back then). Sybase later sold its code to Microsoft. The point: databases had open source beginnings. IBM's code, in contrast, was always closed source and never went anywhere or progressed more slowly as a consequence.

The free and open exchange of ideas, and code, is what spurs innovation (it's dangerous to not avail of this in many cases, if expecting to stay competitive). Open source folks were fastest in adopting open GIS-related standards.

CouchDB is more focused on the commodification of server farms and hearkens back to node-based architectures (I'm thinking of Mumps). Web services like EC2 have become a platform for open source databases. SimpleDB is Amazon's solution for running atop its S3 web service.

The newer generation is making better use of pre-existing standards (e.g. http), not investing in one-of-a-kind inhouse protocols. There's more than just lip service in favor of interoperability these days. The USA's DoD is moving towards open source, increasing demand.

Autodesk guy: the future is where big data stores come over the web and mix with local stuff. Semi-structured data, semi-structured searches... these are cutting edge areas.

Sun guy: "All open source databases suck at blobs" (large binary files -- like entire DVDs). He also worries MapReduce is "anti-green" in terms of rack space required, kilowatts per hour and so on. Maybe the optimization literature isn't getting applied, in favor of mere brute force solutions? Other experts in the audience weren't so sure (apples & oranges?).

What's the difference between a filesystem, database, and version control system anyway? Databases are on top of a filesystem, more CPU intensive. Versioning systems tend to not use relational databases, but why? Subversion uses SQLlite. The database might be for metadata...

I asked about user friendly front ends, like Microsoft Access, which write the SQL for you, based on drag and drop, filling in forms and so on. There's nothing quite like that in the open source world, and that's a frustration for people wanting to start small medical databases on desktops, and incubate them without bothering the IT department.

Panelists mostly agreed that these front ends are hard to develop and in short supply. In the old days, SQL itself was envisioned as the user-friendly front end, but it's not taught except in a few majors, such as finance. Steve reminded me that OpenOffice has some tools along these lines -- we did some testing over the break, just for kicks.

Speaking of front ends, my next session for today was Stupid Django Tricks (except they didn't seem stupid). Django is one of the Python flagships for writing web applications, also a space served by Ruby, PHP, Perl... Erlang.

Then Steve, Duncan and I went to American Cowgirls, close to the convention center, to kill time before Emma Jane Hogbin's talk on how to get more women involved in FOSS. Duncan and I compared notes on Arabic, a beautifully logical language, while I joked with Steve about a credentialing system based on tattoos (use your own imagination).