Monday, January 14, 2008

Rich Data Structures

Another trend that emerges when you view the Internet as a primary source of raw curriculum materials, is a willingness to build lesson plans around big data sets, as opposed to little, deliberately ridiculous in many cases, sample data sets in hardcopy.

So, for example, a chemistry class might begin with 'import pt' (for Periodic Table), and for the rest of the class, students follow along interactively e.g. pt.au.neutrons would spit back the number of neutrons in gold (we'll get to isotopes later). Entering all this data the first time is tedious, but chances are, someone has done that for you or (and this is important) has provided an API to some back end updatable database.

The ridiculous little examples were sufficient for computer science majors because their topic was "parsing" and the data might as well be nonsense, so long as the challenges were realistic, i.e. there were delimiters, other regular expressions, that might serve as raw material for programmed algorithms. Today we're using XML a lot, or YAML.

However as we move more deeply into this cyberfrontier, professors of other subjects start wanting more topical richness. Their ability to import a Shakespeare play as a namespace, or some other (possibly holy) writ, in the form of a densely populated module of highly organized multimedia, is nothing to sneeze at.

Related thread on comp.lang.python.