Tuesday, March 10, 2009

PPUG 2009.3.10

PPUG (Python)
Wow we're a huge group tonight, packing a big room. What beer joint will handle this crowd (Produce Row tonight).

PyParsing (Brett Carter):

Brett Carter is talking about PyParsing ("the best thing ever", heh). It's a Python library that easily lets you create BNF style grammars. It's geeky, very CS (shades of bison).

Backus-Naur Form (BNF) is a formal notation for languages. Is this linguistics, "are we not men?" (citing Devo)? Yes, all 35 - 40 of us are XYs (not always the case, as I've met many an XX "FOSS boss").

There's a full grammar specification for Python 2.6 on the screen (yow -- or maybe not that bad?). "Regular expressions on steroids" might be a way to describe this language game. How simple an application might still be interesting?

Now we're eyeballing a parser for SQL statements, now for Icon (another language). Why Brett, why? Trying to convert a CVS repository to Mercurial...

Machine Learning (John Melesky):

I lost power in the middle of blogging about Machine Learning, unfortunate but I think there's a video recording happening somewhere...

The presentation was all about geometric approaches to analyzing documents (hyper-dimensional -- lots of geographic metaphors) versus the more statistical, as in Bayesian approaches (naive and not). Detecting spam is a common example for the latter technique.

Targeting advertising to specific contexts is a typical application for machine learning, automatic tagging. Getting a handle on large stashes of documents, like blog posts, is the name of the game.

Divmod's Reverend
(Bayes was a reverend) is way better than Ruby's Bishop, the port, in terms of speed and accuracy (plus Bayes wasn't a Bishop).

Support Vector Machines: PyML... Orange (academic i.e. semi-broken, still usable). Technique: munge with math to make the non-linear linear, then employ linear separation.

Sorry I'm skipping the light bulb jokes (despite their mnemonic value). John also said: "I have no Python code, I am a horrible, horrible man, but I do have an excuse, we found rats in our kitchen this morning". This guy cracks me up.

PyTyrant etc. (Michael Schurter):

Tokyo Cabinet
, a data store (like for key:value pairs, B+ tree, other formats) is wrapped by Tokyo Tyrant, with PyTyrant the Python remote client. The Tokyo Tyrant daemon is the database server and offers a lot. It has replication, Lua extensions, hot backup, http option, is high concurrency, multi-threaded. PyTyrant is lagging, probably because Tokyo Tyrant is evolving so quickly. Other languages are ahead at the moment.

Followup: this post to the PPUG list; link to Michael's slides.