Friday, December 27, 2024

Data Science for All

I was listening to one of Starmer's StatsQuests again, this one on Entropy, and it really hit me why I'm glad for data science having taken off, not leaving statistics in the dust so much as giving it a much needed makeover. That was my old lingo in the 1990s: math was overdue for a makeover, by which of course I included the need to phase in more Synergetics. That much hasn't changed. "More than Cosmetic" was a slogan, to counter the stereotypes "makeover" comes with.

In my high school, statistics was, like trig, one of those semester courses one could take towards filling a math credits requirement, even while opting to get off the main track: pre-calc + calc. The latter was considered the more stringently "college prep" path, whereas if one already knew the goal was a business degree and not to shoot for a "next Einstein" award, then why not take statistics, a kind of knucklehead math that'd be useful in Economics, a knucklehead science? 

I was happy to take stats and trig, but I took pre-calc + calc also, and was good in them. I placed straight into honors calc at Princeton, with Dr. Thurston as my prof. Then I went on to study linear programming (featuring its simplex algorithm) with one of the leaders in the field, Dr. Harold Kuhn. I'd circle back to linear algebra much later, when getting into the geometry of rotating polyhedra (we use matrices for that, if not quaternions).

My point here is I no longer have that old prep school mindset. I'm happy to see calc (Newtonian delta calculus) postponed until college, and have it feature inside a major, such as physics, or why not more stats? 

Slicing and dicing into infinitesimals is at the limit of discrete math (where we meet the so-called manifold), which math (discrete) became the focus of my lobbying at the state level. Let kids take a discrete maths approach all the way through high school if they like, with less focus on calc (by definition) and increasing focus on number and group theory, say, with computer programming.

Data science is providing the glue though, in bringing it back to everyday coping, like people do, with words like expectation and surprise, entropy and probability. Likelihood. These are the everyday words of common experience, and to see them treated mathematically, with notation, with symbols, is a kind of music to my ears in the sense of providing segues to what many find a turn-off.

Hey, I hadn't realized that Edward Teller and his wife, more Martians (Hungarian heritage), had contributed to Markov Chain Monte Carlo (MCMC) taking off, as later abetted (translated to stats language) by Hastings.

Data Science connects us to Information Theory, signal versus noise, entropy and cryptography, but by anchoring some of our everyday bread and butter notions as agentic humans in Universe. Bayesian thinking connects to machine learning, which is also the kind we do when push comes to shove.