Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-L] integrated science, data science, big data



Hi --

Let's take a higher-level look at the big-data discussion.

Just because it needs to be done doesn't mean the physics
faculty has to do it, althogh sometimes it seems that way.
A big part of the introductory physics class is remedial
math, but it doesn't have to be that way.

There is such a thing as integrated science instruction.
At Princeton, a typical first-year homework problem is
to write a program to simulate the diffusion of some
biomolecule. That ticks the boxes for computing, physics,
chemistry, and biology all at once.

This isn't a particularly new idea. When I was a college
freshman, many moons ago, we noticed a conspiracy between
the math and physics classes. Math did derivatives, physics
did principle of virtual work. Math did matrices, physics
did photon polarization. And so on. There was no formal
arrangement AFAIK; it was just that the math professors
and physics professors were friends. They talked. They
shared ideas about authentic homework exercises.

In this vein, let me point out that there are some huge
natural-language corpora. One possible exercise would be
to use that, plus some nontrivial programming, to form an
estimate of the entropy of English text. That ticks the
boxes for big data, algorithms, linguistics, and physics
(twice). Entropy is important in every branch of physics,
and big data techniques are becoming almost as important.

There's also such a thing as computational drug discovery.
This involves reeeeeally big databases. The data is so
valuable that you might have a hard time laying hands
on it, but on the other side of the same coin the drug
companies are so desperate for trained data scientists
that they might bend over backwards to help you teach.

I'm almost afraid to mention it, but there is a monstrous
industry that uses big data to target consumers and voters.
That's why nowadays it is almost impossible to buy bread
or milk or anything else without coughing up an ID number.
Cambridge Analytica and all that. Nasty business.

Since this is at heart a pedagogy list, let me add that
there are some interesting pedagogical and psychological
corpora. There is fun stuff you can do with this, e.g.:
Wang & Bao (2010)
Analyzing Force Concept Inventory with Item Response Theory
https://arxiv.org/abs/1007.5473

Physicists often argue, rightly, that the skills learned
in physics class are portable to other fields of endeavor.
My point is that the reverse is sometimes true. Physics
majors need to learn big data techniques, but if they
learn that in some other context not much is lost.


So ...... putting on my science manager hat ......

I suspect it might be about time to convene a Big Data
BoF (birds of a feather) meeting. I find pizza helps with
this. Order a bunch of pizza and invite all the big-data
guys you can find, from physics, astronomy, linguistics,
biochem, educational psychology, comp sci, B-school, and
everywhere else you can think of. Ask the first-order
invitees who else should be invited.

It would be prohibitively difficult to integrate the
whole university-wide instructional program on Day One,
but you don't need to do that. Instead, start by coming
up with one integrated course that ticks as many boxes
as possible. It might help to start by identifying a few
interdisciplinary exercises and projects. Then figure
out who's going to teach it. I predict students will
sign up in droves.

You know what else helps with this? Money. See if you
can get a grant from somewhere, to buy somebody a year
of relief from teaching and other routine duties, so
they can do the curriculum development work.

I guarantee this is not easy. However, the alternatives
are worse in every way: they are just as difficult, and
end up in a less-desirable place.