Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] Data Science for Physicists - Is there a dataset all students should see?



I am tickled with the Genetic Algorithm method. Though a usable example can be made in less than a dozen lines of Basic,I show instead a coder creating one in Python to calculate a root of a cubic function here:
https://www.youtube.com/watch?v=4XZoVQOt-0I


Trained Neural Nets.
I recently did a deep learning trial method on machine health, by attaching a 3-axis accelerometer to a fan; capturing a string of data samples;repeating the process with a degraded machine sample (I dinged the fan and blades etc,)I ran the datasets through an app that chose a suitable neural net and produced an application using a small microprocessor to listen to the machine, and report impending trouble of the types I had taught it.This was essential run using pocket change to fund the experiment.
Self-trained Neural Nets.
I was greatly impressed with a video review of a self-classifier using a huge medical test database, which could associate clusters in selected dimensions of the high-dimension space.     When the clusters were shown to diagnosticians in the field, they could readily identify the disease associated with each cluster. The creator could then name the clusters in what was a poweful diagnostic device.





On Wednesday, November 16, 2022 at 03:51:22 PM CST, Paul Nord <paul.nord@valpo.edu> wrote:

Steve,

Yes, the HR diagram is one of the first examples that came to my mind.  I
have a colleague here that does a lab with the astronomy students to
produce an HR diagram from the Gaia dataset.  I should put you in touch.
There's a nice piece of software they use for that which doesn't require R.

The HR diagram is a perfect example of what I was suggesting.  So many
theories of stellar evolution were suggested from that plot.  The main
sequence, the gaps, and the outliers each motivate finding a good
theoretical explanation.

And there are so many other things you can do with the Simbad and Gaia
data.  There are many more dimensions than just position, brightness, and
color.  There are rogue stars in there who's velocity does not put them in
an orbit around our solar system:  Where did they come from, where will
they go?  Are they just passing through from another galaxy collision?  Or
did they recently have a close interaction with another galactic body that
flung them out?

That said, is an astronomy database the right data set for all
universities?  If you don't have an astronomer on staff, or if you'd like
to engage other faculty members in teaching analysis of a "Big Data" set,
what else might be the motivation?

Also, I'd like to suggest to the AAPT committees that we run a workshop to
teach an HR diagram lab exercise.  Perhaps I can get the two of you to
collaborate on such a thing for the Summer 2024 meeting?  It would be nice
to get a poster or 10 minute talk for the Summer 2023 meeting.  We're in
Sacramento in 23, and Boston in 24.

Paul

On Wed, Nov 16, 2022 at 3:15 PM Gollmer, Steve via Phys-l <
phys-l@mail.phys-l.org> wrote:

Paul,
I use the SIMBAD astronomical database (https://simbad.unistra.fr/simbad/)
for an analysis project.  I had my students use R to make a query to the
database and pull down stars of different spectral classes.  Using
brightness and parallax data, I had them generate an H-R diagram.  Nothing
as nice as those generated by researchers, but enough to get the main
sequence, white dwarves, and red giants.  I think there is a Python library
that gives you access to SIMBAD.  I rolled my own for R.  Gaia Archive (
https://gea.esac.esa.int/archive/) has a billion entries as opposed to
millions for SIMBAD.  I plan on adapting my project to this data in the
future, although the query procedure might not be the same.

Steve



On Wed, Nov 16, 2022 at 12:38 PM Paul Nord <Paul.Nord@valpo.edu> wrote:

Lots of lip service is given to the term "Big Data" these days.
Increasingly many fields of physics find themselves with huge sets of
data.  Undergraduate students are not typically exposed to analysis of
such
data sets unless they get involved in a research internship.  And even
then, in classic physicist style, students learn only enough about data
management to get the job done.  The data they analyze is very specific
to
one physics field and to a particular research project.

*Is there one data set that every physics student should see?*  Some
complex set of data that requires searching, classifying, and summarizing
data with more advanced statistical tools.  It would need to be in a
field
that is generally accessible to undergraduates.

That suggests that we might look to a kinematics domain.  But I could
imagine a couple of sets depending on the subfield of the instructor.
Astronomy might have one dataset while an undergraduate program that
emphasises materials science would use another.  The fundamental
pedagogical goals of teaching this analysis should be clear from the
results.  In whatever field students go into, they would feel prepared to
grapple with a large data set.


FOOTNOTE:  What Is Big Data?
As a simple definition of "Big Data" I'm going to say that it is any data
set that will not fit into an Excel Spreadsheet.  Excel allows 1M rows
and
16k columns.  (That's cell XFD1048576 if you were curious.)  But I think
we
are all aware that Excel breaks long before you fill up all of those
possible locations.  It may fail to load without lots of RAM.  And
recalculating anything complex in a large data set using Excel can be
painfully slow.  One is never quite sure if the computer is hung or if
Excel will finish in 5 minutes or 5 hours.  If your data set has between
a
few thousand and a million data values we'll call that "Big Data."
Functionally it means that one will need to use a tool other than Excel
to
analyze it.

This definition is rather small still in the modern data-driven world.
An
itemized list of sales at your local supermarket for a month would likely
exceed this size.  Receipts from the major online retailers will include
orders of magnitude more transactions.  Astronomy surveys, daily weather
recordings, particle physics experiments, and many more scientific
studies
are creating enormous data sets.  Even a simple sensor in the lab is
capable of generating thousands of data values per second.  Fortunately,
most physics lab experiments last less than 2 seconds.
=====

I'm working with the AAPT committee on labs to suggest that the
recommendations on curriculum include some data science concepts.  I
haven't yet convinced my colleagues to add this, but this idea is my
little
part of the discussion.
Intro level: Make an organized table in Excel.
Advanced level: ...?
I'd like to write, "Understand 3rd Normal Form," but that doesn't mean
anything to physicists.  Nor does it motivate the need.

Happy to hear your thoughts.  Or, critique anything I've said.

Paul
_______________________________________________
Forum for Physics Educators
Phys-l@mail.phys-l.org
https://www.phys-l.org/mailman/listinfo/phys-l



--
Steven Gollmer
*Senior Professor of Physics*
Science and Mathematics
*Cedarville University*
o: 937-766-7764
https://stevegollmer.people.cedarville.edu/
_______________________________________________
Forum for Physics Educators
Phys-l@mail.phys-l.org
https://www.phys-l.org/mailman/listinfo/phys-l

_______________________________________________
Forum for Physics Educators
Phys-l@mail.phys-l.org
https://www.phys-l.org/mailman/listinfo/phys-l