Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: Finding information on the Internet



On 04/05/2003 03:46 PM, Dwight Souder wrote:

The "information superhighway" is jammed with garbage trucks! Things
were so much easier when students got their information from the
books in the library.

Were things really so much easier?

My experience has been that brick-and-mortar libraries
are jammed with non-authoritative information. I find
the web to be
-- much much better in some areas,
-- about the same in some areas, and
-- markedly worse only in a few rather odd areas.

An example of an odd area is information that is
out-of-date but not old enough or important enough
to be a classic. You can find this stuff in libraries
but usually not online. Few people care.

An example of an area where the web excels is medical
info. I can find good information that is simply
not available in 99+% of the world's libraries, and
requires an all-day fishing expedition even given
access to an exceptional library.

> frustrated using the search engines and finding nothing but "junk".

With searching, as with many other activities,
doing it badly is easy. Doing it well requires
skill.

I've examined the logs of some search engines.
It is just amazing how unsophisticated the typical
query is. Surprisingly, the engine often returns
useful information in response to such a query,
performing far better than basic principles
(garbage in, garbage out) might lead you to expect.

Google in particular is famous for pioneering
methods for finding the most authoritative sites.
Before google, most engines just returned the
"closest match" to the query, without trying very
hard to select authoritative sites.

Recommendation: Spend some time learning how to
use the search engine like a virtuoso. Read the
instructions, e.g.
http://www.google.com/help/

Especially if you are doing research (on just about any
topic...chemistry, physics or non-science related), where do you
start and would recommend?

Let's do some examples.

If you want fundamental constants, you can try
http://www.google.com/search?q=constants

which leads immediately to
http://physics.nist.gov/cuu/Constants/
as it should.


Suppose you are interested in homeopathy. Start
with
http://www.google.com/search?q=homeopathy

The first 12 hits are all homeopathy partisan sites.
But the 13th is titled "homeopathy, the ultimate fake"
so that ought to raise your suspicions that there is
another side to the story, if you didn't already know
that. So try this:
http://www.google.com/search?q=homeopathy+fake+OR+quack
http://www.google.com/search?q=homeopathy+skeptic

That should get you all the information you need to
figure things out.

You see that a more-refined search cuts through the
junk. Often this is an iterative process. You can
start with a naive query, see what sort of junk
comes back, and then design a query to filter out
the worst of it. Then see what's left, and iterate.

I've been trying to think of a counterexample, namely
something that I know is out there, but is not readily
findable using a modest level of search skill. The
only example I can think of is something that has
been up for only a few days and hasn't been spidered
yet.

Does anybody have a specific counterexample we could
discuss?

> We've been going over some things to look for on a web-site
> to try to judge if the source is reliable.

From the keen-grasp-of-the-obvious department (tm):
you look at the _content_ and see if it makes sense.

For physics issues, that shouldn't be too hard. If
it is just PbBA (Proof by Bold Assertion) you
should give it small weight compared to a site that
explains things in enough detail that you can
rederive the results.

Then you start cross-checking. Collect N documents
and see if they agree. You will have to go to the
trouble of understanding what they are saying.
This will require an effort for the first document,
but the other N-1 should be easy; just skim them
looking for inconsistencies. If all N documents
agree with each other, and with your common sense,
then you probably aren't risking much if you go
along with them. If they disagree, you will have to
reproduce the critical steps and see who got it right.

For non-scientific questions, if you just want to
find out what's popular without any notion of right
versus wrong, then google has already done that for
you. It's unlikely that you can do better, unless
you have vast resources at your disposal. For info
on google's PageRank algorithm, try:
http://www.google.com/search?q=pagerank