On Being Certain

Continuing the theme of books that have implications for meaning extraction, this post comments upon On Being Certain by Robert Burton, M.D.   Burton is the associate chief of the Department of Neurosciences at Mt. Zion- University of California Hospital.  (His website is rburton.com.)

In On Being Certain, Burton addresses the question of how we know something.  How do we know that the sky is blue, that the traffic light has turned green, that a flash of light is from a camera and not from an explosion?   Burton’s writing style is concise, relaxed, entertaining, and often quite unexpectedly funny.  He strikes a very nice balance on the continuum between being too technically detailed and not being scientific enough.  I highly recommend this book to anyone interested in neuroscience, artificial intelligence, text analytics, or meaning extraction.

The Feeling of Knowing

Burton opens On Being Certain with a discussion of how the feeling of knowing affects our mental state.  On page 5 (in the paperback edition) he provides a really cool exercise to demonstrate this, which I am going to reproduce here:

“Don’t skim, give up half way through, or skip to the explanation.  Because this experience can’t be duplicated once you know the explanation, take a moment to ask yourself how you feel about the paragraph.  As you do so, please pay attention to the shifts in your mental state and your feelings about the paragraph.

“A newspaper is better than a magazine.  A seashore is a better place than the street.  At first it is better to run than to walk.  You may have to try several times.  It takes some skill, but it is easy to learn.  Even young children can enjoy it.  Once successful, complications are minimal.  Birds seldom get too close.  Rain, however, soaks in very fast.  Too many people doing the same thing can cause problems.  One needs lots of room.  If there are no complications, it can be very peaceful.  A rock will serve as an anchor.  If things break loose from it, however, you will not get a second chance.

“Is this paragraph comprehensible or meaningless?  Feel your mind sort through potential explanations.  [David Seuss note inserted here:  are you uncomfortable as you do so?  Do you feel something is wrong as you read the paragraph and you can’t put your finger on it?  When I read it the first time I felt disjointed, off balance, out of kilter – and I wanted to hurry through it as fast as I could to remove the sensation of struggle.]  Now watch what happens with the presentation of a single word: kite.  As you reread the paragraph, feel the prior discomfort of something amiss shifting to a pleasing sense of rightness.  Everything fits; every sentence works and has meaning.”

It occurs to me that Burton’s compelling exercise presented above captures the experience of researchers facing the task of using traditional search engines to gain command of a body of knowledge.  Every one of the thousands of documents returned from the search is a little bit of relevant information, just like each sentence in the example passage above is a little bit of relevant information.  But the user that doesn’t know what it means, what the overall themes are, is left to flail away trying to synthesize the individual bits into an overall picture.  It is a struggle in part because there are so many documents, so many bits of information, and all of it is uncorrelated.  Business professionals doing market research or technical research with traditional search engines often experience the feelings of disjointedness and unease that manifest themselves when we know we don’t know.  Wouldn’t it be marvelous if the search engine could read the passage in advance, figure out the unifying idea, and whisper in our ear, “kite,” before we had to deal with the words ourselves?  That is what meaning extraction-enabled search engines strive to do.

The Committee In Your Brain

Further on, in the chapter on neural networks, Burton engages in a discussion of how the brain interprets sensory data that is also relevant to meaning extraction.   Generally, I hate analogies in scientific writing because of my suspicion that the use of analogies is often founded on the author’s belief that I, the reader, can’t understand the actual information, data, analysis, or reasoning.  (It is like the author is saying: let me dumb it down for you.)  However, Burton’s writing style is so self-deprecating and full of goodwill, and the analogy in this case so entertaining, that I found myself engaged by it.  Burton postulates that there is a sudden flash of light.  How does the brain decide  whether to notice it and how does the brain interpret what it is?  He suggests that the flash doesn’t go on a direct route to consciousness from the optic nerves, but first “goes to a holding station where it is scrutinized, evaluated, and discussed by a screening committee representing all of your biological tendencies and past experiences.  This committee meets behind closed doors, operating outside of consciousness in the hidden layer.”

Each committee member is a set of neural connections.  One represents a childhood memory of the sudden onset of an electrical fire after the flash of an electrical short in a toaster.  Another represents the sensitivity to a general alarm about the possibility of terrorism.  The third committee member is a composite memory of rock concerts.  The fourth committee member is a genetically based predisposition for heightened startle reflex for bright lights.  As Burton explains, “Each committee member has his own opinion and gets one vote.  After hearing all the arguments, each committee member casts his vote and they are tallied (weighted)…The childhood memory votes yes: send the flash into awareness.  The terrorist alarm network, fearing that the flash could indicate an explosion, votes yes.  The rock concert memory is blasé, has seen the same flashes a zillion times at rock concerts…The genetic predisposition votes yes.  The third member is outvoted and the flash is sent on high priority into consciousness.”

In this manner, the brain decides whether to pass the stimulus into conscious awareness (perhaps with interpretive tags like: Fire! Bomb! Danger!  Pay attention! Duck! Run!)  or to suppress the stimulus altogether so that you are not even consciously aware of the flash of light.

After a stressful and perhaps embarrassing moment (if, for example, you dove for cover) as your consciousness processes the stimulus along with the interpretations, it turns out that you are at a wedding and the flash was simply from a camera.  The rock concert committee member will have enhanced status now, and the other committee members will raise the weight they give him going forward.  Plus, I imagine, there is a new member of the committee now, the wedding-memory member that will have the attitude that flashes as no big deal.  Burton explains, “The next time a similar flash is received, the committee reminds each of the members that last time was a false alarm.  Some of the committee members who previously voted yes now feel sheepish and don’t vote.  The committee votes to nearly completely suppress the image.  The genetic predisposition is ignored.  So you barely notice the flashbulbs going off while you watch your child play Elmer Fudd in his grammar school play.”

In physical terms, the committee members are bundles of neurons.   What is happening is that some bundles of neurons representing specific memories, interpretations, skills, or predispositions have stronger and weaker ties to other bundles.  The strength of these connections influence the weight one bundle gives another.  As experiences build up, the connections between some bundles of neurons get stronger, others weaker.  In this way the brain is literally rewiring itself all the time with the effect being that we learn.

There are two implications for this discussion of the brain’s committee process for interpreting data from the outside world.  First of all, observe that for any interesting problem, there is rarely only one possible interpretation.  The flash could be a fire from an electrical spark, a bomb, a flashbulb, or something we haven’t seen before.  At the level of the first pass, all interpretations are invoked.  For a meaning extraction-enabled search engine, we see an analog to this in that sentences or even documents may be interpreted as meaning different, even opposite, things.  We may see a discussion in a particular news story of how commodity prices have risen last quarter, adversely affecting profits, but in another news story there is an additional discussion of how commodity prices fell in a prior quarter, raising profits.  The meaning extraction-enabled search engine may surface scenarios like higher commodity prices have decreased profits, as well as lower commodity prices have increased profits.  Well, you ask, which one is it?

This is where the second similarity comes in, that one being weight. While a search engine doesn’t have bundles of neurons that strengthen or weaken ties, it can count the number of documents a scenario of interpretation is found in, and that gives the user all that is needed to make the selection of which scenario is most likely the best one.  For example, in recent financial news the body of stories will heavily skew toward explaining the most recent quarter, and the number of articles about higher commodity prices decreasing profits will be far more numerous than those articles that give the opposite explanation for the prior quarter.

Meaning extraction doesn’t have to get it right at the level of every sentence, paragraph, or even document, so long as we get it right at the level of the body of documents representing the entire search result.  We can weight the competing scenarios just like the committee in your brain weights its judgments.

The Future

There is one aspect of Robert Burton’s description of how the committee of the brain works that is beyond our current capabilities in meaning extraction, and that is the spontaneous adjustment of the interpretation criteria in response to both accurate and inaccurate calls.  At the current state of the art, meaning extraction systems represent human understanding with rules for interpreting text.   For example, a human domain expert adds the rule:  if the algorithm finds commodity price increase near falling profits infer higher commodity prices have decreased profits.  Meaning-extraction enabled search engines are not neural networks which can at least in theory adjust themselves, but are closer to expert systems that apply rules developed and maintained by human domain experts.

Neural networks have not worked well in practice for large scale applications, for a variety of reasons most often relating to scalability.  For example, we can efficiently apply a set of rules to a body of 30 million journal articles to extract scenarios of meaning, while a tiny fraction of that number will defeat a neural network running on high-end enterprise servers.  (We humans don’t have anything remotely close to the brain’s computing capacity in our IT infrastructure.)  So the idea of a self-learning meaning extraction- enabled search engine is still out there, beyond what is practical right now.

But in the 50-year history of the software industry there is one lesson that has been learned by software pioneers over and over.  And that is if the hardware can’t do it now, just wait a little while.