Reading the Visual Mind


by Jim Schnabel

August 19, 2008

“Imagine a general brain-reading device that could reconstruct a picture of a person’s visual experience at any moment in time.”

So begins a remarkable paper in the March 20 issue of Nature by researchers at the University of California at Berkeley, who report the development of a visual decoder that uses functional magnetic resonance imaging (fMRI) brain-scan data to determine, from a large set, what image a person is viewing. Led by Kendrick Kay, a graduate student in the laboratory of psychologist Jack Gallant, the researchers suggest that techniques like theirs might someday be used “even to access the visual content of purely mental phenomena such as dreams and imagery.”

“Gallant and his colleagues advance the field of fMRI mind-reading,” says Stanford psychologist Brian Wandell, “because they base their predictions on a model of the neuronal networks in the visual cortex of individual subjects. They go on to use that model to make successful predictions about the fMRI response to any image.”

A huge task

Determining how the brain takes raw input from the optic nerves and generates the experience of seeing things is difficult. It amounts to the reverse-engineering of a mechanism that has evolved over hundreds of millions of years.

Researchers have long known that the visual cortex contains myriad hierarchical networks of neurons that sift through raw visual data for various features, including luminance, orientation, curves, colors and motion. But these networks are so complex that neuroscientific models of them have for the most part performed poorly.

In recent years, neuroscientists have implanted electrodes into the brains of cats to record the firings of visual neurons directly, generating videos of the scenes the cats are seeing, but the resulting images have been low-resolution and very “noisy” (carrying fluctuations in signal strength and lost of “extraneous” information in the signal stream). Other neuroscientists, using standard statistical techniques, have tried to correlate raw brain-scan data from people who viewed images to the characteristics of the images. Such techniques do not model the brain in any sophisticated way, and they have been successful only with test-pattern-type images, not the naturalistic images a person sees in ordinary life.

Connecting the voxels

“Imagine that we begin with a large set of photographs chosen at random,” Kay suggests in explaining how his decoder works. “You [the test subject] secretly select just one of these and look at it while we measure your brain activity. Given the set of possible photographs and the measurements of your brain activity, the decoder attempts to identify which specific photograph you saw.”

The process is based on a model of visual processing networks derived from the work of Gallant and others. Given an image’s basic characteristics, such as the arrangement of elements in the image field and their orientations, the model predicts some of the visual-neuron activity the image will stimulate in an average person. It would predict, for example, that a relatively complex, fine-grained image of an angular building would generate a significantly different brain-activity pattern than that for a soft-focused, misty beachscape.

Currently it is not possible to view a subject’s neurons directly using fMRI. The scanning technique detects only the relative activity of several thousand tiny volumes— or “voxels”—of a subject’s brain, each of which contains not one but many thousands of neurons.

The prediction of neuronal activity by the basic visual-processing model therefore has to be translated into a prediction of what an fMRI scan will show during that same neuronal activity. And because no two brains are precisely alike in their visual-processing architectures, this image-to-fMRI-data decoder also has to be “tuned” for each subject.

Kay himself was one of the two subjects in the study; the other was fellow student Thomas Naselaris. To train the decoder to match their brain-scan patterns to image characteristics, they first viewed 1,750 photographs of everyday objects and scenes while they underwent fMRI scans. Then they tested the decoder’s reliability on a completely new set of 120 images. For each of the two subjects, the decoder generated a prediction for what his fMRI voxel patterns would be when he viewed these images. The decoder then tried to find the best match, among its 120 predicted patterns, to the actual fMRI voxel patterns observed when the subject viewed the images.

In the end, the decoder correctly identified 110 out of the 120 images Naselaris viewed, an accuracy rate of 92 percent. For Kay’s images the decoder’s score was 86 out of 120, or 72 percent accurate. An average “chance” score would have been one out of 120, or 0.8 percent.

“That their model could predict the subject’s fMRI responses to a wide variety of novel images, and could do so reliably, does represent an impressive step forward,” says Wandell.

Boston University neuroscientist Frank Guenther, who works on the decoding of speech-controlling networks in the brain, describes the report by Kay and his colleagues as “very interesting work that will indeed apply to other cortical decoding problems.” Guenther adds that it is “amazing” that the Berkeley researchers’ decoder can effectively guess the viewed image from such a large set, “given the fact that each fMRI voxel contains hundreds of thousands of neurons that are effectively averaged together when determining the voxel’s activity level.”

Not a mind reader

In a way, the Berkeley team’s fMRI-based decoder worked too well—it raised concerns in the popular press about the prospect of machines that could intercept private thoughts remotely.

The technology to “read minds” has long been demonstrated in other areas of neuroscience research. Electroencephalogram-based devices that enable a paralyzed person to control a computer cursor with his or her thoughts have been in use for years. More recent experiments with higher-resolution fMRI techniques have enabled researchers in tightly controlled clinical settings, to determine the private thoughts of subjects on matters that include product preferences, intentions to act, racial prejudices and sexual arousal in more or less real time. Some laboratories now hire out their fMRIs for private and corporate use as lie detectors.

“The brain doesn’t lie,” says Martin Paulus, a psychiatrist at the University of California, San Diego, who has worked with fMRI-based techniques to determine levels of drug addiction. “If I were to ask you, ‘Do you like to watch porn movies?’ you might say, ‘Well, no, of course not.’ And then I show you a porn movie, and your nucleus accumbens lights up, and I know that you’re not telling me the truth,” he says. “Therein lie some real ethical issues.”

Gallant agrees that “decoding brain activity could have serious ethical and privacy implications downstream.” But he also points out that there are many years worth of technical hurdles to be overcome before machines could covertly and remotely read people’s thoughts, and in the meantime even his decoder remains relatively primitive. For example, the decoder makes use of a relatively small set of image features.

“Our initial experiments used black and white, still images,” he says, “because including color and motion makes model estimation more difficult.”

The decoder also focused only on the primary visual cortex. Actual visual perception is known to involve not only “bottom-up” processing from primary sensory areas but also “top-down” inputs, essentially from memory areas, which tend to drive the perception of images into expected, readily-recognizable forms. “A more general brain decoding device would certainly have to take these top-down effects into account,” Gallant says.

The largest set of technological hurdles come from fMRI itself. “The fMRI technique is fairly noisy,” Gallant says, “because it doesn’t measure neural activity directly. Instead, it measures changes in blood flow indirectly caused by neural activity.”

Like a primitive camera system, fMRI can’t track rapid changes in brain activity. Several seconds must elapse, for example, between the presentation of an image to the subject and the appearance on fMRI of resulting blood flow changes. The tracking of real-world, fast-changing imagery would therefore be, at best, extremely difficult using such a technique.

Even with the static images used in the study, the low-time resolution of fMRI meant that Kay’s team’s decoder could produce accurate results only after the subjects viewed the 120-image series multiple times, averaging their voxel data over each presentation of a given image to reduce the effects of fMRI-related noise. To get their 92 percent and 72 percent accuracy levels, Naselaris and Kay actually had to view each image 13 times.

By contrast, when each image was viewed only once, the decoding model predicted correctly only 51 percent and 32 percent of the time for the two subjects—and it did so after the fact, not in real time.

Even those lower accuracy levels would have been impossible if the subjects had moved around within the fMRI housing during their viewings, thus blurring the voxel data. Outside the fMRI housing no recording could have taken place.

“Real-time decoding of dynamic brain activity will require new technology,” Gallant says.

In the meantime, he says, his lab will continue to develop the existing decoding model, not only to further the understanding of visual perception but perhaps also to enable devices to assist blind people.