Rapid discrimination of visual patterns

8

Click here to load reader

Transcript of Rapid discrimination of visual patterns

Page 1: Rapid discrimination of visual patterns

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-13, NO. 5, SEPTEMBER/OCTOBER 1983 857

A C K N O W L E D G M E N T

The author would like to thank Prof. Kenji Hiwatashi for his encouragement during this work.

R E F E R E N C E S

[1] J. R. Anderson and G. H. Bower, Human Associative Memory. Washington DC: Winston, 1973.

[2] Ν. V. Findler, Ed., Associative Networks'. Representation and Use of Knowledge by Computers. New York: Academic, 1979

[3] T. Kohonen, Associative Memory: A System-Theoretical Approach. New York: Springer-Verlag, 1978.

[4] J. A. Anderson, "Neural models with cognitive implications," in Basic Processes in Reading: Perception and Comprehension, D. LaBerge and S. J. Samuels, Eds. Hillsdale, NJ: Erlbaum, 1977.

[5] Y. Hirai, "A template matching model for pattern recognition: Self-organization of template and template matching by a disinhibi-tory neural network," Biol. Cybern., vol. 38, pp. 91-101, 1980.

[6] , "A learning network resolving multiple match in associative memory," in Proc. 6th Int. Conf. Pattern Recognition, 1982.

[7] G. Palm, "On associative memory," Biol. Cybern., vol. 36, pp. 19-31, 1980.

[8] T. Kohonen and M. Ruohonen, " Representation of associated data by matrix operations," IEEE Trans. Comput., vol. C-22, pp. 701-702, 1973.

[9] T. Kohonen and E. Oja, " Fast adaptive formation of orthogonaliz-ing filters and associative memory in recurrent networks of neuron­like elements," Biol. Cybern., vol. 21, pp. 85-95, 1976.

[101 V. Chan-Palay, Cerebellar Dentate Nucleus: Organization, Cytology and Transmitters. New York: Springer-Verlag, 1977.

[11] M. Ito, M. Sakurai, and P. Tongroach, "Climbing fiber induced depression of both mossy fiber responsiveness and gulutanate sensi­tivity of cerebellar Purkinje cells," / . Physiol., vol. 324, pp. 113-134, 1982.

Rapid Discrimination of Visual Patterns JAMES R. BERGEN AND BELA JULESZ

Abstract—Experiments involving the rapid discrimination of visual pat­terns are used to infer the spatial information available to an observer within the first few hundred ms of inspection. Eye movements are pre­vented by a very brief presentation of the stimulus, and the inspection interval is terminated by a presentation of a masking pattern. It is shown that detection of a single vertical target line segment, embedded in an array of differently oriented background segments, improves with the increase of mask delay. This improvement is rapid if the difference in angular orienta­tion between the target and background segments is large, but it becomes much slower as this difference is reduced. For a 90° orientation difference, reliable detection of the target is obtained in about 60 ms, while for a 20° difference, over 200 ms is required. The reduction of the area in which the target may lie reduces the inspection time that is required to determine the target's presence or absence. The phenomena are invariant under changes of the spatial scale within the fovea and parafovea. These results are interpreted in the context of a model in which the diameter of the area which can be searched in parallel is proportional to the distance in a feature space between the target and background elements. The geometry of this feature space is similar to the functional architecture of the visual cortex. A theory of texture perception based on a qualitative "all-or-none" feature space of "textons" is described in {1]. In this paper a quantitative model that is shown to be essentially equivalent to the previous theory in the limiting case of very large feature differences is proposed.

Manuscript received September 21, 1982; revised July 20, 1983. B. Julesz is with Bell Laboratories, 600 Mountain Avenue, Murray Hill,

NJ 07974. J. R. Bergen was with Bell Laboratories. He is now with the David

Sarnoff Center, RCA, Princeton, NJ 08540.

W H E N A VISUAL pattern is inspected under nor­mal viewing conditions, we make frequent shifts of

fixation, seldom staring at a single point for more than a few tenths of a second. Consequently, our impression of the pattern is the result of a sequence of different retinal images. Such inspections that use eye movements can be prevented under experimental conditions by using very briefly presented stimuli that disappear before the observer has time to initiate an eye movement. Since it takes well over 100 ms to generate a voluntary eye movement, this condition is easily met. Due to visual persistence, however, even a very briefly presented stimulus is available for inspection over a period of about 300 ms. Thus if we wish to make the inspection interval shorter than this, the test stimulus presentation must be followed by an erasing (masking) stimulus that overwrites the afterimage of the test. The experiments discussed in this paper use this technique of temporal restriction to determine the type of information extracted in very rapid visual inspection.

The experiments that will be described require the ob­server to discriminate among a number of different stimu­lus conditions. One type of stimulus is shown in Fig. 1. Here the observer must decide whether the array of line segments consists of all horizontal ("background") seg­ments, as shown in Fig. 1(a), or if one (the "target") is

0018-9472/83/0900-0857$01.00 ©1983 IEEE

Page 2: Rapid discrimination of visual patterns

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-13, NO. 5, SEPTEMBER/OCTOBER 1983 857

A C K N O W L E D G M E N T

The author would like to thank Prof. Kenji Hiwatashi for his encouragement during this work.

R E F E R E N C E S

[1] J. R. Anderson and G. H. Bower, Human Associative Memory. Washington DC: Winston, 1973.

[2] Ν. V. Findler, Ed., Associative Networks'. Representation and Use of Knowledge by Computers. New York: Academic, 1979

[3] T. Kohonen, Associative Memory: A System-Theoretical Approach. New York: Springer-Verlag, 1978.

[4] J. A. Anderson, "Neural models with cognitive implications," in Basic Processes in Reading: Perception and Comprehension, D. LaBerge and S. J. Samuels, Eds. Hillsdale, NJ: Erlbaum, 1977.

[5] Y. Hirai, "A template matching model for pattern recognition: Self-organization of template and template matching by a disinhibi-tory neural network," Biol. Cybern., vol. 38, pp. 91-101, 1980.

[6] , "A learning network resolving multiple match in associative memory," in Proc. 6th Int. Conf. Pattern Recognition, 1982.

[7] G. Palm, "On associative memory," Biol. Cybern., vol. 36, pp. 19-31, 1980.

[8] T. Kohonen and M. Ruohonen, " Representation of associated data by matrix operations," IEEE Trans. Comput., vol. C-22, pp. 701-702, 1973.

[9] T. Kohonen and E. Oja, " Fast adaptive formation of orthogonaliz-ing filters and associative memory in recurrent networks of neuron­like elements," Biol. Cybern., vol. 21, pp. 85-95, 1976.

[101 V. Chan-Palay, Cerebellar Dentate Nucleus: Organization, Cytology and Transmitters. New York: Springer-Verlag, 1977.

[11] M. Ito, M. Sakurai, and P. Tongroach, "Climbing fiber induced depression of both mossy fiber responsiveness and gulutanate sensi­tivity of cerebellar Purkinje cells," / . Physiol., vol. 324, pp. 113-134, 1982.

Rapid Discrimination of Visual Patterns JAMES R. BERGEN AND BELA JULESZ

Abstract—Experiments involving the rapid discrimination of visual pat­terns are used to infer the spatial information available to an observer within the first few hundred ms of inspection. Eye movements are pre­vented by a very brief presentation of the stimulus, and the inspection interval is terminated by a presentation of a masking pattern. It is shown that detection of a single vertical target line segment, embedded in an array of differently oriented background segments, improves with the increase of mask delay. This improvement is rapid if the difference in angular orienta­tion between the target and background segments is large, but it becomes much slower as this difference is reduced. For a 90° orientation difference, reliable detection of the target is obtained in about 60 ms, while for a 20° difference, over 200 ms is required. The reduction of the area in which the target may lie reduces the inspection time that is required to determine the target's presence or absence. The phenomena are invariant under changes of the spatial scale within the fovea and parafovea. These results are interpreted in the context of a model in which the diameter of the area which can be searched in parallel is proportional to the distance in a feature space between the target and background elements. The geometry of this feature space is similar to the functional architecture of the visual cortex. A theory of texture perception based on a qualitative "all-or-none" feature space of "textons" is described in {1]. In this paper a quantitative model that is shown to be essentially equivalent to the previous theory in the limiting case of very large feature differences is proposed.

Manuscript received September 21, 1982; revised July 20, 1983. B. Julesz is with Bell Laboratories, 600 Mountain Avenue, Murray Hill,

NJ 07974. J. R. Bergen was with Bell Laboratories. He is now with the David

Sarnoff Center, RCA, Princeton, NJ 08540.

W H E N A VISUAL pattern is inspected under nor­mal viewing conditions, we make frequent shifts of

fixation, seldom staring at a single point for more than a few tenths of a second. Consequently, our impression of the pattern is the result of a sequence of different retinal images. Such inspections that use eye movements can be prevented under experimental conditions by using very briefly presented stimuli that disappear before the observer has time to initiate an eye movement. Since it takes well over 100 ms to generate a voluntary eye movement, this condition is easily met. Due to visual persistence, however, even a very briefly presented stimulus is available for inspection over a period of about 300 ms. Thus if we wish to make the inspection interval shorter than this, the test stimulus presentation must be followed by an erasing (masking) stimulus that overwrites the afterimage of the test. The experiments discussed in this paper use this technique of temporal restriction to determine the type of information extracted in very rapid visual inspection.

The experiments that will be described require the ob­server to discriminate among a number of different stimu­lus conditions. One type of stimulus is shown in Fig. 1. Here the observer must decide whether the array of line segments consists of all horizontal ("background") seg­ments, as shown in Fig. 1(a), or if one (the "target") is

0018-9472/83/0900-0857$01.00 ©1983 IEEE

Page 3: Rapid discrimination of visual patterns

858 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-13, NO. 5, SEPTEMBER/OCTOBER 1983

(a)

(b)

(c)

Fig. 1. (a) Stimulus array consisting of all horizontal line segments, (b) Stimulus array with one vertical "target" line segment, (c) Masking array. These stimuli were presented on CRT display in which the lines were white against a dark background. The total angular subtense of the array is 15 when viewed from a distance of 1 m.

vertical, as in Fig. 1(b). The position of the vertical seg­ment, if present, is unknown. Under conditions of free viewing, the observer's response to this task is rapidly to foveate the target. It is of interest, therefore, that even if the stimuli are presented for only 15 ms, precluding any possibility of eye movements, the observers will dis­criminate between the two stimulus conditions with com­plete accuracy. As we see in Fig. 2, if the inspection interval is terminated by a masking pattern (Fig. 1(c)) of a 60-ms duration, essentially error-free discrimination is ob­tained unless the inspection time falls below about 60 ms for one observer (JRB), and about 100 ms for the other (AMB). This interobserver variation seems to be quite consistent from one task to another. The remaining data are all for the former observer unless otherwise noted. This result shows that the eye movement to foveate the target is not part of a scanning process in search of the target, but rather is something that occurs well after the target has been detected. Note that the detectability of the target goes from chance (50 percent) to perfect (100 percent) in just a few tens of ms. If we decrease the difference in angular orientation between the target and background elements, the response curve changes shape.

As shown in Fig. 3, we keep the target element vertical, but change the orientation of the background elements. Fig. 4 compares the increase in detectability with the inspection time for the 90° orientation difference with that for smaller differences. As the angular difference is re-

100

S U B J E C T

t A M B ο

/ r> J R B ·

I I I I I I I I I I I I 60 90 120 150 180 S 0 A IN M I L L I S E C O N D S

Fig. 2. Data showing probability of correct discrimination between the two stimulus conditions as a function of mask delay or stimulus onset asynchrony (SOA). Data for two subjects are shown. The angular difference between target and background elements is 90°.

\ X x \ \ \ χ \ \

Ν N \ \ I \ \ \ Ν + \ \ ^

\ \ \ \ \ \ Ν \ \ \ \

^ \ \ \

(a)

\ \ \ \

χ \ \ \ \

Ν \ \ V \ ^ \ \ \

I \ \ \ \ \ \ \ \ \ \

\ \ \ \

(b)

\ \ \ * \ \ \ \ \

\ \ \ \ \ \

\ \ \ \

(c)

Fig. 3. Stimulus arrays similar to those in Fig. 1 in which orientation of background elements has been varied. The background orientations are as follows: (a) 45°. (b) 20°. (c) Ten°. Note that the target element is always kept vertical.

duced, the slope of the function becomes less steep. When the difference is only 20°, about 200 ms are required before the asymptotic level is reached, while for a 10° difference the afterimage has apparently decayed below a useful level before reliable detection of the target is achieved.

Since the target is the same in all cases, we cannot attribute this variation in the time required to detect it to differences in the detectability of different orientations. We are left with the implication that small differences in angular orientation of the line segments take longer to detect than large ones.

Page 4: Rapid discrimination of visual patterns

B E R G E N A N D J U L E S Z : R A P I D D I S C R I M I N A T I O N O F V I S U A L P A T T E R N S 859

ο u 80

• - 90° ο - 75°

• - 60° - • - 45°

^ „ A - 30° * " * Δ — 20°

; ffr, 10°

ι I ι I ι 1 I ι I ι I ι 1 ι I ι I ι 1 0 30 6 0 9 0 120 150 180 210 2 4 0 270 3 0 0

SOA IN MILLISECONDS

Fig. 4. Data analogous to those shown in Fig. 2 measured with stimuli such as those in Fig. 3. The background orientations used are 90°, 75°, 60°, 45°, 30°, 20°, and 10°. The 90° data are redrawn from Fig. 2.

This result, on the surface, is not very surprising. It would certainly be counterintuitive to find the opposite, and arbitrarily small differences, which are invisible, must certainly take "infinitely long" to detect. However, we are left with the question as to why clearly visible small differences take longer to find. One type of intuitive inter­pretation is as follows: "you have to look more closely to see small differences, so searching the space takes longer." We must remember, of course, that searching by eye move­ments cannot take place under our experimental condi­tions, so if we wish to pursue this class of explanation we need to consider carefully what we mean by "look more closely" and "searching the space." Another class of inter­pretations proposes that the difference between the re­sponses to the lines of similar orientation is smaller and thus more likely to become lost in noise internal to the nervous system. This provides a reason for why large differences are more reliably detected than small ones when viewed under comparable conditions, i.e., for the same time, but it requires additional assumptions to ex­plain the reason for the increase in detectability with time.

The idea that seeing smaller differences requires a more detailed inspection of the response space may be more precisely formulated as follows. Imagine a three-dimen­sional space in which the responses to our experimental stimuli occur. Let two of the dimensions correspond to the retinal location and let the third dimension correspond to angular orientation. In this space, the spatial position and orientation of the line segment generating a particular response are defined completely by the position of that response in the response space. Thus the task of deciding whether one segment has a different orientation from the others becomes one of determining whether one response is separated from the others along the orientation axis. This suggests that the ability to perform the task depends on the level of positional resolution within the response space. For large orientation differences only crude positional informa­tion is required, while for smaller differences more accurate localization is necessary to decide whether there is a dif­ference in orientation. This leaves us with the question of why it takes longer to obtain more accurate positional

information. It should be emphasized that the resolution that we are considering here is for the localization of the response within the response space. It is unrelated to acuity, which depends on high spatial-frequency sensitivity.

Returning to the metaphor of needing to look more closely, we may now restate this idea as follows: " in order to obtain more accurate positional information, the volume being inspected must be reduced." In other words, the precision of positional information available varies directly with the amount of the response space being considered. If only crude positional resolution is required, a large region of the space may be inspected simultaneously, i.e., in parallel. If better positional resolution is required, the size of this region must be reduced, and serial inspection of different areas of the response space may become neces­sary, thus increasing the time required.

The essence of these intuitive observations can be for­malized in two assumptions:

1) the extent of the region in the response space that can be inspected in parallel is proportional to the positional resolution required to discriminate between local features; and

2) this positional resolution cannot be independently varied for a single dimension.

Thus, the uncertainty in position for any response is the same along all dimensions, subject to normalization. The region inspected in parallel is often called the "aperture of attention." Since by assumption 2) each axis of the space has the same resolution, by 1) the extent of the aperture of attention must be the same in all directions. Thus the shape of the Λ-dimensional "sphere of attention" can be de­scribed by a single diameter. Since the term "at tent ion" has higher level cognitive connotations, we prefer to con­tinue to refer simply to " t h e region inspected in parallel."

In all of the experiments described thus far, the target may appear at any of the 30 locations in the outer two shells of the stimulus field. One way or another the region of the stimulus space, corresponding to all possible loca­tions of the target, must be inspected. If the possible target positions are restricted to a smaller region, the model just described predicts an improvement in performance. Fig. 5 compares the 90°, 45°, and 30° data from Fig. 4 with those obtained using the same orientations, but restricting the target to the 12 positions of the second shell. The 30° and 45° data show an improvement in performance in the restricted condition that is best described as a shift to shorter times of about 20 ms. The 90° data also show an improvement for some inspection times, however, no change in performance is seen at the shortest times (30 ms), so the change in curve shape is better described as steepening. One interpretation of this difference in the effect is that there is a minimum time required for the response being utilized to build up. The extremely rapid rise in perfor­mance between 30 and 45 ms suggests that some such absolute limit is being reached.

We can summarize these results by saying that there is a trade-off between the angular difference between the target and background elements and the size of the area that

Page 5: Rapid discrimination of visual patterns

860 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-13, NO. 5, SEPTEMBER/OCTOBER 1983

: // A N G L E S H E L L

If 9 0 ° 2

- 11 / / ° 90° 2+3

- if / / " 45° 2 i J A D 4 5 ° 2+3 - 1 / A 3 0 ° 2 ο

ι ι I

/ Δ

1 1 1 1 1

30°

1 1

2 + 3

1 1 1 1

30 60 180 210 90 120 150 SOA IN MILLISECONDS

Fig. 5. Data showing the effects of spatial restriction of the target. The open symbols show the 90°, 45°, and 30° data from Fig. 4. The filled symbols show measurements made using same background orientations but with target restricted to twelve locations in the second hexagonal shell.

+ + * + + + x * JC + + + + Χ X + + + * * + x + + χ χ > < <r-~?j ν ^ v < . v * + + * f t η χ * J > v L > " W > < L < v * * + + > ^ Λ Ι - χ * * * f + χ 4 χ * + > Γ < -Λν/ΓΆ j r ^ > v + * J A U ^ + * +

• + * j r L J r + * Γ Γ -7Γ

. · + + J jjr + + x * ^ r L j r A < u J ^ > r χ f + H v ( + \ + ^ + ^ r u r > * + % + j f ^ v < ν ~7< ^ v + + * + -J<~7v+* * + * + * U J V V

·++ + ^ " ί Α - ΐ ^ Α ) > ν , ν ^ Λ χ + + + < ν ν - < ~ + * * * % * + + + * χ + > ^ ^ n + * * + f j / ^ J j > , . J > u v - r * + * + r v < u * x + * + * + * . , . , i + J f > < c v , * * * ^ < Λ Γ ^ Λ ( ' ν >v + + * + > < * * + * * + + * + + * + < r + J ^ A J < > > < V C U ) " * + X + + x + + ) ( X * n - \ A j + * + + L > >~/r "٥^ν» r ^ r >+ ++ + v r >L * * χ + * * χ* + x ++s / f < Ll. >f "7^ > Λ 1 . + * + * ν - ν Γ ^ + + + + + + + x + K + ) t v r J j

* + + χ χ + * * + + + * X + H i ' A 1 J 1 < >~7>^< < ^ w U Γ Γ Γ ^ Λ

* + + x + + % x + + * χ + >t + •f't + n+ j n r r r f u J >< v r *K v n / s u > χ + + χ + + ) v v < L A j ) w

A v ) A > * ) ν

Λ - 1 ' -+ + * + + + +x + x + + χ + + * + x+ + { w . J ' O ^ v - v J< L 1 V

A ( U V J 1 * χ + ^ * + + + + + ^ ) ( + + + + + ^ Λ Λ > 1 Γ Ι , < J A V ( > A j v ) v J < -A Y< V ^ ^ l i ^ XS. ^ >hy- ^ l-y^vV, j j v - α Γ γ λ ^ < -Λ^ J >U A V <

± λ -iT t-\-f- *-1 λ v ν- τ *7λ j < ' _ N / >-^^>~;u^v/>^v- / * v ~'t .

ν ^ » - < u j r r ) α λ < r > v > r T - t > < < > w ^ i - * ^ ^ τ ^ τ - τ ^ Γ ^

ν ^ > ( Γ Γ Γ >^ >U r L -W - T W>C J. ^ - ^ l ^ ^ v T i - i j r J t

>TV<-L< << A > r < ( 1 f ^ y j v j > y ^ S ^ ^ ^ ^ f r > 1 f ^ > K > - i - U - ^ V ' - V ' - - \ T ' - Λ > Α Τ ~ < < ^ > ^ L A v t Γ > L ^ J - 1 1

Fig. 6. Demonstration of texture segregation. For description, see text.

110i

must be searched. Reducing the difference in angular orientation increases the time required to detect the target, while restricting the locations in which it may occur de­creases the time.

The next set of data that we shall examine concerns stimuli that are somewhat more complicated than the line segments used thus far. These consist of pairs of perpendic­ular line segments that meet in one of the following three ways: at one end, forming an " L " ; with the end of one line touching the midpoint of the other, forming a " T " ; and by crossing in the middle, forming a " + " or an " X . " We can refer to this last element either way, because in this set of experiments the angular orientations of all elements are varied at random. When these elements are used as targets and background elements in experiments such as those discussed above, we obtain behavior similar to that which we have already seen. Before examining this in detail, however, we consider the phenomenon displayed in Fig. 6.

Here we have a visual texture made up of the three elements just described. Notice that in the upper two quadrants of the figure there are highly conspicuous em­bedded square regions composed of different elements than those surrounding them. The lower two quadrants also contain embedded regions differing from their surrounds, but these are not at all conspicuous. This is not to say that they are hard to see if they are looked for, but the rapid segregation of an embedded square does not occur. Julesz [2] has studied this phenomenon and attributes it to the differential occurrence in the segregating regions of partic­ular local features, termed " textons." Treisman and Gelade [3] have also described the effects of combinations of color and shape differences on the occurrence of segregation. For the moment we simply note that the boundary between the + and L regions is very distinct, while the boundary separating the Τ and L regions is less so. That between the + and Τ regions is intermediate in strength.

100 h

5 0 1 1 1 1 1 1 1 1 1 1

100 200 300 4 0 0 500 SOA IN M I L L I S E C O N D S

Fig. 7. Results of discrimination experiment using elements from Fig. 6. The target in both cases is a randomly oriented L. The filled and open symbols show measurements when the background elements are ran­domly oriented + 's and Ts , respectively.

When these elements are used in discrimination experi­ments we see the pattern of results shown in Fig. 7. Here the target is a single L presented in any orientation and the background consists of either T's or + ' s also in random orientations. The format is the same as in Fig. 2 - 5 , with the interval between the test and mask onsets shown on the abscissa, and the percent correct discrimination on the ordinate. The task is the same as before, i.e., to determine whether the elements presented are all alike, or if one of them is different. The two backgrounds obviously have very different effects. When the background element is a + the detectability of the target rises rapidly with in­creasing mask delay. This is reminiscent of the data for a 90° orientation difference. Against a field of "T ' s , " how­ever, the detection of the L improves slowly, requiring about 300 ms to reach an approximate asymptote, and furthermore never rises above 65 or 70 percent correct, even when no mask is used. These data are similar in form to those obtained using a small orientation difference. The

Page 6: Rapid discrimination of visual patterns

B E R G E N A N D J U L E S Z : R A P I D D I S C R I M I N A T I O N O F V I S U A L P A T T E R N S 861

110

100

70

60

TARGET GROUND • - + L

ο - T L

SOA = 100 ms.

5 6 7 θ 9 10 11 12 13 14 15 NUMBER OF ELEMENTS

Fig. 8. Effect of number of elements on detection of target. See text.

asymptotic level of 65 or 70 percent correct is what would be expected if the observer were able to inspect at random only seven or eight of the 30 possible positions of the target in the time available [11].

Some additional measurements using these elements are shown in Fig. 8. Here the inspection time (mask delay) is fixed at 100 ms, the target is an L, and the number of background elements, either T's or + 's is varied. All of the elements are presented in a single ring. When the back­ground consists of + 's the decline in detectability of the target with increase in the number of background elements is slight, while when the background consists of T's it is great. The dashed curve shows the expected performance if the observer has time to inspect two elements at random [11].

It would seem from both of these experiments that detection of a single L against a background of T's requires a serial search for the target, while against a background of + 's, it does not. Returning to the model of discrimination discussed earlier, we note that this distinction is equivalent to the statement that the discrimination of the Τ and L elements requires accurate positional information, i.e., the accurate localization of responses in the response space, while that of the + and L does not. Why should this be? It is not too difficult to see why the Τ versus L discrimination might require good positional resolution. The two line segments that make up these two figures are identical. If each segment generates approximately the same response as it would if alone, then the only thing that distinguishes them is the slightly different positional relationship of the two responses. That the + behaves so differently suggests that when the two segments overlap so much they behave not as two separate entities, but as a single different one. The structure of simple-orientation selective units in the visual cortex suggests that the + should produce a maxi­mum response in a somewhat different unit than the Τ or L [3]. If units that prefer different width inputs are separated in the response space, then our simple model can explain the different curve shapes in Figs. 7 and 8. Briefly, if discrimination of the Τ and L requires better positional resolution than that of the + and L, then a smaller area of

110

100

90

or 80 ο

70

60

50 u

7 ELEMENTS

• " 6.9° • - 4.6e

Ο-.2.3· • - 1.4·

100 200 300 400 SOA IN MILLISECONDS

(b)

500 600

110

100

90

κ 80 ο ο

70

60

50

: iff/ff ANGLE DISTANCE

' If ii · 90° 3m

// Ρ tl ° 90° 1m - // ///Ν • 45· 3m

- * J11 45· 1 m

- 1 * If 30· 3m

o f * ι I ι f ι I ι 1

30°

ι I ι

1 m

I ι I ι 30 60 90 120 150 180

SOA IN MILLISECONDS (a)

210 240

Fig. 9. Demonstration of scaling invariance. (a) Numbers in upper right give the radius of the stimulus array in degree of visual angle. Trie array consists of seven elements arranged in a single ring, (b) Comparison of data from Fig. 4 for 90°, 45° and 30° with those measured using identical stimuli viewed from three times the distance. At this distance the stimulus subtends 2.5°.

the response space can be inspected in parallel, i.e., simul­taneously. Consequently, a greater number of separate, spatially restricted inspections must be made, which re­quires more time. The data of Figs. 7 and 8 suggest that for the Τ and L case, one possible target location can be inspected every 40 or 50 ms.

It is interesting that this simple hypothesis about the limits of positional resolution allows one to convert ob­servations, concerning the time needed to detect a stimulus against a background, into statements about the geometry of the response space. Thus far the representation of retinal position, angular orientation, and width is entirely con­sistent with what we know of the geometry of the primary visual cortex [4], [5].

One additional interesting observation is that the phe­nomena that we have been describing seem to be indepen­dent of the overall angular subtense of the display. Over a considerable range, uniform contraction or dilation of the stimulus does not significantly affect the detectability of the target. In Fig. 9(a) data are shown for Τ versus L discrimination using seven elements arranged in a ring. The display was contracted to fall entirely within the fovea

Page 7: Rapid discrimination of visual patterns

862 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-13, NO. 5, SEPTEMBER/OCTOBER 1983

(2.8° in diameter) and expanded by a factor of five to subtend almost 14° with no change in performance. In Fig. 9(b), the distance from which the display was viewed was increased from 1 to 3 m, again with little change in performance. Clearly there must be a lower limit on such invariance; if the stimulus is contracted too far it will become unresolvable. Performance may also be degraded if the pattern is extended into the far periphery. This general type of invariance has been known for over a century. It was first systematically studied by Aubert and Foerster in 1857 [6]. Interestingly, Aubert and Foerster prevented eye movements in exactly the same way as we do in this study, i.e., by very brief presentation of the stimulus. Lacking expensive computer systems, they used a single discharge across a spark gap to illuminate the figures drawn on a card.

Our hypothesis about positional resolution suggests that these contractions and dilations of the stimulus preserve the ratio between the spatial separation of the responses to be discriminated and the volume of response space that must be searched to find the target. If we consider an annulus in visual (or retinal) space, the cortical volume devoted to its representation will be more or less unaf­fected by shrinking or expanding it, as long as we avoid the very central retina. The spacing of units preferring differ­ent angular orientations is also roughly constant with reti­nal eccentricity [7]. Thus the observed invariance is, under our hypothesis, also consistent with cortical structure.

Returning to the phenomenon of texture segregation demonstrated in Fig. 6, let us attempt to analyze it in terms of our fundamental hypothesis. Remember that we con­cluded from the rapid discrimination experiments with these elements that the responses generated by the Τ and L patterns require detailed positional information to be dis­tinguished while those for the + and L do not. This suggests that the reason we see no boundary between the Τ and L regions is that whenever we look on a scale much larger than that of a single pair of elements, the distinct­ness of the elements disappears. It is suggested that in order to see a boundary between two such regions one must be able to see the difference between the elements while looking at a number of them, large enough to define a sufficient piece of the contour. This is conjecture but it is consistent with what we have seen thus far. In the case of the segregating regions, we have evidence that the dif­ference between the elements in them can be seen while looking at a large area simultaneously. However, we pre­dict that this large-scale inspection, while not losing the distinctness of the elements, should result in a degradation of the positional information available. Thus the shape of the segregating region should be somewhat indistinct. Un­der unrestricted viewing conditions this missing informa­tion may be obtained by subsequent more detailed scrutiny. If the inspection interval is restricted by a mask, however, we predict that detailed information about the shape of the segregating region will not be available.

To test this prediction, stimuli of the type shown in Fig. 10 were used. Here a strongly segregating region has one corner cut off. The task of the observer is to say which one

* x + * * x - f * " f x + x - t -* χ X ^ χ ^ + ^ - ^ χ - / - ^ * ν + + χ ) ( + χ + + | χ -f * + Χ χ + χ χ χ + + χ χ χ χ χ * * Γ > χ χ + χ + χ * -f * * χ ^ > Γ + * * * L > > > x * * x x + ^ * + ν ~ 7 χ / Γ ^ χ χ Χ - Γ - - / -+ + + Γ > < * χ + -f + * x ~7 > Γ > χ χ * χ χ Χ X - r - + X X " f X + X + X X Χ + + ^ + + ++ +/ + -f- + + X - V + X + X * x x X

/ χ + * Χ χ χ - / - χ Χ + Χ χ f + x - V x - \ - 4 - X + - V x + * - V x + - f X - f x - / L + x x + + f ν + Γ < L χ χ χ -f-χ + χ ) Ι ν 3 Λ Ί η ^ + - ν χ + Γ - ^ Γ η - Λ - ν * * - ν * χ + χ J - J ν ^ < -ν χ \ χ χ -f + ^ / ^ u > u -f -ν- χ χ χ + * X v y u ' 7 -V * x + -f * -f Χ -V- x y . - f - f - f y . x x y L - f . Χ + + * + Χ " χ + - ν · · ¥ χ χ Χ ^ x ^ X - * - ^ * * ^ - / - ^ - * - - , -χ x - f ^ . f y . x + X X - f - X X '

Fig. 10. Stimuli used to investigate rapid texture segregation. Upper texture has a corner of three elements missing; lower has four missing.

it is. The position of the entire embedded region was varied slightly from trial to trial so that the judgement has to be made on the basis of shape rather than position. Results are shown in Fig. 11. The format is the same as that used previously, but chance performance is 25 percent rather than 50 percent. Performance does become poorer as the inspection interval is shortened below about 100 ms. It should be noted that at all of the times shown, the presence or absence of the embedded region can reliably be re­ported. Thus at the short times the observer knows that something is there, but has no precise information about its shape. The difference between the two curves shown is that the one on the right was measured with a smaller corner cut-off, thus requiring more precise positional information to perform the task. For all three observers, this change in the level of necessary positional resolution causes a shift in time of about 40 ms.

We see, therefore, that the occurrence of sharp texture segregation as seen in Fig. 6 seems to depend on two factors. First, the difference between the elements making up the various regions must be visible when a large area is viewed simultaneously. Second, there must be sufficient time to allow the closer scrutiny needed for accurate locali­zation of the boundaries between the segregating regions.

The process of internal scanning or searching upon which we have based much of this discussion is often described as a function of attention. Much has been writ­ten about phenomena attributable to attention [8], [9], [10], and the reader of this literature rapidly becomes aware of the fact that the word "at tent ion" means many different things to different people. If the term is to be used to describe the phenomena discussed here, it should be made clear that it refers to a selective process, operating at or

Page 8: Rapid discrimination of visual patterns

B E R G E N A N D J U L E S Z : R A P I D D I S C R I M I N A T I O N O F V I S U A L P A T T E R N S 863

20-

o ' — ι — I — ι — I — ι — I — ι I ι I ι I • ' ι 0 40 80 120 160 200 240 280 320

SO A IN MILLISECONDS

0-RAS3 • -RAS4

I ι I ι I ι I ι I ι I ι I ι I ι I 0 40 80 120 160 200 240 280 320

S0A IN MILLISECONDS Fig. 11. Probability that observer correctly identified missing corner.

Upper graph shows data for observers JRB and AMB, lower for RAS. Note that for JRB the corner sizes are two and three, while for the other observers they are three and four. However, the effect of increasing the corner size is similar in all cases. The difference between JRB and AMB parallels that seen in Fig. 2.

near the level of the primary visual cortex. This process seeks to minimize the amount of serial searching necessary by matching the diameter of the parallel process to the discrimination required by the stimulus. It is, in other words, a perceptual rather than a cognitive process. The extent to which higher level control might be exercised, which would influence this process, is an open question.

R E F E R E N C E S

[1] B. Julesz and J. R. Bergen, "Textons, the fundamental elements in preattentive vision and perception of textures,*' Bell Syst. Tech. J., July/Aug., 1983.

[2] B. Julesz, "Textons, the elements of texture perception, and their interactions," Nature, vol. 290, pp. 91-97, 1981.

[3] A. Treisman and G. Gelade, "A feature integration theory of attention," Cognitive Psychol., vol. 12, pp. 97-136, 1980.

[4] D. Hubel and Τ. Ν. Wiesel, "Receptive fields and functional architecture of monkey striate cortex," J. Physiol., vol. 195, pp. 215-243, 1968.

[5] S. M. Zeki, "The functional organization of projections from striate to prestriate cortex in the Rhesus monkey," in Cold Spring Harbor Symp. Quantitative Biology, vol. 15, 1976, pp. 591-600.

[6] H. von Helmholtz, Treatise on Physiological Optics. Opt. Soc. Am, vol. II, pp. 34-45, 1925.

[7] E. L. Schwartz, "A quantitative model of the functional architecture of human striate cortex with application to visual illusion and cortical texture analysis," Biol. Cybern., vol. 37, pp. 63-76,1980.

[8] Μ. I. Posner, "Cumulative development of attentional theory," Amer. Psychol., vol. 37, no. 2. pp. 168-179, 1982.

[9] M. L. Shaw, "Attending to multiple sources of information: Part 1, The integration of information in decision making," Cognitive Psy­chol., vol. 14, pp. 353-409, 1982.

[10] G. Sperling and M. J. Melchner, "Visual search, visual attention and the attention operating characteristic," in Attention and Perfor­mance, vol. VII, J. Requin, Ed. New York: Academic, 1978.

[11] J. R. Bergen and B. Julesz, "Parallel versus serial processing in rapid pattern discrimination," Nature, vol. 303, pp. 696-698, 1983.