Making Sense of Probability Through Paradox and Programming: A Case Study in a Connected Mathematics Framework*

URI WILENSKY
Center for Connected Learning
Northwestern University
Annenberg Hall 311
2120 Campus Drive
Evanston, IL 60208
uriw@media.mit.edu
847-647-3818

Epistemology & Learning Group
Learning & Common Sense Section
The Media Laboratory
Massachussets Institute of Technology
20 Ames Street Room E15-315
Cambridge, MA 02139
uriw@media.mit.edu

* This paper appeared in the edited book "Kafai & Resnick (1996). Constructionism In Practice". A version of this paper appeared in the Journal of Mathematical Behavior, Volume 14 No. 2, June 1995.

1. Introduction

The disciplines of probability and statistics have fundamentally changed the way we do science and the way we think about our world. Many scholars have argued (e.g., Cohen, 1990; Gigerenzer, 1990; Hacking, 1990) that a probabilistic revolution has occurred in our century and that notions of randomness and uncertainty have opened up whole new areas of mathematics and science. This has released a ground swell of interest in subjects such as complexity, chaos, and artificial life. Statistical methods are ubiquitous in the scientific literature. Courses in probability and statistics are required for virtually all students in the natural and social sciences. Our daily newspapers are full of statistics about such matters as lung cancer risks, divorce rates, birth control failure rates, variation in temperature, the purity of soap, etc.

Yet, despite the rapid infiltration of probability and statistics into our science and media,, there is substantial documentation of the wide spread lack of understanding of the meaning of the statistics we encounter (Gould, 1991; Konold, 1991; Phillips, 1988; Piaget, 1975; Tversky & Kahneman, 1971). Even highly educated professionals who use probability and statistics in their daily work have great difficulty interpreting the statistics they produce (Kahneman & Tversky, 1982).

Besides a lack of competence and understanding, students express a great deal of dislike towards courses in probability and statistics-an antipathy well captured by the oft-quoted line, attributed to both Mark Twain and Benjamin Disraeli:"There are three kinds of lies: lies, damn lies, and statistics."

Most students first encounter the subject of probability in the form of school exercises in calculating ratios of frequencies and binomial coefficients. As a result, the subject matter of probability and statistics is seen as an assemblage of formulae to be committed to memory. When students fail to master the techniques taught to them, better methods are sought to improve their ability to calculate and apply the formulae. But very little is done in school to explore basic ideas of probability or respond to questions such as: "what is a normal distribution and what makes it useful?" or "how can something be both random and structured?" Partially because the meanings of core probabilistic notions are still being debated by philosophers of mathematics and science (e.g., Chaitin, 1987; Kolmogorov, 1950; Savage, 1954; Suppes, 1984; von Mises, 1957), it is assumed that these meanings are too hard for students to access. "Safe probability" is best practiced through formal exercises without too much attention to the meanings of underlying concepts.

There is a substantial literature concerning the topic of decision-making under uncertainty (e.g., Cohen, 1979; Edwards & von Winterfeldt, 1986; Evans, 1993; Kahneman & Tversky, 1973; 1982; Nisbett, 1980; Nisbett et al, 1983; Tversky & Kahneman, 1974; 1980; 1984). Much of this literature documents the systematic errors and biases people display when attempting to make judgments under uncertainty. A common conclusion drawn by educators and researchers from this research is that "people just aren't built for doing probability," our intuitions are faulty and are not to be trusted. So, again, the safe practice for educators wishing their students to master the material is to instill in them a mistrust of their intuitive responses and a healthy respect for the formulae [1].

The cost of this highly formal instruction in probability and statistics is high. While the best and brightest do manage to learn to use the right statistical tests in the appropriate contexts, even they do not really understand what they are doing. They experience a kind of "epistemological anxiety"(Wilensky 1993; in preparation) anxiety about the nature of the knowledge they are producing and what justifies it. This anxiety leads to skepticism about the validity of statistical knowledge. Add to this mix the unscrupulous use of statistical arguments to mislead voters and consumers and we begin to understand why the subject stimulates so much distaste. The cost of this educational approach is to deprive learners from accessing core probabilistic and statistical notions which are powerful means of making sense of the world.

In this paper, I present a case study of a learner engaged in a classical probability paradox. The learner was one of seventeen interviewees studied in depth as part of the Connected Probability project. I start by briefly describing the Connected Probability project and its theoretical framework-the Connected Mathematics research program. Part of the learning environment provided in the Connected Probability project is a computer modeling language suitable for probability investigations-a version of he language StarLogo (Resnick, 1992; Wilensky, 1993). I, then, present the probability paradox with which the subject is engaged and an account of her investigation. The paradox was selected because of its potential for engaging learners in a deeper investigation into the meaning of the concept of "random"-a fundamental concept of probability theory. I conclude by arguing three points illustrated in the case study:

that providing support for seriously engaging such paradoxes is an important avenue to relieving epistemological anxiety about the nature of probabilistic concepts;
that programming can be an effective tool for resolving mathematical paradoxes (by making their hidden assumptions [2] explicit and concrete) and
that through programming their own computational models (and thus making their own mathematics), learners gain a much deeper understanding of probabilistic concepts than through the use of simulations or pre-built computational models.

2. Theoretical Framework

2.1 The Connected Mathematics Research Program

The name "Connected Mathematics" comes from two seemingly disparate sources, the literature of emergent artificial intelligence (AI) and the literature of feminist critique. From emergent AI and in particular from the Society of Mind theory (Minsky, 1987; Papert, 1980), Connected Mathematics takes the idea that concepts cannot have only one meaning [3]. Only through their multiple connections do concepts gain meaning. From the feminist literature (e.g., Belenky et al., 1986; Keller, 1983; Gilligan, 1977; Surrey, 1991), it takes the idea of "connected knowing": knowing that is intimate and contextual as opposed to an alienated, disconnected and formalistic knowing. In this section, I only briefly sketch the Connected Mathematics approach. A more comprehensive description can be found in (Wilensky, 1993; forthcoming-a).

The Connected Mathematics approach is rooted in the constructionist (Papert, 1991; 1993) learning paradigm. As such, it holds that the character of mathematical knowledge, is inextricably interwoven with its genesis-both its historical genesis and its development in the mathematical learner. A conception of mathematics as disconnected from its development leads to the misguided pedagogy of the traditional mathematics curriculum-a "litany" of defintion- theorem-proof and its attendant concepts stipulated by formal definition [4]. In contrast to approaches that attempt to explain failures of mathematical understanding in technical or information processing terms, Connected Mathematics seeks to explain these obstacles in epistemological terms. Obstacles to understanding are failures of meaning making and since meaning is made through building connections, Connected Mathematics sees these as fundamentally failures of connection.

Paradox can be an important tool of a Connected Mathematics learning environment. The recognition of paradox, is the recognition that (at least) two conceptual structures have not been integrated. This explicit recognition is the first step in making the connections between the two structures that will resolve the paradox and, most often, thereby, generate new mathematics.

The vision of mathematics as being made and not simply received leads naturally to a role for technology. Technology is not there simply to animate received truth, it is an expressive medium-a medium for the making of new mathematics. It follows that we can make better use of computational technologies than simply running black-box simulations-we can make mathematics by constructing computational embodiments of mathematical models. The true power of the computer will be seen not in assisting the teaching of the old topics but in transforming ideas about what can be learned.

Technology here is to be construed in a broad sense-the notations in which we express mathematics and the mathematical concepts themselves are artifacts of the technology of the period of their creation. The emergence of new powerful computational technologies, therefore implies a radical change in both the concepts and semiotic activities of a newly contextualized mathematics.

2.1.1 Connected Mathematics and Current Standards of Mathematics Reform

Connected mathematics moves beyond reform documents such as the Standards of the National Council of Teachers of Mathematics (NCTM Standards, 1991a; 1991b) in its serious reexamination of the warrant for the current mathematical curriculum (see also Confrey, 1993a). In so doing, it proposes new standards for the curriculum in terms of content, process, beliefs and context. It expands mathematics content beyond the boundaries circumscribed by school and outdated technology. New technologies are used imaginatively to make abstract mathematical concepts concrete, to explore areas of mathematics previously inaccessible and to create new mathematics (e.g., Abelson & diSessa, 1980; Abelson & Goldenberg, 1977; Cuoco & Goldenberg, 1992; Edwards, 1992; Feurzeig, 1989; Harel, 1992; Leron & Zazkis, 1992; Noss & Hoyles, 1991; Papert, 1972; 1980; Resnick, 1991; Wilensky, 1993).

In contrast to the NCTM Standards, which portrays an "image" of mathematics (see Brandes, this volume) as essentially a problem solving activity, the vision of Connected Mathematics is more generative-the central activity being making new mathematics. In so doing, it fosters a culture of design and exploration-designing new representations of mathematics and encouraging critique of those designs.

Connected Mathematics acknowledges and attends to the affective side of learning mathematics and looks critically at the role of shame in the mathematical community. Listening to learners and fostering an environment in which it becomes safe for mathematical learners to express their partial understandings [5] results in a dismantling of the culture of shame which paralyzes learners-preventing them from proposing the tentative conjectures and representations necessary to make mathematical progress. In doing so, it parts company with the literature on misconceptions which highlights the gulf between expert and novice. Instead, Connected Mathematics stresses the continuity between expert and novice understanding [6], noticing that even expert mathematicians have had to laboriously carve out small areas of well connected clarity from the generally messy terrain (see also Smith, diSessa, & Roschelle, 1994).

2.2 Connected Probability

The Connected Probability project is one major branch of the Connected Mathematics program. In the Connected Probability project, we are engaged in building Connected Mathematics environments for learning probability. As a first step towards this goal, seventeen in-depth interviews [7] about probability were conducted with learners age fourteen to sixty-four. Interviews were open ended and most often experienced by the interviewees as extended conversations. The interviewer guided these conversations so that the majority of a list of twenty-three topics was addressed. The interview topics ranged from attitudes toward situations of uncertainty, to interpretation of newspaper statistics, to the design of studies to collect desired statistics and to formal probability problems.

These topics were valuable for gaining insight into people's ideas about probability, and encouraging them to think through probabilistic issues. In one sense probability is ubiquitous in our everyday lives. Yet, since probability is fundamentally about large numbers of instances, these singular everyday experiences may not be useful for building our probabilistic intuition [8]. Rarely, in our everyday lives, do we have direct and controlled access to large numbers of experimental trials, measurements of large populations, or repeated assessments of likelihood with feedback. We do regularly assess the probability of specific events occurring. However, when the event either occurs or not, we don’t know how to feed this result back into our original assessment. After all, if we assess the probability of some event occurring as, say, 30%, and the event occurs, we have not gotten much information about the adequacy of our original judgment. Only by repeated trials can we get the feedback we need to evaluate our judgments. Without the necessary feedback, it is difficult to develop our probabilistic intuitions and make probabilistic concepts concrete. A powerful way to bridge this gap between singular experiences and probabilistic reasoning is through the use of exploratory computational environments. The processing power of the computer can give learners immediate access to large amounts of data usually distributed widely over space or time. This can provide the necessary feedback needed to develop concrete understandings of probabilistic concepts. In creating an environment for learning probability, it is therefore natural to consider computational tools.

One of the investigatory tools made available to the learners in this study was a programming language, StarLogo (Resnick, 1992; Wilensky, 1993) specially adapted for modeling probabilistic phenomena. Starlogo is a massively parallel version of the computer language Logo. It allows the user to control thousands of "turtles" on a computer screen. Each of the turtles (or agents) has its own local state and can be given its own local procedures and rules of interaction [9] . Thus, the user can model the emergent effects of the behavior of many distributed agents each following its own local rules. In particular, the key probabilistic notion of distribution can be seen to arise from the actions of many independent agents (see Wilensky, in preparation). Starlogo facilitates the design of probability experiments which allow learners to test their conjectures. They can use the feedback to modify them and clarify their underlying structures via successive refinement (Leron, 1983). The use of Starlogo to design probability experiments is in keeping with the constructionist (Papert, 1991) model of learning-that a particularly felicitous way to build strong mental models is to produce physical or computational constructs which can be manipulated and debugged. As we shall see in the case that follows, being able to articulate a model of a probabilistic concept (through programming) can lead to rich insights into the nature of probabilistic concepts such as randomness and distribution. In contrast to consumers of ready made models, learners who construct computational models are afforded the opportunity to refine their models through debugging. Through debugging their programs they can debug their probabilistic concepts and make them concrete (Wilensky, 1991).

3. The Case Study

3.1 Overview

In this paper, I present a case study of a student engaged in exploring the meaning of randomness in the context of a particular mathematical problem. This problem, first proposed by Bertrand over a hundred years ago (Bertrand, 1889) has led a fascinating mathematical life over the last century. Over the last hundred years, mathematicians have given many different solutions (e.g. Borel, 1909; Poincaré, 1912; Uspensky, 1937; Keynes, 1921; von Mises, 1964) and continue to propose new solutions and reject old arguments to this day (e.g., Marinoff, 1994). This problem, which became known as "Bertrand's paradox [10]", engaged leading mathematicians in a debate over the range of applicability of the principle of non-sufficient reason (also called the principle of indifference-i.e., the assignment of uniform probability distributions in situations of ignorance) that was a keystone of the evolving notion of randomness. Given the attention of Connected Mathematics to the genesis of mathematical knowledge both historically and developmentally, Bertrand's paradox was a natural candidate for inclusion in this study. It was hoped that the role of Bertrand's paradox in the historical development of the notion of random would be mirrored in the development of the interviewee's thinking. Interviewees were capable of calculating more than one numerical answer to the problem. The contradiction between two or more numerical answers to a seemingly well specified probability question might then be experienced by the interviewee as a paradox. The resolution of this contradiction could then lead to a deeper and richer understanding of the notion of randomness.

In the case reported on below, the paradoxical element was introduced by the interviewer. Once that intervention occurred, no further support was needed for the interviewee to recognize it as a paradox [11]. The resolution of the paradox, however, was greatly facilitated by use of the programming language.

Some researchers and educators have recently argued that students cannot make practical use of general purpose programming languages in their subject domain learning (e.g., Soloway, 1993; Steinberger, 1994)[12]. It is my hope that the case study below will help dispel this claim and show the interesting and productive interactions that can occur between programming and learning mathematics. Indeed, it is through programming that the interviewee first "thickens" (Geertz, 1973) her understanding of "random"-by seeing the need for a process, be it computational or physical, to generate randomness. Thus, a computational idea leads to a mathematical idea, resulting in the recognition of the intimate tie between a random process and a probability distribution [13]. This thickening of the notion of random and embedding it in a web of related concepts and activities leads to a Connected Mathematics understanding of randomness [14].

3.2 The Paradoxical Question:

From a given circle, choose a random chord.
What's the probability that the chord is longer than a radius

This question was, in one sense, the most formally presented question in the interview. On the surface, it most mirrored the kinds of questions that students get in school. But, because of its hidden ambiguity many rich interviews arose from it. It was particularly rich in evoking epistemological questioning and investigations into the meaning of the word random and how "randomness" connects to both mathematical distribution functions and the physical world.

The question was selected because it had many possible "reasonable" answers. Among the answers (backed up by solid reasonable arguments) that interviewees gave for the requested probability were 1/2, 2/3 , 3/4 , square root of 3/2 . The language of the question,"choose a random chord", implies that the meaning of random chord is well specified-there should only be one way to choose chords that are truly random. All of the interviewees shared this assumption. When they encountered two different seemingly correct solutions to the question that led to different values for the probabilities, they were therefore confronted with a paradox.

Each of the answers listed above is in fact the correct answer for a particular probability experiment. Depending on the physical experiment conducted, or, in the corresponding mathematical language, the initial distribution that is assigned to the chord lengths, different answers will be obtained. By exploring the paradox, learners came to see there was no unique way to specify "random chord" (or in one interviewee's language "there's no such thing as a random chord"). Different chords are appropriate for different occasions. Different physical experiments lead to different probabilities and correspond to different ways of choosing chords. Depending on which method is used to select chords, different distributions of chord lengths will ensue. This leads to seeing the deep and powerful connections between randomness, distributions and physical experiments. It lays the intuitive substrate needed to create probabilistic models and to make sense of more advanced probability concepts such as probability measures.

3.3 Case Study: Ellie

Of the seventeen participants in the Connected Probability project, fifteen engaged with the random chord problem. These fifteen interview fragments include many different themes and investigation paths. Each case is different. Nonetheless, the case that follows can be described as typical, if not in its specific details, then in the general outline of the investigation.

3.3.1 First Encounter

Ellie is a computer professional who has a solid undergraduate math background. Like many of the other interviewees, Ellie gets into trouble trying to understand the meaning of random. We could have resolved her difficulty by specifying a particular distribution of chords, or by describing a specific experiment to generate the chords. But had we done that, Ellie would not have developed her insights into the meaning of randomness. As teachers, it is often difficult for us to watch learners struggle with foundational questions and not "clear up the misunderstanding". However, the temptation to intervene is more easily resisted if we keep in mind that it is only by negotiating the meaning of the fundamental concepts, by following unproductive, semi-productive and multiple paths to this meaning that learners can make these concepts concrete.

Many interviewees answered this question fairly quickly using the following argument. Chords range in size from 0 to 2r. Since we're picking chords at random, they'’re just aslikely to be shorter than "r" they are to be longer thann "r". Hence the probability is equal to 1/2 .

Ellie engaged herself with this question but approached it differently. She began thinking about the problem by drawing a circle and a chord on it which she knew had length equal to the circle’s radius, as shown below.

After contemplating this drawing for a while, she then drew the following figure:

Figure 2.

With the drawing of this picture came an insight. She pointed at the triangle in the figure and said: Ellie: It has to be equilateral because all the sides are equal to a radius. So that means six of them fit around a circle. That's right, 6 * 60 = 360 degrees. So, that means if you pick a point on a circle and label it P, then to get a chord that's smaller than a radius you have to pick the second point on either this section of the circle [labeled A in the figure above] or this one [labeled B in the figure above]. So since each of those are a sixth of the circle, you get a one third chance of getting a chord smaller than a radius and a two thirds chance of a chord larger than a radius [15].

Ellie was quite satisfied with this answer and I believe would not have pursued the question any more if not for my prodding.

3.3.2 Introducing the Paradox

U: I have another way of looking at this problem that gives a different answer.

E: Really? I don’t see how that could be.

U: Can I show you?

E: Sure. But I bet it's got a mistake in it and you're’ trying to trick me.

U: OK. Let me show you and you tell me.

I, then, drew the figure below:

Figure 3.

U: Consider a circle, C1, of radius r. Draw a chord, AB, of length r. Then drop a per-pendicular onto AB from the center of the circle, O, intersecting AB in a point, P. Then P is a mid-point of AB. Now we calculate the length of OP. We have OA = r and AP = r/2. By Pythagoras, we have OP = 3/2 * r. Now draw a circle, C2, of radius OP centered at O. If we pick any point on C2 and draw a tangent to the circle, C1, at that point, then the resultant chord has length r. If we pick a point, P, inside C2 and draw the chord which has P as mid-point then that chord must be longer than r.

Similarly, if we pick a point inside C1 but outside C2 and draw the chord which has that point as mid-point, then that chord must be shorter than r. Now pick any point, Q, inside C1. Draw a chord, EF, which has Q as mid-point. EF will be bigger than a radius if and only if Q is inside C2. It follows that the probability of choosing a chord larger than a radius is the ratio of the areas of C1 and C2. The area of C1 = p * r2. The area of C2 = p * OP2 = p * 3/4 * r2. So the ratio of their areas is 3/4 and therefore the probability of a chord being larger than a radius is also 3/4 , not 2/3 as you said.

This explanation had a disquieting effect on Ellie. She went over it many times but was not able to find a "bug" in the argument. After repeatedly struggling to resolve the conflict, she let out her frustration:

E: I don't get it. One of these arguments must be wrong! The probability of choosing a random chord bigger than a radius is either 2/3 or 3/4 . It can't be both. I'm still pretty sure that it's really 2/3 but I can't find a hole in the other argument.

U: Can both of the arguments be right?

E: No. Of course not.

U: Why not?

E: It's obvious! Call the probability of choosing a chord larger than a radius p. Then argument #1 says p = 2/3 and argument #2 says p = 3/4 . If both argument #1 and #2 are correct then 2/3 = 3/4 which is absurd [16].

Here Ellie is quite sure that there is a definite and unique meaning to the concept "probability of choosing a random chord larger than a radius" even though she admits that she is not completely certain what that meaning is.

3.3.3 Programming

U: Would writing a computer program help to resolve this dilemma?

E: Good idea. I can program up a simulation of this experiment and compute which value for the probability is correct! I should have thought of that earlier.

Ellie then spent some time writing a Starlogo program. As she worked to code this up, she soon began to feel uneasy with her formulation. A few times she protested: "But I have to generate the chords somehow. Which of the two methods shall I use to generate them?" Nevertheless, she continued writing her program, using an approach based on argument #1. Basically, she made each turtle turn randomly and move forward a distance equal to the circle's radius to pick a point on the circle. Then, she made the turtle return to the center, turn randomly again, and move forward to pick a second point on the circle, thus defining a chord. At various points, she was unsure how to model the situation. She experimented with using the same radius for each turtle as well as giving each turtle its own radius. She experimented with calculating the statistics over all trials of each turtle as opposed to calculating it over all the trials of all the turtles. Finally, she decided both were interesting and printed out the probability over all trials as well as the minimum and maximum probability of any turtle.

Below are the main procedures of Ellie's program. Comments (preceded by semi-colons) have been added by the author for clarity of the exposition.

TURTLE PROCEDURES[17]

;;; this turtle procedure sets up the turtles[18]
to setup
setxy 0 0            ;;; place myself at the origin
make "radius 10   ;;;; make my radius 10 units
make "p1x 0        ;;;; initialize temporary variables
make "radius 10   ;;;; make my radius 10 units
make "p1x 0        ;;;; initialize temporary variables
make "p1y 0
make "p2x 0
make "p2y 0
make "chord-length 0
make "trials 0
make "big 0
make "prob 0
end

;;; This is a turtle procedure which generates a random chord.
to gen-random-chord
fd :radius               ;;;; go to the circumference of the circle
make "p1x xpos      ;;;; remember where I am 
make "p1y ypos
bk :radius              ;;;; go back to the center of the circle (the origin)
rt random 360         ;;;; turn randomly
fd :radius                ;;;;go to a new point on the circumference of the circle
make "chord-length distance :p1x :p1y     ;;;; the chord length is the distance
                                          ;;;; from where I was before to where
                                          ;;;; I am now
bk :radius                 ;;;; go back to the center of the circle
                           ;;;; (the origin)
end
   
;;;; this turtle-procedure gets executed by each turtle at each tick of the
;;;;clock
to turtle-demon
gen-random-chord                 ;;;; choose a new chord by the procedure 
above
make "trials :trials + 1         ;;;; increment the number of chords chosen
if bigger? [make "big :big + 1]  ;;;; if the new chord is bigger than the
                                 ;;;; radius, increment the number of chords
                                 ;;;; chosen so far which are bigger than the
                                 ;;;; radius
make "prob :big / :trials        ;;;; the probability (so far) of choosing a
                                 ;;;; chord bigger than the radius is the  
                                 ;;;; proportion of chords chosen so far which
                                 ;;;; are bigger than the radius end

;;;; is the turtles chord bigger than a radius?
to bigger?
:chord-length > :radius          ;;;; return "true" if chord chosen is bigger
                                 ;;;; than the radius end
                                 
OBSERVER PROCEDURES

;;;; observer-demon summarizes the results of all the turtles
;;;; it gets executed at every clock tick.
to observer-demon
make "total-trials turtle-sum [:trials]  ;;;; get the total number of chords
                                         ;;;; chosen by all the turtles

make "total-trials turtle-sum [:trials]  ;;;; get the total number of chords 
                                         ;;;; chosen by all the turtles
make "total-big turtle-sum [:big]        ;;;; get the total number of chords
                                         ;;;; chosen by all the turtles  
                                         ;;;; bigger than a radius
make "total-prob :total-big / :total-trials  ;;;;  the final probability of
                                             ;;;; choosing a chord bigger than
                                             ;;;; a radius is the ratio of the
                                             ;;;;  above two totals
every 10 [type :total-big type :total-trials
          print :total-prob type turtle-min [prob]  ;;;; print some statistics
                                         ;;;; including the probabilities of
                                         ;;;; the turtles with the smallest
                                         ;;;; and largest probabilities
          print turtle-max [prob]]
end

Ellie ran her program and it indeed confirmed her original analysis. On 2/3 of the total trials the chord was larger than a radius. For a while she worried about the fact that her extreme turtles had probabilities quite far away from 2/3 , but eventually convinced herself that this was OK and that it was the average turtle "that mattered". But Ellie was still bothered by the way the chords were generated. E: OK, so we got 2/3 as we should have. But what's bothering me is that if I generate the chords using the idea you had then I'll probably get 3/4 [19]. Which is the real way to generate random chords?(emphasis added)

The need to explicitly program the generation of the chords precipitated an epistemological shift. The focus was no longer on determining the probability. It now moved to finding the "true" way to generate random chords. This takes Ellie immediately into an investigation of what "random" means. At this stage she is still convinced, as she was before about the probability, that there can be only one set of random chords. She assumes that the problem is to discover this unique set.

U: That's an interesting question.

E: Oh, I see. We have two methods for generating random chords-what we have to do is figure out which produces really random chords and which produces non-random chords. Only one of these would produce really random chords and that's the one that would work in the real world.

U: The real world? Do you mean you could perform a physical experiment?

E: Yes. I suppose I could. ...Say we have a circle drawn on the floor and I throw a stick on it and it lands on the circle. Then the stick makes a chord on the circle. We can throw sticks and see how many times we get a chord larger than a radius.

U: And what do you expect the answer to be in the physical experiment?

E: Egads. (very excitedly)We have the same problem in the real world!!! We could instead do the experiment by letting a pin drop on the circle and wherever the pin dropped we could draw a chord with the pin as midpoint. Depending on which experiment we try we will get either answer #1 [20] or #2. Whoa, this is crazy. So which is a random chord? Both correspond to reality?.....

This was a breakthrough moment for Ellie, but she was not done yet. Though her insight above suggests that both answers are physically realizable, Ellie was still worried on the mathematics side" that one of the methods for generating chords might be "missing some chords" or "counting chords twice". Ellie needed to connect her insight about the physical experiment to her knowledge about randomness and distribution. She spent quite a bit of time looking over the two methods for generating chords to see if they were counting "all the chords once and only once". She determined that in her method, once she fixed a point P, there was a one-to-one correspondence between the points on the circle and the chords having P as an end- point. She concluded therefore that there "are as many chords passing through P as there are points in the circle". However, there will be more chords of a large size than chords of a small size. As could be seen from her original argument, there will be twice as many chords of length between r and 2r as there are of chords of length between 0 and r. Now, for the first time, Ellie advanced the argument that many interviewees had given first.

3.3.4 Reflection

E: I never thought of the obvious. I've been sort of assuming all along that every chord of a given size is equally likely. But if that were true then I could have solved this problem simply. Each chord would have an equal chance of being of length between 0 and the diameter. So half the chords would be bigger than a radius and half smaller.

Ellie went on to see that, in argument #2, large chords are more probable than small chords. She reasoned that for every chord of a given size (or more accurately a small size interval) there was a thin annulus of points that would generate chords of that size by method #2. Annuli closer to the center of the circle would correspond to large chords and annuli near the circumference would correspond to small chords. She went on to demonstrate that annuli close to the center would have larger areas than annuli close to the circumference. Thus large chords become increasingly more probable [21].

Another interesting feature: The program that Ellie wrote placed all the turtles at the origin and since Ellie, as a professional programmer, wrote state transparent code [22] they stayed at the origin. Initially, she had placed the turtles at the origin of the screen’s coordinate system because she recognized a potential bug in her program. If she created the turtles in random positions as is typical in Starlogo the turtles might "wrap"[23] around the screen when drawing their circles and thus incorrectly calculate their chord lengths. But, because the turtles remained centered at the origin, the program was not very visually appealing. While we were engaged in the interview, a student came by and watched. He asked us why nothing was happening on the screen. Ellie explained what she was investigating and then had an idea of how to make the program more interesting. She decided to spread the turtles out a bit so each could be seen tracing its circle, turning yellow if its chord was longer than a radius and green if it was shorter. To spread the turtles out without getting too close to the screen edge, Ellie told each turtle to execute the command fd random (60 - radius)telling each turtle to move a random amount out from the origin. In doing this, the result wasn't quite what Ellie had hoped for. Near the origin there was a splotch of color [mostly yellow] as all the turtles were squeezed together very tightly, while near the edges the turtles were spaced out more sparsely (as in the following figure).

What had happened here quite by accident was a mirroring of the original dilemma. Ellie had used a linear random function to move points into a circular planar area. There were an equal number of turtles in each equally thick disk around the origin, but the outer disks had greater area than the inner disks and therefore appeared less crowded.

So Ellie's function which successfully spread turtles out evenly (and what she then called randomly) along a line did not spread them out evenly on the planar screen. This experience was an important component of her subsequent "aha" moment-exposing her as it did to a crack in her solid and fixed notion of random

4. Discussion

As can be seen from the above interview fragment, the primary obstacles to Ellie's resolving the paradox are epistemological in nature. She faced such questions as: Can a definitive probability problem admit two different numerical answers? Is the notion of a random chord well defined? What is the relationship between a physical experiment and a mathematical calculation? How do you put into correspondence an infinite number of chords and an infinite number of points? When can you say you have selected a reference set for which it is justified to say all chords in it are equally likely to be selected?[24] As Ellie's interview suggests, an important finding of the Connected Probability research is that the primary obstacles to the interviewees' facility with probability are epistemological in nature. Their difficulties stem from fundamental confusion about such notions as randomness, distribution and expectation. The epistemological status of these concepts was in doubt (What kinds of things are they? What makes them work? Are they "natural" or constructed?). As a result, many interviewees reported an inability to resolve the competing claims of conflicting probabilistic or statistical arguments. Faced with two equally compelling arguments, they are in the position of Buridan’s ass: paralyzed between two equally appealing bales of hay. They can’t choose either one and so never "get" any probability .

4.1 Paradox as a Learning Tool

Responses to this paralyzing situation include:

Blaming themselves:
just can't see why one of these is better than the other and, in their discouragement, abandoning the domain to experts;
Blaming the subject:
You can say anything with statistics and there's no way of proving you wrong
rejecting the importance of the conflict.

So, no big deal, Hey, they're both right.[25] No amount of purely formal instruction in the use of probabilistic and statistical formulae can begin to address the "epistemological anxiety" that engenders these responses. What is needed is a therapeutic intervention-the valuation of both sides of a contradictory argument together with validation of the learner's competence to resolve these competing claims. In contrast to the literature on misconceptions, it is important to emphasize the continuity between the learner's confused and messy understanding of the domain and that of the experts [26][27].

Essentially, the gist of the intervention concerning paradox is creating an environment in which learners are encouraged to pay attention to a situation in which there are conflicting probabilistic arguments, and to replace the experience of helplessness or anxiety in the face of this conflict with the feeling of excitement associated with a meaningful learning opportunity.

By the time Ellie had gotten to the circle chords question, she had already spent five and a half hours during three separate days over a three-week period in a Connected Mathematics interview. During this time, she had encountered and constructed many paradoxes and, along the way, gained confidence in her ability to resolve them to her satisfaction. She, therefore, did not need much support on this occasion in accepting the paradox as an opportunity for learning. She took both arguments (her own and the interviewer's) seriously. Even though she suspected that the interviewer's argument was a clever trick, she felt a need to find a flaw in that argument. This need to find a flaw in one side of the paradox (as opposed to just embracing the argument that seems good to her) is a powerful avenue for learning. Less sophisticated learners are content to find an argument they can believe and do not feel a need to refute any counterarguments. Seizing on the plausible argument without refuting the counterargument was a common phenomenon in the interviews and was particularly salient in discussions of the Monty Hall family of problems (Gilman, 1992; Wilensky, 1993).[28]

4.2 Programming - Making Probability Concrete

It was not until Ellie programmed a simulation of the problem that she began to resolve the paradox. Note that she had already begun to see the direction of resolution before she ran her simulation, even before she completed writing the program.[29] This was a common phenomenon across interviewees. Explicitly representing the situation in which the probability problem is embedded, making it concrete, was frequently enough to resolve the present difficulty and move to the next level of subtlety.[30] Writing the code to generate the chords forced Ellie to embody the randomness of the chords in a computational process. This led her to see that different computational processes generate different sets of chords (or distributions of chord lengths). Still clinging to the idea that there was only one truly random set of chords, she moved to the level of physical simulation where surely, she thought, she could see which set of chords would really be picked out. At that point came the "aha" that there was no unique set of real and truly random chords-different physical experiments would lead to different sets of "random" chords. She had made the connections between the physical experiments, the computational processes and the mathematics of randomness and distribution. Equipped with this connected web of relations, we might venture to say that Ellie would now also be ready to deal with the formalisms of measure theory and probability measures without getting lost. The concrete foundation built up during the interview would provide support in navigating through the formalism, guiding its use and preventing its abuse.

4.3 Using Models vs. Building Models

Some researchers have argued (e.g., Soloway & Guzdial, 1993; Steinberger, 1994) that using specialized applications, domain-specific models, and exploratory simulations can provide the benefit of programming without the "overhead" associated with learning the language. An illuminating comparison can be made between the experience of programming random chords and that of using specialized probability courseware.

One such package, ConStats (Cohen et al., in press), was designed with the objective of helping students gain "a deep conceptual understanding of introductory probability and statistics" through an "active experimental style of learning"(Cohen et al., 1994). As such, it is based in constructivist principles. However, the experiments students can conduct with ConStats consist of manipulating the parameters of a preconceived model. Students cannot program in ConStats or build models to pursue questions that arise.

The software is impressive, with well implemented graphics, an easy-to-learn user interface, extensive contextual help facilities, and a large selection of features. A principal emphasis of the software package is on distributions. The package contains many different distributions, both continuous and discrete, each with its own name and associated text describing its characteristics. In addition, for each kind of distribution, users have a host of parameters which they can manipulate and view the resultant change in the graph of the "random variable".

ConStats has both the strengths and weaknesses of the broader class of what can be called "black-box" simulations (i.e., simulations in which the user does not have explicit access to the modeling algorithm). These strengths include the ability of users to engage quickly with high level models, the availability of specialized domain specific tools, engaging user interfaces and broad coverage of the subject domain. The chief weakness is the lack of "read/write" access to the model. As a result, learners cannot explore what processes govern the way the parameters change the model. More importantly, they cannot explore the consequences of changing the structure of these processes themselves. As a result, they do not develop a solid understanding of these underlying processes.

The ConStats software was used extensively by students in a number of university-level courses. After each course was completed, the students were given a post-test designed to measure their comprehension of concepts "covered" by the software. The researchers conducting the evaluation (Cohen et al., 1994) report that conceptual comprehension was significantly greater for those students using the software than for the control group. One of the questions on the post-test was: "What" is it about a variable that makes it a random variable?" The first author of the evaluation study reported (Cohen, 1993) that in all the exams he has seen, not a single student had "given the correct answer", nor had a single one mentioned the concept of distribution in his/her answer.[31] Most students just left it blank. The most frequent non-blank answer was: "a variable that has equal probability of taking on any of its possible values". Despite the fact that they had spent hours manipulating distributions and had plotted and histogrammed their "random variables", they missed the connection between these activities and the concept of random variable. The connection between distribution and randomness was perhaps too obvious to the software designers. They did not recognize the necessity of the learners constructing that connection for themselves if they are to explore it further through the software.

The ConStats software encourages exploration through changing parameters which may explain its success in improving conceptual understanding in courseware subject matter. But ConStats users understanding of randomness is seemingly impoverished. They have not made connections between the distributions they manipulated and observed and the concept of random variable. It is unlikely that they have developed a widely-connected and intuitive sense of the concept of randomness. In contrast, most of the interviewees in the Connected Probability project developed a deeper understanding of randomness. By explicitly confronting the question of the meaning of randomness and by explicitly representing it in a program, the interviewees developed strong intuitions that were not developed by the users of the courseware.

Leaving aside the differences in conceptual understanding promoted by the two approaches, there is also an important issue of educational goals. Particularly in the area of statistics, the educational goal should emphasize interpreting and designing statistics from science and life rather than mastery of curricular materials. In order to make sense of scientific studies, it is not sufficient to be able to verify the stated model; one needs to see why those models are superior to alternative models. In order to understand a newspaper statistic, one must be able to reason about the underlying model used to create that statistic [32] and evaluate its plausibility. For these purposes, building probabilistic and statistical models is essential.

Computer-based exploratory environments for learning probability can facilitate greater conceptual understanding. The computer's capacity to repeat and vary large numbers of trials, ability to show the results of these trials in compressed time (and often in visual form), makes it possible to encapsulate events that are usually distributed over time and space. This can provide learners with the kinds of concrete experiences they need to build solid probabilistic intuitions.

A central issue, then, is between learners using pre-built models and learners making their own models. The ability to run pre-built models interactively is an improvement over static textbook based approaches. By manipulating parameters of the model, users can make useful distinctions and test out some conjectures. The results of the Connected Probability project suggest that for learners to make use of these pre-built models, they must first build their own models and design their own investigations.

It is possible to combine the two approaches (e.g. Eisenberg, 1991; Wilensky, forthcoming-b) by providing pre-built models that are embedded in programming environments, creating so-called "extensible applications"(Eisenberg, 1991). This combined approach has the advantages of both pre-built and buildable models. The challenge of such an approach is to design the right middle level of primitives so that they are neither
(a) too low-level, so that the application becomes identical to its programming language, nor
(b) too high-level, so that the application turns into an exercise of running pre-conceived experiments. The metric by which the optimal level can be judged is in the usefulness to learners. This requires an extensive research program. The findings from this research must inform the development enterprise.

5. Concluding Remarks

In the Connected Probability project, learners such as Ellie succeeded in making deep probabilistic arguments that probed at the foundations of the discipline. Having understood the foundational concepts in this deep way, they developed a strong intuitive understanding of such concepts as randomness, distribution and expectation. Solid intuitions about probability and statistics were clearly developed by learners in this study. This shows that we are not, by our natures, as some have argued, unable to reason intuitively about probability.

The Connected Probability project is an instantiation of the Connected Mathematics approach. The key elements of the Connected Mathematics approach that enabled these changes are:

The explorations of multiple meanings of concepts and making connections between these different representations: Like Ellie, they saw the connections between representations of randomness in different domains including physical experiments, probability distributions and computational processes.
A focus on epistemological issues (as they specifically relate to how learners construct understandings): Ellie focused on what it means for something (a process) to be random? Is there only one way of choosing chords randomly or can there be multiple ways?
The use of paradox: The paradox reinforced the focus on epistemological issues- it placed her epistemology of mathematics in doubt. Ellie wondered: What kind of discipline is mathematics if a "unique" probability can be equal to both 2/3 and 3/4?
Conducting a learner-owned investigation (as opposed to problem solving) as the central activity of mathematical learning: Even though the random chord problem started out as a classic formal problem, Ellie engaged herself with it to see it as her own.
Acknowledgment of and attention to the affective side of learning mathematics: Ellie would not have engaged herself with the paradox had she not been encouraged to believe in the pursuit enough to overcome the epistemological anxiety that usually prevents learners from getting so engaged. Crucial to this self-confidence is a "cognitive-emotive" therapy for the sense of shame produced by a mathematical culture that prevents learners from expressing the epistemological anxiety and tentative understandings that are at its root.
Making mathematics (and articulating it in a concrete form): Ellie is encouraged to create alternate representations of the problem and work out definitions of randomness that make sense to her. This enables her to see mathematics as a personal odyssey of meaning making, not an externally given corpus to be assimilated but not affected by her.
The use of technology as a medium for making and articulating mathematics: Ellie was able to design her own experiment to explore the different sides of the paradox. This ability to express her partial understandings of "random chord" in a computational model was key to the refinement of her mental model and provided a powerful semiotic context for her articulation of her mathematical thought.

The availability of the programming environment facilitated many of these goals. The programming environment facilitated Ellie's conducting her own investigation. It provided a language, a different notation in which Ellie could express her mathematical ideas. It provided a signing environment, a place-holder for these ideas to exist outside of Ellie. And because this language is dynamic-it can be "run"-it provided feedback to Ellie's ideas. This trialogue between Ellie's mental model, the expression of her mental model in encapsulated code and the running of that code, allowed Ellie to successively refine the creative structure of her thought. While one might concede that it is theoretically possible for Ellie to have resolved her problem with a pre-built model in which a randomizing parameter was modified, the leaner-modeling approach is clearly significant in its outcome and arguably more practical in its implementation. This is because, for Ellie to have come to a similar set of insights, a model designer would have had to anticipate all of Ellie's concerns and built them into the model. Clearly, this is impossible to do in general educational software designers cannot anticipate all the directions that a learner might want to investigate and incorporate them into a parameter model. Moreover, users of "parameter-twiddling" software realize that they are pursuing someone else's investigation. This realization decreases the motivation of discovery. Lastly, such closed environments reinforce a view of mathematics learning as a process of verifying already known mathematics as opposed to seeing it as a personal odyssey of mathematics making. In designing computer- based environments for learning probability, we must remember that allowing users to create their own models is necessary for truly learner-owned investigations.

For many learners in the Connected Probability project, this experience of doing Connected Mathematics was so different from their experience in regular mathematics classrooms, that they did not recognize their activity as being mathematics. Learners who had "always hated mathematics" and had been told that they were not "good at mathematics" were excitedly engaged in doing mathematics that could be easily recognized by mathematicians as "good mathematics". Having created a strong intuitive foundation for the conceptual domain, learners could also go on to engage the formal approaches and techniques with an appreciation for how they connect to core idea of probability and statistics. Even more importantly, they now understood that mathematics is a living growing entity which they could literally make their own.

ACKNOWLEDGEMENTS

The preparation of this paper was supported by the National Science Foundation (Grant # MDR 8751190), the LEGO Group, and Nintendo Inc., Japan. The ideas expressed here do not necessarily reflect the positions of the supporting agencies. I'd like to thank Seymour Papert, Mitchel Resnick and David Chen for extensive feedback about this research. I'd also like to thank Donna Woods, Paul Whitmore, Ken Ruthven, Walter Stroup, David Rosenthal, Richard Noss, Yasmin Kafai, Wally Feurzeig, Laurie Edwards, Barbara Brizuela and Aaron Brandes for helpful comments on drafts of this paper. Finally, I'd like to thank Ellie and all the participants in the Connected Probability project. I have learned a lot from (and with) you.

References

Abelson, H. & diSessa, A. (1980). Turtle Geometry: The Computer as a Medium for Exploring Mathematics. Cambridge, MA: MIT Press.

Abelson, H.. & Goldenberg, P. (1977). Teacher's Guide for Computational Model of Animal Behavior. LOGO Memo NO. 46. Cambridge, MA.

Ball, D. (1990). With an eye on the Mathematical Horizon: Dilemmas of Teaching. Paper presented at the annual meeting of the American Educational Research Association, Boston, MA.

Belenky, M., Clinchy, B., Goldberger, N., & Tarule, J. (1986).Women's Ways of Knowing. New York: Basic Books.

Bertrand, J. (1889).Calcul des ProbabilitŽs. Paris:Gauthier-Villars.

Borel, E. (1909).ElŽments de la Théorie des ProbabilitiŽs. Paris: Hermann et Fils.

Brandes, A. (1994).Elementary School Children's Images of Science in Y. Kafai and M. Resnick (Eds.), Constructionism in Practice: Rethinking Learning and Its Contexts. Presented at the National Educational Computing Conference, Boston, MA, June 1994. The MIT Media Laboratory, Cambridge, MA.

Chaitin, G. (1987). Information, Randomness and Incompleteness: Papers on Algorithmic Information Theory. Singapore; Teaneck, NJ: World Scientific.

Cohen, B. (1990). Scientific Revolutions, Revolutions in Science, and a Probabilistic Revolution 1800- 1930. in Kruger, L., Daston, L., & Heidelberger, M. (Eds.)The Probabilistic Revolution Vol. 1. Cambridge, MA: MIT Press.

Cohen, J. (1979).On the Psychology of Prediction Whose is the Fallacy? Cognition, 7, pp. 385-407.

Cohen, S. (1993).Personal Communication. Tufts University, Medford, MA. Cohen, S., Smith, G., Chechile, R. & Cock, R. (in press).Designing Curricular Software for Conceptualizing Statistics. Proceedings of the 1st Conference of the International Association for Statistics Education.

Cohen, S., Chechile, R. Smith, G., Tsai, F. & Burns, G. (1994).A method for evaluating the effectiveness of educational software. Behavior Research Methods, Instruments & Computers, 26 (2), pp. 236-241.

Cohen, S. (1995).Personal Communication. Tufts University, Medford, MA.

Collins, A. & Brown, J.S. (1988).The Computer as a Tool for Learning Through Reflection. In H. Mandl & A. Lesgold (Eds). Learning Issues for Intelligent Tutoring Systems (pp. 1-18). New York: Springer Verlag.

Confrey, J. (1993a).A Constructivist Research Programme Towards the Reform of Mathematics Education. An introduction to a symposium for the Annual Meeting of the American Educational Research Association, April 12-17, 1993 in Atlanta, Georgia.

Confrey, J. (1993b).Learning to See Children's Mathematics: Crucial Challenges in Constructivist Reform, In K. Tobin (Ed.) The Practice of Constructivism in Science Education. Washington, D.C.: American Association for the Advancement of Science. pp. 299-321.

Cuoco, A.& Goldenberg, E. P. (1992).Reconnecting Geometry: A Role for Technology. Proceedings of Computers in the Geometry Classroom conference. St. Olaf College, Northfield, MN, June 24-27, 1992.

Edwards, L. (in press).Microworlds as Representations. in Noss, R., Hoyles, C., diSessa A. and Edwards, L. (Eds.) Proceedings of the NATO Advanced Technology Workshop on Computer Based Exploratory Learning Environments. Asilomar, Ca.

Edwards, L. (1992).A LOGO Microworld for Transformational Geometry. In Hoyles, C. & Noss, R. (Eds.) Learning Mathematics and Logo. London: MIT Press.

Edwards, W. & von Winterfeldt, D. (1986).On Cognitive Illusions and their Implications. Southern California Law Review, 59(2), 401-451.

Evans, J. (1993).Bias and Rationality. In Mantkelow & Over (Eds.) Rationality (in press) London:

Routledge. Eisenberg, M. (1991).Programmable Applications: Interpreter Meets Interface. MIT AI Memo 1325. Cambridge, Ma., AI Lab, MIT.

Feurzeig, W. (1989).A Visual Programming Environment for Mathematics Education. Paper presented at the fourth international conference for Logo and Mathematics Education. Jerusalem, Israel.

Fischbein, E. (1987). Intuition in Science and Mathematics: An Educational Approach. Dordrecht, Holland: D. Reidel Publishing Company.

Geertz, C. (1973).The Interpretation of Cultures. New York: Basic Books.

Gigerenzer, G. (1990).The Probabilistic Revolution in Psychology - an Overview. In Kruger, L., Daston, L., & Heidelberger, M. (Eds.) The Probabilistic Revolution. Vol. 1. Cambridge, Ma: MIT Press.

Gilligan, C. (1977).In a Different Voice: Psychological Theory and Women's Development. Cambridge, MA: Harvard University Press.

Gillman, L. (1992).The Car and the Goats. American Mathematical Monthly, volume 99, number 1, January, 1992.

Greeno, J. G. (1991).Number Sense as Situated Knowing in a Conceptual Domain. Journal for Research on Mathematics Education, 22, 170-218.

Goldenberg, E. P., Cuoco, A. & Mark, J. (1993).Connected Geometry. Proceedings of the Tenth International Conference on Technology and Education. Cambridge, MA, March 21-24, 1993.

Gould, S. J. (1991). The Streak of Streaks. In Gould, S. J. Bully for Brontosaurus. Cambridge, MA., W.W. Norton. (Chapter 31).

Hacking, I. (1990).Was there a Probabilistic Revolution 1800-1930? In Kruger, L., Daston, L., & Heidelberger, M. (Eds.) The Probabilistic Revolution. Vol. 1. Cambridge, Ma: MIT Press.

Harel, I. (1992).Children Designers. Norwood, NJ:

Ablex. Harel, I. & Papert, S. (1990).Software Design as a Learning Environment. Interactive Learning Environments Journal. Vol.1 (1). Norwood, NJ:

Ablex. Kafai, Y. & Harel, I. (1991).Learning through Design and Teaching: Exploring Social and Collaborative Aspects of Constructionism. In I. Harel & S. Papert (Eds.) Constructionism. Norwood, N.J. Ablex Publishing Corp. Chapter 5.

Kahneman, D. (1991).Judgment and Decision Making: A Personal View. Psychological Science. vol. 2, no. 3, May 1991.

Kahneman, D., & Tversky, A. (1982).On the study of Statistical Intuitions. In D. Kahneman, A. Tversky, & D. Slovic (Eds.) Judgment under Uncertainty: Heuristics and Biases. Cambridge, England: Cambridge University Press.

Kahneman, D., & Tversky, A. (1973).On the Psychology of Prediction. Psychological Review, 80 (4), pp. 237-251.

Kaput, J. (1989).Notations and Representations. In Von Glaserfeld, E. (Ed.) Radical Constructivism in Mathematics Education. Netherlands: Kluwer Academic Press.

Keller, E.F. (1983).A Feeling for the Organism: The Life and Work of Barbara McClintock. San Francisco, CA:

W.H. Freeman. Keynes, J.M. (1921).A Treatise on Probability.London: MacMillan.

Kolmogorov, A.N. (1950).Foundations of the Theory of Probability. New York:

Chelsea Pub. Co. Konold, C. (1989).Informal Conceptions of Probability. Cognition and Instruction, 6, 59-98.

Konold, C. (1991). Understanding Students' beliefs about Probability. In Von Glaserfeld, E. (Ed.) Radical Constructivism in Mathematics Education. Netherlands: Kluwer Academic Press.

Lakatos, I. (1976).Proofs and Refutations. Cambridge: Cambridge University Press.

Lampert, M. (1990.)When the problem is not the question and the solution is not the answer: Mathematical knowing and teaching. In American Education Research Journal, spring, vol. 27, no. 1, pp. 29- 63.

Leron, U. & Zazkis, R. (1992).Of Geometry, Turtles, and Groups. In Hoyles, C. & Noss, R. (Eds.) Learning Mathematics and LOGO. London: MIT Press.

Leron, U. (1983). Structuring Mathematical Proofs. American Mathematical Monthly, Vol. 90, 3, 174-185.

Marinoff, L. (1994). A Resolution of Bertrand's Paradox. Philosophy of Science, 61, 1-24.

Mason, J. (1987).What do symbols represent? In C. Janvier (Ed.) Problems of Representation in the Teaching and Learning of Mathematics. Hillsdale, NJ: Lawrence Erlbaum Associates.

Minsky, M. (1987).The Society of Mind. New York: Simon & Schuster Inc.

National Council of Teachers of Mathematics (1991a). Curriculum and Evaluation Standards for School Mathematics. Reston, Va:

NCTM. National Council of Teachers of Mathematics (1991b).Professional Standards for Teaching Mathematics. Reston, Va: NCTM.

Nisbett, R. (1980).Human Inference: Strategies and Shortcoming of Social Judgment. Englewood Cliffs, NJ: Prentice-Hall.

Nisbett, R., Krantz, D., Jepson, C., & Kunda, Z. (1983). The Use of Statistical Heuristics in Everyday Inductive Reasoning. Psychological Review, 90 (4), pp. 339-363.

Noss, R. & Hoyles, C. (1991).Logo and the Learning of Mathematics: Looking Back and Looking Forward. In Hoyles, C. & Noss, R. (Eds.) Learning Mathematics and Logo. London: MIT Press.

Papert, S. (1980).Mindstorms: Children, Computers, and Powerful Ideas. New York: Basic Books.

Papert, S. (1991). Situating Constructionism. In I. Harel & S. Papert (Eds.) Constructionism. Norwood, N.J. Ablex Publishing Corp. (Chapter 1).

Papert, S. (1993). The Children's Machinee. New York: Basic Books.

Papert, S. (1972) Teaching Children to be Mathematicians vs. Teaching About Mathematics. International Journal of Mathematics Education. Vol. 3.

Phillips, J. (1988).How to Think About Statistics. New York: W.H. Freeman.

Piaget, J. (1952).The Origins of Intelligence in Children. New York: International University Press.

Piaget, J. (1975).The Origin of the Idea of Chance in Children. New York: Norton.

Poincaré, H. (1912). Calcul des Probabilités Paris: Gauthier-Villars.

Resnick, M. (1992). Beyond the Centralized Mindset: Explorations in Massively Parallel Microworlds. Unpublished doctoral dissertation, Cambridge, MA: Media Laboratory, MIT.

Resnick, M. (1991).Animal Simulations with *Logo: Massive Parallelism for the Masses. In J. Meyer & S. Wilson (Eds.), From animals to animats. Cambridge, MA: MIT Press.

Richmond, B. & Peterson, S. (1990).STELLA IIM. Hanover, NH: High Performance Systems, Inc.

Savage, L. (1954).The Foundations of Statistics. New York: Wiley.

Scheffler, I. (1991).In Praise of the Cognitive Emotions. London: Routledge, Chapman and Hall.

Schoenfeld, A. (1991).On Mathematics as Sense-Making: An Informal Attack on the Unfortunate Divorce of Formal and Informal Mathematics. In Perkins, Segal, & Voss (Eds.)Informal Reasoning and Education.

Schwartz, J. & Yerushalmy, M. (1987)The Geometric Supposer: an Intellectual Prosthesis for Making Conjectures. The College Mathematics Journal, 18 (1): 58-65..

Smith, E. & Confrey, J. (in press).Multiplicative Structures and the Development of Logarithms: What was Lost by the Invention of Function? In G. Harel & J. Confrey (Eds.) The Development of Multiplicative Reasoning in the Learning of Mathematics. Albany: State University of New York Press.

Smith, J.P., diSessa, A.A., & Roschelle, J. (1994).Reconceiving Misconceptions: A Constructivist Analysis of Knowledge in Transition. Journal of the Learning Science, 3, pp. 115-163.

Soloway, E. (1993). Should We Teach Students to Program? CACM, October 1993, 36(1).

Steinberger, M. (1994).Where does Programming fit in? Logo Update. Vol. 2 (3).

Suppes, P. (1984). Probabilistic Metaphysics. Oxford, UK: Blackwell.

Surrey, J. (1991). Relationship and Empowerment. in Jordan, J., Kaplan, A., Miller, J.B., Stiver, I. & Surrey, J. Women's Growth in Connection: Writing from the Stone Center. New York: The Guilford Press.

Tierney, J. (1991). Behind Monty Hall's Doors: Puzzle, Debate and Answer? The New York Times National, July 21, 1991, page 1.

Thurston, W. (1994).On Proof and Progress in Mathematics. Bulletin of the American Mathematical Society. Volume 30, Number 2, April, 1994.

Turkle S., & Papert, S. (1991). Epistemological Pluralism and Revaluation of the Concrete. In I. Harel & S. Papert (Eds.) Constructionism. Norwood N.J. Ablex Publishing Corp.

Tversky, A. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5, 207-232.

Tversky, A. & Kahneman, D. (1974).Judgment Under Uncertainty: Heuristics and Biases. Science, 185, pp. 1124-1131.

Tversky, A. & Kahneman, D. (1980) Causal Schemas in Judgments Under Uncertainty. In M. Fischbein (Ed.), Progress in Social Psychology. Hillsdale, NJ: Erlbaum.

Tversky, A. & Kahneman, D. (1983). Extensional vs. Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment. Psychological Review, 90 (4), pp. 293-315.

Uspensky, J. (1937).Introduction to Mathematical Probability. New York: Prentice Hall.

von Glaserfeld, E. (1987).Preliminaries to any Theory of Representation. In C. Janvier (Eds.) Problems of Representation in Mathematics Learning and Problem Solving, Hillsdale, NJ:

Erlbaum. von Mises, R. (1964).Mathematical Theory of Probability and Statistics. New York: Academic Press.

von Mises, R. (1957).Probability, Statistics and Truth. New York: Dover Publications.

Wilensky, U. (1991).Abstract Meditations on the Concrete and Concrete Implications for Mathematics Education. In I. Harel & S. Papert (Eds.) Constructionism. Norwood N.J.: Ablex Publishing Corp.

Wilensky, U. (1993).Connected Mathematics: Building Concrete Relationships with Mathematical Knowledge. Unpublished doctoral dissertation, Cambridge, MA: Media Laboratory, MIT.

Wilensky, U.(forthcoming-a).GPCEE an Extensible Microworld for Exploring Micro- and Macro- Views of Gases. Interactive Learning Environments Journal. Norwood, NJ: Ablex.

Wilensky, U. (forthcoming-b).Learning Probability through Building Computational Models. Proceedings of the Nineteenth International Conference on the Psychology of Mathematics Education. Recife, Brazil, July 1995. Wilensky, U. (in preparation). What is Normal Anyway? Therapy for Epistemological Anxiety.

[1]In a graduate probability course at MIT, the professor explicitly admonished the class members not to try to do inverse probabilities in their heads since their intuitions were not reliable. Instead, he said, always use the Bayes formula to calculate inverse probabilities.

[2]Assumptions implicit in the formulation of the paradox or in the preconceptions of the learner.

[3]A nice example of the many meanings of "derivative" can be found in a recent paper by Thurston (1994).

[4]Formal proof and definition is an after-the-fact reconstruction of the processes of coming to know in mathematics. The justification of such reconstruction for the purposes of communication within expert culture is certainly allowed. What is unfortunate and damaging pedagogically is that this re-presentation becomes an active conception of what mathematics is and what it is to know mathematics.

[5]The taboo against expressing partial understandings is endemic to school discourse. To break it, teachers must explicitly model expressing their own confusions and groping for clarity. One reason this is hard to do is that it is very difficult to remember what it was like not to grasp a mathematical concept that is now self evident. There are many striking parallels between the development of conservation in children (Piaget, 1952) and the acquisition of new mathematical concepts. One feature they share is the inconceivability of one's previous understanding "what is it like to think that there is more water in a tall glass than there was in the shorter glass which you emptied into the taller container?"[for further discussion of this point see (Wilensky, 1993)].

[6]Because it suggests that making is endemic to mathematical activity, the Connected Mathematics view is that: learners make connections, they don't cross intellectual ravines. Thus the process of becoming expert in mathematics is one of adding connections and not removing or replacing novice knowledge

[7]The shortest interview was two hours long, the longest eighteen hours and the median seven hours. These figures refer to the face-to-face interview time. Some interviews continued over electronic mail for up to two months following face-to-face interactions.

[8]Part of what makes an event singular is that we do not interpret it as a member of a class of events. It is only when we can stand at a distance from the event and see it in the context of many other events, that we can begin to make the reference classes needed to make probabilistic judgments.

[9]These "object-oriented" features of the language make StarLogo a more accessible environment for modeling. In contrast to other modeling environments, such as STELLA (Richmond & Peterson, 1990), which model with aggregate quantities and flows, StarLogo is "object" based, thus facilitating concrete interactions with the basic units of the model.

[10]The name "Bertrand's paradox" was given by PoincarŽ.

[11]This was true in roughly half of the interviews in this study. The later in the interview the paradox occurred, the more likely that it was recognized and owned.

[12]A weaker form of this claim is that programming requires too much "overhead" that distracts learners from the mathematics at hand. This paper does not respond directly to this weaker claim. Let me note briefly that:

Logo and StarLogo are conceived here as lifelong tools and powerful expressive media across many domains, not just probability.
In contrast to languages such as Fortran or Basic, meaningful Starlogo programs are usually quite short and Starlogo has “low threshold” (i.e., easy for novices to write meaningful programs) as a primary language design criterion (Papert, 1980; Resnick, 1991).

[13]Or as some interviewees saw it, each process leads to a different "meaning" of random.

[14]A positivist or strict formalist critic might object that in fact the notion of randomness has been replaced by a more precise and technical notion. In practice, however, the new ideas coexist with the old and take much of their sustenance from their connections to prior conceptions and other contexts for recognizing randomness.

[15]The transcripts have been "cleaned up" some (removing pauses, umms a and many interjections) for clarity of the exposition. Bracketed comments are the author's clarifying remarks.

[16]At this point, Ellie actually wrote down a formal mathematical proof by contradiction. The last line of the proof was: 2/3 = 3/4 . Contradiction.

[17]The Starlogo procedures are divided into turtle procedures and observer procedures. Turtle procedures are executed by each turtle in parallel. Observer procedures set up the general environment and summarize the behavior of turtles.

[18]In this case, each turtle sets itself up at the origin on the circumference of a circle of radius 10.

[19]Ellie did go on and write the code to do this experiment just as a check of her insight. Her new code is the same as the old code except for a rewrite of the procedure "gen-random-chord".

[20]I chose not to intervene at this juncture and point out that the first experiment Ellie proposed did not correspond exactly to her first analysis and method of generating chords.

[21]Here is her argument: Choose a circle of radius r and an interval, a, small relative to R. For calculating convenience, let's say R is large and a is 2. Then the annulus corresponding to very small chords (length nearly zero) has area equal to p. But the disk corresponding to large chords (length near 2R) has area equal to p*(2R -1) which is substantially larger.

[22]In Logo, a procedure is "state transparent" if, after its execution, the state of the turtle is unaltered.

[23]In a typical Logo or Starlogo screen, when a turtle goes off the screen to the right or at the top, it reappears at the left or bottom.

[24]As was mentioned in section 3.1, this is Ellie's version of the question debated by mathematicians as to the applicability of the principle of insufficient reason.

[25]These three responses are equivalent to

giving up on the paradox,
asserting that the paradox is in fact a contradiction in the domain and thus invalidating the domain, or
refusing the paradox.

[26]Reflecting on the intellectual development of historical figures in the origins of probability can be very encouraging as they are often seen to be confused about the very same things. As was noted in section 3.1, Ellie's concerns and confusions paralleled those of leading mathematicians over the last hundred years.

[27]The paradox will not be resolved though getting the received solution of experts. The expert solution relies on intermediate mental constructs that must be built by the learner. If instead, learners "dive in" to the paradox (as the experts did long ago), they will construct a richer intuitive conception of the key ideas in the domain.

[28]One of these “Monty Hall” probability paradoxes made newspaper headlines (Tierney, 1991) after appearing in a column in Parade magazine (Vos Savant, 1991).

[29]As shall be elaborated in section 4.3, the fact of Ellie's getting her insight before running her program further emphasizes the value of building a model over exploring the outputs of a model.

[30]Again, this was the norm in the Monty Hall family of problems. As soon as the interviewee started to write down a program for simulating the problem, what was going on became apparent. Often, they didn't even bother to finish the programs. In contrast, in another paradox discussed in the interviews, the "envelope paradox", writing a program most often served to instantiate an already held theory and therefore running it "confirmed" the theory, often prematurely. In effect, the authority of the program sanctioned the solution. However, in that particular case, a community formed spontaneously to resolve the dilemma. Interviewees in the study got together and compared their different solutions. Since the interviewees wrote different programs with different results, the conflict created an opportunity for them to talk about how their programs encapsulated their theories. This led to more sophisticated critiques of the theories. [For a discussion of how programs and microworlds can encapsulate theories, see (Edwards, in press)].

[31]More recently, Cohen (1995) has replicated these post-tests and a "very small number" of students do refer to distributions in their answers.

[32]A case study of two students trying to make sense of a divorce-rate statistic reported in the newspaper is presented in Wilensky (1993). In order to make sense of the statistic, they designed and critiqued many different models.