CM: NetLogo ProbLab: Central Limit Theorem

Download Central Limit Theorem.nlogo (36 KB)

Central Limit Theorem

Central Limit Theorem is authored in the NetLogo modeling-and-simulation environment. The model is part of ProbLab, a curricular unit designed to enrich student understanding of the domain. The online unit package will include a suite of models, student worksheets, and a teacher guide. Below is an applet of Central Limit Theorem. You can interact with this model by changing the slider values and switch settings and then pressing Setup and Go to run this model under different settings. For more details, please see the model itself in the NetLogo library. Note that this model is still under development and is yet to undergo our rigorous checkout procedure.

CM ProbLab: Central Limit Theorem -- A sample model, a model sample
Don't see nothin'?

Gist

Central Limit Theorem demonstrates relations between population distributions and their sample mean distributions as well as the effect of sample size on this relation. In this model, a population is distributed by some variable, for instance by their total assets in thousands of dollars. The population is distributed randomly -- not necessarily 'normally' -- but sample means from this population nevertheless accumulate in a distribution that approaches a normal curve. The program allows for repeated sampling of individual specimens in the population.

Questions to Ponder

Can you see any connections between the distribution of the population (in the graphics display window) and the mean value of the histogram (in the plot window)? For instance, if there happen to be more population specimens ("people") on the left side of the range, where do you expect to see most of the sample means?

Try running the model with a SAMPLE-SIZE of just 1. What do you get. Now try with a SAMPLE-SIZE of 2. Has anything changed? How about a larger sample size?

Are there any connections between SAMPLE SIZE, RANGE, and STD-DEV? One way to explore this question is to keep two of these variables constant and examine what happens when you change the third variable. You may want to take an equal number of samples for each of these trials.

If you set the model to a sample size that is larger than the total size of the population, you will receive a message telling you cannot do this. However, you may set a sample size that is larger than most columns. This means that the entire sample cannot fit into those columns. Is this a problem? What does this do to the distribution of sample means?

Using the CREATE MY OWN PEOPLE option, build some "unusual" populations. Some of these have already been put into the PRESET buttons. For instance, you could create people only in one or two columns, or you could make the population "U-shaped" (more on the outside and less and less as you go towards the middle). What are your findings?

Again, using the CREATE MY OWN PEOPLE option, build one very tall column off on the right side of the screen (about at x-value 8) and build a few very short columns. Set the SAMPLE-SIZE to 10. Press GO ONCE. What can you say about the number of persons that happened to be chosen from the tall column? Try this again and then press GO. Look at the plot. Do you see any connection between the chance of getting samples from the tall column and the location of the mean in the plot?

Set ALSO-SUMS? to "On," and activate the program. Can you explain the similarities and differences between the two histograms you get? For instance, you can look at their range, the total area they cover, their height, and their shape. Try to explain the transformation between the two histograms. For instance, why is the histogram on the left taller than the histogram on the right? Look at the standard deviation of the means and of the sums. What is the ratio between these two values? Does this ratio relate to any other value in the settings of this model?

Relating to the two histograms, can you find a case in which the two histograms converge to a single histogram?

Press SETUP, press CREATE MY OWN PEOPLE and make 6 columns of the height 2 (two persons), set the sample size to 2, and set the ALSO-SUMS? to "On." Now press GO. It is interesting to compare between this statistics activity and a probability activity in which you are rolling a pair of dice. For instance, how many possible columns are there in the sums histogram? In fact, such a comparison can help us think through similarities and differences between statistics and probability.

What is the relation between the number of samples you take, the size of each sample, and the resulting distribution of sample means? For instance, if you have a budge to sample 1,000 people, should you take 10 samples of size 100 each or 100 samples of size 10 each? What do you gain and what do you, perhaps, lose, in each of these choices? For instance, in terms of confidence or in terms of information about the population you are sampling from. To explore this question, you may want to extend the range of the sample size. You may also want to resize the Graphics Window so as to allow for more specimens in your population. Finally, it may be helpful to have a slider and corresponding code for controlling the total number of samples you are taking.

For further background, pedagogical notes, and ideas, please download the model.

[Last updated May 10, 2005]