NetLogo banner

NetLogo Publications
Contact Us

Modeling Commons

Beginners Interactive NetLogo Dictionary (BIND)
NetLogo Dictionary

User Manuals:
Farsi / Persian


NetLogo Models Library:
Sample Models/Biology

(back to the library)

Food Reward Learning

[screen shot]

If you download the NetLogo application, this model is included. You can also Try running it in NetLogo Web


This model explores a possible mechanism through which individuals can form food preferences and the role that food environments may play in shaping these preferences. Agents repeatedly choose between two food options that differ based on where in the environment they are located. Each agent’s expected reward value of the two food types (i.e., their expectation of how much they would enjoy eating these foods) gradually changes through consumption using a temporal difference learning algorithm.

This model is based on the original paper by Hammond et al. (2012) and can be used to replicate several of the authors findings.


In the base model, there are two food types: low palatability (L) and high palatability (H). Each food type has an associated palatability between zero and one that controls the “true” reward value of the food. The model environment consists of a square lattice where each cell can contain a set of two food types: either HH (colored black), HL (colored gray), or LL (colored green).

Every agent has a learned reward value for the low and high value food types. These reward values can be thought of as the agent’s expectation of how much they would enjoy eating each food type. At the start of the model, agents have no prior expectations of the reward value of each food type and are initialized with learned reward values of zero. The value of each agent's expected palatability of high value food is represented with color, where darker colored agents have higher learned reward values than lighter colored agents.

Agents begin on the left side of the lattice and progress to the right, one cell at time, until they have reached the right side of the lattice, at which point the simulation terminates.

At each time step agents consume one of the food objects in the cell that they are currently occupying. If the cell contains either HH or LL, the agent will select H or L, respectively. However, if the cell contains both food types, the agent will select the food type that they currently perceive as having a higher value—i.e., the food type with the larger corresponding learned reward value. If the agent considers both food types to be exactly equivalent in value, they will select one of food types at random. Subsequently, there is a small probability that an agent will switch to the other food option prior to consumption, changing their mind for some reason exogenous to the model.

Finally, the agent will consume their selected food type and update one of their learned reward values, corresponding to the food type the just consumed. The learned reward values are updated using an adapted temporal difference learning algorithm as follows:

V<sub>F</sub>(t + 1) = V<sub>F</sub>(t) + α[β × p<sub>F</sub> - V<sub>F</sub>(t)]


F: is the food type consumed (either H or L). V<sub>F</sub>(t): is the agent’s learned reward value of food type F at time t. α: is the agent’s learning rate, the speed at which they update their learned reward value. β: is the agent’s responsivity to food, a factor that modifies their reward values for each food type. p<sub>F</sub>: The “true” reward value of food type F.


The two buttons, SETUP and GO, control execution of the model. The SETUP button will initialize the food environment, agents, and other variables, preparing the model to be run. The GO button will then run the model until all agents have reached the right side of the world.

GRID-SIZE controls both the number of agents in the model (equal to the sliders value) and the size of the world (equal to the square of the slider value).

FOOD-ENVIRONMENT controls the initialization of the food types in the model.

  • “random” initializes each cell randomly to HH, HL, or LL with equal probability.
  • “gradient” splits the world into two environments starting in a mostly HH or LL environment and transitioning to an environment of mostly HL.
  • “uniform” splits the world in half such that one environment is solely HH and the other environment is solely HL.

The slider ALPHA-1 controls the learning rate that half of all agents will be initialized with (denoted in circles), while ALPHA-2 controls the learning rate of the remaining agents (denoted in squares). If ALPHA-1 is equal to ALPHA-2 all agents will share a single learning rate.

If the HETEROGENEOUS-RESPONSIVITY? switch is "Off" all agents will share the same responsivity to food. If set to "On" responsivity to food can differ across agents with half of all agents having increased responsivity (beta-1) and the remaining agents having decreased responsivity (beta-2).

The P-LOW and P-HIGH sliders set the specific "true" reward values of high (H) and low (L) value foods.


Early food environments can create a “lock-in” effect where current and future agent preferences are strongly influenced by initial consumption decisions and food environments. For example, if two agents are currently in identical food environments, their future consumption decisions may differ if their consumption and/or environment option set in the past was different.

The “gradient” FOOD-ENVIRONMENT demonstrates a good example of “lock-in” effects. Notice that even though agents are in identical environment cells toward the end of the model run, the food options along the paths that they took to the right side are different, resulting in different learned reward values.

This finding could help provide greater insight into research on eating behaviors because solely studying current food environments and current eating behaviors may not fully account for early “lock-in” effects (Hammond et al. 2012).


Start by running the model with ALPHA-1 and ALPHA-2 set to 0.4. Now try running the model with different values of ALPHA-1 and ALPHA-2. What happens to the final learned reward value and how quickly do agents reach equilibrium? What happens as you change the values of the learning rate values?

Try setting HETEROGENEOUS-RESPONSIVITY? to "On". What happens to the final learned reward value and how quickly do agents reach equilibrium? What happens when HETEROGENEOUS-RESPONSIVITY? is "On" and learning rates ALPHA-1 and ALPHA-2 are not equal?

Try answering the above questions with different FOOD-ENVIRONMENTS. Are there certain food environments where the spread (or distribution) of learned reward values is greater than others? If so, can you hypothesize why this might be the case?

Try adjusting the reward values (P-LOW and P-HIGH) of the two food types. How does this change your previous findings?


Add other FOOD-ENVIRONMENTs to the model.

Add a reporter to calculate the time it took each group of agents to learn the reward value of each food type.

Add another food type to the model.


World is resized dynamically based on user input.

The legend in the interface tab was created by using notes with large unicode characters and setting the text color to the desired element color.

To concisely manage the setup of different food environments, the string name of each environment (e.g., &quot;random&quot;) has a corresponding setup procedure with the suffix &quot;-setup&quot; (e.g., to random-setup). The command word food-environment &quot;-setup&quot; concatenates the selected environment string name with the suffix &quot;-setup&quot; to get the string name of the setup procedure. This procedure is then executed using the run command. This avoids having to use an ifelse statement in the setup to match each environment string name to a corresponding setup procedure.


Some other NetLogo models that explore algorithms for agent learning are the "El Farol" models and the "Piagel-Vygotsky Game". Any models using a temporal difference learning (TDL) algorithm will also be related to this model.


This NetLogo model was adapted and implemented by Adam B. Sedlak and Matt Kasman, based on the original paper:

Many thanks to Ross Hammond for his helpful comments as we created this model.


If you mention this model or the NetLogo software in a publication, we ask that you include the citations below.

For the model itself:

Please cite the NetLogo software as:


Copyright 2023 Uri Wilensky.


This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. To view a copy of this license, visit or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

Commercial licenses are also available. To inquire about commercial licenses, please contact Uri Wilensky at

(back to the NetLogo Models Library)