powered by NetLogo
view/download model file: MicroArray_Analysis.nlogo
Microarrays are a powerful tool for detecting the quantities of gene expression in an organism. They are the only mechanism that can simultaneously detect every type of gene being expressed. However, the error in the process is not widely understood, and microarrays have received a reputation for inaccuracy.
This model shows how error enters the microarray process attempts to give the user a more intuitive feel about the limitations and advantages to using microarrays.
This model is based off of (as yet unpublished) work by Patrick McMullen and Luis Amaral.
Stage 1: Amplification
The goal of this stage is to amplify the initial signal to the point where it is easily read by the microarray.
During this stage there are three types of turtles: polymerase, cDNA and mRNA. Every time a cDNA and a polymerase are on the same patch a mRNA strand created with the same sequence as the cDNA used to create it. After the first time limit runs out, all the polymerase dies.
The important point of this sequence is that random error enters the model based off of the spatial location of the turtles. If there are less turtles
Stage 2: Microarray binding
During this stage the microarray patches are created. There are no polymerase molecules, so no more cDNA is being created. If a cDNA strand has a complimentary sequence that is similar to the sequence of the patch, then they can bind. If the sequences match exactly then the binding is specific, if they only partially match, than the binding is non-specific. When this stage is over all unbound strands are destroyed.
The Interface is divided into several distinct blocks. The first column allows you to change the setup conditions for cDNA as well as for the microarray itself. The second column allow you to change the Run Options that change the rules of the system. The third column contains the 'Setup' and 'Go' buttons and the options for Outputs and Alerts.
- cDNA
Distribution - There are three options to generate a concentration distribution of sequences used. With Power-law, one sequence is initially present at a significantly higher amount and each following sequence has a smaller amount than the previous sequence. With Bimodal distribution, half of the sequences will have a high initial amount (mode1) and half the sequences will have a low initial amount (mode2). For uniform, all the sequences will have the same initial amount.
initial-amount - This changes the initial number of cDNA strands initially placed into the system
num-comp-used - This changes the number of sequences used to create the distribution of cDNA.
alpha - this is a variable that allows you to tune the type of sequence distribution. The role alpha plays depends on the type of distribution chosen above. With Power law alpha is the scaling exponent (amount = C * X ^ alpha). With bimodal alpha is the ratio of the larger mode to the smaller mode. With uniform distribution alpha plays no role.
- Microarray
gap - this allows you to set how many blank patches are placed between patches in the array
Arangement - this allows you to choose if the patches in the microarray are placed in a random order or in the order of the sequence list.
- Amplification
time-limit-stage1 - this controls how many ticks the amplification stage will run.
amount-of-polymerase - this controls how many polymerase are added to the
(poymerase allow cDNA to create mRNA strands)
Linear-Amplification? - If it is turned on then the sequences created by amplification cannot be used to create more sequences (amplification is linear). If it is turned on they can (amplification is exponential). In this case, exponential amplification isn't biologically acurate, but the visualization was nice so I left it in.
- Microarray Binding
time-limit-stage2 - this controls how many ticks the strands have to bind the microarray before they are destroyed.
Bind-chance-specific - the chance that a strand on a patch with it's exact complementary sequence will bind to that patch.
Unbind-chance-specific - the chance that a specifically bound strand will spontaniously unbind.
Bind-chance-non-specific - the chance that a strand on a patch with one amino acid in it's sequence matching the complementary sequence will bind to that patch.
Unbind-chance-non-specific - the chance that a non - specifically bound strand will spontaniously unbind.
Stage1-Alert? - Will display a message window alerting you when the amplification stage is over
Save-Output? - Will save the data generated to a text file of your specification
Show-Output? - will display the data generated in the command center
Make sure you have Show-Output? set to 'On'.
While the only way to increase the acuracy of the True Signal is to have more strands after the amplification stage, the magnitude of the False-signal also increases due to increased non-specific binding. Dispite this trend, the signal to noise ratio is improved with a larger the number of strands.
Things that increase noise in the setup: lowering the inital amount, increasing the number of components used (just because they will lower the initial amounts you
Things that increase noise in the amplification stage: decreaseing the time limit or decreasing the amount of polymerase (Changing the amplification from linear to nonlinear is just for fun)
Things that increase noise in the microarray binding stage: decreasing the time limit, bind-chance specific, or unbind-chance-nonspecific; increasing bind-chance-nonspecific or unbind-chance-specific.
The quickest way to see different error distributions is with the 'Bimodal' distribution. Change the initial amounts and the ratio between starting amounts (with the 'alpha' slider).
A feature that would really help the user visualize what this model is trying to acomplish is a plot showing the distribution of each sequence amount after each stage.
Another great feature would be a format of output that could be put directly into a spreadsheet and graphed.
Plenty of embelishments could be added that make the model more acurately reflect the physical processes involved. I don't think these are really necessary. However, if such features are added, then they should reflect aditional errors that could occur such as strand-strand binding or transcription errors in the amplification stage.
The main technical challenge while programming this model was in managing lists.
The sequences were in lists, the amounts of each sequence at each stage were in lists, virtually every changing variable in this system was managed in a list.
The create-temporary-plot-pen feature was also used in this model to only plot the amounts of components that were present.
By Peter Winter
Thanks to Patrick McMullen and Luis Amaral for providing sugestions and direction to this project.
globals
[
all-tags main-tags lesser-tags randomized-list1 plot-pens
sequence-list init-amt-list stage1-amt-list stage2-amt-list
actual-amt-list detection-list true-signal-list false-signal-list
]
patches-own [psequence pcomplimentary place-check]
turtles-own [sequence complimentary bound]
breed [polymerase a-polymerase]
breed [mRNA a-mRNA]
breed [cDNA a-cDNA]
;;;;;;;;;;;;;;;;;;;;; General Setup ;;;;;;;;;;;;;;;;;;;;;
to setup
set-lists
setup-all-tags-list
turtle-setup
world-setup
end
to set-lists
set sequence-list []
set init-amt-list []
set stage1-amt-list []
set stage2-amt-list []
set actual-amt-list []
set detection-list []
set true-signal-list []
set false-signal-list []
end
;;;;;;;;;;;;;;;;;;;;; Turtle Setup ;;;;;;;;;;;;;;;;;;;;;
;this subroutine is long because it includes the methods to make each of the three distributions
to turtle-setup
clear-turtles
reset-ticks
set-current-plot "Component-Amounts"
clear-plot
set plot-pens []
make-plot-pens
if Distribution = "Power-Law"
[
;set randomized-list1 [8 4 9 15 5 7 0 1 14 10 11 6 2 13 3 12]
; I abandoned the randomized list after trying to format output by hand.
; It is much easier if all turtles with a given initial amount are located next to each other in a list.
set randomized-list1 [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
; this loop is used to calculate the norm-factor
;the norm-factor is used to normalize the distribution so that the given 'initial-amount' of turtles is created
let make-number 0
let i 0 ; i is used at as an index counter
let j 0 ; j is used to sum the total amount of turtles that would be created without the norm-factor
let norm-factor 1
repeat num-comp-used
[
set make-number ( ( i + 1 ) ^ ( - alpha ) )
set j ( j + make-number)
set i ( i + 1 )
]
set norm-factor ( initial-amount / j)
; this creates the actual distribution
set i 0
set make-number 0
; this dummy list was added after I had a hard time initializeing the list sucessfully
; it contains the amounts of each sequence of turtle that should be created
let dummy []
foreach randomized-list1
[
set make-number ceiling ( norm-factor * ( (i + 1) ^ ( - alpha )) )
;show (word "make-number:" make-number " norm-factor:" norm-factor " num-comp-used:" num-comp-used " alpha:" alpha)
if i = 0
[
set dummy fput make-number dummy
;set sequence-list [ ["A" "C" ] ] ; this is left here in case I ever switch back to the randomized list
set sequence-list [ ["C" "C"] ]
set init-amt-list dummy ;item 0 init-amt-list
]
if i < num-comp-used and i > 0
[
set dummy fput make-number dummy
set init-amt-list dummy
set sequence-list fput (item ? all-tags) sequence-list
]
if i >= num-comp-used [set make-number 0]
;show (word "i:" i " ?:" ? " make-number:" make-number)
create-cDNA make-number
[
setxy random-xcor random-ycor
set sequence item ? all-tags
find-complimentary
set color white
set bound 0 ; bound=0 is unbound, bound=1 is specific binding, bound=2 is non-specific, bound=3 is dimer
]
set i ( i + 1 )
]
set sequence-list reverse sequence-list
set init-amt-list reverse init-amt-list
show (word "Sequences Used:" sequence-list)
show (word "Initial Amounts:" init-amt-list)
]
; a lot of this code is similar to the Power-Law distribution so most of the comments are there
if Distribution = "Bimodal"
[
;set randomized-list1 [8 4 9 15 5 7 0 1 14 10 11 6 2 13 3 12]
set randomized-list1 [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
; this calculates the number of components used in each mode
let num-comp-mode1 floor ( num-comp-used / 2)
let num-comp-mode2 ceiling ( num-comp-used / 2)
; this calculates the amount of turtles that should be created for sequences in the smaller mode.
let conc-mode2 ceiling ( initial-amount / ( alpha * num-comp-mode1 + num-comp-mode2 ) )
;show (word "num-comp-mode1:" num-comp-mode1 " num-comp-mode2:" num-comp-mode2 " conc-mode2:" conc-mode2)
let make-number 0
let i 0
let dummy []
; for each sequence calculate how many turtles to create
foreach randomized-list1
[
;calculate how many turtles to create for this particular sequence
if i < num-comp-mode1 [ set make-number (alpha * conc-mode2) ] ; this sets the concentration of turtles
if i >= num-comp-mode1 and i <= num-comp-used [ set make-number conc-mode2 ]
if i >= num-comp-used [set make-number 0]
if i = 0
[
set dummy fput make-number dummy
;set sequence-list [ ["A" "C" ] ]
set sequence-list [ ["C" "C" ] ]
set init-amt-list dummy ;item 0 init-amt-list
]
; this makes all the turtles for mode 1
if i < num-comp-used and i > 0
[
set dummy fput make-number dummy
set init-amt-list dummy ;fput make-number init-amt-list
set sequence-list fput (item ? all-tags) sequence-list
]
;show (word "i:" i " ?:" ? " make-number:" make-number)
create-cDNA make-number
[
setxy random-xcor random-ycor
set sequence item ? all-tags
find-complimentary
set color white
set bound 0 ; bound=0 is unbound, bound=1 is specific binding, bound=2 is non-specific, bound=3 is dimer
]
set i ( i + 1 )
]
set sequence-list reverse sequence-list
set init-amt-list reverse init-amt-list
show (word "Sequences Used:" sequence-list)
show (word "Initial Amounts:" init-amt-list)
]
; a lot of this code is similar to the Power-Law distribution so most of the comments are there
if Distribution = "Uniform"
[
;set randomized-list1 [8 4 9 15 5 7 0 1 14 10 11 6 2 13 3 12]
set randomized-list1 [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
let i 0
let dummy []
foreach randomized-list1
[
let make-number floor (initial-amount / num-comp-used)
if i = 0
[
set dummy fput make-number dummy
;set sequence-list [ ["A" "C" ] ]
set sequence-list [ ["C" "C" ] ]
set init-amt-list dummy ;item 0 init-amt-list
]
if i < num-comp-used and i > 0 [
set init-amt-list fput make-number init-amt-list
set sequence-list fput (item ? all-tags) sequence-list
]
if i >= num-comp-used [set make-number 0]
create-cDNA make-number
[
setxy random-xcor random-ycor
set sequence item ? all-tags
find-complimentary
set color white
set bound 0 ; bound=0 is unbound, bound=1 is specific binding, bound=2 is non-specific, bound=3 is dimer
]
set i (i + 1)
]
set sequence-list reverse sequence-list
set init-amt-list reverse init-amt-list
show (word "Sequences Used:" sequence-list)
show (word "Initial Amounts:" init-amt-list)
]
make-polymerase
end
; polymerase don't need a special distribution or sequence
; they just need to be present during the amplification stage
to make-polymerase
create-polymerase amount-of-polymerase
[
setxy random-xcor random-ycor
set sequence ["-" "-"]
set color red
set bound 0
]
end
;;;;;;;;;;;;;;;;;;;;; Sequences Setup ;;;;;;;;;;;;;;;;;;;;;
;; Creates a list of all possible sequences (of length 2 with 4 nucleotides)
to setup-all-tags-list
set all-tags []
let tag-elements ["C" "G" "A" "T"] ; these are the four nucleotides used in DNA
foreach tag-elements
[
let i ?
foreach tag-elements [
let j ?
set all-tags fput (list i ?) all-tags
]
]
;; as a result of using FPUT the result is in backwards order
set all-tags reverse all-tags
;show all-tags
end
; this
to find-complimentary
set complimentary []
foreach sequence
[
if ? = "C" [set complimentary fput "G" complimentary]
if ? = "G" [set complimentary fput "C" complimentary]
if ? = "A" [set complimentary fput "T" complimentary]
if ? = "T" [set complimentary fput "A" complimentary]
]
set complimentary reverse complimentary
end
;;;;;;;;;;;;;;;;;;;;; Patches Setup ;;;;;;;;;;;;;;;;;;;;;
; makes all the patches in the world blank
to world-setup
clear-patches
ask patches [set psequence ["-" "-"]]
end
; only 16 patches will be used as microarray patches
; this is used to determine their location and their sequence
to array-setup
; these variables are used to determine the position of each microarray patch
let upper-x (4 * (gap + 1))
let upper-y (4 * (gap + 1))
let sequence-length 2
let placeholder 0
let start-x (- sequence-length - gap)
let start-y (- sequence-length - gap)
if arangement = "Left-to-Right"
[
LET X start-x
WHILE [ X < (upper-x + start-x)]
[
LET Y start-y
WHILE [ Y < (upper-y + start-y)]
[
; (x - start-x) this shows how far right of the starting square we are. all patches counted.
; (x - start-x) / (gap + 1) this shows how many array patches we are to the right of the starting square.
; (Y - start-y) this shows how far above the starting square we are. all patches counted.
; (upper-x / (gap + 1) ) this shows how many array tiles we have in one row\
; placeholder is used to determine which sequence is pulled out of the list 'all-tags'
set placeholder ( (x - start-x) / (gap + 1) + ((upper-x / (gap + 1) ) * (Y - start-y) ) / (gap + 1))
ASK PATCH X Y
[
set pcolor yellow
set place-check placeholder
set psequence item (place-check) all-tags
find-pcomplimentary
]
SET Y (Y + gap + 1)
]
SET X (X + gap + 1)
]
]
if arangement = "Random-Order"
[
let dummy-list all-tags
let randomvar 0
LET X start-x
WHILE [ X < (upper-x + start-x)]
[
LET Y start-y
WHILE [ Y < (upper-y + start-y)]
[
; (x - start-x) this shows how far right of the starting square we are. all patches counted.
; (x - start-x) / (gap + 1) this shows how many array patches we are to the right of the starting square.
; (Y - start-y) this shows how far above the starting square we are. all patches counted.
; (upper-x / (gap + 1) ) this shows how many array tiles we have in one row\
; placeholder is used to determine which sequence is pulled out of the list 'all-tags'
set placeholder ( (x - start-x) / (gap + 1) + ((upper-x / (gap + 1) ) * (Y - start-y) ) / (gap + 1))
ASK PATCH X Y
[
set pcolor yellow
set randomvar random (length dummy-list)
set place-check randomvar
set psequence item (place-check) dummy-list
find-pcomplimentary
set dummy-list remove-item randomvar dummy-list
]
SET Y (Y + gap + 1)
]
SET X (X + gap + 1)
]
]
end
; this is just like the find-complimentary for turtles, but the letter p is in front of each varible (for patch)
to find-pcomplimentary
set pcomplimentary []
foreach psequence
[
if ? = "C" [set pcomplimentary fput "G" pcomplimentary]
if ? = "G" [set pcomplimentary fput "C" pcomplimentary]
if ? = "A" [set pcomplimentary fput "T" pcomplimentary]
if ? = "T" [set pcomplimentary fput "A" pcomplimentary]
]
set pcomplimentary reverse pcomplimentary
end
;;;;;;;;;;;;;;;;;; Main Program ;;;;;;;;;;;;;;;;;;;;;;;;;;
to go
plot-components
if ticks = 0 [initial-analysis]
ifelse ticks <= time-limit-stage1
[
move-turtles1
amplify
tick
]
[
ifelse ticks < (time-limit-stage1 + time-limit-stage2)
[
move-turtles2
tick
]
[
ask turtles with [bound = 0] [die]
ask turtles with [bound = 3] [die]
stage2-analysis
plot-components
if Save-Output? = true [print-everything]
stop
]
]
if ticks = time-limit-stage1 [end-stage1]
end
to end-stage1
ask polymerase [die]
stage1-analysis
if Stage1-Alert? = true
[
display
user-message ( "Amplification stage complete" )
]
array-setup
end
; this moves all turtles not including binding rules
to move-turtles1
ask turtles
[
left random 720
jump random 5
]
end
; this moves all turtles including binding rules
to move-turtles2
ask turtles with [bound = 0]
[
left random 720
jump random 5
if random 100 < Bind-chance-specific [specific-binding]
if random 100 < Bind-chance-nonspecific [nonspecific-binding]
strand-strand-binding
]
ask turtles with [bound = 1]
[
unbind-specific
]
ask turtles with [bound = 2]
[
unbind-nonspecific
]
ask turtles with [bound = 3]
[
left random 720
jump random 5
unbind-strand-strand
]
end
; this is used to create more turtles with a given sequence
to amplify
ask polymerase
[
ask cDNA-here
[
; l
ifelse Linear-Amplification? = TRUE
[
; with linear-amplification mRNA cannot further amplify themselves
; only the initial concentration of cDNA will create more strands
hatch-mRNA 1
[
set color blue
]
]
[
hatch-cDNA 1
[
; with non linear-amplification more cDNA is created so that it can further amplify itself
; NOTE: THIS IS NOT BIOLOGICALLY ACURATE (cDNA does not create more cDNA)
; this was just the easiest way to write the code.
set color white
]
]
]
]
end
; all these specific/nonspecific binding/unbinding subroutines are pretty self explanitory.
to specific-binding
;if complimentary = [psequence] of patch-here [set bound 1 ]
let i 0
let match 0
let sequence-length 2
let dummy ([psequence] of patch-here)
repeat sequence-length
;foreach complimentary dummy
[
if (item i complimentary) = (item i dummy) [set match (match + 1)]
set i (i + 1)
]
if match = 2 [set bound 1 set color green]
end
to nonspecific-binding
let i 0
let match 0
let sequence-length 2
let dummy ([psequence] of patch-here)
repeat sequence-length
;foreach complimentary dummy
[
if (item i complimentary) = (item i dummy) [set match (match + 1)]
set i (i + 1)
]
if match = 1 [set bound 2 set color orange]
end
; strand-strand binding was feature that was eventually left out of the final code
to strand-strand-binding
end
to unbind-specific
if random 100 < Unbind-chance-specific [ set bound 0 set color white]
end
to unbind-nonspecific
if random 100 < Unbind-chance-nonspecific [ set bound 0 set color white]
end
to unbind-strand-strand
end
;;;;;;;;;;;;;;;;;; Plotting ;;;;;;;;;;;;;;;;;;;;;;;;;;
to make-plot-pens
set-current-plot "Component-Amounts"
let i 0
repeat num-comp-used
[
create-temporary-plot-pen ( word "component-" i )
set plot-pens lput ( word "component-" i ) plot-pens
set i ( i + 1 )
]
end
to plot-components
set-current-plot "Component-Amounts"
let i 0
let sequence-marker 0
let this-sequence []
repeat num-comp-used
[
set sequence-marker (item i randomized-list1)
set this-sequence (item sequence-marker all-tags)
;show (word "i:" i " sequence-marker:" sequence-marker " this-sequence:" this-sequence)
set-current-plot-pen (item i plot-pens)
plot count turtles with [sequence = this-sequence]
set i ( i + 1)
]
end
;;;;;;;;;;;;;;;;;; Analysis ;;;;;;;;;;;;;;;;;;
to initial-analysis
; this calculates the initial amount of each strand containing each sequence
let i 0
let new-amt 0
let ratio 0
set init-amt-list []
foreach all-tags
[
set new-amt (count turtles with [sequence = ?])
set init-amt-list fput new-amt init-amt-list
set i (i + 1)
]
set init-amt-list reverse init-amt-list
; This prints the data generated to the Command Center box if Show-Output? is true (1 of 3 outputs)
If Show-Output? = True
[
print "Results"
print (word "All Sequences:" all-tags)
print (word "Initial Amounts" init-amt-list)
]
end
to stage1-analysis
; this calculates the new amounts of strands containing each sequence after amplification
let i 0
let new-amt 0
let ratio 0
foreach all-tags
[
set new-amt (count turtles with [sequence = ?])
set stage1-amt-list fput new-amt stage1-amt-list
set i (i + 1)
]
; since I was adding things to the front of the list (instead of the back), the list needs to be reversed to be acurate
set stage1-amt-list reverse stage1-amt-list
; This prints the data generated to the Command Center box if Show-Output? is true (2 of 3 outputs)
If Show-Output? = True
[
print (word "Amount After Amplification" stage1-amt-list)
]
end
to stage2-analysis
let detection 0
let true-signal 0
let false-signal 0
let actual-amt 0
foreach all-tags
[
set true-signal 0
set false-signal 0
set actual-amt (count turtles with [sequence = ?])
ask patches with [pcomplimentary = ?] [set detection count turtles-here]
ask patches with [pcomplimentary = ?] [ask turtles-here with [bound = 1] [ set true-signal (true-signal + 1)]]
ask patches with [pcomplimentary = ?] [ask turtles-here with [bound = 2] [ set false-signal (false-signal + 1)]]
set actual-amt-list fput actual-amt actual-amt-list
set detection-list fput detection detection-list
set true-signal-list fput true-signal true-signal-list
set false-signal-list fput false-signal false-signal-list
;print ( word "seq:" ? " actual:" actual-amt " detection:" detection " true-signal:" true-signal " false-signal:" false-signal)
]
; since I was adding things to the front of the list (instead of the back), all the lists need to be reversed to be acurate
set actual-amt-list reverse actual-amt-list
set detection-list reverse detection-list
set true-signal-list reverse true-signal-list
set false-signal-list reverse false-signal-list
; This prints the data generated to the Command Center box if Show-Output? is true (3 of 3 outputs)
if Show-Output? = true
[
print (word "Amount Bound:" actual-amt-list)
print (word "Amount Detected:" detection-list)
print (word "True Signal:" true-signal-list)
print (word "False Signal:" false-signal-list)
]
end
to print-everything
; this writes all the data generated to a file of the user's specification
file-open user-new-file
file-print "Results"
file-print (word "All Sequences:" all-tags)
file-print (word "Initial Amounts:" init-amt-list)
file-print (word "Amount After Amplification:" stage1-amt-list)
file-print (word "Amount Bound:" actual-amt-list)
file-print (word "Amount Detected:" detection-list)
file-print (word "True Signal:" true-signal-list)
file-print (word "False Signal:" false-signal-list)
file-flush
file-close
end