# Agent Based Modelling with data.table OR how to model urban migration with R

## Introduction

Recently I found a good introduction to the Schelling-Segregation Model and to Agent Based Modelling (ABM) for Python (Binpress Article by Adil). The model follows an ABM approach to simulate how urban segregation can be explained.

I will concentrate on the R-code, if you want to know more about the Schelling-Segregation Model (which brought its creator a Nobel Price) and Agent Based Modelling you should head over to the binpress page! As my Python knowledge is not sufficiently large enough, I try to rebuild the ABM in R with some guidelines from the Python script, but as I use `data.table` and its specific functions, the code naturally deviates quite a lot.

## Schelling-Segregation Model

The idea behind the Schelling Model is that we create an `M x N` grid that contains homes for our agents, which we simulate to belong to `n` different races, with a certain possibility that homes are empty.

In each round I calculate a ratio of same-race neighbors over the total amount neighbors for each home (position on the grid). If the ratio falls below a certain threshold (in this case 50%), the agent becomes unhappy and will move to another (random) home. This process is iterated to find an equilibrium.

## Basic Principles of the Code

The ABM is based on three self-written functions:

1. `initiateSchelling()`
• takes inputs of dimensions of the simulated area and the number of different races
• creates a data.table called schelling that contains the id for each position in the grid, x and y the coordinates, the race as well as distance and unsatisfied, which we will use later
2. `plotSchelling()`
• takes a text input that is used as the graph’s title
• it uses the `schelling` data.table and plots each agent
3. `iterate()`
• takes the number of iterations (= number of simulations) as well as the similarity threshold
• the function has another subfunction `is.unsatisfied()` that checks for each row if the agent is unsatisfied
• iterate first checks for each agent if she is unsatisfied, if so the agent will be moved

## R-Code of the Functions

To fasten up the speed of the code, I use the `data.table` package including some of its specialties. If you are unfamiliar with the syntax of `data.table`, I recommend you to have a look at this excellent intro by yhat or the CheatSheet by DataCamp.

For visualization, I use `ggplot2` and `RColorBrewer`.

The packages are loaded with:

```require(data.table)
# for the visualization
require(ggplot2)
require(RColorBrewer)
```

The code for the `initiateSchelling()`-function looks like this:

```initiateSchelling <- function(dimensions = c(10, 10), n_races = 2){
# create "races" based on colours
races <- colours()[1:n_races]
# what percentage of the homes will be empty
perc_empty <- 0.2

# how many homes will be simulated
n_homes = prod(dimensions)
# calculates the number of agents
count_agents <- floor(n_homes * (1 - perc_empty))

# the characteristics that a home can have
races <- c("empty", races)
# the probabilities of each characteristics
probabilities <- c(perc_empty, rep((1 - perc_empty)/(length(races) - 1),
times = length(races) - 1))

# creates the global schelling data.table
schelling <<- data.table(id = 1:prod(dimensions),
x = rep(1:dimensions,
dimensions),
y = rep(1:dimensions,
each = dimensions),
race = sample(x = races,
size = n_homes,
prob = probabilities,
replace = TRUE),
# used to find the satisfaction of each home
distance = rep(NA, prod(dimensions)),
unsatisfied = rep(NA, prod(dimensions)))
}
```

Secondly, the `plotSchelling()`-function looks like this:

```plotSchelling <- function(title = "Schelling-Segregation-Model"){
# get races to get the right number of colors
races <- unique(schelling\$race)

# find the dimensions of the grid to get the best dot size
dims <- c(max(schelling\$x), max(schelling\$y))

# create colours
# check if there are less than 3 races,
# this would create issues with brewer.pal from
# RColorBrewer otherwise
if (length(races) <= 3) {
colors <- brewer.pal(3, "Dark2")[c(1,3)]
} else {
colors <- brewer.pal(length(races), "Dark2")
}

# plot the graph
p <- ggplot(data = schelling[race != "empty"],
aes(x = x, y = y, color = race)) +
# workaround to get relatively large dots that
# resize with the size of the grid
geom_point(size = 100/sqrt(prod(dims))) +
scale_colour_manual(values = colors) +
# create a beatiful and mostly empty chart
theme_bw() +
theme(axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
legend.position = "none",
panel.background = element_blank(),
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_blank(),
plot.title = element_text(lineheight=3,
face="bold",
color="black",
size=14)) +
# fixes the axis to avoid distortion in the picture
coord_fixed(ratio = 1) +
# lastly adds the title
ggtitle(title)

return(p)
}
```

And lastly, the `iterate()`-function, which iterates over the checks for satisfaction and moves of the agents if necessary, looks like this:

```iterate <- function(n = 10, similiarity_threshold){
# subfunction that checks for a given x and y value if the agent is
# unsatisfied (returns TRUE or FALSE)
is.unsatisfied <- function(x_value, y_value, similiarity_threshold = 0.5){
# gets the race for the agent
cur_race <- schelling[x == x_value && y == y_value, race]

# checks if the home is empty to
if (cur_race == "empty"){
return(FALSE) # empty houses are not satisfied, therefore will not move!
}else{
# creates the square of the distance
# I avoid to take the squareroot to speed up the code
schelling[, distance := (x_value - x)^2 + (y_value - y)^2]

# counts the number of agents that live less than two fields away
# (includes all adjacent agents) and that are similar
count_similar <- nrow(schelling[distance <= 2 &&
race == cur_race &&
distance != 0])
# same here except that it looks into different agents
count_different <- nrow(schelling[distance <= 2 &&
race != cur_race &&
race != "empty"])

# calculates the ratio
ratio <- count_similar/(count_similar + count_different)

# returns TRUE if the ratio is below the threshold
return(ratio < similiarity_threshold)
}
}

# creates a ProgressBar, although this is not necessary, it does look nice..
pb <- txtProgressBar(min = 0, max = 1, style = 3)
# for time-keeping-purposes
t <- Sys.time()

# iterates
for (iterate in 1:n){
# fills the boolean vector "unsatisfied"
# indicates if the household is unsatisfied
schelling[, unsatisfied := is.unsatisfied(x_value = x,
y_value = y,
similiarity_threshold =
similiarity_threshold),
by = 1:nrow(schelling)]

# move unsatisfied agents to an empty house
# find the IDs that are empty where agents can migrate to
emptyIDs <- schelling[race == "empty", id] # finds the id of empty houses

# finds the origin of the agents moving,
# aka. the old ID of each household moving
oldIDs <- schelling[(unsatisfied), id] # origin

# generates new IDs for each household moving
# note that households can move to the house of other moving agents
# also, agents can (by a small chance) choose to "move" to their
# existing home
newIDs <- sample(x = c(emptyIDs, oldIDs),
size = length(oldIDs),
replace = F) # target

# a new data.table that shows
# what race migrates from which origin_id to which target-id
transition <- data.table(origin = oldIDs,
oldRace = schelling[id %in% oldIDs, race],
target = newIDs)

# moves the agents to the new homes
schelling[id %in% transition\$origin]\$race = "empty"
schelling[id %in% transition\$target]\$race = transition\$oldRace

# orders the schelling, although this takes some time,
# it is necessary for the other operations
schelling <- schelling[order(id)]

# updates the ProgressBar
setTxtProgressBar(pb, iterate/n)
}
close(pb)
timedif <- Sys.time() - t

# print out statistics for the calculation time
print(paste0("Time for calculation in seconds: ", round(timedif, 3), " or: ",
round(n / as.numeric(timedif), 3), " iterations per second"))

return(schelling)
}
```

## Results 1: 2 Races

By using the function I create a 50×50 grid with 2.500 agents and simulate 20 rounds (this process takes roughly a minute). A visualization is produced at 0, 10, and 20 iterations; after 20 rounds the 2-race simulation reaches an equilibrium as we can see by the few changes between the two states (10 and 20 iterations).

```
set.seed(42^2)
# initiate schelling
initiateSchelling(dimensions = c(50, 50), n_races = 2)
# plot schelling
plotSchelling(title = "Schelling-Segregation-Model after 0 iterations")

# iterate 10 times
schelling <- iterate(n = 10, similiarity_threshold = 0.5)
# plot the result after 10 iterations
plotSchelling(title = "Schelling-Segregation-Model after 10 iterations")

# iterate another 10 times
schelling <- iterate(n = 10, similiarity_threshold = 0.5)
# plot again after 20 iterations total
plotSchelling(title = "Schelling-Segregation-Model after 20 iterations")
``` Initial State ABM with 2 Races ABM with 2 Races after 10 Iterations ABM with 2 Races after 20 Iterations

Although it seems that the model found an equilibrium after 10 iterations, there are still some minor changes between the two states, albeit only few.

## Results 2: 4 Races

To see the ABM with 4 different races I used the following code to generate the following images.

```set.seed(42^3)
# initiate schelling
initiateSchelling(dimensions = c(50, 50), n_races = 4)
# plot schelling
plotSchelling(title = "Schelling-Segregation-Model after 0 iterations")

# iterate 10 times
schelling <- iterate(n = 10, similiarity_threshold = 0.5)
# plot the result after 10 iterations
plotSchelling(title = "Schelling-Segregation-Model after 10 iterations")

# iterate another 10 times
schelling <- iterate(n = 10, similiarity_threshold = 0.5)
# plot again after 20 iterations total
plotSchelling(title = "Schelling-Segregation-Model after 20 iterations")

# iterate another 30 times
schelling <- iterate(n = 30, similiarity_threshold = 0.5)
# plot again after 50 iterations total
plotSchelling(title = "Schelling-Segregation-Model after 50 iterations")
``` ABM with 4 Races Initial State ABM with 4 Races after 10 Iterations

There are almost no visible differences between the initial state and state after 10 iterations. ABM with 4 Races after 20 Iterations

A more notable change between the states after 10 and 20 iterations. ABM with 4 Races after 50 Iterations

Here we see that the model took more iterations to find an almost steady-state after 50 iterations (there are still some agents that will move in the next round, can you spot them?).

## Outro

These few lines show nicely what ABMs are, how they can be used and how they simplify complex human behavior.

Although `data.table` is enormously fast, the process still takes a lot of time to calculate. If you have any idea how to speed up the code, I would love to hear about it in the comments! If you have any other questions or remarks, I am of course happy to hear about them in the comments as well.

Lastly, if you want to see other implementations of the Schelling-Segregation Model in R, you can visit R Snippets or R-bloggers.