Local Indicators of Spatial Association in Homicides in Baltimore, 2019

Last year was yet another record year for homicides in Baltimore with 348 reported homicides for the year. While this number was the second-highest in terms of total homicides, it was probably the highest in terms of rate per 100,000 residents since Baltimore has been seeing a decrease in population. (The previous record was in 2015 and 2017.) The epidemic of violence and homicides that began in 2015 continues today.

When we epidemiologists analyze data, we need to be mindful of the spatial associations we see. The first law of geography states that “everything is related to everything else, but near things are more related than distant things.” As a result, clusters of cases of a disease or condition — or an outcome such as homicides — may be more a factor of similar things happening in close proximity to each other than an actual cluster.

To study this phenomenon and account for it in our analyses, we use different techniques to assess spatial autocorrelation and spatial dependence. In this blog post, I will guide you through doing a LISA (Local Indicators of Spatial Association) analysis of homicides in Baltimore City in 2019 using R programming.

First, The Data

Data on homicides was extracted from the Baltimore City Open Data portal. Under the “Public Safety” category, we find a data set titled “BPD Part 1 Victim Based Crime Data.” In those data, I filtered the observations to be only homicides and only those occurring/reported in 2019. This yielded 348 records, corresponding to the number reported for the year.

Data on Community Statistical Areas (agglomerations of neighborhoods based on similar neighborhood characteristics came from the Baltimore Neighborhood Indicators Alliance (BNIA). They are a very good source of data on Baltimore with regards to social and demographic indicators. From their open data portal, I extracted the shapefile of CSAs in the city.

Next, The R Code

First, I load the libraries that I’m going to use for this project. (I’ve tried to annotate the code as much as possible.)

library(tidyverse)
library(rgdal)
library(ggmap)
library(spatialEco)
library(tmap)
library(tigris)
library(spdep)
library(classInt)
library(gstat)
library(maptools)

Next, I load the data.

csas <-
  readOGR("2017_csas",
          "Vital_Signs_17_Census_Demographics") # Baltimore Community Statistical Areas
jail <-
  readOGR("Community_Statistical_Area",
          "community_statistical_area") # For the central jail
pro <-
  sp::CRS("+proj=longlat +datum=WGS84 +unit=us-ft") # Projection
csas <-
  sp::spTransform(csas,
                  pro) # Setting the projection for the CSAs shapefile
homicides <- read.csv("baltimore_homicides.csv",
                      stringsAsFactors = F) # Homicides in Baltimore
homicides <-
  as.data.frame(lapply(homicides, function(x)
    x[!is.na(homicides$Latitude)])) # Remove crimes without spatial data (i.e. addresses)

Note that I had to do some wrangling here. First, there is an area on the Baltimore Map that corresponds to a large jail complex in the center of the city. So I added a second map layer called jail to the data. I also corrected the projection on the layers. (This helps make sure that the points and polygons line up correctly since you’re overlaying a flat piece of data on a spherical part of the planet.) Finally, I took out homicides for which there was no spatial data. This sometimes happens when the true location of a homicide cannot be determined… Or when there is a lag/gap in data entry.

Next, I’m going to convert the homicides data frame into a spatial points file.

coordinates(homicides) <-
  ~ Longitude + Latitude # Makes the homicides CSV file into a spatial points file
proj4string(homicides) <-
  CRS("+proj=longlat +datum=WGS84 +unit=us-ft") # Give the right projection to the homicides spatial points

Note that I am telling R which variables in homicides corresponds to the longitude and latitude variables. Next, as I did before, I corrected the projection to match that of the CSAs layer.

Now, let’s join the points (homicides) to the polygons (csas).

homicides_csas <- point.in.poly(homicides,
                                csas) # Gives each homicide the attributes of the CSA it happened in
homicide.counts <-
  as.data.frame(homicides_csas) # Creates the table to see how many homicides in each CSA
counts <-
  homicide.counts %>%
  group_by(CSA2010, tpop10) %>%
  summarise(homicides = n()) # Determines the number of homicides in each CSA.
counts$tpop10 <-
  as.numeric(levels(counts$tpop10))[counts$tpop10] # Fixes tpop10 not being numeric
counts$rate <-
  (counts$homicides / counts$tpop10) * 100000 # Calculates the homicide rate per 100,000
map.counts <- geo_join(csas,
                       counts,
                       "CSA2010",
                       "CSA2010",
                       how = "left") # Joins the counts to the CSA shapefile
map.counts$homicides[is.na(map.counts$homicides)] <-
  0 # Turns all the NA's homicides into zeroes
map.counts$rate[is.na(map.counts$rate)] <-
  0 # Turns all the NA's rates into zeroes
jail <-
  jail[jail$NEIG55ID == 93, ] # Subsets only the jail from the CSA shapefile

Here, I also created a table to see how many points fell inside the polygons. I allowed me to look at how many homicides were in each of the CSAs. Here are the top five:

Community Statistical AreaHomicide Count
Southwest Baltimore29
Sandtown-Winchester/Harlem Park25
Greater Rosemont21
Clifton-Berea16
Greenmount East15

Of course, case counts alone are not giving us the whole information. We need to account for the differences in population numbers. For that, you’ll see that I created a rate variable. Here are the top five CSAs by homicide rate per 100,000 residents:

Community Statistical AreaHomicide Rate per 100,000 Residents
Greenmount East191
Sandtown-Winchester/Harlem Park189
Clifton-Berea176
Southwest Baltimore172
Madison/East End167

As you can see, the ranking based on homicide rates is different, but it is more informative. It accounts for the fact that you may see more homicides in a CSA simply because that CSA has more people in it. Now we have the rates. I then joined the counts and rates to the csas shapefile. This is what we’ll use to map. I also made sure that all the NA’s in the rates and counts are zeroes. Finally, I used only the CSA labeled “93” for the jail. This, again, is to make sure that the map represents that area and we know that it is not a place where people live.

The Maps

Look at the following simple plot of homicide locations in Baltimore:

Notice Something?

This is not a map. This is just a plot of the homicide locations. The latitude is on the Y axis and the longitude on the X axis. If you notice that there is some clustering of cases, you would be correct. Point-based cluster analysis can also be done on these data alone, but that is for a different blog post at a later time. (And you have to take other things into consideration.)

Next, I’m going to create a map of the points above (using the tmap package) but also show the boundaries of the CSAs. This will look more like a map and show you the spatial distribution of the homicide cases in context. You’ll see why there are blank spaces in some parts.

Here is the code:

tmap_mode("plot") # Set tmap to plot. To make interactive, use "view"

homicide.dots.map <-
  tm_shape(map.counts) +
  tm_borders(col = "black",
             lwd = 0.5) +
  tm_text("OBJECTID",
          size = 1) +
  tm_shape(homicides) +
  tm_dots(col = "red",
          title = "2019 Homicides",
          size = 0.2, ) +
  tm_compass() +
  tm_layout(
    main.title = "Map of Homicides and Homicide Rate in Baltimore City, 2019",
    main.title.size = 0.8,
    legend.position = c("left", "bottom"),
    compass.type = "4star",
    legend.text.size = 1,
    legend.title.size = 1,
    legend.outside = T
  ) +
  tm_scale_bar(
    text.size = 0.5,
    color.dark = "black",
    color.light = "yellow",
    lwd = 1
  ) +
  tm_add_legend(
    type = "symbol",
    col = "red",
    shape = 21,
    title = "Homicide",
    size = 0.5
  ) +
  tmap_options(unit = "mi") +
  tm_add_legend(
    type = "text",
    col = "black",
    title = "CSA ID#",
    text = "00"
  )
homicide.dots.map

And here is the map:

Click to enlarge.

Again, we see the clustering of homicides in some areas. Let’s look at homicide rates. First, the code:

homicide.rate.map <-
  tm_shape(map.counts) +
  tm_fill(
    "rate",
    title = "Homicide Rate per 100,000",
    style = "fisher",
    palette = "OrRd",
    colorNA = "white",
    textNA = "No Homicides"
  ) +
  tm_borders(col = "black",
             lwd = 0.5) +
  tm_text("OBJECTID",
          size = 1) +
  tm_shape(homicides) +
  tm_dots(col = "red",
          title = "2019 Homicides") +
  tm_shape(jail) +
  tm_fill(col = "#d0e2ec",
          title = "Jail") +
  tm_compass() +
  tm_layout(
    main.title = "Map of Homicides and Homicide Rate in Baltimore City, 2019",
    main.title.size = 0.8,
    legend.position = c("left", "bottom"),
    compass.type = "4star",
    legend.text.size = 1,
    legend.title.size = 1,
    bg.color = "#d0e2ec",
    legend.outside = T
  ) +
  tm_scale_bar(
    text.size = 0.5,
    color.dark = "black",
    color.light = "yellow",
    lwd = 1
  ) +
  tm_add_legend(
    type = "symbol",
    col = "red",
    shape = 21,
    title = "Homicide",
    size = 0.5
  ) +
  tm_add_legend(
    type = "text",
    col = "black",
    title = "CSA ID#",
    text = "00"
  ) +
  tmap_options(unit = "mi")
homicide.rate.map

It stands to reason that the CSAs with the most points would also have the highest rates, so here is the map:

Click to enlarge.

From here, we see that the CSAs with the highest rates are close together. We also see that those with the lowest rates are close together. And others are somewhere in between. If there was no spatial autocorrelation, we would expect these CSAs with high or low rates to be equally distributed all over the map. If there was perfect autocorrelations, all highs would be on one side and all lows would be on the other. What we have here is something in the middle, as is the case in most real-life examples.

LISA Analysis

In the next few lines of code, I’m going to do several things. First, I’m going to create a list of the neighbors to each CSA. One CSA, #4 at the far southern tip of the city, does not touch other CSAs. It looks like an island. So I’m going to manually add its neighbors to the list.

Next, I’m going to create a list of neighbors and weights. By weights, I mean that each neighbor will “weigh” the same. If a CSA has one neighbor, then the weight of that neighbor is 1. If they have four, each neighbor — to that CSA — weighs 0.25.

Next, I’m going to calculate the Local Moran’s I value. For a full discussion of Moran’s I, I recommend you visit this page: http://ceadserv1.nku.edu/longa//geomed/ppa/doc/LocalI/LocalI.htm
There are a lot of nuances to that value. But, basically, that value tells us how different from no spatial autocorrelation each CSAs is. Once you see it on the map, you’ll understand better.

Here’s the code:

map_nbq <-
  poly2nb(map.counts) # Creates list of neighbors to each CSA
added <-
  as.integer(c(7, 50)) # Adds neighbors 7 and 50 to CSA 4 (it looks like an island)
map_nbq[[4]] <- added # Fixes the region (4) without neighbors
View(map_nbq) # View to make sure CSA 4 has neighbors 7 and 50.

map_nbq_w <-
  nb2listw(map_nbq) # Creates list of neighbors and weights. Weight = 1/number of neighbors.
View(map_nbq_w) # View the list

local.moran <-
  localmoran(map.counts$rate, # Which variable you'll use for the autocorrelation
             map_nbq_w, # The list of weighted neighbors
             zero.policy = T) # Calculate local Moran's I
local.moran <- as.data.frame(local.moran)
summary(local.moran) # Look at the summary of Moran's I

map.counts$srate <-
  scale(map.counts$rate) # save to a new column the standardized rate

map.counts$lag_srate <- lag.listw(map_nbq_w,
                                  map.counts$srate) # Spatially lagged standardized rate in new column

summary(map.counts$srate) # Summary of the srate variable
summary(map.counts$lag_srate) # Summary of the lag_srate variable

Now, we’ll take a look at the Moran’s I plot created with this code:

x <- map.counts$srate %>% as.vector()
y <- map.counts$lag_srate %>% as.vector()
xx <- data.frame(x, y) # Matrix of the variables we just created

moran.plot(x, map_nbq_w) # One way to make a Moran Plot

And here is the plot:

Click to enlarge.

This plot tells us which CSAs are in the High-High (right upper quadrant), High-Low (right lower quadrant), Low-Low (left lower quadrant) and Low-High (left upper quadrant) designations for Moran’s I values. The High-High CSAs mean that their value was high and the values of the CSAs (their neighbors) were high also. That is expected since everything is like everything else close to it, right?

The Low-Low CSAs are the same. The ones that are outlier in this regard are the High-Low and Low-High, because these are CSAs that are high values (by values, we are looking at a calculation derived from standardized homicide rates) in a sea of low values, or low values in a sea of high values. The plot above has highlighted the ones that are not spatial outliers and have a significant p-value. (More on the p-value later.)

There is another way to make the same plot, if you’re so inclined:

plot(
  x = map.counts$srate,
  y = map.counts$lag_srate,
  main = "Moran Scatterplot Homicide Rate",
  xlab = "Homicide Rate (Scaled)",
  ylab = "Lagged Homicides Rate (Scaled)"
) # Another way to make a Moran Plot
abline(h = 0,
       v = 0) # Adds the crosshairs at (0,0)
abline(
  lm(map.counts$lag_srate ~ map.counts$srate),
  lty = 3,
  lwd = 4,
  col = "red"
) # Adds a red dotted line to show the line of best fit

And this is the resulting plot from that:

Click to enlarge.

So those are the plots of the Moran’s I value. What about a map of these values? For that, we need to prepare our CSAs with the appropriate quadrant designations.

map.counts$quad <-
  NA # Create variable for where the pair falls on the quadrants of the Moran plot
map.counts@data[(map.counts$srate >= 0 &
                   map.counts$lag_srate >= 0), "quad"] <-
  "High-High" # High-High
map.counts@data[(map.counts$srate <= 0 &
                   map.counts$lag_srate <= 0), "quad"] <-
  "Low-Low" # Low-Low
map.counts@data[(map.counts$srate >= 0 &
                   map.counts$lag_srate <= 0), "quad"] <-
  "High-Low" # High-Low
map.counts@data[(map.counts$srate <= 0 &
                   map.counts$lag_srate >= 0), "quad"] <-
  "Low-High" # Low-High

This gives us this map:

Click to enlarge.

Here is the code for that map:

local.moran$OBJECTID <- 0 # Creating a new variable
local.moran$OBJECTID <- 1:nrow(local.moran) # Adding an object ID
local.moran$pvalue <-
  round(local.moran$`Pr(z > 0)`, 3) # Rounding the p value to three decimal places

map.counts <- geo_join(map.counts,
                       local.moran,
                       "OBJECTID",
                       "OBJECTID")

colors <-
  c("red", "blue", "lightpink", "skyblue2", "white") # Color Palette

local.moran.map <-
  tm_shape(map.counts) +
  tm_fill("quad",
          title = "Local Moran's I",
          palette = colors,
          colorNA = "white") +
  tm_borders(col = "black",
             lwd = 0.5) +
  tm_text("pvalue",
          size = 0.5) +
  tm_shape(jail) +
  tm_fill(col = "gray",
          title = "Jail") +
  tm_compass() +
  tm_layout(
    main.title = "Map of Local Moran's I for Homicide Rates in Baltimore, 2019",
    main.title.size = 0.8,
    legend.position = c("left", "bottom"),
    compass.type = "4star",
    legend.text.size = 1,
    legend.title.size = 1,
    legend.outside = T,
    bg.color = "#d0e2ec"
  ) +
  tm_scale_bar(
    text.size = 0.5,
    color.dark = "black",
    color.light = "yellow",
    lwd = 1
  ) +
  tm_add_legend(
    type = "text",
    col = "black",
    title = "Moran's I p-value",
    text = "0.000"
  ) +
  tmap_options(unit = "mi")
local.moran.map

The CSAs in red are those that have high Moran’s I values and are surrounded by other CSAs with high values. The light blues are where there are low values and are are surrounded by lows. The dark blue are CSAs with high values but with low neighbors, and the light pink are CSAs with low values but high neighbors. The numbers inside each CSA is the p-value.

The Thing About Significance

If you’ve done your coursework on biostatistics, you’ll see that there is a lot of attention paid to p-values that are less than 0.05. In this case, only a few CSAs have such low p-values. But that may not be enough. In this explainer on spatial association, the author states the following:

An important methodological issue associated with the local spatial autocorrelation statistics is the selection of the p-value cut-off to properly reflect the desired Type I error. Not only are the pseudo p-values not analytical, since they are the result of a computational permutation process, but they also suffer from the problem of multiple comparisons (for a detailed discussion, see de Castro and Singer 2006). The bottom line is that a traditional choice of 0.05 is likely to lead to many false positives, i.e., rejections of the null when in fact it holds.

Source: https://geodacenter.github.io/workbook/6a_local_auto/lab6a.html

For this exercise, we are going to keep the p-value significance at 0.05 and create a map that uses the resulting CSAs. The code:

map.counts$quad_sig <-
  NA # Creates a variable for where the significant pairs fall on the Moran plot
map.counts@data[(map.counts$srate >= 0 &
                   map.counts$lag_srate >= 0) &
                  (local.moran[, 5] <= 0.05), "quad_sig"] <-
  "High-High, p <0.05" # High-High
map.counts@data[(map.counts$srate <= 0 &
                   map.counts$lag_srate <= 0) &
                  (local.moran[, 5] <= 0.05), "quad_sig"] <-
  "Low-Low, p <0.05" # Low-Low
map.counts@data[(map.counts$srate >= 0 &
                   map.counts$lag_srate <= 0) &
                  (local.moran[, 5] <= 0.05), "quad_sig"] <-
  "High-Low, p <0.05" # High-Low
map.counts@data[(map.counts$srate <= 0 &
                   map.counts$lag_srate >= 0) &
                  (local.moran[, 5] <= 0.05), "quad_sig"] <-
  "Low-High, p <0.05" # Low-High
map.counts@data[(map.counts$srate <= 0 &
                   map.counts$lag_srate >= 0) &
                  (local.moran[, 5] <= 0.05), "quad_sig"] <-
  "Not Significant. p>0.05" # Non-significant

Note that I am using the p-value in the local.moran calculation (position #5) in addition to where on the moran plot the CSAs fall. This really eliminates the non-significant (at p = 0.05) CSAs and leaves us with only eight CSAs. Here is the code for the map:

local.moran.map.sig <-
  tm_shape(map.counts) +
  tm_fill(
    "quad_sig",
    title = "Local Moran's I",
    palette = colors,
    colorNA = "white",
    textNA = "Not Significant"
  ) +
  tm_borders(col = "black",
             lwd = 0.5) +
  tm_text("pvalue",
          size = 0.5) +
  tm_shape(jail) +
  tm_fill(col = "gray",
          title = "Jail") +
  tm_compass() +
  tm_layout(
    main.title = "Map of Local Moran's I for Homicide Rates in Baltimore, 2019",
    main.title.size = 0.8,
    legend.position = c("left", "bottom"),
    compass.type = "4star",
    legend.text.size = 1,
    legend.title.size = 1,
    legend.outside = T,
    bg.color = "#d0e2ec"
  ) +
  tm_scale_bar(
    text.size = 0.5,
    color.dark = "black",
    color.light = "yellow",
    lwd = 1
  ) +
  tm_add_legend(
    type = "text",
    col = "black",
    title = "Moran's I p-value",
    text = "0.000"
  ) +
  tmap_options(unit = "mi")
local.moran.map.sig

And here is the map itself:

Click to enlarge.

What this is telling us is that only those eight CSAs had a statistically significant Moran’s I value, and they were all in the “High-High” classification. This means that they have high standardized values (which are themselves derived from homicide rates) and are surrounded by CSAs with high values. There were no significant CSAs from the other quadrants on the Moran plot. (Remember that significance here comes with a big caveat.)

In Closing

What does this all mean? It means that the “butterfly pattern” seen in so many maps of social and health indicators in Baltimore still holds. The two regions in red in that last map make up the butterfly’s wings. Those CSAs are also the ones with the highest homicide rates.

Click to enlarge.

You could say that it is in those eight CSAs where homicides are the most concentrated. These are known as spatial clusters. If we had significant “Low-High” or “High-Low” CSAs, they would be known as spatial outliers. Significant “Low-Low” CSAs would also be spatial clusters.

As I mentioned above, we can also do other analyses, some of them looking at the individual locations of the homicides, the dots in the first plot and the first map. You could generate statistics that tell us whether those dots are closer together to each other than what you would normally see if they were just tossed on the map randomly.

A full geostatistical analysis includes these analyses, and you can see an example of the whole thing if you look at my dissertation. You may want to include the Local Moran’s I value or some other term (I used average number of homicides in neighbors) in a regression analysis. Or you could use composite index to account for different factors.

Crab Cakes and Circus

As of December 15, 2019, there have been 325 homicides for the year in Baltimore, Maryland. In the 349 days of the year, there have been an average of 0.93 homicides per day. If the trend continues, Baltimore will come very close to breaking the all-time record for homicides in one year that it set in 2015 and 2017 with 342 homicides. This is the continuation of an epidemic of homicides that started in 2015 and has continued unabated ever since.

You can read more about it in my doctoral dissertation.

This week, the Baltimore Ravens clinched a berth in the playoffs of the National Football League. The team has been playing well, and many have them as favorites not only to make the championship but also to win it. They have a star quarterback and a good defensive corps. They probably will go all the way.

Often, on my drive to work, I’ll tune in and listen to the local Baltimore news just to see what’s going on. I’ve been surprised that almost everyone from the top to the bottom of the political structure kind of mentions the homicide epidemic and accompanying violence, but they seemingly go nuts when talking about their football team. They say that the team’s success brings “hope” to Baltimore, and they encourage the downtrodden citizens of the Charmed City to pay attention to the games and celebrate.

I don’t blame them, though. For almost as long as I’ve been alive, I’ve cheered on the Mexican Men’s National Soccer team even as horrible things were going on around me. Even once I came to the United States, I’d ignore the political situations around the teams that I cheered for and just watched the games. I’d make a big deal of the smallest drama from athletes who, frankly, have zero direct effect on my everyday life.

Heck, I’d find “hope” and “inspiration” from athletes who would hurt themselves and make a comeback, or have a bad game and keep it together enough to win. I thought long and hard what it meant when Adam Jones — then an outfielder for the Baltimore Orioles said, “Success makes you more confident, but I try to keep an even keel. One day at a time.” Don’t get too caught up in the moment and stop thinking because you’re celebrating, I think.

We humans need entertainment. We can’t be too focused on all the crazy things going on around us for too long because it will eat us alive. We need to be distracted from the horrors and the pains of life. We go to football games, the theater, the movies, a concert… Or we go watch wrestling that isn’t wrestling:

Start it at 21:34 to get to the main point, or watch it all to see that wresting isn’t wrestling.

But is there a limit to all this? Is there a point at which we are too entertained and not involved enough in real life? I certainly think so. I see it all the time in children and young people who are hyperconnected to their phones but not each other, or the older people who think that the current political goings on in Washington are more of a reality television show than events impacting real lives.

So where do we draw the line? How do I teach my child to look for entertainment that allows her to get away while teaching her to keep her eye on the realities of life? When is too much fanboy-ism too much?

I guess it varies from person to person and depends on maturity. I’ve found myself being more cynical of organizations like the NFL, MLB, NBA and the many soccer federations around the world. Yeah, I watch the games, and I love seeing the sport being performed by some incredibly talented athletes. But I don’t hold those athletes to much of a higher standard anymore. Their scandals are not scandalous to me anymore. They’re regular people, not gods; so they’re going to do some very stupid things.

At the same time, some of them will do great things, things that need to be emulated by as many of us as possible. For example, Dwayne Wade talking about his child and his acceptance of the gender identity of his child is something that every parent should listen to:

Sadly, it is very rare to see this in athletes. Most of them are too young and self-absorbed to really try to bring good into the world through their actions off the field. Many do try. They do involve themselves in public relations stunts that lure more fans to their events, to their websites to buy jerseys and hats and all other sorts of memorabilia.

Rarely do you see any of them in the most violent neighborhoods, talking the gangs out their wars. But, again, I don’t blame them. I’d do what they do if it guaranteed me tens of millions of dollars a year to play a fun game that I’ve enjoyed since a young age and promised me fame. Fame and fortune is what it’s all about.

It is now December 20, 2019, as I finish this blog post, five days since I started it. The homicide count in Baltimore now stands at 333, eight more than five days ago. In eleven days, 2019 will end, and it will likely be the most deadly year for the city, ever… But, hey, go Ravens.

How Many Guns Were Within 1,000 Feet of Schools in Baltimore in 2018?

According to the Gun-Free School Zones Act of 1990, people may not posses a gun within 1,000 feet of a school if that gun traveled across state lines to get there (since Congress can regulate guns by regulating interstate commerce) and if the person carrying the gun is not licensed to do so. With that in mind, I got curious about how many homicides by firearm happened in Baltimore. It was part of my dissertation, but it had to be cut from the dissertation and the presentation because of content and time constraints.

My data analysis for the dissertation only went up to 2017, so I’ve been curious to see what happened in 2018. So how about we take a look and use R to see what happened? (Please note that a firearm homicide or shooting within 1,000 feet of a school may not have necessarily been done with an unlicensed gun.)

The Data

First, we get data on the firearm homicides and shooting in Baltimore in 2018 from the Baltimore City Open Data portal:

This gives me a list of 5,343 incidents in which a firearm was used and where there was a location noted. They boil down to this. (Note: These are all crimes reported. The actual number may be different.):

Type of CrimeCount
Aggravated Assault1,474
Homicide273
Robbery – Carjacking337
Robbery – Commercial533
Robbery – Residence121
Robbery – Street1,928
Shooting677

With that in hand, including the longitude and latitude, we throw them on a map to see where they happened. To create that map, I used the shapefiles of the Community Statistical Areas (CSAs) from the city’s Open Data portal.

Of course, I could have done all this in QGIS or ArcGIS, and it would have produced a map like this:

Click to enlarge.

It’s very crowded because there were that many firearm-related crimes. The thing about doing this in QGIS and other programs with point-and-click interfaces is that you’re not exactly recording what you just did. You’d have to take the time to write a how-to manual to replicate this map above exactly. That, or you’d do a screen recording.

This is where programs like R come in. With it, you’re literally writing the how-to manual as you write your code. Well-commented code allows for the next person to grab your code and replicate your results exactly, with little effort to figure out what you did and how you did it. So let’s replicate the map above in R, and even get some statistics on which school had the most number of firearm crimes.

The Code

First, we’re going to load some libraries. I’ve commented the use of each library.

# Load the required libraries ----

library(rgdal) # To manage spatial data
library(tigris) # To manage spatial data
library(tmap) # To create the maps
library(tmaptools) # To add more functionality to the maps created with tmap
library(raster) # To manipulate spatial data
library(sp) # To manipulate spatial data
library(rgeos) # To manipulate spatial data
library(sf) # To manipulate spatial data
library(spatialEco) # To join the layers
library(dplyr) # To manage data (make tables and summarize values)

Now, let’s bring in the data.

# Import the data and make it readable ----

crimes <-
  read.csv("data/gun_crimes_2018.csv", stringsAsFactors = F) # Import gun crime data
crimes <-
  as.data.frame(lapply(crimes, function(x)
    x[!is.na(crimes$Latitude)])) # Remove crimes without spatial data (i.e. addresses)
csas <-
  readOGR("data", "community_statistical_area") # Import Community Statistical Area shapefiles
schools <- readOGR("data", "bcpss_school") # Import Schools

Now, let’s turn crimes from a dataframe into a proper shapefile, and give it the right projection.

# Create point shapefile from dataframe ----

coordinates(crimes) <-
  ~ Longitude + Latitude # Assign the variables that are the coordinates
writeOGR(crimes, "data", "guncrimes", driver = "ESRI Shapefile") # Write it as a shapefile
guncrimes <- readOGR("data", "guncrimes") # Read it back in
proj4string(guncrimes) <-
  CRS("+proj=longlat +datum=WGS84 +unit=us-ft") # Give the right projection

A simple line of code to look at the number of firearm crimes by type.

table(crimes$Description)

Next, because we’re going to be putting layers on top of each other, we need to make sure that they all have the right projection (Coordinate Reference System).

# Fix the projections of the other layers ---

pro <- sp::CRS("+proj=longlat +datum=WGS84 +unit=us-ft")
csas <- sp::spTransform(csas, pro)
schools <- sp::spTransform(schools, pro)

With all the layers on the same projection, let’s build our first map using tmap.

# Create map of Firearm Crimes ----

tmap_mode("plot") # Set tmap to "plot" (instead of "view")

baltimore1 <-
  tm_shape(csas) + # Adds the Community Statistical Areas layer
  tm_borders("black", lwd = .5) + # Makes those borders black and thin
  tm_shape(guncrimes) + # Adds the layer of gun-related crimes
  tm_dots("Dscrptn", # Creates a dot for each gun-related crime, color-coded by type of crime
          title = "Crime Type",
          size = 0.1) +
  tm_compass() + # Adds the north star compass
  tm_legend() + # Adds the legend
  tm_layout( # Controls the layout of the different elements of the map
    main.title = "Map of Firearm Crimes in Baltimore, 2018",
    main.title.size = 1,
    legend.position = c("left", "bottom"),
    compass.type = "4star",
    legend.text.size = 0.7,
    legend.title.size = 0.9
  ) +
  tm_scale_bar( # Adds the scale bar
    size = 0.5,
    color.dark = "black",
    color.light = "white",
    lwd = 1
  ) +
  tmap_options(unit = "mi") # Makes sure the scale bar is in miles

baltimore1 # Let's look at the map

tmap_save(tm = baltimore1, # Saves the map
          filename = "Maps/baltimore1.bmp", # File name of the image
          dpi = 600) # Resolution of the image saved

This is what we create:

Click to enlarge

Let’s build a second map of school locations. (Here, I don’t have as many comments because it’s a replication of the first map above.):

# Create map of School locations ----

baltimore2 <-
  tm_shape(csas) +
  tm_borders("black", lwd = .5) +
  tm_shape(schools) +
  tm_dots(
    "CATEGORY",
    shape = 17,
    size = 0.1,
    title = "School Type",
    col = "red",
    legend.show = T
  ) +
  tm_legend() +
  tm_compass() +
  tm_layout(
    main.title = "Map of School Locations in Baltimore, 2018",
    main.title.size = 1,
    legend.position = c("left", "bottom"),
    compass.type = "4star",
    legend.text.size = 0.7,
    legend.title.size = 0.9
  ) +
  tm_scale_bar(
    size = 0.5,
    color.dark = "black",
    color.light = "white",
    lwd = 1
  ) +
  tm_add_legend(
    "symbol",
    col = "red",
    shape = 17,
    size = 0.5,
    labels = "SCHOOL"
  ) +
  tmap_options(unit = "mi")

baltimore2

tmap_save(tm = baltimore2,
          filename = "Maps/baltimore2.jpg",
          dpi = 600)

This is what we get:

Click to enlarge

Now, let’s create a map of firearm-related crimes and school locations:

# Create map of school locations and firearm crimes locations ----

baltimore3 <-
  tm_shape(csas) +
  tm_borders("black", lwd = .5) +
  tm_shape(schools) +
  tm_dots(
    "CATEGORY",
    shape = 17,
    size = 0.1,
    title = "School Type",
    col = "red",
    legend.show = T
  ) +
  tm_shape(guncrimes) +
  tm_dots("Dscrptn",
          title = "Crime Type",
          size = 0.08) +
  tm_compass() +
  tm_layout(
    main.title = "Map of Firearm Crimes and Schools in Baltimore, 2018",
    main.title.size = 1,
    legend.position = c("left", "bottom"),
    compass.type = "4star",
    legend.text.size = 0.7,
    legend.title.size = 0.9
  ) +
  tm_scale_bar(
    size = 0.5,
    color.dark = "black",
    color.light = "white",
    lwd = 1
  ) +
  tm_add_legend(
    "symbol",
    col = "red",
    shape = 17,
    size = 0.5,
    labels = "SCHOOL"
  ) +
  tmap_options(unit = "mi")

baltimore3

tmap_save(tm = baltimore3,
          filename = "Maps/baltimore3.jpg",
          dpi = 600)

This is the map:

Click to enlarge

Now, we’re going to create a 1,000-ft buffer around the schools:

# Create a 1,000-foot buffer around the schools ----
# https://www.rdocumentation.org/packages/raster/versions/2.8-19/topics/buffer

buf <- buffer(schools, # Layer for buffers
              width = 304.8, # Buffer in meters
              dissolve = F)
plot(buf) # Take a look at the result

Make sure you convert your feet to meters, and make sure to verify that what you’ve created is what you were looking for.

Now, let’s see those buffers around the schools on a map:

# Create map with buffers and schools only ----

baltimore4 <-
  tm_shape(csas) +
  tm_borders("black", lwd = .5) +
  tm_shape(buf) +
  tm_borders("blue", lwd = .5) +
  tm_shape(schools) +
  tm_dots(
    "CATEGORY",
    shape = 17,
    size = 0.1,
    title = "School Type",
    col = "red"
  ) +
  tm_compass() +
  tm_layout(
    main.title = "Map of 1000-ft Buffers Around Schools in Baltimore, 2018",
    main.title.size = 1,
    legend.position = c("left", "bottom"),
    compass.type = "4star",
    legend.text.size = 0.7,
    legend.title.size = 0.9
  ) +
  tm_scale_bar(
    size = 0.5,
    color.dark = "black",
    color.light = "white",
    lwd = 1
  ) +
  tm_add_legend(
    "symbol",
    col = "red",
    shape = 17,
    size = 0.5,
    labels = "SCHOOL"
  ) +
  tm_add_legend("line",
                col = "blue",
                size = 0.1,
                labels = "1000-ft Buffer") +
  tmap_options(unit = "mi")

baltimore4

tmap_save(tm = baltimore4,
          filename = "Maps/baltimore4.jpg",
          dpi = 600)

Here is what it looks like:

Not bad, right? (Click to enlarge)

And now, let’s create the final map:

# Create map with buffer, schools and crimes. This is the final map. ----

baltimore5 <-
  tm_shape(csas) +
  tm_borders("black", lwd = .5) +
  tm_shape(buf) +
  tm_borders("blue", lwd = .5) +
  tm_shape(schools) +
  tm_dots(
    "CATEGORY",
    title = "Schools",
    shape = 17,
    size = 0.1,
    col = "red"
  ) +
  tm_shape(guncrimes) +
  tm_dots("Dscrptn",
          title = "Crime Type",
          size = 0.08) +
  tm_compass() +
  tm_layout(
    main.title = "Map of Firearm Crimes, Schools and 1,000-ft Buffers in Baltimore, 2018",
    main.title.size = 0.8,
    legend.position = c("left", "bottom"),
    compass.type = "4star",
    legend.text.size = 0.7,
    legend.title.size = 0.9
  ) +
  tm_scale_bar(
    size = 0.5,
    color.dark = "black",
    color.light = "white",
    lwd = 1
  ) +
  tm_add_legend(
    "symbol",
    col = "red",
    shape = 17,
    size = 0.5,
    labels = "SCHOOL"
  ) +
  tm_add_legend("line",
                col = "blue",
                size = 0.1,
                labels = "1000-ft Buffer") +
  tmap_options(unit = "mi")

baltimore5

tmap_save(tm = baltimore5,
          filename = "Maps/baltimore5.jpg",
          dpi = 600)

And this is what that looks like:

Click to enlarge

Does that look like the map created in QGIS? Absolutely. Just some tweaking to what the markers look like — and their sizes — and we get two similar maps.

The good thing about tmap is that it allows you to create an interactive map with just a quick line of code. (The other R package, leaflet, does something similar, and might even be a little more “robust,” in my opinion.)

# View the final map in interactive mode ---

tmap_mode("view") # Set the mode to view in order to create an interactive map
baltimore5
tmap_mode("plot") # Return to plotting if further maps are going to be made

Now, for analyzing the data, let’s create a dataframe by joining all the points into the buffer zones.

# Now to create a dataframe of the gun crimes within the buffer zones ----

crimespoints <- as.data.frame(point.in.poly(guncrimes, # Points
                                            buf)) # Polygons
crimesbuffer <-
  subset(crimespoints, NAME != "NA") # Creating a dataframe of only the points inside the polygons (NAME is not NA)

If “NAME” is NA (not available), then that means that the point is not inside a buffer. I subsetted that into “crimesbuffer” where I will do the analysis of the 182 schools with at least one (1) gun-related crime within 1,000 feet, and I could even do an analysis of the 2,485 crimes that happened within 1,000 feet of a school.

# Finally, let's see which school had the most gun crimes within 1,000 feet from it ----

table <-
  count(crimesbuffer,
        NAME,
        sort = T) # Create a table of names and numbers and sorts it descending
sort(table$n,
     decreasing = T) # Sorting the table
table # View the table
hist(table$n) # Histogram of crime counts (Notice a distribution?)
sum(table$n)

From this, I found that the school with the most firearm-related crimes within 1,000 feet of it was William Paca Elementary at 200 N Lakewood Ave.

Lots of guns were around these schools.

That school had 81 firearm-related crimes within 1,000 feet from it.

Further Analysis?

From here, the types of analyses you can do are only limited by your data and your imagination. You can extract socioeconomic data from the CSA shapefiles and do a regression to see if anything is predictive of the number of firearm-related crimes within 1,000 feet of schools. (Remember that you’re dealing with count data, so you’ll need to use the Poisson distribution or maybe even a negative binomial regression. Also remember to deal with spatial autocorrelation.)

At the very least, you can now go to policymakers or the parent-teacher associations and tell them which schools are at higher risk of having firearm-related crimes close to them. If you’re really ambitious, you could use the time codes on the crime data and compare crimes during school hours versus crimes after hours. Finally, you can expand or reduce the buffer to something that is significant to your audience.

The Full Code and Data

As with all of these mini-projects of mine, the full code and data can be found by clicking the button below. Please remember to link back to this blog post or my blog home page and to give proper attribution for this within your code or in your write-up. The code is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0).

And please do let me know if you have ideas on improving this in any way. I’m all ears when it comes to collaboration.

Analysis of Public Health Inequities Using R Programming

Note: A previous version of this blog post appeared on my Github page: https://rfnajera.github.io/baltimore-homicide-inequity/

Poverty and Violence

There is evidence that poverty and violence go hand-in-hand for a number of reasons. Those who are poor may come into friction with those who are not. Or the lack of wealth is systemic, learning to a poor public safety infrastructure. Or those who are poor do things deemed criminal by society in order to survive.

In this project, I take data from Baltimore on homicides in 2016 and data from the Baltimore Neighborhood Indicators Alliance and see if homicides were inequitably distributed in Baltimore.

This is work that Public Health professionals can reproduce in their own locals (or even in Baltimore) using their own data and looking at other social and health indicators. This project is based on “Methods for measuring inequalities in health” by Schneider et al. (The et al includes my doctoral thesis advisor and mentor, Dr. Carlos Castillo-Salgado.)

For more information, please feel free to contact me via Twitter: @EpiRen

Data Sources

For my doctoral dissertation, I received a list of homicide cases from Justin Fenton at The Baltimore Sun. That list contained the names, locations, and some demographic (age, gender, race) information on homicides reported in Baltimore between 2005 and 2016. I added to those data the same data that I had collected from 2017. In order to validate the dataset, I randomly pulled 400 cases and searched for news archives, obituaries, and official sources to confirm them. All 400 cases were confirmed this way. I then geocoded these data to get the latitude and longitude of the incident location through ArcGIS.

The information on the Community Statistical Areas (CSA) came from the Baltimore Neighborhood Indicators Alliance. BNIA collects and cleans data about Baltimore every year. For this project, I’ve used their Vital Signs 16 data, which you can explore. I obtained the percent of households under the poverty level by (CSA).

All of the data and code that you need for this project are in the project’s GitHub repository. The R code has all the comments you’ll need to understand what is happening in each line of code, as well as the questions that we will be answering below.

Alternatively, you can download the zip file with all the code and data.

The Main Aim of This Project

The main aim of this project is to give Public Health professionals a primer on using R programming to determine if a health indicator (homicides, in this case) is equitably distributed in a location (Baltimore) according to that location’s indicator of wealth (percent of households under the poverty level, in this case).

You Might Want to Watch This First

Here is a quick YouTube video I made of me running the code on RStudio. It’s 15 minutes, and you can see how easy it is to look at the code, run it, and the understand what each line did.

Step One: Are Homicides and Poverty Distributed in a Spatially Similar Way in Baltimore?

For this first part, we bring in all the data of homicides from 2005 to 2017. I’ve already geocoded these data. If it only had street addresses (at the block level) of homicides, they would need to be geocoded into latitude-longitude coordinates. But that’s for a different project at a different time. What is also included in the dataset are the names of the Community Statistical Areas (CSA) they belong to. CSAs are a way to bring together neighborhoods into bigger areas while maintaining as much as possible of the neighborhoods’ geographic and sociodemographic characteristics.

# Read the homicides between 2005 and 2017 into a dataframe.
# Since we're using 2016 CSA data, we'll only use 2016 homicides.
# The data we will use was obtained from The Baltimore Sun and
# cleaned up by the author for work in his doctoral (DrPH) dissertation.
homicides_all <- read.csv("data/baltimore_homicides_2005_2017.csv")

#Subset only the homicides in 2016
library(tidyr)
library(dplyr)
homicides_2016 <- homicides_all %>% filter(year==2016)

# Save this subset as a .csv file if you want to use it later
write.csv(homicides_2016, file = "data/baltimore_homicides_2016.csv")

Question 1

How many homicides were reported in 2016?

Now we’re ready to bring in a shapefile from BNIA which contains the shape of the CSAs in Baltimore for mapping purposes.

# Bring in shapefile with population data obtained from the
# Baltimore Neighborhood Indicators Alliance at:
# https://bniajfi.org/vital_signs/data_downloads/
library(rgdal)
ogrInfo(dsn="data/bnia_16", 
        layer="bnia_16") # see shapefile info
csa_baltimore <- readOGR("data/bnia_16",
                         "bnia_16") # read the shapefile into R
csa_baltimore <- spTransform(csa_baltimore, 
                             CRS("+init=epsg:4326"))  #transform coord system:
# More projections information: http://trac.osgeo.org/proj/wiki/GenParms

Because R needs to deal with this as a dataframe, we transform the shapefile into a dataframe.

# Tranform the shapefile into a dataframe, keep the CSA name information
library(ggplot2)
csa_baltimore <- fortify(csa_baltimore, 
                         region = "CSA2010") # "id" is the name of the CSA column

Now we get the total number of homicides per CSA

# Tranform the shapefile into a dataframe, keep the CSA name information
library(ggplot2)
csa_baltimore <- fortify(csa_baltimore, 
                         region = "CSA2010") # "id" is the name of the CSA column

Question 2

Which CSA had the highest number of homicides reported in 2016?

Now, we’re going to bring in BNIA data on population and percent of households under the poverty line. We will then join that information with the homicide counts we just created. We then save the data for later use.

# Add the population and percent of households under poverty level
# to the information by CSA.
#These data also from BNIA at https://bniajfi.org/vital_signs/data_downloads/

bnia_demo <- read.csv("data/bnia_demographics_2016.csv") # Reads the data

csa_info <- left_join(bnia_demo,
                      homicides_csa, 
                      by = "CSA2010", 
                      type = "right") # Joins the data to the homicide per CSA table.
                                      # There will be some NAs because some CSAs did not have homicides.

# Some of the CSAs did not have homicides.So we need to turn those into zeroes. 
# It prevent's NAs in the next step, calculating the homicide rate.
csa_info[is.na(csa_info)] <- 0  

#Calculate the homicide rate per 100,000 residents
csa_info$homicide_rate <- csa_info$total / (csa_info$pop/100000)

# Save this for later use as a .csv
write.csv(csa_info, file = "data/baltimore_homicides_2016_with_counts.csv")

Question 3

Which CSA had the highest homicide rate per 100,000 residents reported in 2016?

Making a Choropleth Map

You’ve probably seen choropleth maps before. They’re those maps where the shading of a geographic unit (e.g. a US State) depends on the value of some variable (e.g. Population). For this exercise, we’re going to make two choropleth maps: One showing the percent of households under the poverty level by CSA in Baltimore (based off the Vital Signs 16 data), and one showing the homicide rate per 100,000 residents we just calculated above from the homicide data and the population per CSA from BNIA.

For this, we will join the homicide counts, rates, and percent poverty to the CSA shapefile, then make the map using ggmap. Look at the code that customizes the map. The neat thing about doing this in code is that you don’t have to remember which buttons you clicked if you want to reproduce your work. If you use a graphic user interface (GUI), or point-and-click, you might forget how you did something. Here, it’s all in code, with comments and such, ready for you to plug in your data, tweak a few things in the code to fit your needs, and output something you can use.

(NOTE: GGMap requires a Google API, which you can get for free, so long as you don’t overuse it to get maps.)

# Join CSA info (the counts of homicides, homicide rate, etc.) to CSA polygon
csa_baltimore2 <- merge(x = csa_baltimore, 
                        y = csa_info, 
                        by.x = "id", 
                        by.y = "CSA2010", 
                        all.x = TRUE)

# Choropleth of poverty

library(scales) # For the palette and pretty breaks
library(ggmap) # For the map
baltimore_roadmap <- get_map("Baltimore", 
                             zoom = 12,
                             maptype = c("toner"),
                         source = "stamen")
baltimore_poverty <- ggmap(baltimore_roadmap)
baltimore_poverty <- baltimore_poverty +
  geom_polygon(aes(x=long, 
                   y=lat, 
                   group=group, 
                   fill = poverty),
               size=.3, 
               color='black', 
               data=csa_baltimore2,
               alpha=0.8) +
  scale_fill_distiller(type="seq", 
                       palette = "PuOr",  # Gives a nice scale. 
                       breaks=pretty_breaks(n=5), 
                       name="% of Households Under Poverty", 
                       na.value = "transparent") +  # There should not be any NAs, right?
  labs(x=NULL, 
       y=NULL,
       # Remember to use a descriptive title
       title="Percent of Households Under the Poverty Level by Community Statistical Area in Baltimore",
       subtitle=NULL,
       caption="Source: bnia.org") +
  theme(plot.title = element_text(face="bold", 
                                  family="Arial", 
                                  size=8)) +
  theme(plot.caption = element_text(face="bold", 
                                    family="Arial", 
                                    size=7, 
                                    color="gray", 
                                    margin=margin(t=10, r=80))) +
  theme(legend.position="right") +
  theme(axis.line =  element_blank(), #Eliminates the plot axes and labels
        axis.text =  element_blank(),
        axis.ticks =  element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank())

# Now, let's look at the map
baltimore_poverty

# Save the map so you can use it later as want
ggsave("images/baltimore_poverty_choropleth.png")

Your result should look like this:

Choropleth Map of Poverty in Baltimore

Other ways to map?

Yes, there are other ways to create a map in R, but this is a very “robust” way that allows you a lot of customization. Another package is Leaflet , but we will leave that for another project at a later time.

For the choropleth map of homicide rate, we do the same code, just using a different value. (This is what I meant by code being better than point-and-click. All I had to do was replace the value povertywith homicide_rate.)

# Choropleth of homicide rate in 2016 by CSA
# Just like the choropleth above, but with homicide rate as the filler.
baltimore_hr <- ggmap(baltimore_roadmap)
baltimore_hr <- baltimore_hr +
  geom_polygon(aes(x=long, 
                   y=lat, 
                   group=group, 
                   fill = homicide_rate), # Fill by "homicide_rate" instead of "poverty"
               size=.3, 
               color='black', 
               data=csa_baltimore2, # Same data as the choropleth for poverty
               alpha=0.8) +
  scale_fill_distiller(type="seq", 
                       palette = "PuOr", 
                       breaks=pretty_breaks(n=5), 
                       name="Homicide Rate per 100k Residents", 
                       na.value = "transparent") +
  labs(x=NULL, 
       y=NULL,
       title="2016 Homicide Rate by Community Statistical Area in Baltimore",
       subtitle=NULL,
       caption="Source: René Najera, DrPH & bnia.org") + # Remember to cite your sources
  #geom_text(aes(x=long, y=lat, label=total), data=homtxt, col="black", cex=5) +
  # Add the label of number of shootings. Will get warning about removing NAs
  theme(plot.title  =element_text(face="bold", family="Arial", size=8)) +
  theme(plot.caption = element_text(face="bold", family="Arial", size=7, color="gray", margin=margin(t=10, r=80))) +
  theme(legend.position="right") +
  theme(axis.line =  element_blank(),
        axis.text =  element_blank(),
        axis.ticks =  element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank())

# Now, look at the homicide rate map
baltimore_hr

# Save the map so you can use it later
ggsave("images/baltimore_homicide_rate_choropleth.png")

Your homicide rate choropleth should look like this:

Choropleth Map of Homicide Rate in Baltimore

The next few lines of code will show you the maps side-by-side.

# Save the side-by-side to a file for viewing later, or editing, etc.
install.packages("devtools")
devtools::install_github("thomasp85/patchwork")
# More on patchwork: https://github.com/thomasp85/patchwork

library(patchwork)
baltimore_poverty / baltimore_hr
ggsave("images/baltimore_maps_side_by_side.png")

Question 4

Does there seem to be an association between poverty and homicides?

There does seem to be, at the very least, a spatial association between poverty and homicide rate in Baltimore in 2016. That is, the CSAs with higher homicide rates also seem to be the ones with the higher percent of households under the poverty level. But let’s look at it in a scattergram to see this association a little more mathematically.

# Now, a simple scatterplot. What does this tell you?
# (More on scatter plots using ggplot:
# http://www.sthda.com/english/wiki/ggplot2-scatter-plots-quick-start-guide-r-software-and-data-visualization)
scatter_plot <- ggplot(csa_info, aes(x=poverty, y=homicide_rate)) +
  geom_point(size=2, shape=23, fill = "black") +
  geom_smooth(method=lm, se=FALSE, color = "blue", linetype = "dashed") + # Adds regression line
  geom_smooth(method = "loess", formula = "y ~ x", color = "red") + # Adds Loess line
  labs(title="Scatterplot of Percent Households Under Poverty vs. Homicide Rate",
       x="% of Households Under Poverty", y = "Homicide Rate per 100,000 Residents")

# Look at the line
scatter_plot

ggsave("images/scatterplot.png")

Step Two:Looking at the Inequality in a Quantified Way

Now, let’s look at the inequality. We will do this by ranking the CSAs in order of poverty, from the ones with the most households under poverty to the ones with the least. Those will be plotted on the X axis. Then, the cummulative number of homicides between 2005 and 2017 will be plotted on the Y axis. If all CSAs shared the same number of homicides, the line would be a 45-degree line from the X-Y intercept. (There are packages in R that will do this for you almost automatically, but this is good practice to see how it is done.)

First, let’s process the data.

# Rank CSAs in csa_info by "poverty" in descending order, poorest to wealthiest
csa_info_ranked <- csa_info[order(-csa_info$poverty),] 

# Get the cumulative number of homicides
csa_info_ranked$cum_hom <- cumsum(csa_info_ranked$total)

# Calculate the cumulative percent of homicides
csa_info_ranked$cum_hompct <- csa_info_ranked$cum_hom / sum(csa_info_ranked$total)

# Generate the valures if all things were equal, to plot the line of equality
csa_info_ranked$equality <- sum(csa_info_ranked$total) / 55 / 319
# There are 55 CSAs, divide by 100 to turn to percentage (decimal). Why the 319?

csa_info_ranked$equality_line <- cumsum(csa_info_ranked$equality)

Now, let’s plot it using ggplot2.

# Using ggplot() to make the concentration curve
csas <- csa_info_ranked$CSA2010
csa_info_ranked$CSA2010 <- factor(csa_info_ranked$CSA2010,levels = csas) 
# Need to make the CSAs into the proper levels, otherwise, they're plotted alphabetically
csa_info_ranked$group <- 1 # ggplot needs observations to be grouped :-/

# Making the curve
# More on customizing ggplot:
# http://www.sthda.com/english/wiki/ggplot2-axis-ticks-a-guide-to-customize-tick-marks-and-labels

con_curve <- ggplot(data=csa_info_ranked,
       aes(x = CSA2010,
           y = cum_hompct,
           group = 1)) +
  geom_line(color="red",
            size = 1) +
  geom_line(aes(y = equality_line),
            color = "black", 
            size = 0.5, 
            linetype = "dashed") +
  labs(title = "Concentration Curve of Homicides and Wealth in Baltimore", 
       subtitle = "Based on 2005-2017 Homicides and 2016 Poverty Data",
       x = "CSAs Ranked from Poorest to Wealthiest, Based on % of Households Below Poverty", 
       y = "Cumulative Percentage of Homicides",
       caption = "Data: Rene Najera & bnia.org") +
  scale_y_continuous(labels = scales::percent, 
                     expand = c(0, 0), 
                     limits = c(0, 1.01),
                     breaks = c(0.25, 0.5, 0.75, 0.9, 1)) +
  geom_hline(yintercept = 0.5, 
             linetype = "dashed", 
             color = "gray", 
             size = 0.25) + # Created a horizontal line to help visualize which CSAs had 50% of the homicide burden
  geom_hline(yintercept = 0.9, 
             linetype = "dashed", 
             color = "gray", 
             size = 0.25) + # Created a horizontal line to help visualize what perentage of homicides the wealthiest CSAs had
  geom_vline(xintercept = 16, 
             linetype = "dashed", 
             color = "gray", 
             size = 0.25) + # Created a vertical line to help visualize which CSAs had 50% of the homicide burden
  geom_vline(xintercept = 36, 
             linetype = "dashed", 
             color = "gray", 
             size = 0.25) + # Created a vertical line to help visualize what percentage of homicides the wealthiest CSAs had
  theme(axis.line =  element_line(color = "black"),
        axis.text.x = element_text(color = "blue", angle = 90, hjust = 1),
        axis.text.y = element_text(color = "red"),
        axis.ticks.length = unit(.25, "cm"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank())

# Look at your creation
con_curve

# Save the concentration curve so you can use it in a document, or edit it with a graphics editing software
ggsave("images/baltimore_inequality_curve.png")

So what does the curve tell you? Can you answer the questions below?

Question 5

Are homicides distributed equitably in Baltimore?

Question 6

How many CSAs had 50% of the homicides in 2016?

Question 7

What percent of the total homicides in 2016 did the wealthiest 20 CSAs have in 2016?

Final Thoughts

There is a process to figure out the Gini Coefficient (range 0 to 1). If homicides were equitably distributed, the Gini Coefficient would be 0. Clearly, the Gini Coefficient here is not 0. There are packages in R that can calculate it for you, but it helps you understand code and how this curve works if you do it on your own. You would do this by calculating the area between the dashed line and the red curve. Then you divide that area by 0.5, which is the area of the graph above (or below) the dashed line. You would then use the Gini Coefficient to track movement of the inequality line and understand equitability of a health condition based on an indicator over time. In our example, if the Gini Coefficient got smaller over time, then it would indicate that homicides are becoming more equitably distributed in Baltimore. (This is not necessarily a good thing. It could be that there are more homicides in wealthier neighborhoods.)

That is all for this project. Please feel free to use the code for your own needs, but do cite your data sources and give me a shout-out to let me know how you’re using it. If you have any questions, leave them below.

Suggested Citation (APA): Najera, R. F. (2018, August). Analysis of Public Health Inequalities Using R Programming. Retrieved from https://rfnajera.github.io/baltimore-homicide-inequality/

Thank you for your time.

The Canaries in the Coalmine

The old adage in epidemiology and biostatistics that “the plural of anecdotes is not data.” What we mean by this is that the unverified experiences of a group of people may not always be the truth of what is happening in the population at large. Certainly, I use this point of view when it comes to the anti-vaccine crowd and their claims that vaccines turn you into the Hulk or whatever.

“Doveryai, no proveryai” – Old Russian Proverb

When Andrew Wakefield released his paper in 1998, he claimed that it was his gut feeling that the MMR vaccine caused autism. (The paper, however, found no evidence of this.) HIs statements made researchers jump to action and look into vaccines as a cause for autism because, at the time, the idea that vaccines could do this was a novel idea. (Even if the plausibility was, at best, questionable.) Twenty-plus years later, we know that vaccines do not cause autism, and neither did the preservative called thimerosal that used to be in most childhood vaccines (not the MMR) and is now only found in some multi-dose vials of the influenza vaccine.

In essence, we verified Wakefield’s fraudulent claims, and we’re a little better for it in terms of our knowledge (and acceptance) of autism. We’re not doing so well on the anti-vaccine front, but there are some signs that the tide is turning against their misleading ways. Not only that, but vaccine safety and effectiveness surveillance is better now than it ever has been because we went through the trouble of verifying the fraud perpetrated by Wakefield and amplified by his followers.

The Woman in the Hospital Gown, and Not Much More

About a year ago, a bystander at a bus stop across from a hospital in Baltimore recorded a woman in a hospital gown, and not much else, as the woman muttered words to herself and seemed in some sort of general disorientation. She was taken back to the hospital after 911 was called and help arrived. As it turns out, someone released her from the hospital and put her at the bus stop although she was in some sort of a mental health crisis.

The question in my mind at the time (and in the mind of others) was how often this happens, to whom, and why it happens. Is she the canary in the coal mine when it comes to how people with mental health issues are treated by healthcare providers and facilities? Or was she just the unfortunate recipient of a clerical error that landed her outside in a gown in the middle of winter?

The Brother Pearlie, Sheamon and Damon

The first homicide of 2017 was that of 20 year old Sheamon Pearlie. A year and a half later, his brother Damon was killed just a few blocks away from where Sheamon was killed.

Just a short walk from one place to the other.

By themselves, these two homicides are just a couple of data points in what is a very big and very sad dataset coming out of Baltimore City. However, put them together, and you get the idea of what is happening in the city with regards to where homicides are happening and who is being affected the most by them: African American men between 15 and 35 years of age living/working/existing in poor and disordered neighborhoods in the city. That is, they tell a bigger story.

The Little Lors

One of the first versions of my doctoral thesis work was going to look at the social network of victims of homicide in Baltimore. To do this, I was planning on going to social media and seeing if one victim was associated with another through their social media interactions. (For example, were they friends on Facebook?) So I worked on a proof of concept before proposing the work to my advisor.

For a few weeks in 2015, I tracked homicides happening in Baltimore almost as they happened. As soon as I saw that there had been a homicide, I searched social media to see who was talking about it. Almost inevitably, someone would mention the name of the victim and link to the victim’s social media profile on Facebook or Twitter, or even SnapChat. Later, when another victim was identified and their social media identity revealed, I would compare the profiles and see if they knew each other.

Little by little, over the course of about six months, I discovered that a handful of young men knew each other on Facebook. They all identified each other with the moniker of “Lor” followed by a nickname based on their real name or how they were known in their circle(s) of friends. There were only a few of them, but they all were called “Lor” Something by their friends and on their social media accounts. I nicknamed them “The Little Lors.” Little, because they were all so young.

Again, questions started to be asked in my head. What were the Little Lors a symptom of in the city? Most were still school-aged, so why were they not in school? And those who were out of school, had they graduated high school and were they employed? I wouldn’t get the answers to these questions because that aspect of my research was not approved to go forward. (It was deemed too intrusive to go look at their social media profiles, even if the profiles were open to the public to see.)

Those Nagging Questions

Many of those questions nag me. While I’m not directly involved in solving the problems that led to the woman at the bus stop or the siblings and friends being killed, those “canaries in the coal mine” are relevant to me as someone involved in public health. They’re valuable pieces of an interesting puzzle that, when completely elucidated, will give some idea of the level of disparity and institutionalized discrimination and racism toward entire groups of people.

People with mental health issues are abandoned to their own fortune because the fallible systems created by fallible humans are not able to serve them for some reason. When one sibling is killed in a neighborhood, the other was destined for the same fate perhaps because there was no intervention available to the second sibling to protect them from the contagion that is violence. What intervention could have worked?

And so on and so forth…

Early Warning or More of the Same?

When I told the story of the brothers getting killed, one of my colleagues asked if they were an early warning of something yet to come or just another set of statistics in what we already know about Baltimore. My response was that they were both. On the one hand, yes, there are a lot of homicides in Baltimore, and they have shown little signs of slowing down of the rate at which they’re happening.

On the other hand, when the one Pearlie was killed, there should have been a mobilization of all available resources to make sure that his sibling had the necessary resources to not end up the same way. Because, if violence is contagious (as it has been shown), then who could be at most risk of more violence than the next-of-kin of someone killed via homicide? At the same time, the fact that this was not done is an indicator of where we (all of us) could do better.

The same is true with the woman at the bus stop. She would be the harbinger of things to come if she were the exception, and more attention should/could be placed on how people with mental health problems are discharged from medical facilities. And, if she is not the exception, then we can use her experience to begin to understand what is happening… And do something about it.

Turning Anecdotes Into Data

So now comes the $64,000 question: How do we turn these anecdotes into data that is actionable? In the case of the woman at the bus stop, I start looking at medical records of people with mental health issues and try to understand the habits and procedures of the health care providers when it comes to where those patients are discharged, when, and if any follow-up is done. In the case of the siblings, I collect information on other homicides and what kind of follow-up and services were offered and delivered to the victims’ next-of-kin. Same with the Little Lors.

Then I step back and analyze the information, consult with experts, and put forward proposals to stop the next woman from being sent on her own into the winter’s cold in a hospital gown, or to deliver services to the next-of-kin and friends of a homicide victim to stop — or at least slow — that contagion right then and there. Then, of course, we double-back and analyze the interventions and re-assess the approach.

Better Said Than Done

This is all better said than done, right? There are a lot of moving parts to catching these anecdotes, verifying them and then going out into the world of data that is out there to measure and intervene to prevent what these anecdotes are forecasting. That doesn’t mean we don’t try.

The minute Wakefield put forth his fraudulent “study,” it scared parents away from vaccinating. Not all parents, but enough parents to cause vaccine-preventable diseases to make a comeback. The minute someone heard him say what he said in that press conference twenty years ago, someone else should have jumped higher and shouted louder with data in hand, both verifying the plausibility and probability of vaccines causing autism and then doing something to prevent parents from stopping vaccines for their children.

Unfortunately, too many were afraid to be divisive, so they chose to be indecisive. And here we are today… This can’t continue to be the case with the social ailments and public health threats in a city like Baltimore, or anywhere else. If we see a problem, we point it out, we science the heck out of it, and we make sure that the anecdote remains an anecdote. And, if it isn’t an anecdote but a widespread problem, we take the decisive action needed to stop and reverse whatever perverse trend threatens others.

Seeing Data in Time and Space

Alright… Real quick. I’ve been looking at ways to present in two dimensions data that are in three dimensions: location, time and rate. I’ve been looking at my homicide data (based on data compiled by The Baltimore Sun and others) and there was no good way to show it in two dimensions.

I wanted to show that some Community Statistical Areas (CSA) in Baltimore had gone from high homicide rates to low, and others from low to high. Usually, people would just use a complicated line graph. But, with 54 CSAs in Baltimore, that would mean 54 lines on a graph. That would be nuts.

For my dissertation, I used a tool in ArcGIS called a “space-time cube” in “emerging hot spot analysis.” That rendered a neat map of baltimore with cubes 13 levels high (one for each year from 2005 to 2017):

Cool, right?

While I gained the ability to show which CSAs had high homicide rates (in red) and remained that way, and show that these CSAs were clustered in space (near each other) while those with lower homicide rates throughout the 13 years were also clustered, I lost the ability to show how poverty influenced these rates. That is, I couldn’t show that poorer CSAs had red throughout while wealthier ones had yellow or even blank (no homicides in that year) spaces.

I had almost forgotten about this problem until tonight, when I came upon a Hovmöller diagram by accident (looking at wind data). Look at this example form this paper:

Hovmöller!!!

As you can see, the wind speed was different by date (on the Y axis on the left) within and between reading stations (on the X axis on the bottom). Here we have geographic data (the location of the stations), time data (the date) and a measurement (the speed and direction represented by the scale on the right).

I’ve been dabbling in the R programming language, so I got to do some coding tonight, and I came up with this:

Click to see it in all its glory!

As you can see, we have the CSAs on the left in alphabetical order, and the years are on the bottom. (I flipped the graph because time is better viewed from left to right.) You can see that some CSAs are white throughout, meaning that their homicide rate per 10,000 residents was quite low. Others are red. And others vary as time goes by. For example, Cherry Hill (near the bottom) is darker red in 2005 and then gets lighter as the years go by. On the other hand, Fells Point enjoys several years of no homicides but then has had several since the current epidemic of homicides started in 2015.

Of course, this doesn’t tell the whole story. As I mentioned above, the space-time cube analysis tells the same story. What I’m interested in is the social determinants of health component. So a simple sorting of the CSAs by the percentage of households living under the poverty line does the trick:

Click to see it in all its glory!

Now the story is very clear. The poorer CSAs at the top have the higher homicide rates, and they tend to stay at high levels throughout the years. At the bottom of the diagram, you can see wealthier CSAs having a lot of empty spots where years went by without any homicides. They’re lighter because they had lower homicide rates, too. But do you see some exceptions?

At the bottom, Downtown/Seton Hill stands out as having a higher homicide rate per 10,000 residents compared to its “neighbors” on the poverty scale. Why? Downtown/Seton Hill is close Poppleton/The Terraces/Hollins Market, which is at the top of the diagram in terms of poverty and, in the last four years (2015 to 2018) has had the highest homicide rates in the city. So there is probably some spatial autocorrelation going on here. (Think of spatial autocorrelation as “things that are similar are close together in space and time.”)

At the top, Southeastern stands out as having a lot of white among a sea of red in its poverty neighborhood. Like Downtown/Seton Hill, Southeastern is influenced by nearby Canton and Fells Point, who rather wealthy CSAs that have enjoyed relative peace. Again, spatial autocorrelation is probably at play there as well.

I’m probably going to work on some statistical trend analysis of these data and then overlay them on a map. My goal is to be able to replicate what ArcGIS does with the space-time cubes and emerging hot spot analysis in R programming. (R is free and open source. ArcGIS, while very powerful and highly recommended if you’re going to do spatial analytics, can get expensive if you need to do a lot more than just make pretty maps.)

The things I do at one in the morning…

A big thanks to the folks at the Baltimore Neighborhood Indicators Alliance – Jacob France Institute for their phenomenal data on social indicators around Baltimore (and their geographic data as well).

Mr. Bloomberg, Please Don’t Segregate Baltimore More Than It Already Is

A few months ago, while I was walking from a parking lot to a building at the Johns Hopkins Bloomberg School of Public Health, I heard a car crash about a block away. The crash was between two cars at an intersection about 150 meters from the parking lot. I looked at the security man as he just stood there, looking in the direction of the crash, and then I ran towards the accident.

As it turns out, the accident was not that bad. The two drivers almost went after each other physically, though. One thought the other was on the phone and distracted when they ran the stop sign. The other thought the first driver was going too fast and caused them to miscalculate crossing the intersection. As I stood between the two of them, asking them to calm down, I looked in the direction of the parking lot. A small group of security guards just looked at us.

Eventually, Baltimore City police showed up. It was then that I left and went to my building to work on my dissertation… I also fired off an email to the Dean. I explained to her that none of the security guards did anything at all to come help. They just stood there and gawked. I doubted whether they even called the police. (I called 911 as I ran toward the accident.)

This all happened at a time when the Johns Hopkins University leadership was trying very hard to get permission from Maryland to have an armed police force at the campuses (medical and academic) around the city. Many opposed the idea for different reasons. Some agreed with it, for their own reasons.

Me? At first, I agreed with the idea. It seemed reasonable to me that the school should have a police force just like so many other colleges and universities do. Then I started listening to the opposing side, and then I got deep into my doctoral research. There are things about Baltimore that cannot be ignored.

Baltimore is deeply segregated among racial lines. Entire neighborhoods are populated by only white or only black residents. Few neighborhoods are mixed, and it just so happens that the black neighborhoods are the most disadvantaged. They’re disadvantaged because there is a still lingering institutional racism that we could discuss at a later point… Believe me, it would be a long discussion.

That segregation has brought with it a lot of resentment toward the police department from the poorest and most marginalized communities in Baltimore. Why? Because they are disproportionately targeted for arrests and questioning under the idea that it is their residents who are the most likely to be criminals. (Crime rates may be higher in poorer neighborhoods, but it’s not their racial makeup that dictates this. It’s the poverty.)

Slap on top of this a Baltimore that leads the world in terms of per capita homicides among major cities. Why is this happening? My research points to poverty and urban disorder (e.g. abandoned buildings, dilapidated infrastructure) as potential causes for increased homicide rates. Other research goes further. Other research points to the opioid epidemic and the high number of people using and abusing opioids (especially heroin and fentanyl) on a daily basis around Baltimore.

All those drugs are not free. People from in and around Baltimore buy millions of dollars of drugs, and those millions are being sought by the poorest in the city. After all, if you didn’t finish high school and can’t get an honest job because you were once arrested by the police for, say, standing at the corner with your friends, then your only hope is to sell drugs or work for those who sell drugs.

When you sell drugs, you have to defend your turf. Since selling drugs is illegal — unless you’re a big pharmaceutical corporation — your only recourse when someone wants to take your business is violence. If enough people are walking around with guns to protect their business, you’re probably more inclined to get a gun yourself to defend yourself. And you’re likely to get a gun illegally and use it against someone else or yourself.

It is this very violence that Mr. Michael Bloomberg, a graduate of Johns Hopkins University and major donor (hence the “Bloomberg School of Public Health”), is citing as a reason why an armed police force is needed at the university campuses around Baltimore. This is from the Baltimore Sun:

“Michael Bloomberg, the billionaire ex-mayor of New York and benefactor of the Johns Hopkins University, said Tuesday it’s “ridiculous” the institution doesn’t have an armed police force.

“When you have a city that has the murder rate that Baltimore has, I think it’s ridiculous to think that they shouldn’t be armed,” Bloomberg said of the Hopkins security force.

I know for a fact that Mr. Bloomberg has funded a lot of the research into gun violence being done at the university. (He did not fund any of my research. My research was funded, in part, by the Brown Scholarship in Community Health.) That research, and my research, has shown that victims of violence in Baltimore are not the university students and staff. They typical victim of gun violence is an African American male between 15 and 35 years of age living in the poorest neighborhoods in Baltimore. The relatively wealthier, whiter, students at JHU, or the people working at the different campuses, are definitely not the typical victims.

Do crimes against JHU students and staff happen? Of course they do, but they’re at a much smaller proportion and severity than it happens to a child growing up in deep poverty. (That kind of poverty is mostly associated with “third world” countries but is alive and well in America in 2019.) And the solution to those crimes is not more police with more guns and more diving lines between the community and the law enforcers.

Imagine a police force at Hopkins that will not go 150 meters out to help because it’s not their jurisdiction… Or imagine a police force that is suspicious of anyone who doesn’t fit the description of the typical university student. What will that do? How can we assure that won’t happen?

Why this whole thing has become a priority for the university leadership above a lot of other more pressing issues is beyond me. Perhaps someone important had something stolen or was on the receiving end of some crime at one of the campuses? It certainly seems to me that there is some pressure coming from somewhere, demanding an armed police force that is separate from the Baltimore City Police Department.

I wish that Mr. Bloomberg would listen to the evidence on why Baltimore is so violent, and that he put efforts into fighting that more than just applying another level of boundaries between the university and the community. A good basic education, a living wage, and opportunities to prosper go a longer way to making a city safer than just throwing more cops and more guns at it.

Mr. Bloomberg, don’t segregate Baltimore any further. Help us integrate it… And let’s use science and reason to do so.

The Epidemic of Violence in Baltimore Continues

Back in 2017, I received a message from one of the automated bots I’ve deployed to the web to look for information for me to consume. As I was working on my doctoral dissertation on homicides in Baltimore, one bot reported that a firebombing had occurred in Baltimore, killing two teens.

The main suspect in that firebombing (and a shooting at the same location a few days before that) was Mr. Antonio Wright, a 27 year-old Black man. After being named “Baltimore’s Public Enemy Number One,” Mr. Wright turned himself in to the authorities. Throughout his arrest and trial, he continued to assert that he was innocent.

A few weeks ago, Mr. Wright was found not guilty on all counts. Because of his background and this incident, Mr. Wright was advised by his lawyer to move out of the area, start anew somewhere else:

Continue reading

“It Only Happens to Mexicans” (Or How Not to Frame a Critique of the Immigration Policy)

I’ll start off by saying that I like Dan Rodricks. He’s an okay guy. He may be a little hard to follow on some of his arguments, but he generally means well. So I was a little bit surprised to read his latest opinion piece for The Baltimore Sun. In it, he tells us the story of a British academic from Johns Hopkins University who was unable to continue her work at the university because of the immigration policies of the Trump Administration.

The only problem I, and a few Mexican friends of mine, had is the way that Mr. Rodricks framed the discussion from the get go:

“Last month, when it became clear that Tamsyn Mahoney-Steel would have to leave her job at Johns Hopkins University and return to England, the reaction of several friends and colleagues was uncannily the same: “I thought this only happened to Mexicans.””

Uh… What?

The opinion article tells us the story of Dr. Mahoney-Steel and how the H-1B visa was denied just when her academic career was getting good at Hopkins, and how the university was not very strong in their reaction to how she was treated and eventually deported. (Though “deported” is kind of a strong word in that she wasn’t forcefully removed from the country, or separated from her children… From what I’ve read.)

Here’s the final part of the opinion article:

“Mahoney-Steel found that last claim by Tabb “galling,” adding the distinctly British modifier “somewhat,” and remarking that “sometimes you have to laugh at the absurdities of life.” She says she has been buoyed by the kindness of her American friends and colleagues, who surely must understand by now that this kind of lousy thing does not happen “only to Mexicans.””

Again… What?

The reason I keep asking “what?” is because this issue — immigration — is a very, very sensitive issue with immigrants from Latin America in general and Mexico in particular. We’ve been made to feel like we don’t belong here, or even in places where our ancestors existed way before the Europeans arrived, like Texas and New Mexico. So to hear someone say that they’re surprised that a British academic was “deported” and that they thought it only happened “to Mexicans” is, well, offensive.

It’s like someone is saying that deportations and discrimination in immigration proceedings happening to Mexicans is part of the plan. But, when it happens to a British academic then we must do something. Then it’s appalling. Then it’s a problem.

Of course, I don’t think that this is what Mr. Rodricks intended. That’s just not how he rolls. He’s more Left-of-Center than that. But, man, it hit a nerve.

I think that the thesis of his opinion article was more of like: “Don’t ignore the immigration policies of the Trump Administration because it seems that only a certain segment of the population is being affected. Everyone is being affected.” Because writing it the way he did, it’s almost as if writing about homicides he’d write that homicides happen “only to Black people.”

Know what I mean?

Here’s what others are saying about this:

Dr. Collado-Torres: Problems with an article from the Baltimore Sun covering Dr. Mahoney-Steel’s immigration issues

A Web-Based Application to Visualize Homicides in Baltimore City Between 2005 and 2017

“How many African American women were killed by firearm in the Cherry Hill Community Statistical Area of Baltimore in 2016?” None. I was able to answer this question for you quickly because I have a dataset on homicides in Baltimore between 2005 and 2017.

The dataset is the result of data from 2005 to 2016 collected by The Baltimore Sun and data from 2017 collected by me. I then took a random sampling of about 20% of those cases and verified that they matched up to what was published about the cases. I also cross-checked the counts of homicides per year with what the Baltimore Police Department reports to the FBI’s Uniform Crime Reporting.

So the data are publicly available, and I plan to release the dataset — once it’s all cleaned up and I’ve created a codebook for it — for others to use. Right now, you can go to the Baltimore City Open Data Portal and look at more general data on crime and other indicators.

Anyway, back to the app…

The top portion of the screen has a little bit of a description of the app and a section where you can filter what you want to see.

app_top.png

First, you can filter what the markers will show. When you first load the app, the markers are all black, indicating all the homicide locations. You can choose to have different colors represent different race/ethnicity categories, different genders, or different causes of death.

Next, you can choose what date range you want to look at. The default is all data from January 2, 2005, to December 27, 2017. (These are the earliest and latest dates in the dataset.)

Next, you can choose to restrict the results to just one Community Statistical Area. (I’m working on a way to choose different CSAs, but that part of the coding gets tricky because the CSAs are all coded as strings… Hmmm?)

Because just throwing points of a map without some sort of context is kind of non-informative (or even misinformative), I also give the user the ability to select some layers to display on the map. You can choose to see the average yearly homicide rate by CSA. (This was calculated by taking the count of homicides for the 13 years — 2005 to 2017 — and dividing it by 13, then dividing that quotient by the population of the CSA according to the 2010 US Census.) Or you can choose to see the proportion of families living under the poverty level (a good indicator of poverty) according to the Vital Signs 16 report from the Baltimore Neighborhood Indicators Alliance.

I’ve also included a layer to show the 1935 Home Owners’ Loan Corporation ratings for Baltimore. These are the ratings people refer to when they’re talking about “redlining” or “red districting” neighborhoods, something that is still affecting the socioeconomics of Baltimore today. Finally, I have a layer showing median household income, also from the BNIA report and another indicator of poverty.

Yes, you guessed it, poverty and crime go hand-in-hand, and in a very complex way.

In the middle section, you can choose the personal attributes of the victims you want displayed on the map, starting with age. The age range defaults to the age range in the dataset, 0 to 97 years. (Zero years is for infants under one year of age.) Then you can choose the race/ethnicity, and, finally, the gender of the victims.

In the far-right section (or the bottom, if you’re looking at the app on your phone), you can choose the cause of death from three categories: Shooting, Stabbing or Other. I chose these categories because they make up the bulk (in that order) of homicides. The “Other” category includes things like blunt force trauma, asphyxiation, or arson.

Finally, if you want to reset everything back to the original choices, there’s a button to do that. So what do you get in the end?

As you make the choices, the back-end code automatically filters the dataset based on what you’re choosing. That is then automatically sent to create the map, some graphs, and a datatable. These three presentations of the data are then rendered below the choosing area through tabs.

app_tab1.png
App showing the first tab with the 1935 HOLC layer.

The first tab is the map, and it’s what a lot of people like to see. The second tab has graphs:

app_tab2.png

In that tab, you can see a frequency histogram of ages of the victims. If you look at different epidemiological segments (e.g. African American men versus African American women), you can see that the age distributions are different. You can also see some bar plots of counts of homicides by race/ethnicity, gender, and cause of death. The last graph shows a count of homicides per day, displayed by cause of death, for the time range you choose above.

The final tab has a data table of the data you asked for. Future functionality for that will be the ability to download the table as a comma separated values (CSV) file.

So let’s do a quick video walkthrough to answer the question, “How many Hispanic men over the age of 18 were killed in all of Baltimore between 2010 and 2015?” (There is no audio.)

There you have it. Twenty-two Hispanic men were killed in that time. Do you think it’s interesting that they show a couple of clusters?

app_hisp_men_over_18
Location of homicides of Hispanic men over the age of 18 between 2010 and 2015.

So these are the kinds of questions you can have answered using the app. Of course, in public health in general and epidemiology in particular, answering some questions leads to the asking of others. For example, in the map directly above, are those homicides happening there because there are more Hispanic people living there? Or is there something else at work?

There are, of course, some technical things that I did to make the app work, but those are better left for a discussion on how to use the R programming language and the Shiny Apps package to create it. I’ll do that at a later time.

For now, please feel free to play with the app and let me know if you see something that could be a bug. I try to correct those as soon as they’re discovered. (The fourth tab is a change log of the app.) Also, the app is hosted on a free server, so there are limitations to how many people can see it operating at the same time, and how many hours it can be see through the month.

You can find the app here: https://rfnajera.shinyapps.io/homicide_app/