A Web-Based Application to Visualize Homicides in Baltimore City Between 2005 and 2017

“How many African American women were killed by firearm in the Cherry Hill Community Statistical Area of Baltimore in 2016?” None. I was able to answer this question for you quickly because I have a dataset on homicides in Baltimore between 2005 and 2017.

The dataset is the result of data from 2005 to 2016 collected by The Baltimore Sun and data from 2017 collected by me. I then took a random sampling of about 20% of those cases and verified that they matched up to what was published about the cases. I also cross-checked the counts of homicides per year with what the Baltimore Police Department reports to the FBI’s Uniform Crime Reporting.

So the data are publicly available, and I plan to release the dataset — once it’s all cleaned up and I’ve created a codebook for it — for others to use. Right now, you can go to the Baltimore City Open Data Portal and look at more general data on crime and other indicators.

Anyway, back to the app…

The top portion of the screen has a little bit of a description of the app and a section where you can filter what you want to see.


First, you can filter what the markers will show. When you first load the app, the markers are all black, indicating all the homicide locations. You can choose to have different colors represent different race/ethnicity categories, different genders, or different causes of death.

Next, you can choose what date range you want to look at. The default is all data from January 2, 2005, to December 27, 2017. (These are the earliest and latest dates in the dataset.)

Next, you can choose to restrict the results to just one Community Statistical Area. (I’m working on a way to choose different CSAs, but that part of the coding gets tricky because the CSAs are all coded as strings… Hmmm?)

Because just throwing points of a map without some sort of context is kind of non-informative (or even misinformative), I also give the user the ability to select some layers to display on the map. You can choose to see the average yearly homicide rate by CSA. (This was calculated by taking the count of homicides for the 13 years — 2005 to 2017 — and dividing it by 13, then dividing that quotient by the population of the CSA according to the 2010 US Census.) Or you can choose to see the proportion of families living under the poverty level (a good indicator of poverty) according to the Vital Signs 16 report from the Baltimore Neighborhood Indicators Alliance.

I’ve also included a layer to show the 1935 Home Owners’ Loan Corporation ratings for Baltimore. These are the ratings people refer to when they’re talking about “redlining” or “red districting” neighborhoods, something that is still affecting the socioeconomics of Baltimore today. Finally, I have a layer showing median household income, also from the BNIA report and another indicator of poverty.

Yes, you guessed it, poverty and crime go hand-in-hand, and in a very complex way.

In the middle section, you can choose the personal attributes of the victims you want displayed on the map, starting with age. The age range defaults to the age range in the dataset, 0 to 97 years. (Zero years is for infants under one year of age.) Then you can choose the race/ethnicity, and, finally, the gender of the victims.

In the far-right section (or the bottom, if you’re looking at the app on your phone), you can choose the cause of death from three categories: Shooting, Stabbing or Other. I chose these categories because they make up the bulk (in that order) of homicides. The “Other” category includes things like blunt force trauma, asphyxiation, or arson.

Finally, if you want to reset everything back to the original choices, there’s a button to do that. So what do you get in the end?

As you make the choices, the back-end code automatically filters the dataset based on what you’re choosing. That is then automatically sent to create the map, some graphs, and a datatable. These three presentations of the data are then rendered below the choosing area through tabs.

App showing the first tab with the 1935 HOLC layer.

The first tab is the map, and it’s what a lot of people like to see. The second tab has graphs:


In that tab, you can see a frequency histogram of ages of the victims. If you look at different epidemiological segments (e.g. African American men versus African American women), you can see that the age distributions are different. You can also see some bar plots of counts of homicides by race/ethnicity, gender, and cause of death. The last graph shows a count of homicides per day, displayed by cause of death, for the time range you choose above.

The final tab has a data table of the data you asked for. Future functionality for that will be the ability to download the table as a comma separated values (CSV) file.

So let’s do a quick video walkthrough to answer the question, “How many Hispanic men over the age of 18 were killed in all of Baltimore between 2010 and 2015?” (There is no audio.)

There you have it. Twenty-two Hispanic men were killed in that time. Do you think it’s interesting that they show a couple of clusters?

Location of homicides of Hispanic men over the age of 18 between 2010 and 2015.

So these are the kinds of questions you can have answered using the app. Of course, in public health in general and epidemiology in particular, answering some questions leads to the asking of others. For example, in the map directly above, are those homicides happening there because there are more Hispanic people living there? Or is there something else at work?

There are, of course, some technical things that I did to make the app work, but those are better left for a discussion on how to use the R programming language and the Shiny Apps package to create it. I’ll do that at a later time.

For now, please feel free to play with the app and let me know if you see something that could be a bug. I try to correct those as soon as they’re discovered. (The fourth tab is a change log of the app.) Also, the app is hosted on a free server, so there are limitations to how many people can see it operating at the same time, and how many hours it can be see through the month.

You can find the app here: https://rfnajera.shinyapps.io/homicide_app/

The Places Where People Are Not Exercising For Some Reason

In early January, I took a drive to one of Baltimore’s most disadvantaged neighborhoods. It was there that the first homicide of the year had happened. I wanted to see where it had happened and get a better feel for the kind of environment where a homicide would happen. Yes, I took every precaution to not stand out like a sore thumb. I didn’t take expensive cameras with me, and I didn’t linger around like a tourist at an archaeological dig. Still, there were those who asked me why I would ever want to go to “a place like that.”

Don’t Go There

Someone on Instagram commented that it is precisely that attitude of “don’t go there because it’s too dangerous” that further isolates those communities from the rest of the world. It’s true. If no one goes there, then that place doesn’t exist in the minds of the people and — most importantly — the policymakers. The problems that happen in “a place like that” are believed to be contained in “a place like that” when all the evidence shows us that violence doesn’t really respect artificial boundaries much.

If you’ve ever seen a map of Baltimore, you probably have heard of the “butterfly pattern” seen on the map when you overlay socioeconomic measures like crime and poverty. There are very wealthy neighborhoods to the north, but a swath of low-level indicators spreads out from the downtown area and into the east and west. Like this:

Screenshot 2018-07-02 19.24.46

You can see that the homicide rate is highest in the east and west, but the central north is relatively quiet with regards to homicides. The same pattern can be seen with poverty, urban blight, even cancer rates. Look at life expectancy, where the lighter colors mean lower life expectancies:

Screenshot 2018-07-02 19.27.34

It’s the same pattern. Higher life expectancies in the central north and downtown, but lower spreading out from downtown into two “wings” over east and west. Believe me when I tell you that a lot of these indicators correlate perfectly with poverty. But let’s look at something else…

Let’s go for a run?

Strava (Strava.com) is a health tracking app that is available on major phone platforms like Android or iOS. You turn on the app and allow it to track you via GPS. As you walk, run, bike, or do some other exercise, the app tracks your location and gives you a summary of where you were when you worked out. The distance traveled and the time also give you the speed and intensity of your workout.

As it turns out, Strava compiles the location data into a “Heat Map.” The Heat Map basically shows where Strava users have been exercising. In the past, the Heat Map made news when it was found that some members of the military were uploading their location data with their workout data, so places where they were and which were supposed to be secret were found out. Oops.

Anyway, let me take you on a jog in St. Louis, Missouri. Here’s the heat map:

Screenshot 2018-07-02 21.12.30.png

I want you to notice the area in the north of St. Louis, to the southeast of the St. Louis Lambert International Airport. The red and blue lines show where people run or bike using the Strava app. The area I pointed out is almost devoid of any kind of activity. Now, look at this map:

Screenshot 2018-07-02 21.14.24.png

This map aggregates data from the US Census Bureau, from the 2010 Federal Census. The blue dots are White residents. The green dots are African American residents. Does the map look familiar? Let’s lay them one on top of the other:

Screenshot 2018-07-02 21.19.44

Can you see where the areas with green dots (African American residents) don’t have as many people working out (red and blue lines)? Do you think it’s a coincidence?

So let’s go for a jog in Baltimore. This is the Strava data around Baltimore:

Screenshot 2018-07-02 21.22.23

And this is the US Census data on race in Baltimore in 2010:

Screenshot 2018-07-02 21.23.25

Look familiar? Does it show the butterfly pattern as above, and, also, does it show a decreased use of the Strava app for workouts in primarily African American areas? Yes and yes. Let’s overlay again:

Screenshot 2018-07-02 21.26.26

I don’t have full access to the Strava data to do an analysis of my own, but I’m willing to bet that there is a statistically significant pattern with regards to where people work out and where they don’t.

Now, some of you epidemiologists out there might be thinking of theories of why this pattern arises. Do African Americans workout less as a group than whites? Do they have less access to tracking apps? Do they not like to use GPS tracking? Or are the places where African American’s live simply not conducive to exercise because of a variety of reasons? Do public spaces and big streets/avenues just draw everyone away from tightly-packed neighborhoods?

For a clue, let me show you something…

Screenshot 2018-07-02 21.32.23

When you zoom in on Baltimore, particularly near the Sandtown-Winchester and Easterwood neighborhoods, you see that people do exercise with the app. It’s just that they’re numbers are much less than in Baltimore as a whole. Gilmor Street and the nearby track at Carver Vo-Tech are hotspots of running and cycling activity. So there is some exercise going on, but it’s much less than in the rest of the city, especially when compared to the areas where residents are primarily white.

Using some geostatistical analyses like the ones I did for my doctoral dissertation, we could figure out some of the answers for this disparity. So who’s with me?

(Hat tip to Dr. Hojoon Lee for pointing out the heat map and how it looked in Baltimore.)

The Thing About Hotspots

No, I’m not talking about wifi hotspots that help you connect to the internet so you can watch cat videos. I’m talking about the symbology used on maps in order to emphasize an area (or areas) where there is a lot of something going on. For example, in infectious disease epidemiology, I might use a map to show where there are a lot of cases in a relatively small area, or where the number of cases observed has exceeded the number of cases expected.

You know, something like this:


This map was created using a geographic information system (GIS) and data from the Centers for Disease Control and Prevention (CDC). It took incidence of deaths from heart disease and mapped them out, then broke down the data to show where there were increased levels of heart disease deaths, where there were the expected levels, and where there were the lower levels (aka “cold spots”). As you can see, Texas has a big problem. (Then again, Texas always has some big problem or another.)

But there are hotspots and then there are hotspots.

The map above doesn’t tell us the whole story. The higher incidence in the DC Metropolitan area might be due to social, environmental and genetic conditions different than those in Texas or South Carolina. The “cold spot” in Oklahoma might be indicative of something at play there that is completely the opposite of what is going on in Texas or DC. This is where epidemiology comes in.

This is where we ask the questions.

And that right there is the problem with these maps. Taken out of context, they might be misused. For example, what do we know about heart disease and deaths from it? Well, we know that a bad diet and sedentary lifestyle can contribute to obesity and heart disease. We know that lack of access to primary healthcare can also contribute to heart disease complications. And we know that some people are just more predisposed to heart disease based on how their bodies process saturated fat and cholesterol and deposit those compounds on their arteries.

If you make the mistake of thinking that the hotspot in Texas is due to one thing and not the other — or a combination of things — and you try an intervention, you might fail miserably. Or what if you implement the same intervention in Texas that you did in DC but get completely different results? Again, you might be wasting resources. So you need to understand the context of the hotspot analysis that you’re performing.

To understand the context, you need to ask questions of person, place and time. Who is dying from heart disease? What are their ages? What are their genders? What are their racial/ethnic profiles? What are their socioeconomic profiles?

While the map answers some of the questions about place, we need to be a little more analytical and find out if those counties showing the hotspot are poorer/richer than their neighboring counties, or if the people dying are dying at a hospital or at home. Are they dying in particular neighborhoods or buildings?

And, when it comes to time, are the deaths happening with any kind of seasonality? Is there a variation in the time of day? Are these hotspots new, or have they been there for several years or across generations?

Yes, I know these are a lot of questions to ask, but they’re the kinds of questions that help inform the stakeholders on what needs to be done about the hotspots that have been detected. It is not nearly enough to just present the map and walk away. You have to tell the story of what is going on so you can one day tell the story of how it was fixed, and to guide the response(s) toward fixing things.

After all, the world is much simpler when you’re fixing things. I know it is for me.

Photo on Foter.com

The True Size of the Ebola Epidemic in West Africa

One of the best skills I learned from my dad is how to read a map. I can hold it every which way and understand the layout of landmarks in it. When I drove from Texas up to Pennsylvania, before the advent of smartphones and consumer GPS units, I bought a road atlas from Walmart and used it to guide my way. I would use it again to go back and visit my relatives. Even now with Google Maps, I still can grab a map and a compass and get myself out of a jam… Or out of an adventure race.

That early interest in maps that dad instilled in me also got me interested in general cartography. I still remember reading all about the different map projections and how they distort the true size and shape of things because you’re trying to project a three-dimensional sphere onto a two-dimensional rectangle. But it wouldn’t be until a website showing the “true size” of countries compared to each other that I really, really understood how a person’s view of the world could be distorted.

In the Mercator Projection, which is the projection most commonly used by consumers to look at the world on screens and world maps, the countries near the equator look smaller than they actually are. The countries farthest away from the equator look much bigger than they actually are. (Some say that this projection is okay with world powers because it shows them as big in land area compared to developing nations in Africa and Latin America.)

You have to keep this in mind when you are told the story of the Ebola Epidemic of 2014-2016 that occurred in West Africa.

The epidemic started in 2014 in Guinea, with cases popping up in neighboring Sierra Leone and Liberia. By the time it was all over, over 11,000 people were dead, close to 30,000 had been infected. From news reports and from friends and colleagues who were there, the affected towns and cities were severely affected to the point that many basic services shut down and bodies were dumped in public places. Basically, it was the worst imaginable scenario for an epidemic.

While there was plenty of panic in the United States, particularly from political talking heads calling for quarantines and travel bans, a lot of people seemed to not worry much about Guinea, Sierra Leone and Liberia. After all, those countries were “over there,” and they were tiny. That kind of mindset also affected the response. There was no sense of urgency from the same panicked politicians to get as many people and supplies as possible over to West Africa to combat the epidemic. (Cooler heads prevailed when it came to quarantines and travel bans, by the way.)

However, when you use the “True Size” site to see the true size of those three countries overlaid on the United States, and when you see the population estimates for those same countries, you kind of get a shiver down your spine… Especially if you’re and epidemiologist like me.

First, let’s talk population sizes. According to the World Bank, Guinea has a population of about 12.4 million people. Sierra Leone’s population is about 7.4 million, and the population of Liberia is about 4.6 million. Put together, those were over 24 million people living in the middle of the epidemic. That’s more or less the same population as Shanghai, China, or the urban area in and around New York City. Of course, the population density is not the same, which brings me to the size of Guinea, Sierra Leone and Liberia.

Click on the following images to see the three countries in question compared to different places in the western United States.

They’re huge, right? But you wouldn’t know it from the Mercator Projection…

Screenshot 2018-03-03 17.45.29.png

Africa is a huge continent, actually. You could put all of the US, China, and even India inside of it and still have room left over for several European countries.

I mean… Come on!

Now imagine being in the position of having to respond to an epidemic of an incredibly deadly virus that kills you by liquefying you from the inside out. Imagine that the outbreak is happening over that size of a land area. And then imagine that you have to safeguard that many lives.

Daunting, right? (Or exciting, if you’re like me. I’m a weirdo.)

Yet Congress budgeted only $1.7 billion to the response, with only $603 million dedicated to international response activities. By comparison, the Defense Department budget request for next fiscal year is $639 billion. Since we love us some strong military overspending, the budget request is probably going to be approved. (And, to be perfectly honest, some of that money is probably going to go to a military response to the next big epidemic.) Just to make it a little more “exciting,” the current administration is proposing a budget cut of about 80% to the overseas response operations from CDC.

If Ebola — or something nastier — rises again in a place that’s not the US, they’re on their own. And then we’re on our own when that nasty thing gets on a plane or a ship and heads over here. Because you’re a fool if you think our borders and the police forces guarding it will be able to stop it.

In many ways, the Ebola epidemic in West Africa was much bigger than many people in the United States could fathom or even understand. It took place over a large land mass and it affected millions of people. Something similar here is the stuff of horror novels and movies, not something we think can actually happen. But it can, and that is the biggest lesson from that epidemic. The response was slow, underfunded, understaffed, and left us in the public health community with a lot of questions.

Now, we could have learned a lot from what happened in West Africa, and we were very lucky that the virus was contained. But get just a few more people traveling north to Europe or east to the rest of Africa, and we could have seen the horror that was seen there happen right here. And then what?

Then what indeed.

Another way to look at the distribution of wealth

When I was growing up in Mexico, I remember seeing the occasional plane fly overhead. My home in Juarez was near the airport. My grandparents’ homes in Aldama were on the final approach route to the Chihuahua City airport. But these planes were far and few in between. Once we moved to El Paso, Texas, I saw plane after plane land at El Paso International Airport (ELP). It wasn’t until I was in high school that I first flew in a plane, and I wouldn’t fly again until I was in college. Flight tickets were very expensive for me, and I really wasn’t able to afford a ticket anywhere until I started working full-time as a lab tech (a very good-paying job) after college.

Today, I stumbled across this website. “Contrailz” is a visualization of flight data from all over the globe. For example, here is a composite of all the flight paths taken in and out of London:

Here is the view of the Washington, DC, and Baltimore, MD, region:

The startling thing was the view of Mexico:

No wonder I didn’t see any planes!

And Africa:

Northern Africa


Southern Africa

And South America:

Southern South America


Northern South America. Note the Caribbean.

Now, look at the United States:

“You are now free to move about the country.” – Southwest Airlines tagline.

Yes, the United States was the first country with working airplanes. The infrastructure for flight is very robust. But, infrastructure or not, a lot of us in this country can fly from one city to another when we want to. We can afford tickets now more than ever.

It’s very clear that flying from one city to another within and between countries is very much associated with wealth. Don’t believe me? Check out these maps of wealth distribution and income around the world:

The wealthier countries have more flight paths. Poorer countries don’t see many planes in the air.

Yes, there are people actually surviving on less than $2 per day, and the majority of them will never fly in a plane to a far away place for vacation, to visit loved ones, or to move to a new home for a new job.

So how’s your day going at your job?