Epi 101: Direct Age Adjustment by Hand and With R

When you’re comparing two populations’ death rates, you need to take into account the differences in the age distributions of the two populations. For example, if one population has more young people in it, you might not see as many deaths from heart conditions as you would see in a population with more adults over the age of 50. It’s not that the two are really different in the rates of death from heart conditions, it’s just that the population differences are accounting for this.

For Example

Let’s say you are comparing deaths from cancer in two populations, A and B, and you know that the particular cancer you’re looking at hits a lot of people over the age of 75, but only a few under the age of 15. The other age groups are affected somewhere in the middle. Let’s look at the data:

Age GroupPopulation ANumber of Deaths
Over 7530,0004,500
Crude Death Rate:8,750/92,000 = 95 per 1,000
Age GroupPopulation BNumber of Deaths
Over 7511,0001,650
Crude Death Rate:5,250/83,000 = 63 per 1,000

So population A has a death rate of 95 deaths per 1,000 residents. Population B has a death rate of 63 deaths per 1,000 residents. Clearly, if you want to stay away from dying from this kind of cancer, you need to go to population B, right? Well…

Look Closer

If you pay close attention, you will notice that the population distributions between the two are different. Population A has most of its residents in the 65-75 and Over 75 age groups. Population B, on the other hand, has most of its residents in the 0-15 and 36-55 age groups. That is, population B is a bit younger. So could it be that they have a lower crude death rate because they’re younger?

Let’s Find Out!

To compare the two populations, we need to use a standard population. A quick way to get a standard population is to use either population A or population B as a standard. Another way is to use the United States’ population distribution as a standard. Whatever you choose, you have to then compare the two populations’ age-specific death rates to that standard. For this example, I created a standard population:

Age GroupPopulation – Standard
Over 7528,000

To this standard population, we are going to apply the age-specific death rates from each population, and use that to calculate the expected number of deaths in the standard population if the age-specific death rates for population A (and then B) are applied to the standard population.

Age GroupPopulation
death rate
Population A
Expected Deaths
Population A
0-1520,00010 per 1,000 (0.01)200
16-3530,00050 per 1,000 (0.05)1,500
36-5535,00070 per 1,000 (0.07)2,450
56-7537,000100 per 1,000 (0.10)3,700
Over 7528,000150 per 1,000 (0.15)4,200
Total150,000Total Expected Deaths12,050

Now, let’s do it for population B (and see if you notice a pattern):

Age GroupPopulation
death rate
Population B
Expected Deaths
Population B
0-1520,00010 per 1,000 (0.01)200
16-3530,00050 per 1,000 (0.05)1,500
36-5535,00070 per 1,000 (0.07)2,450
56-7537,000100 per 1,000 (0.10)3,700
Over 7528,000150 per 1,000 (0.15)4,200
Total150,000Total Expected Deaths12,050

Do you see it? The age-specific death rates for the two populations are exactly the same. You just got different totals in the crude rates because of the different age distributions. Once we applied the age-specific death rates to the standard population (aka “reference population”), we get the same number of expected deaths: 12,050.

With that, we get the same adjusted death rates for the two populations: 12,050 divided by 150,000, which is 0.080 or 80 per 1,000 residents. From this, you can conclude that the observed differences in the crude deaths rates is more a function of different age distributions between the populations than from actual differences in death rates between the two populations.

What About R?

To do this in R, we are going to need three packages: “dsr”, “epitools” and “knitr”. So pop open a new R script in R Studio and type:


Next, we are going to create our data:

age_groups <- c("0-15", "16-35", "36-55", "56-75", ">75") # Labels for Age Groups
population_a <- c(10000, 12000, 15000, 25000, 30000) # Population A
population_b <- c(25000, 15000, 20000, 12000, 11000) # Population B
deaths_a <- c(100, 600, 1050, 2500, 4500) # Deaths in Population A
deaths_b <- c(250, 750, 1400, 1200, 1650) # Deaths in Population B
std_pop <- c(20000,30000,35000,37000, 28000) # Population of Standard Population

Note that we basically created six vectors to represent the data in the tables above. From these vectors, we are going to make some data frames.

df_a <- data.frame(age=age_groups, # Data frame for Population A
                   fu=population_a, # fu (follow-up) can be person-time or population counts

df_b <- data.frame(age=age_groups, # Data frame for Population B
                   fu=population_b, # fu (follow-up) can be person-time or population counts

df_all <- rbind(df_a, df_b) # Data frame combining both populations

df_pop <- data.frame(age=age_groups, # Data frame for the Standard/Reference Population

With our main data frame “df_all” we can now use “dsr” to do the hard work for us, and we are going to store the results in “my_results” so we can look at them later.

my_results <- dsr(data=df_all, # What data frame are you using for your data
                  event=death, # The count of the events, in this case, deaths
                  fu=fu, # The follow-up time or, in this case, population counts
                  subgroup=population, # The subgroups "population" and "age"
                  age,                 # "population" meaning A and B, and "age" meaning the age groups
                  refdata=df_pop, # The reference data frame
                  method="gamma", # The method for the confidence interval
                  sig=0.95, # The level of confidence for the 95% confidence interval
                  mp=1000, # The rate multiplier is "per 1,000" in this case
                  decimals=0) # The number of decimals for your results

Now that we have the results in “my_results”, let’s look at them in a table:

SubgroupNumeratorDenominatorCrude Rate
(per 1000)
95% LCL
95% UCL
Std Rate
(per 1000)
95% LCL (Crude)95% UCL

Voila! You have the same results as you did doing it by hand, above. Congrats.

Of course, you can also use “epitools” to calculate these adjusted rates:

pop_a_adjust <- ageadjust.direct(deaths_a, # The vector of values for deaths for Pop A
                                 population_a, # The population of Pop A
                                 rate = NULL, # If you had a standard rate, you'd put it here
                                 std_pop, # Your standard/reference population
                                 conf.level = 0.95) # The confidence level for the 95% confidence interval
round(10^3*pop_a_adjust,0) # Look at the results

pop_b_adjust <- ageadjust.direct(deaths_b,
                                 rate = NULL,
                                 conf.level = 0.95)

From this, you don’t get a fancy table (though you could make it one with some extra coding. You get an output, for Population A:

crude.rate   adj.rate        lci        uci 
        95         80         79         82 

And for Population B:

crude.rate   adj.rate        lci        uci 
        63         80         78         83 

And there you have it. Two ways to do this in R. You can now save the R file (or download the file I used by clicking here). If you have a lot of data, and you don’t want to write out the individual vectors at the beginning, then I suggest you learn the basics of importing data into R and then use that with the packages mentioned above.

Big thanks to these folks and these folks for their examples on how to use dsr and epitools, respectively.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

  One thought on “Epi 101: Direct Age Adjustment by Hand and With R

Comments are closed.

%d bloggers like this: