Epi 101: Direct Age Adjustment by Hand and With R

When you’re comparing two populations’ death rates, you need to take into account the differences in the age distributions of the two populations. For example, if one population has more young people in it, you might not see as many deaths from heart conditions as you would see in a population with more adults over the age of 50. It’s not that the two are really different in the rates of death from heart conditions, it’s just that the population differences are accounting for this.

For Example

Let’s say you are comparing deaths from cancer in two populations, A and B, and you know that the particular cancer you’re looking at hits a lot of people over the age of 75, but only a few under the age of 15. The other age groups are affected somewhere in the middle. Let’s look at the data:

 Age Group Population A Number of Deaths 0-15 10,000 100 15-35 12,000 600 36-55 15,000 1,050 56-75 25,000 2,500 Over 75 30,000 4,500 Total 92,000 8,750 Crude Death Rate: 8,750/92,000 = 95 per 1,000
 Age Group Population B Number of Deaths 0-15 25,000 250 15-35 15,000 750 36-55 20,000 1,400 56-75 12,000 1,200 Over 75 11,000 1,650 Total 83,000 5,250 Crude Death Rate: 5,250/83,000 = 63 per 1,000

So population A has a death rate of 95 deaths per 1,000 residents. Population B has a death rate of 63 deaths per 1,000 residents. Clearly, if you want to stay away from dying from this kind of cancer, you need to go to population B, right? Well…

Look Closer

If you pay close attention, you will notice that the population distributions between the two are different. Population A has most of its residents in the 65-75 and Over 75 age groups. Population B, on the other hand, has most of its residents in the 0-15 and 36-55 age groups. That is, population B is a bit younger. So could it be that they have a lower crude death rate because they’re younger?

Let’s Find Out!

To compare the two populations, we need to use a standard population. A quick way to get a standard population is to use either population A or population B as a standard. Another way is to use the United States’ population distribution as a standard. Whatever you choose, you have to then compare the two populations’ age-specific death rates to that standard. For this example, I created a standard population:

 Age Group Population – Standard 0-15 20,000 16-35 30,000 36-55 35,000 56-75 37,000 Over 75 28,000 Total 150,000

To this standard population, we are going to apply the age-specific death rates from each population, and use that to calculate the expected number of deaths in the standard population if the age-specific death rates for population A (and then B) are applied to the standard population.

 Age Group PopulationStandard Age-specificdeath ratePopulation A Expected DeathsPopulation A 0-15 20,000 10 per 1,000 (0.01) 200 16-35 30,000 50 per 1,000 (0.05) 1,500 36-55 35,000 70 per 1,000 (0.07) 2,450 56-75 37,000 100 per 1,000 (0.10) 3,700 Over 75 28,000 150 per 1,000 (0.15) 4,200 Total 150,000 Total Expected Deaths 12,050

Now, let’s do it for population B (and see if you notice a pattern):

 Age Group PopulationStandard Age-specificdeath ratePopulation B Expected DeathsPopulation B 0-15 20,000 10 per 1,000 (0.01) 200 16-35 30,000 50 per 1,000 (0.05) 1,500 36-55 35,000 70 per 1,000 (0.07) 2,450 56-75 37,000 100 per 1,000 (0.10) 3,700 Over 75 28,000 150 per 1,000 (0.15) 4,200 Total 150,000 Total Expected Deaths 12,050

Do you see it? The age-specific death rates for the two populations are exactly the same. You just got different totals in the crude rates because of the different age distributions. Once we applied the age-specific death rates to the standard population (aka “reference population”), we get the same number of expected deaths: 12,050.

With that, we get the same adjusted death rates for the two populations: 12,050 divided by 150,000, which is 0.080 or 80 per 1,000 residents. From this, you can conclude that the observed differences in the crude deaths rates is more a function of different age distributions between the populations than from actual differences in death rates between the two populations.

To do this in R, we are going to need three packages: “dsr”, “epitools” and “knitr”. So pop open a new R script in R Studio and type:

``````library(knitr)
library(dsr)
library(epitools)``````

Next, we are going to create our data:

``````age_groups <- c("0-15", "16-35", "36-55", "56-75", ">75") # Labels for Age Groups
population_a <- c(10000, 12000, 15000, 25000, 30000) # Population A
population_b <- c(25000, 15000, 20000, 12000, 11000) # Population B
deaths_a <- c(100, 600, 1050, 2500, 4500) # Deaths in Population A
deaths_b <- c(250, 750, 1400, 1200, 1650) # Deaths in Population B
std_pop <- c(20000,30000,35000,37000, 28000) # Population of Standard Population``````

Note that we basically created six vectors to represent the data in the tables above. From these vectors, we are going to make some data frames.

``````df_a <- data.frame(age=age_groups, # Data frame for Population A
death=deaths_a,
fu=population_a, # fu (follow-up) can be person-time or population counts
population='A')

df_b <- data.frame(age=age_groups, # Data frame for Population B
death=deaths_b,
fu=population_b, # fu (follow-up) can be person-time or population counts
population='B')

df_all <- rbind(df_a, df_b) # Data frame combining both populations

df_pop <- data.frame(age=age_groups, # Data frame for the Standard/Reference Population
pop=std_pop)``````

With our main data frame “df_all” we can now use “dsr” to do the hard work for us, and we are going to store the results in “my_results” so we can look at them later.

``````my_results <- dsr(data=df_all, # What data frame are you using for your data
event=death, # The count of the events, in this case, deaths
fu=fu, # The follow-up time or, in this case, population counts
subgroup=population, # The subgroups "population" and "age"
age,                 # "population" meaning A and B, and "age" meaning the age groups
refdata=df_pop, # The reference data frame
method="gamma", # The method for the confidence interval
sig=0.95, # The level of confidence for the 95% confidence interval
mp=1000, # The rate multiplier is "per 1,000" in this case
decimals=0) # The number of decimals for your results``````

Now that we have the results in “my_results”, let’s look at them in a table:

``knitr::kable(my_results)``
 Subgroup Numerator Denominator Crude Rate(per 1000) 95% LCL(Crude) 95% UCL(Crude) Std Rate(per 1000) 95% LCL (Crude) 95% UCL(Crude) A 8750 92000 95 93 97 80 79 82 B 5250 83000 63 62 65 80 79 83

Voila! You have the same results as you did doing it by hand, above. Congrats.

Of course, you can also use “epitools” to calculate these adjusted rates:

``````pop_a_adjust <- ageadjust.direct(deaths_a, # The vector of values for deaths for Pop A
population_a, # The population of Pop A
rate = NULL, # If you had a standard rate, you'd put it here
conf.level = 0.95) # The confidence level for the 95% confidence interval
round(10^3*pop_a_adjust,0) # Look at the results

population_b,
rate = NULL,
std_pop,
conf.level = 0.95)

From this, you don’t get a fancy table (though you could make it one with some extra coding. You get an output, for Population A:

``````crude.rate   adj.rate        lci        uci
95         80         79         82 ``````

And for Population B:

``````crude.rate   adj.rate        lci        uci
63         80         78         83 ``````

And there you have it. Two ways to do this in R. You can now save the R file (or download the file I used by clicking here). If you have a lot of data, and you don’t want to write out the individual vectors at the beginning, then I suggest you learn the basics of importing data into R and then use that with the packages mentioned above.

Big thanks to these folks and these folks for their examples on how to use dsr and epitools, respectively.