Finding myocardial infarction in obese patients

This vignette assumes a SQL server at localhost (we use PostgreSQL), with patient data in OMOP Common Data Model v5.4 format in schema cdm_new_york3. The data shown in this example is synthetic, generated by Synthea(TM) Patient Generator.

suppressPackageStartupMessages(library(dplyr))
library(phea)

# Connect to SQL server.
dbcon <- DBI::dbConnect(RPostgres::Postgres(),
  host = fabcred$pg$host, port = fabcred$pg$port, dbname = fabcred$pg$database,
  user = fabcred$pg$user, password = fabcred$pg$pass)

# Setup Phea
setup_phea(dbcon, 'cdm_new_york3')

How to find patients who had an acute myocardial infarction (AMI) when they were clinically obese (body mass index, BMI > 30)?

A patient can have multiple myocardial infarctions, and their weight can change over time. In this example we are only interested in cases where the AMI happened while the patient’s BMI was known to be > 30.

The problem is, we cannot go back in time and measure the BMI of the patient on the day the AMI happened. Maybe someone did measure the patient’s weight on that day, but maybe not. So, what do we mean when we say “known to be obese”?

Maybe the only weight measurement available for that person is from 3 years before, or after, the day of the AMI. Or maybe the AMI happened between two weight measurements, one giving BMI > 30 hence obesity, the other giving BMI < 30 hence no obesity. Which one do you pick?

Here, we will pick whichever body measurements are closest in time to each AMI event.

Say a patient:

…that means that on day 40, the closest weight record available was the one from day 1. That record said that the patient was obese. So, we count that as an AMI that happened in a patient known to be obese.

That would not be the case had the AMI been on day 90, for example. Similarly, if that same patient had a second AMI (god forbid) on day 120, or any day after 60, that second AMI would count in the non-obese group.

Create components

Let us first compute body mass index like in another vignette.

# Weight component
# Loinc 29463-7 Body weight, OMOP concept ID 3025315
weight_component <- sqlt(measurement) |>
  filter(measurement_concept_id == 3025315) |>
  make_component(
    .ts = measurement_datetime,
    .pid = person_id)

# Height component
# Loinc 8302-2 Body height, OMOP concept ID 3036277
height_component <- sqlt(measurement) |>
  filter(measurement_concept_id == 3036277) |>
  make_component(
    .ts = measurement_datetime,
    .pid = person_id)

To which we add a component for acute myocardial infarction (AMI), from table CONDITION_OCCURRENCE.

# Acute myocardial infarction (AMI) component
# SNOMED 22298006 Myocardial infarction, OMOP concept ID 4329847
ami_component <- sqlt(condition_occurrence) |>
  filter(condition_concept_id == 4329847) |>
  make_component(
    .ts = condition_start_datetime,
    .pid = person_id)

Calculate the phenotype

Spelling out the BMI formula is straightforward:

bmi = height/(weight * weight)

From which we can write a formula for “is obese”:

is_obese = bmi > 30

Notice that while the first formula is numerical (gives a number), the second one is logical (gives a TRUE or FALSE). Both are supported transparently, as long as they’re not mixed, e.g. 3 / TRUE.

However, doing the same for “has an AMI” may not be as clear. What is the formula for “patient has an AMI event”?

Since we have a column that gives the date when the AMI happened (it’s called condition_start_datetime), we can check whether that column is empty. In each patient, for all phenotype calculations with a timestamp past the timestamp of an AMI event, that column will be populated by Phea with the date of that AMI event. For timestamps prior to the patient’s first AMI, the column is empty. If the patient has a new AMI, the date is updated.

To check if a column is empty in SQL, we use is not null:

has_ami = ami_condition_start_datetime is not null

Armed with this, we can write the final formula for “has AMI and is obese”:

obese_ami = is_obese AND has_ami

# AMI in obese patient
ami_obese <- calculate_formula(
  components = list(
    weight = weight_component,
    height = height_component,
    ami = ami_component),
  fml = list(
    height_in_meters = 'height_value_as_number / 100',
    
    a = list( # the name 'a' doesn't matter
      bmi = 'weight_value_as_number / (height_in_meters * height_in_meters)',
    
      has_ami = 'ami_condition_start_datetime is not null'),
    
    b = list( # the name 'b' doesn't matter
      has_bmi = 'bmi is not null',
    
      is_obese = 'bmi > 30'),
    
    ami_obese = 'has_ami and is_obese'
  )
)
#> Called from: calculate_formula(components = list(weight = weight_component, 
#>     height = height_component, ami = ami_component), fml = list(height_in_meters = "height_value_as_number / 100", 
#>     a = list(bmi = "weight_value_as_number / (height_in_meters * height_in_meters)", 
#>         has_ami = "ami_condition_start_datetime is not null"), 
#>     b = list(has_bmi = "bmi is not null", is_obese = "bmi > 30"), 
#>     ami_obese = "has_ami and is_obese"))
#> debug: board <- dplyr::mutate(board, window = dplyr::sql(sql_ts_greatest) - 
#>     dplyr::sql(sql_ts_least), phea_ts_row = dplyr::sql(sql_txt))
#> debug: if (require_all || !is.null(dont_require)) {
#>     required_components <- setdiff(names(components), dont_require)
#>     if (length(required_components) > 0) {
#>         sql_txt <- paste0(DBI::dbQuoteIdentifier(paste0(required_components, 
#>             "_ts"), conn = .pheaglobalenv$con), " is not null", 
#>             collapse = " and ")
#>         if (has_content(window)) {
#>             board <- dplyr::filter(board, phea_row_id == phea_ts_row && 
#>                 dplyr::sql(sql_txt) && window < local(window))
#>         }
#>         else {
#>             board <- dplyr::filter(board, phea_row_id == phea_ts_row && 
#>                 dplyr::sql(sql_txt))
#>         }
#>     }
#>     else {
#>         if (has_content(window)) {
#>             board <- dplyr::filter(board, phea_row_id == phea_ts_row && 
#>                 window < local(window))
#>         }
#>         else {
#>             board <- dplyr::filter(board, phea_row_id == phea_ts_row)
#>         }
#>     }
#> } else {
#>     if (has_content(window)) {
#>         board <- dplyr::filter(board, phea_row_id == phea_ts_row && 
#>             window < local(window))
#>     }
#>     else {
#>         board <- dplyr::filter(board, phea_row_id == phea_ts_row)
#>     }
#> }
#> debug: if (has_content(window)) {
#>     board <- dplyr::filter(board, phea_row_id == phea_ts_row && 
#>         window < local(window))
#> } else {
#>     board <- dplyr::filter(board, phea_row_id == phea_ts_row)
#> }
#> debug: board <- dplyr::filter(board, phea_row_id == phea_ts_row)
#> debug: if (!is.null(filters) && any(!is.na(filters))) {
#>     sql_txt <- paste0("(", paste0(filters[!is.na(filters)], collapse = ") AND ("), 
#>         ")")
#>     board <- filter(board, sql(sql_txt))
#> }
#> debug: if (isFALSE(is.na(limit))) board <- head(board, n = lim)
#> debug: board <- dplyr::select(board, phea_row_id, pid, ts, window, !!!g_vars)
#> debug: res_vars <- NULL
#> debug: if (!is.null(fml)) {
#>     if (cascaded) {
#>         for (i in seq(fml)) {
#>             cur_fml <- fml[[i]]
#>             if (class(cur_fml) == "list") {
#>                 if (any(lapply(cur_fml, class) == "list")) 
#>                   stop("Formulas cannot be nested deeper than 1 level.")
#>                 res_vars <- c(res_vars, names(cur_fml))
#>                 commands <- unlist(purrr::map2(names(cur_fml), 
#>                   cur_fml, ~rlang::exprs(`:=`(!!..1, dplyr::sql(!!..2)))))
#>                 board <- dplyr::mutate(board, !!!commands)
#>             }
#>             else {
#>                 res_vars <- c(res_vars, names(fml)[i])
#>                 board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>                   dplyr::sql(cur_fml)))
#>             }
#>         }
#>     }
#>     else {
#>         if (any(lapply(fml, class) == "list")) 
#>             stop("Nested formulas require cascaded = TRUE.")
#>         res_vars <- c(res_vars, names(fml))
#>         commands <- unlist(purrr::map2(names(fml), fml, ~rlang::exprs(`:=`(!!..1, 
#>             dplyr::sql(!!..2)))))
#>         board <- dplyr::mutate(board, !!!commands)
#>     }
#> }
#> debug: if (cascaded) {
#>     for (i in seq(fml)) {
#>         cur_fml <- fml[[i]]
#>         if (class(cur_fml) == "list") {
#>             if (any(lapply(cur_fml, class) == "list")) 
#>                 stop("Formulas cannot be nested deeper than 1 level.")
#>             res_vars <- c(res_vars, names(cur_fml))
#>             commands <- unlist(purrr::map2(names(cur_fml), cur_fml, 
#>                 ~rlang::exprs(`:=`(!!..1, dplyr::sql(!!..2)))))
#>             board <- dplyr::mutate(board, !!!commands)
#>         }
#>         else {
#>             res_vars <- c(res_vars, names(fml)[i])
#>             board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>                 dplyr::sql(cur_fml)))
#>         }
#>     }
#> } else {
#>     if (any(lapply(fml, class) == "list")) 
#>         stop("Nested formulas require cascaded = TRUE.")
#>     res_vars <- c(res_vars, names(fml))
#>     commands <- unlist(purrr::map2(names(fml), fml, ~rlang::exprs(`:=`(!!..1, 
#>         dplyr::sql(!!..2)))))
#>     board <- dplyr::mutate(board, !!!commands)
#> }
#> debug: for (i in seq(fml)) {
#>     cur_fml <- fml[[i]]
#>     if (class(cur_fml) == "list") {
#>         if (any(lapply(cur_fml, class) == "list")) 
#>             stop("Formulas cannot be nested deeper than 1 level.")
#>         res_vars <- c(res_vars, names(cur_fml))
#>         commands <- unlist(purrr::map2(names(cur_fml), cur_fml, 
#>             ~rlang::exprs(`:=`(!!..1, dplyr::sql(!!..2)))))
#>         board <- dplyr::mutate(board, !!!commands)
#>     }
#>     else {
#>         res_vars <- c(res_vars, names(fml)[i])
#>         board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>             dplyr::sql(cur_fml)))
#>     }
#> }
#> debug: cur_fml <- fml[[i]]
#> debug: if (class(cur_fml) == "list") {
#>     if (any(lapply(cur_fml, class) == "list")) 
#>         stop("Formulas cannot be nested deeper than 1 level.")
#>     res_vars <- c(res_vars, names(cur_fml))
#>     commands <- unlist(purrr::map2(names(cur_fml), cur_fml, ~rlang::exprs(`:=`(!!..1, 
#>         dplyr::sql(!!..2)))))
#>     board <- dplyr::mutate(board, !!!commands)
#> } else {
#>     res_vars <- c(res_vars, names(fml)[i])
#>     board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>         dplyr::sql(cur_fml)))
#> }
#> debug: res_vars <- c(res_vars, names(fml)[i])
#> debug: board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>     dplyr::sql(cur_fml)))
#> debug: cur_fml <- fml[[i]]
#> debug: if (class(cur_fml) == "list") {
#>     if (any(lapply(cur_fml, class) == "list")) 
#>         stop("Formulas cannot be nested deeper than 1 level.")
#>     res_vars <- c(res_vars, names(cur_fml))
#>     commands <- unlist(purrr::map2(names(cur_fml), cur_fml, ~rlang::exprs(`:=`(!!..1, 
#>         dplyr::sql(!!..2)))))
#>     board <- dplyr::mutate(board, !!!commands)
#> } else {
#>     res_vars <- c(res_vars, names(fml)[i])
#>     board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>         dplyr::sql(cur_fml)))
#> }
#> debug: if (any(lapply(cur_fml, class) == "list")) stop("Formulas cannot be nested deeper than 1 level.")
#> debug: res_vars <- c(res_vars, names(cur_fml))
#> debug: commands <- unlist(purrr::map2(names(cur_fml), cur_fml, ~rlang::exprs(`:=`(!!..1, 
#>     dplyr::sql(!!..2)))))
#> debug: board <- dplyr::mutate(board, !!!commands)
#> debug: cur_fml <- fml[[i]]
#> debug: if (class(cur_fml) == "list") {
#>     if (any(lapply(cur_fml, class) == "list")) 
#>         stop("Formulas cannot be nested deeper than 1 level.")
#>     res_vars <- c(res_vars, names(cur_fml))
#>     commands <- unlist(purrr::map2(names(cur_fml), cur_fml, ~rlang::exprs(`:=`(!!..1, 
#>         dplyr::sql(!!..2)))))
#>     board <- dplyr::mutate(board, !!!commands)
#> } else {
#>     res_vars <- c(res_vars, names(fml)[i])
#>     board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>         dplyr::sql(cur_fml)))
#> }
#> debug: if (any(lapply(cur_fml, class) == "list")) stop("Formulas cannot be nested deeper than 1 level.")
#> debug: res_vars <- c(res_vars, names(cur_fml))
#> debug: commands <- unlist(purrr::map2(names(cur_fml), cur_fml, ~rlang::exprs(`:=`(!!..1, 
#>     dplyr::sql(!!..2)))))
#> debug: board <- dplyr::mutate(board, !!!commands)
#> debug: cur_fml <- fml[[i]]
#> debug: if (class(cur_fml) == "list") {
#>     if (any(lapply(cur_fml, class) == "list")) 
#>         stop("Formulas cannot be nested deeper than 1 level.")
#>     res_vars <- c(res_vars, names(cur_fml))
#>     commands <- unlist(purrr::map2(names(cur_fml), cur_fml, ~rlang::exprs(`:=`(!!..1, 
#>         dplyr::sql(!!..2)))))
#>     board <- dplyr::mutate(board, !!!commands)
#> } else {
#>     res_vars <- c(res_vars, names(fml)[i])
#>     board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>         dplyr::sql(cur_fml)))
#> }
#> debug: res_vars <- c(res_vars, names(fml)[i])
#> debug: board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>     dplyr::sql(cur_fml)))
#> debug: if (class(kco) == "logical") {
#>     if (length(kco) > 1) 
#>         stop("If logical, kco must be of length 1.")
#>     if (kco && !is.null(res_vars)) 
#>         board <- keep_change_of(board, res_vars, partition = "pid", 
#>             order = "ts")
#> } else {
#>     if (class(kco) != "character") 
#>         stop("kco must be logical or character vector.")
#>     board <- keep_change_of(board, kco, partition = "pid", order = "ts")
#> }
#> debug: if (length(kco) > 1) stop("If logical, kco must be of length 1.")
#> debug: if (kco && !is.null(res_vars)) board <- keep_change_of(board, 
#>     res_vars, partition = "pid", order = "ts")
#> debug: board <- dplyr::collapse(board, cte = TRUE)
#> debug: attr(board, "phea") <- "phenotype"
#> debug: attr(board, "phea_res_vars") <- res_vars
#> debug: attr(board, "phea_out_vars") <- g_vars
#> debug: if (get_sql) {
#>     return(dbplyr::sql_render(board))
#> } else {
#>     return(board)
#> }
#> debug: return(board)

Keep only the records closest to each other

Phea will compute ami_obese for all points in time in each patient. If a patient has a single AMI event, but had his body weight and height measured on 6 different dates, that is 7 computations of ami_obese, unless the date of the AMI coincides with one of the dates of the body measurements.

Since we want to consider only the body measurements made closest in time to the AMI event, we can filter the rows by the window.

Phea returns the window alongside each phenotype calculation. The window is the time distance between the oldest record, and the most recent one, that went into the phenotype calculation.

Therefore, in each patient, for each AMI event, whichever row has the smallest window, that row is using the body measurements that are closest possible, in time, to the AMI event.

# Pull the IDs of the patients who ever had an AMI
ami_patients <- ami_obese |>
  filter(has_ami) |>
  select(pid) |>
  distinct() |>
  pull()

# Keep only the smallest window for each AMI event:
ami_obese_f <- ami_obese |>
  filter(has_ami & has_bmi) |>
  pick_row_by('window', c('pid', 'ami_condition_start_datetime'))

head_shot(ami_obese_f) |>
  kable()
phea_row_id pid ts window height_value_as_number weight_value_as_number ami_condition_start_datetime height_in_meters bmi has_ami has_bmi is_obese ami_obese
8098 196 2007-12-22 364 days 186.0 94.4 2007-12-22 1.860 27.28639 TRUE TRUE FALSE FALSE
8391 204 1975-04-20 14 days 176.1 93.2 1975-04-20 1.761 30.05365 TRUE TRUE TRUE TRUE
10861 272 1990-12-30 105 days 182.8 91.2 1990-12-30 1.828 27.29245 TRUE TRUE FALSE FALSE
11404 286 1987-11-04 189 days 168.9 53.3 1987-11-04 1.689 18.68392 TRUE TRUE FALSE FALSE
11777 295 2018-07-08 154 days 181.0 91.8 2018-07-08 1.810 28.02112 TRUE TRUE FALSE FALSE
12787 321 2009-01-28 112 days 191.4 110.4 2009-01-28 1.914 30.13597 TRUE TRUE TRUE TRUE
13092 328 1989-09-28 98 days 162.4 55.3 1989-09-28 1.624 20.96781 TRUE TRUE FALSE FALSE
15510 379 2005-09-09 42 days 182.8 92.1 2005-09-09 1.828 27.56178 TRUE TRUE FALSE FALSE
15914 389 2012-03-03 147 days 169.1 79.8 2012-03-03 1.691 27.90716 TRUE TRUE FALSE FALSE
18539 452 1932-09-15 160 days 187.3 75.5 1932-09-15 1.873 21.52144 TRUE TRUE FALSE FALSE

Plot the phenotype for a random patient

ami_obese_patients <- ami_obese |>
  filter(ami_obese) |>
  select(pid) |>
  distinct() |>
  pull()

random_patient <- sample(ami_obese_patients, 1)
message('Sampled patient: ', random_patient)
#> Sampled patient: 13001
ami_obese |>
  select(
    -has_bmi,
    -height_value_as_number,
    -ami_condition_start_datetime) |>
  phea_plot(pid = random_patient)
#> Collecting lazy table, done. (turn this message off with `verbose` or `.verbose` in setup_phea())

Obtain the SQL query that computes the phenotype

To see the SQL query underlying the phenotype, use helper function code_shot(), or dbplyr::sql_render(), or the .clip_sql option in calculate_formula().

code_shot(ami_obese)
SELECT *, has_ami and is_obese AS "ami_obese"
FROM (
  SELECT *, bmi is not null AS "has_bmi", bmi > 30 AS "is_obese"
  FROM (
    SELECT
      *,
      weight_value_as_number / (height_in_meters * height_in_meters) AS "bmi",
      ami_condition_start_datetime is not null AS "has_ami"
    FROM (
      SELECT
        "phea_row_id",
        "pid",
        "ts",
        "window",
        "height_value_as_number",
        "weight_value_as_number",
        "ami_condition_start_datetime",
        height_value_as_number / 100 AS "height_in_meters"
      FROM (
        SELECT
          *,
          greatest("weight_ts", "height_ts", "ami_ts") - least("weight_ts", "height_ts", "ami_ts") AS "window",
          MAX("phea_row_id") OVER (PARTITION BY "pid", "ts") AS "phea_ts_row"
        FROM (
          SELECT
            row_number() over (order by "pid", "ts") AS "phea_row_id",
            "pid",
            "ts",
            phea_last_value_ignore_nulls(case when "name" = 184746 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "weight_value_as_number",
            phea_last_value_ignore_nulls(case when "name" = 184746 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "weight_ts",
            phea_last_value_ignore_nulls(case when "name" = 476286 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "height_value_as_number",
            phea_last_value_ignore_nulls(case when "name" = 476286 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "height_ts",
            phea_last_value_ignore_nulls(case when "name" = 144573 then "condition_start_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "ami_condition_start_datetime",
            phea_last_value_ignore_nulls(case when "name" = 144573 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "ami_ts"
          FROM (
            (
              SELECT *, NULL AS "condition_start_datetime"
              FROM (
                (
                  SELECT
                    184746 AS "name",
                    "person_id" AS "pid",
                    "measurement_datetime" AS "ts",
                    "value_as_number"
                  FROM "cdm_new_york3"."measurement"
                  WHERE ("measurement_concept_id" = 3025315.0)
                )
                UNION ALL
                (
                  SELECT
                    476286 AS "name",
                    "person_id" AS "pid",
                    "measurement_datetime" AS "ts",
                    "value_as_number"
                  FROM "cdm_new_york3"."measurement"
                  WHERE ("measurement_concept_id" = 3036277.0)
                )
              ) "q01"
            )
            UNION ALL
            (
              SELECT
                "name",
                "pid",
                "ts",
                NULL AS "value_as_number",
                "condition_start_datetime"
              FROM (
                SELECT
                  144573 AS "name",
                  "person_id" AS "pid",
                  "condition_start_datetime" AS "ts",
                  "condition_start_datetime"
                FROM "cdm_new_york3"."condition_occurrence"
                WHERE ("condition_concept_id" = 4329847.0)
              ) "q02"
            )
          ) "q03"
        ) "q04"
      ) "q05"
      WHERE ("phea_row_id" = "phea_ts_row")
    ) "q06"
  ) "q07"
) "q08"