Finding myocardial infarction in obese patients

This vignette assumes a SQL server at localhost (we use PostgreSQL), with patient data in OMOP Common Data Model v5.4 format in schema cdm_new_york3. The data shown in this example is synthetic, generated by Synthea(TM) Patient Generator.

How to find patients who had an acute myocardial infarction (AMI) when they were clinically obese (body mass index, BMI > 30)?

A patient can have multiple myocardial infarctions, and their weight can change over time. In this example we are only interested in cases where the AMI happened while the patient’s BMI was known to be > 30.

The problem is, we cannot go back in time and measure the BMI of the patient on the day the AMI happened. Maybe someone did measure the patient’s weight on that day, but maybe not. So, what do we mean when we say “known to be obese”?

Maybe the only weight measurement available for that person is from 3 years before, or after, the day of the AMI. Or maybe the AMI happened between two weight measurements, one giving BMI > 30 hence obesity, the other giving BMI < 30 hence no obesity. Which one do you pick?

Here, we will pick whichever body measurements are closest in time to each AMI event.

…that means that on day 40, the closest weight record available was the one from day 1. That record said that the patient was obese. So, we count that as an AMI that happened in a patient known to be obese.

That would not be the case had the AMI been on day 90, for example. Similarly, if that same patient had a second AMI (god forbid) on day 120, or any day after 60, that second AMI would count in the non-obese group.

Create components

Let us first compute body mass index like in another vignette.

# Weight component
# Loinc 29463-7 Body weight, OMOP concept ID 3025315
weight_component <- sqlt(measurement) |>
  filter(measurement_concept_id == 3025315) |>
  make_component(
    .ts = measurement_datetime,
    .pid = person_id)

# Height component
# Loinc 8302-2 Body height, OMOP concept ID 3036277
height_component <- sqlt(measurement) |>
  filter(measurement_concept_id == 3036277) |>
  make_component(
    .ts = measurement_datetime,
    .pid = person_id)

To which we add a component for acute myocardial infarction (AMI), from table CONDITION_OCCURRENCE.

# Acute myocardial infarction (AMI) component
# SNOMED 22298006 Myocardial infarction, OMOP concept ID 4329847
ami_component <- sqlt(condition_occurrence) |>
  filter(condition_concept_id == 4329847) |>
  make_component(
    .ts = condition_start_datetime,
    .pid = person_id)

Calculate the phenotype

Spelling out the BMI formula is straightforward:

bmi = height/(weight * weight)

From which we can write a formula for “is obese”:

is_obese = bmi > 30

Notice that while the first formula is numerical (gives a number), the second one is logical (gives a TRUE or FALSE). Both are supported transparently, as long as they’re not mixed, e.g. 3 / TRUE.

However, doing the same for “has an AMI” may not be as clear. What is the formula for “patient has an AMI event”?

Since we have a column that gives the date when the AMI happened (it’s called condition_start_datetime), we can check whether that column is empty. In each patient, for all phenotype calculations with a timestamp past the timestamp of an AMI event, that column will be populated by Phea with the date of that AMI event. For timestamps prior to the patient’s first AMI, the column is empty. If the patient has a new AMI, the date is updated.

To check if a column is empty in SQL, we use is not null:

has_ami = ami_condition_start_datetime is not null

Armed with this, we can write the final formula for “has AMI and is obese”:

obese_ami = is_obese AND has_ami

# AMI in obese patient
ami_obese <- calculate_formula(
  components = list(
    weight = weight_component,
    height = height_component,
    ami = ami_component),
  fml = list(
    height_in_meters = 'height_value_as_number / 100',
    
    a = list( # the name 'a' doesn't matter
      bmi = 'weight_value_as_number / (height_in_meters * height_in_meters)',
    
      has_ami = 'ami_condition_start_datetime is not null'),
    
    b = list( # the name 'b' doesn't matter
      has_bmi = 'bmi is not null',
    
      is_obese = 'bmi > 30'),
    
    ami_obese = 'has_ami and is_obese'
  )
)
#> Called from: calculate_formula(components = list(weight = weight_component, 
#>     height = height_component, ami = ami_component), fml = list(height_in_meters = "height_value_as_number / 100", 
#>     a = list(bmi = "weight_value_as_number / (height_in_meters * height_in_meters)", 
#>         has_ami = "ami_condition_start_datetime is not null"), 
#>     b = list(has_bmi = "bmi is not null", is_obese = "bmi > 30"), 
#>     ami_obese = "has_ami and is_obese"))
#> debug: board <- dplyr::mutate(board, window = dplyr::sql(sql_ts_greatest) - 
#>     dplyr::sql(sql_ts_least), phea_ts_row = dplyr::sql(sql_txt))
#> debug: if (require_all || !is.null(dont_require)) {
#>     required_components <- setdiff(names(components), dont_require)
#>     if (length(required_components) > 0) {
#>         sql_txt <- paste0(DBI::dbQuoteIdentifier(paste0(required_components, 
#>             "_ts"), conn = .pheaglobalenv$con), " is not null", 
#>             collapse = " and ")
#>         if (has_content(window)) {
#>             board <- dplyr::filter(board, phea_row_id == phea_ts_row && 
#>                 dplyr::sql(sql_txt) && window < local(window))
#>         }
#>         else {
#>             board <- dplyr::filter(board, phea_row_id == phea_ts_row && 
#>                 dplyr::sql(sql_txt))
#>         }
#>     }
#>     else {
#>         if (has_content(window)) {
#>             board <- dplyr::filter(board, phea_row_id == phea_ts_row && 
#>                 window < local(window))
#>         }
#>         else {
#>             board <- dplyr::filter(board, phea_row_id == phea_ts_row)
#>         }
#>     }
#> } else {
#>     if (has_content(window)) {
#>         board <- dplyr::filter(board, phea_row_id == phea_ts_row && 
#>             window < local(window))
#>     }
#>     else {
#>         board <- dplyr::filter(board, phea_row_id == phea_ts_row)
#>     }
#> }
#> debug: if (has_content(window)) {
#>     board <- dplyr::filter(board, phea_row_id == phea_ts_row && 
#>         window < local(window))
#> } else {
#>     board <- dplyr::filter(board, phea_row_id == phea_ts_row)
#> }
#> debug: board <- dplyr::filter(board, phea_row_id == phea_ts_row)
#> debug: if (!is.null(filters) && any(!is.na(filters))) {
#>     sql_txt <- paste0("(", paste0(filters[!is.na(filters)], collapse = ") AND ("), 
#>         ")")
#>     board <- filter(board, sql(sql_txt))
#> }
#> debug: if (isFALSE(is.na(limit))) board <- head(board, n = lim)
#> debug: board <- dplyr::select(board, phea_row_id, pid, ts, window, !!!g_vars)
#> debug: res_vars <- NULL
#> debug: if (!is.null(fml)) {
#>     if (cascaded) {
#>         for (i in seq(fml)) {
#>             cur_fml <- fml[[i]]
#>             if (class(cur_fml) == "list") {
#>                 if (any(lapply(cur_fml, class) == "list")) 
#>                   stop("Formulas cannot be nested deeper than 1 level.")
#>                 res_vars <- c(res_vars, names(cur_fml))
#>                 commands <- unlist(purrr::map2(names(cur_fml), 
#>                   cur_fml, ~rlang::exprs(`:=`(!!..1, dplyr::sql(!!..2)))))
#>                 board <- dplyr::mutate(board, !!!commands)
#>             }
#>             else {
#>                 res_vars <- c(res_vars, names(fml)[i])
#>                 board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>                   dplyr::sql(cur_fml)))
#>             }
#>         }
#>     }
#>     else {
#>         if (any(lapply(fml, class) == "list")) 
#>             stop("Nested formulas require cascaded = TRUE.")
#>         res_vars <- c(res_vars, names(fml))
#>         commands <- unlist(purrr::map2(names(fml), fml, ~rlang::exprs(`:=`(!!..1, 
#>             dplyr::sql(!!..2)))))
#>         board <- dplyr::mutate(board, !!!commands)
#>     }
#> }
#> debug: if (cascaded) {
#>     for (i in seq(fml)) {
#>         cur_fml <- fml[[i]]
#>         if (class(cur_fml) == "list") {
#>             if (any(lapply(cur_fml, class) == "list")) 
#>                 stop("Formulas cannot be nested deeper than 1 level.")
#>             res_vars <- c(res_vars, names(cur_fml))
#>             commands <- unlist(purrr::map2(names(cur_fml), cur_fml, 
#>                 ~rlang::exprs(`:=`(!!..1, dplyr::sql(!!..2)))))
#>             board <- dplyr::mutate(board, !!!commands)
#>         }
#>         else {
#>             res_vars <- c(res_vars, names(fml)[i])
#>             board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>                 dplyr::sql(cur_fml)))
#>         }
#>     }
#> } else {
#>     if (any(lapply(fml, class) == "list")) 
#>         stop("Nested formulas require cascaded = TRUE.")
#>     res_vars <- c(res_vars, names(fml))
#>     commands <- unlist(purrr::map2(names(fml), fml, ~rlang::exprs(`:=`(!!..1, 
#>         dplyr::sql(!!..2)))))
#>     board <- dplyr::mutate(board, !!!commands)
#> }
#> debug: for (i in seq(fml)) {
#>     cur_fml <- fml[[i]]
#>     if (class(cur_fml) == "list") {
#>         if (any(lapply(cur_fml, class) == "list")) 
#>             stop("Formulas cannot be nested deeper than 1 level.")
#>         res_vars <- c(res_vars, names(cur_fml))
#>         commands <- unlist(purrr::map2(names(cur_fml), cur_fml, 
#>             ~rlang::exprs(`:=`(!!..1, dplyr::sql(!!..2)))))
#>         board <- dplyr::mutate(board, !!!commands)
#>     }
#>     else {
#>         res_vars <- c(res_vars, names(fml)[i])
#>         board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>             dplyr::sql(cur_fml)))
#>     }
#> }
#> debug: cur_fml <- fml[[i]]
#> debug: if (class(cur_fml) == "list") {
#>     if (any(lapply(cur_fml, class) == "list")) 
#>         stop("Formulas cannot be nested deeper than 1 level.")
#>     res_vars <- c(res_vars, names(cur_fml))
#>     commands <- unlist(purrr::map2(names(cur_fml), cur_fml, ~rlang::exprs(`:=`(!!..1, 
#>         dplyr::sql(!!..2)))))
#>     board <- dplyr::mutate(board, !!!commands)
#> } else {
#>     res_vars <- c(res_vars, names(fml)[i])
#>     board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>         dplyr::sql(cur_fml)))
#> }
#> debug: res_vars <- c(res_vars, names(fml)[i])
#> debug: board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>     dplyr::sql(cur_fml)))
#> debug: cur_fml <- fml[[i]]
#> debug: if (class(cur_fml) == "list") {
#>     if (any(lapply(cur_fml, class) == "list")) 
#>         stop("Formulas cannot be nested deeper than 1 level.")
#>     res_vars <- c(res_vars, names(cur_fml))
#>     commands <- unlist(purrr::map2(names(cur_fml), cur_fml, ~rlang::exprs(`:=`(!!..1, 
#>         dplyr::sql(!!..2)))))
#>     board <- dplyr::mutate(board, !!!commands)
#> } else {
#>     res_vars <- c(res_vars, names(fml)[i])
#>     board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>         dplyr::sql(cur_fml)))
#> }
#> debug: if (any(lapply(cur_fml, class) == "list")) stop("Formulas cannot be nested deeper than 1 level.")
#> debug: res_vars <- c(res_vars, names(cur_fml))
#> debug: commands <- unlist(purrr::map2(names(cur_fml), cur_fml, ~rlang::exprs(`:=`(!!..1, 
#>     dplyr::sql(!!..2)))))
#> debug: board <- dplyr::mutate(board, !!!commands)
#> debug: cur_fml <- fml[[i]]
#> debug: if (class(cur_fml) == "list") {
#>     if (any(lapply(cur_fml, class) == "list")) 
#>         stop("Formulas cannot be nested deeper than 1 level.")
#>     res_vars <- c(res_vars, names(cur_fml))
#>     commands <- unlist(purrr::map2(names(cur_fml), cur_fml, ~rlang::exprs(`:=`(!!..1, 
#>         dplyr::sql(!!..2)))))
#>     board <- dplyr::mutate(board, !!!commands)
#> } else {
#>     res_vars <- c(res_vars, names(fml)[i])
#>     board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>         dplyr::sql(cur_fml)))
#> }
#> debug: if (any(lapply(cur_fml, class) == "list")) stop("Formulas cannot be nested deeper than 1 level.")
#> debug: res_vars <- c(res_vars, names(cur_fml))
#> debug: commands <- unlist(purrr::map2(names(cur_fml), cur_fml, ~rlang::exprs(`:=`(!!..1, 
#>     dplyr::sql(!!..2)))))
#> debug: board <- dplyr::mutate(board, !!!commands)
#> debug: cur_fml <- fml[[i]]
#> debug: if (class(cur_fml) == "list") {
#>     if (any(lapply(cur_fml, class) == "list")) 
#>         stop("Formulas cannot be nested deeper than 1 level.")
#>     res_vars <- c(res_vars, names(cur_fml))
#>     commands <- unlist(purrr::map2(names(cur_fml), cur_fml, ~rlang::exprs(`:=`(!!..1, 
#>         dplyr::sql(!!..2)))))
#>     board <- dplyr::mutate(board, !!!commands)
#> } else {
#>     res_vars <- c(res_vars, names(fml)[i])
#>     board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>         dplyr::sql(cur_fml)))
#> }
#> debug: res_vars <- c(res_vars, names(fml)[i])
#> debug: board <- dplyr::mutate(board, `:=`(!!rlang::sym(names(fml)[i]), 
#>     dplyr::sql(cur_fml)))
#> debug: if (class(kco) == "logical") {
#>     if (length(kco) > 1) 
#>         stop("If logical, kco must be of length 1.")
#>     if (kco && !is.null(res_vars)) 
#>         board <- keep_change_of(board, res_vars, partition = "pid", 
#>             order = "ts")
#> } else {
#>     if (class(kco) != "character") 
#>         stop("kco must be logical or character vector.")
#>     board <- keep_change_of(board, kco, partition = "pid", order = "ts")
#> }
#> debug: if (length(kco) > 1) stop("If logical, kco must be of length 1.")
#> debug: if (kco && !is.null(res_vars)) board <- keep_change_of(board, 
#>     res_vars, partition = "pid", order = "ts")
#> debug: board <- dplyr::collapse(board, cte = TRUE)
#> debug: attr(board, "phea") <- "phenotype"
#> debug: attr(board, "phea_res_vars") <- res_vars
#> debug: attr(board, "phea_out_vars") <- g_vars
#> debug: if (get_sql) {
#>     return(dbplyr::sql_render(board))
#> } else {
#>     return(board)
#> }
#> debug: return(board)

Keep only the records closest to each other

Phea will compute ami_obese for all points in time in each patient. If a patient has a single AMI event, but had his body weight and height measured on 6 different dates, that is 7 computations of ami_obese, unless the date of the AMI coincides with one of the dates of the body measurements.

Since we want to consider only the body measurements made closest in time to the AMI event, we can filter the rows by the window.

Phea returns the window alongside each phenotype calculation. The window is the time distance between the oldest record, and the most recent one, that went into the phenotype calculation.

Therefore, in each patient, for each AMI event, whichever row has the smallest window, that row is using the body measurements that are closest possible, in time, to the AMI event.

# Pull the IDs of the patients who ever had an AMI
ami_patients <- ami_obese |>
  filter(has_ami) |>
  select(pid) |>
  distinct() |>
  pull()

# Keep only the smallest window for each AMI event:
ami_obese_f <- ami_obese |>
  filter(has_ami & has_bmi) |>
  pick_row_by('window', c('pid', 'ami_condition_start_datetime'))

head_shot(ami_obese_f) |>
  kable()

phea_row_id	pid	ts	window	height_value_as_number	weight_value_as_number	ami_condition_start_datetime	height_in_meters	bmi	has_ami	has_bmi	is_obese	ami_obese
8098	196	2007-12-22	364 days	186.0	94.4	2007-12-22	1.860	27.28639	TRUE	TRUE	FALSE	FALSE
8391	204	1975-04-20	14 days	176.1	93.2	1975-04-20	1.761	30.05365	TRUE	TRUE	TRUE	TRUE
10861	272	1990-12-30	105 days	182.8	91.2	1990-12-30	1.828	27.29245	TRUE	TRUE	FALSE	FALSE
11404	286	1987-11-04	189 days	168.9	53.3	1987-11-04	1.689	18.68392	TRUE	TRUE	FALSE	FALSE
11777	295	2018-07-08	154 days	181.0	91.8	2018-07-08	1.810	28.02112	TRUE	TRUE	FALSE	FALSE
12787	321	2009-01-28	112 days	191.4	110.4	2009-01-28	1.914	30.13597	TRUE	TRUE	TRUE	TRUE
13092	328	1989-09-28	98 days	162.4	55.3	1989-09-28	1.624	20.96781	TRUE	TRUE	FALSE	FALSE
15510	379	2005-09-09	42 days	182.8	92.1	2005-09-09	1.828	27.56178	TRUE	TRUE	FALSE	FALSE
15914	389	2012-03-03	147 days	169.1	79.8	2012-03-03	1.691	27.90716	TRUE	TRUE	FALSE	FALSE
18539	452	1932-09-15	160 days	187.3	75.5	1932-09-15	1.873	21.52144	TRUE	TRUE	FALSE	FALSE

Plot the phenotype for a random patient

ami_obese_patients <- ami_obese |>
  filter(ami_obese) |>
  select(pid) |>
  distinct() |>
  pull()

random_patient <- sample(ami_obese_patients, 1)
message('Sampled patient: ', random_patient)
#> Sampled patient: 13001

ami_obese |>
  select(
    -has_bmi,
    -height_value_as_number,
    -ami_condition_start_datetime) |>
  phea_plot(pid = random_patient)
#> Collecting lazy table, done. (turn this message off with `verbose` or `.verbose` in setup_phea())

Obtain the SQL query that computes the phenotype

To see the SQL query underlying the phenotype, use helper function code_shot(), or dbplyr::sql_render(), or the .clip_sql option in calculate_formula().

code_shot(ami_obese)

SELECT *, has_ami and is_obese AS "ami_obese"
FROM (
  SELECT *, bmi is not null AS "has_bmi", bmi > 30 AS "is_obese"
  FROM (
    SELECT
      *,
      weight_value_as_number / (height_in_meters * height_in_meters) AS "bmi",
      ami_condition_start_datetime is not null AS "has_ami"
    FROM (
      SELECT
        "phea_row_id",
        "pid",
        "ts",
        "window",
        "height_value_as_number",
        "weight_value_as_number",
        "ami_condition_start_datetime",
        height_value_as_number / 100 AS "height_in_meters"
      FROM (
        SELECT
          *,
          greatest("weight_ts", "height_ts", "ami_ts") - least("weight_ts", "height_ts", "ami_ts") AS "window",
          MAX("phea_row_id") OVER (PARTITION BY "pid", "ts") AS "phea_ts_row"
        FROM (
          SELECT
            row_number() over (order by "pid", "ts") AS "phea_row_id",
            "pid",
            "ts",
            phea_last_value_ignore_nulls(case when "name" = 184746 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "weight_value_as_number",
            phea_last_value_ignore_nulls(case when "name" = 184746 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "weight_ts",
            phea_last_value_ignore_nulls(case when "name" = 476286 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "height_value_as_number",
            phea_last_value_ignore_nulls(case when "name" = 476286 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "height_ts",
            phea_last_value_ignore_nulls(case when "name" = 144573 then "condition_start_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "ami_condition_start_datetime",
            phea_last_value_ignore_nulls(case when "name" = 144573 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "ami_ts"
          FROM (
            (
              SELECT *, NULL AS "condition_start_datetime"
              FROM (
                (
                  SELECT
                    184746 AS "name",
                    "person_id" AS "pid",
                    "measurement_datetime" AS "ts",
                    "value_as_number"
                  FROM "cdm_new_york3"."measurement"
                  WHERE ("measurement_concept_id" = 3025315.0)
                )
                UNION ALL
                (
                  SELECT
                    476286 AS "name",
                    "person_id" AS "pid",
                    "measurement_datetime" AS "ts",
                    "value_as_number"
                  FROM "cdm_new_york3"."measurement"
                  WHERE ("measurement_concept_id" = 3036277.0)
                )
              ) "q01"
            )
            UNION ALL
            (
              SELECT
                "name",
                "pid",
                "ts",
                NULL AS "value_as_number",
                "condition_start_datetime"
              FROM (
                SELECT
                  144573 AS "name",
                  "person_id" AS "pid",
                  "condition_start_datetime" AS "ts",
                  "condition_start_datetime"
                FROM "cdm_new_york3"."condition_occurrence"
                WHERE ("condition_concept_id" = 4329847.0)
              ) "q02"
            )
          ) "q03"
        ) "q04"
      ) "q05"
      WHERE ("phea_row_id" = "phea_ts_row")
    ) "q06"
  ) "q07"
) "q08"