Kidney injury lab presentation

This vignette assumes a SQL server at localhost (we use PostgreSQL), with data in OMOP Common Data Model v5.4 format in schema cdm_new_york3. The patient records shown in this example are synthetic data from Synthea(TM) Patient Generator.

library(phea)
library(dplyr)

# Connect to SQL server.
dbcon <- DBI::dbConnect(RPostgres::Postgres(),
  host = fabcred$pg$host, port = fabcred$pg$port, dbname = fabcred$pg$database,
  user = fabcred$pg$user, password = fabcred$pg$pass)

# Setup Phea
setup_phea(dbcon, 'cdm_new_york3')

In this vignette we identify:

Case A.

Case B.

Case C.

Here’s how we compute it:

In both cases, we will normalize the unit of measurement of serum creatinine to md/dL prior to computing formulas. We could also use formulas for that, and the result would be the same.

Create components

Serum creatinine

We collect SCr records from MEASUREMENT and convert the units to mg/dL.

# Serum creatinine codes used: 
# Loinc 38483-4 Creatinine [Mass/volume] in Blood, OMOP CDM concept ID 3051825
# Loinc 2160-0 Creatinine [Mass/volume] in Serum or Plasma, OMOP CDM concept ID 3016723

# "A" records: Unit is 'mg/dL'.
scr_records_a <- sqlt(measurement) |>
  filter(measurement_concept_id %in% c(3051825, 3016723) &&
    unit_source_value == 'mg/dL')

# "B" records: Unit is 'µmol/L', but we convert to 'mg/dL'.
scr_records_b <- sqlt(measurement) |>
  filter(measurement_concept_id %in% c(3051825, 3016723) &&
    unit_source_value == 'µmol/L') |>
  mutate( # Convert µmol/L to mg/dL
    value_as_number = value_as_number / 88.42,
    unit_source_value = 'mg/dL')

# Combine all available serum creatinine records.
scr_records <- union_all(scr_records_a, scr_records_b)

# Make a record source.
scr_record_source <- make_record_source(scr_records,
  ts = 'measurement_datetime',
  pid = 'person_id')

Glomerular filtration rate

We collect GFR records from MEASUREMENT.

# GFR codes used:
# Loinc 77147-7 Glomerular filtration rate/1.73 sq M.predicted [Volume Rate/Area] in Serum, Plasma or Blood by Creatinine-based formula (MDRD)
# OMOP CDM concept ID 46236952 
gfr_records <- sqlt(measurement) |>
  filter(measurement_concept_id == 46236952)

gfr_record_source <- make_record_source(gfr_records,
    ts = 'measurement_datetime',
    pid = 'person_id')

Calculate the phenotype

Formula scr_case_a contains the logic of case A (difference ≥0.3 mg/dL), scr_case_b contains case B (ratio ≥1.5), and gfr_case_c contains case C (two GFRs under 60 within 3-5 months).

Notice we use 48 days instead of 48 hours, and 7 months instead of 7 days. This is just to make a visually meaningful example out of the limited data that is produced by Synthea(TM). Those time intervals can be changed to “48 hours” and “7 days” to produce the real phenotype.

scr_change <- calculate_formula(
  components = list(
    # Current SCr
    scr = make_component(scr_record_source),
    
    # Minimum value within 48-hour window
    scr_48h_min = make_component(scr_record_source,
      window = "'48 days'::interval",
      fn = list(value_as_number = 'min')),
    
    # Minimum value within 7-day window
    scr_7d_min = make_component(scr_record_source,
      window = "'7 months'::interval",
      fn = list(value_as_number = 'min')),
    
    # Current glomerular filtration rate (GFR)
    gfr = make_component(gfr_record_source),
    
    # Glomerular filtration rate 3 to 5 months older than phenotype date
    gfr_prior = make_component(gfr_record_source,
      delay = "'3 months'::interval",
      window = "'5 months'::interval",
      fn = list(value_as_number = 'min'))),
    
  fml = list(
    scr_case_a = 'scr_value_as_number - scr_48h_min_value_as_number >= 0.3',
    
    scr_case_b = 'scr_value_as_number / scr_7d_min_value_as_number >= 1.5',
    
    gfr_case_c = 'gfr_value_as_number < 60 AND gfr_prior_value_as_number < 60'),
  
  export = c(
    'scr_measurement_datetime',
    'scr_48h_min_measurement_datetime',
    'scr_7d_min_measurement_datetime',
    'gfr_measurement_datetime',
    'gfr_prior_measurement_datetime'),
  
  cascaded = FALSE, # Because we don't need to use results of prior formulas inside other formulas.
)

Let us take a small peek at 15 rows from the phenotype results.

head_shot(scr_change, 15) |>
  kable()
phea_row_id pid ts window scr_value_as_number scr_48h_min_value_as_number scr_7d_min_value_as_number gfr_value_as_number gfr_prior_value_as_number scr_measurement_datetime scr_48h_min_measurement_datetime scr_7d_min_measurement_datetime gfr_measurement_datetime gfr_prior_measurement_datetime scr_case_a scr_case_b gfr_case_c
1 1 2014-03-07 00:00:00 1.4 1.4 1.4 NA NA 2014-03-07 2014-03-07 2014-03-07 NA NA FALSE FALSE NA
2 1 2014-03-07 00:00:00 1.4 1.4 1.4 124.5 NA 2014-03-07 2014-03-07 2014-03-07 2014-03-07 NA FALSE FALSE FALSE
3 1 2016-03-11 735 days 1.4 1.3 1.3 120.9 NA 2014-03-07 2016-03-11 2016-03-11 2016-03-11 NA FALSE FALSE FALSE
4 1 2016-03-11 00:00:00 1.3 1.3 1.3 120.9 NA 2016-03-11 2016-03-11 2016-03-11 2016-03-11 NA FALSE FALSE FALSE
5 1 2018-03-16 735 days 1.3 1.3 1.3 120.9 NA 2018-03-16 2018-03-16 2018-03-16 2016-03-11 NA FALSE FALSE FALSE
6 1 2018-03-16 00:00:00 1.3 1.3 1.3 151.7 NA 2018-03-16 2018-03-16 2018-03-16 2018-03-16 NA FALSE FALSE FALSE
7 1 2020-03-20 735 days 1.3 1.3 1.3 107.3 NA 2018-03-16 2020-03-20 2020-03-20 2020-03-20 NA FALSE FALSE FALSE
8 1 2020-03-20 00:00:00 1.3 1.3 1.3 107.3 NA 2020-03-20 2020-03-20 2020-03-20 2020-03-20 NA FALSE FALSE FALSE
9 1 2022-02-11 693 days 1.5 1.5 1.5 107.3 NA 2022-02-11 2022-02-11 2022-02-11 2020-03-20 NA FALSE FALSE FALSE
10 1 2022-02-11 693 days 1.8 1.5 1.5 107.3 NA 2022-02-11 2022-02-11 2022-02-11 2020-03-20 NA TRUE FALSE FALSE
11 1 2022-02-11 00:00:00 1.8 1.5 1.5 84.8 NA 2022-02-11 2022-02-11 2022-02-11 2022-02-11 NA TRUE FALSE FALSE
12 1 2022-02-11 00:00:00 1.8 1.5 1.5 132.3 NA 2022-02-11 2022-02-11 2022-02-11 2022-02-11 NA TRUE FALSE FALSE
13 1 2022-03-25 42 days 1.8 1.5 1.5 84.1 NA 2022-02-11 2022-02-11 2022-02-11 2022-03-25 NA TRUE FALSE FALSE
14 1 2022-03-25 42 days 1.8 1.5 1.5 105.7 NA 2022-02-11 2022-02-11 2022-02-11 2022-03-25 NA TRUE FALSE FALSE
15 1 2022-03-25 00:00:00 1.5 1.5 1.5 105.7 NA 2022-03-25 2022-02-11 2022-02-11 2022-03-25 NA FALSE FALSE FALSE

Plot the phenotype for a random patient

For demonstration purposes, let us pick an “interesting” patient. By “interesting” I just mean a patient who at different times was TRUE and FALSE in each of the three criteria. A patient with variability in their timeline.

# Find an "interesting" case to plot: patients with each criteria at times TRUE, at times FALSE
patients <- scr_change |>
  group_by(pid) |>
  summarise(.groups = 'drop',
    n_a = n_distinct(scr_case_a),
    n_b = n_distinct(scr_case_b),
    n_c = n_distinct(gfr_case_c)) |>
  mutate(sort_variable = n_a + n_b + n_c) |>
  select(pid, sort_variable) |>
  collect()

patients <- patients |>
  arrange(desc(sort_variable)) |>
  pull('pid')

patients <- patients[1:20]

random_patient <- sample(patients, 1)

message('Sampled patient: ', random_patient)
#> Sampled patient: 729

Then we plot all data for the chosen patient (pid = 729).

scr_change |>
  select(-ends_with('datetime')) |>
  mutate(
    scr_case_a_or_b = scr_case_a | scr_case_b) |>
  phea_plot(random_patient)
#> Collecting lazy table, done. (turn this message off with `verbose` or `.verbose` in setup_phea())

At the end of this report I include a large table with all the data for this patient, directly from the record sources, for maximum verification.

Obtain the SQL query that computes the phenotype

To see the SQL query underlying the phenotype, use helper function code_shot(), or dbplyr::sql_render(), or the .clip_sql option in calculate_formula().

code_shot(scr_change)
SELECT
  *,
  scr_value_as_number - scr_48h_min_value_as_number >= 0.3 AS "scr_case_a",
  scr_value_as_number / scr_7d_min_value_as_number >= 1.5 AS "scr_case_b",
  gfr_value_as_number < 60 AND gfr_prior_value_as_number < 60 AS "gfr_case_c"
FROM (
  SELECT
    "phea_row_id",
    "pid",
    "ts",
    greatest("scr_ts", "scr_48h_min_ts", "scr_7d_min_ts", "gfr_ts", "gfr_prior_ts") - least("scr_ts", "scr_48h_min_ts", "scr_7d_min_ts", "gfr_ts", "gfr_prior_ts") AS "window",
    "scr_value_as_number",
    "scr_48h_min_value_as_number",
    "scr_7d_min_value_as_number",
    "gfr_value_as_number",
    "gfr_prior_value_as_number",
    "scr_measurement_datetime",
    "scr_48h_min_measurement_datetime",
    "scr_7d_min_measurement_datetime",
    "gfr_measurement_datetime",
    "gfr_prior_measurement_datetime"
  FROM (
    SELECT
      row_number() OVER (ORDER BY "pid", "ts") AS "phea_row_id",
      "pid",
      "ts",
      phea_last_value_ignore_nulls(case when "name" = 236253 then "measurement_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "scr_measurement_datetime",
      phea_last_value_ignore_nulls(case when "name" = 236253 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "scr_value_as_number",
      phea_last_value_ignore_nulls(case when "name" = 236253 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "scr_ts",
      min(case when "name" = 236253 then "measurement_datetime" else null end) over (partition by "pid" order by "ts" range between '48 days'::interval preceding and current row) AS "scr_48h_min_measurement_datetime",
      min(case when "name" = 236253 then "value_as_number" else null end) over (partition by "pid" order by "ts" range between '48 days'::interval preceding and current row) AS "scr_48h_min_value_as_number",
      phea_last_value_ignore_nulls(case when "name" = 236253 then "ts" else null end) over (partition by "pid" order by "ts" range between '48 days'::interval preceding and current row) AS "scr_48h_min_ts",
      min(case when "name" = 236253 then "measurement_datetime" else null end) over (partition by "pid" order by "ts" range between '7 months'::interval preceding and current row) AS "scr_7d_min_measurement_datetime",
      min(case when "name" = 236253 then "value_as_number" else null end) over (partition by "pid" order by "ts" range between '7 months'::interval preceding and current row) AS "scr_7d_min_value_as_number",
      phea_last_value_ignore_nulls(case when "name" = 236253 then "ts" else null end) over (partition by "pid" order by "ts" range between '7 months'::interval preceding and current row) AS "scr_7d_min_ts",
      phea_last_value_ignore_nulls(case when "name" = 666684 then "measurement_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "gfr_measurement_datetime",
      phea_last_value_ignore_nulls(case when "name" = 666684 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "gfr_value_as_number",
      phea_last_value_ignore_nulls(case when "name" = 666684 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "gfr_ts",
      min(case when "name" = 666684 then "measurement_datetime" else null end) over (partition by "pid" order by "ts" range between '5 months'::interval preceding and '3 months'::interval preceding) AS "gfr_prior_measurement_datetime",
      min(case when "name" = 666684 then "value_as_number" else null end) over (partition by "pid" order by "ts" range between '5 months'::interval preceding and '3 months'::interval preceding) AS "gfr_prior_value_as_number",
      phea_last_value_ignore_nulls(case when "name" = 666684 then "ts" else null end) over (partition by "pid" order by "ts" range between '5 months'::interval preceding and '3 months'::interval preceding) AS "gfr_prior_ts"
    FROM (
      (
        SELECT
          236253 AS "name",
          "person_id" AS "pid",
          "measurement_datetime" AS "ts",
          "measurement_datetime",
          "value_as_number"
        FROM (
          (
            SELECT *
            FROM "cdm_new_york3"."measurement"
            WHERE ("measurement_concept_id" IN (3051825.0, 3016723.0) AND "unit_source_value" = 'mg/dL')
          )
          UNION ALL
          (
            SELECT
              "measurement_id",
              "person_id",
              "measurement_concept_id",
              "measurement_date",
              "measurement_datetime",
              "measurement_time",
              "measurement_type_concept_id",
              "operator_concept_id",
              "value_as_number" / 88.42 AS "value_as_number",
              "value_as_concept_id",
              "unit_concept_id",
              "range_low",
              "range_high",
              "provider_id",
              "visit_occurrence_id",
              "visit_detail_id",
              "measurement_source_value",
              "measurement_source_concept_id",
              'mg/dL' AS "unit_source_value",
              "unit_source_concept_id",
              "value_source_value",
              "measurement_event_id",
              "meas_event_field_concept_id"
            FROM "cdm_new_york3"."measurement"
            WHERE ("measurement_concept_id" IN (3051825.0, 3016723.0) AND "unit_source_value" = 'µmol/L')
          )
        ) "q01"
      )
      UNION ALL
      (
        SELECT
          666684 AS "name",
          "person_id" AS "pid",
          "measurement_datetime" AS "ts",
          "measurement_datetime",
          "value_as_number"
        FROM "cdm_new_york3"."measurement"
        WHERE ("measurement_concept_id" = 46236952.0)
      )
    ) "q02"
  ) "q03"
) "q04"

See the raw data that went into calculate_formula()

Sometimes the best way to check a result is to manually, painstakingly go over the data. Below I print all rows from the record sources for the patient that was plotted above.

Serum creatinine records

scr_records |>
  filter(person_id == random_patient) |>
  collect() |>
  arrange(measurement_datetime) |>
  select(measurement_id, person_id, measurement_datetime, value_as_number, unit_source_value) |>
  kable()
measurement_id person_id measurement_datetime value_as_number unit_source_value
291341 729 2003-06-08 1.3 mg/dL
292202 729 2004-06-13 1.4 mg/dL
292012 729 2005-06-19 1.3 mg/dL
292605 729 2006-04-30 1.5 mg/dL
292619 729 2006-04-30 2.1 mg/dL
291412 729 2006-06-25 1.6 mg/dL
291427 729 2006-06-25 2.1 mg/dL
292230 729 2007-07-01 2.1 mg/dL
291480 729 2008-07-06 1.9 mg/dL
292272 729 2008-08-17 1.9 mg/dL
292289 729 2009-07-12 1.9 mg/dL
292662 729 2010-03-14 3.5 mg/dL
291377 729 2010-07-18 2.8 mg/dL
292542 729 2011-07-24 2.2 mg/dL
291722 729 2012-04-01 2.5 mg/dL
292160 729 2012-07-29 4.0 mg/dL
291992 729 2012-08-12 3.2 mg/dL
292444 729 2013-08-04 3.1 mg/dL
292643 729 2013-11-24 3.4 mg/dL
292328 729 2014-05-18 3.0 mg/dL
291836 729 2014-08-10 2.4 mg/dL
291907 729 2015-08-16 2.6 mg/dL
291605 729 2015-09-13 2.0 mg/dL
292345 729 2016-08-21 2.9 mg/dL
292477 729 2016-10-09 2.9 mg/dL
291877 729 2016-11-06 2.4 mg/dL
292389 729 2017-08-27 2.3 mg/dL
291943 729 2017-12-03 6.9 mg/dL
292679 729 2018-01-28 4.3 mg/dL
291918 729 2018-07-29 4.8 mg/dL
292568 729 2018-08-26 5.1 mg/dL
292502 729 2018-09-09 4.1 mg/dL
291636 729 2018-09-30 3.9 mg/dL
291495 729 2018-12-23 5.2 mg/dL
291822 729 2019-01-27 4.4 mg/dL
292140 729 2019-07-21 5.2 mg/dL
292096 729 2019-09-15 5.8 mg/dL
291687 729 2019-12-22 6.9 mg/dL
291588 729 2020-02-16 4.2 mg/dL
291801 729 2020-06-21 4.9 mg/dL
291732 729 2020-09-20 4.9 mg/dL
291447 729 2020-11-15 6.1 mg/dL
291516 729 2021-03-14 4.9 mg/dL
291559 729 2021-04-11 6.8 mg/dL
292372 729 2021-07-11 4.3 mg/dL
292419 729 2021-09-26 4.1 mg/dL
292031 729 2021-10-10 4.8 mg/dL
291759 729 2022-01-16 2.9 mg/dL
291779 729 2022-01-16 4.4 mg/dL
291533 729 2022-02-06 4.6 mg/dL
292521 729 2022-07-10 4.6 mg/dL
291964 729 2022-10-02 5.3 mg/dL

Glomerular filtration rate records

gfr_records |>
  filter(person_id == random_patient) |>
  collect() |>
  arrange(measurement_datetime) |>
  select(measurement_id, person_id, measurement_datetime, value_as_number, unit_source_value) |>
  kable()
measurement_id person_id measurement_datetime value_as_number unit_source_value
291362 729 2003-06-08 128.3 mL/min/{1.73_m2}
292187 729 2004-06-13 93.7 mL/min/{1.73_m2}
292026 729 2005-06-19 159.2 mL/min/{1.73_m2}
292599 729 2006-04-30 121.2 mL/min
292608 729 2006-04-30 63.1 mL/min/{1.73_m2}
291406 729 2006-06-25 146.8 mL/min
291416 729 2006-06-25 63.4 mL/min/{1.73_m2}
292216 729 2007-07-01 62.6 mL/min/{1.73_m2}
291482 729 2008-07-06 70.6 mL/min/{1.73_m2}
292271 729 2008-08-17 70.7 mL/min/{1.73_m2}
292304 729 2009-07-12 70.0 mL/min/{1.73_m2}
292665 729 2010-03-14 38.0 mL/min/{1.73_m2}
291366 729 2010-07-18 47.2 mL/min/{1.73_m2}
292546 729 2011-07-24 58.8 mL/min/{1.73_m2}
291702 729 2012-04-01 47.1 mL/min/{1.73_m2}
292166 729 2012-07-29 30.0 mL/min/{1.73_m2}
292003 729 2012-08-12 36.8 mL/min/{1.73_m2}
292435 729 2013-08-04 38.7 mL/min/{1.73_m2}
292634 729 2013-11-24 34.9 mL/min/{1.73_m2}
292317 729 2014-05-18 38.4 mL/min/{1.73_m2}
291847 729 2014-08-10 49.7 mL/min/{1.73_m2}
291893 729 2015-08-16 44.3 mL/min/{1.73_m2}
291620 729 2015-09-13 57.5 mL/min/{1.73_m2}
292360 729 2016-08-21 39.0 mL/min/{1.73_m2}
292488 729 2016-10-09 40.1 mL/min/{1.73_m2}
291881 729 2016-11-06 47.8 mL/min/{1.73_m2}
292393 729 2017-08-27 48.6 mL/min/{1.73_m2}
291949 729 2017-12-03 16.5 mL/min/{1.73_m2}
292690 729 2018-01-28 26.2 mL/min/{1.73_m2}
291929 729 2018-07-29 23.4 mL/min/{1.73_m2}
292578 729 2018-08-26 22.2 mL/min/{1.73_m2}
292509 729 2018-09-09 27.2 mL/min/{1.73_m2}
291632 729 2018-09-30 28.7 mL/min/{1.73_m2}
291485 729 2018-12-23 21.7 mL/min/{1.73_m2}
291833 729 2019-01-27 25.0 mL/min/{1.73_m2}
292150 729 2019-07-21 21.3 mL/min/{1.73_m2}
292109 729 2019-09-15 19.2 mL/min/{1.73_m2}
291698 729 2019-12-22 16.0 mL/min/{1.73_m2}
291592 729 2020-02-16 25.8 mL/min/{1.73_m2}
291797 729 2020-06-21 22.3 mL/min/{1.73_m2}
291743 729 2020-09-20 22.6 mL/min/{1.73_m2}
291458 729 2020-11-15 17.9 mL/min/{1.73_m2}
291525 729 2021-03-14 22.1 mL/min/{1.73_m2}
291570 729 2021-04-11 15.9 mL/min/{1.73_m2}
292382 729 2021-07-11 25.0 mL/min/{1.73_m2}
292425 729 2021-09-26 26.6 mL/min/{1.73_m2}
292047 729 2021-10-10 22.6 mL/min/{1.73_m2}
291768 729 2022-01-16 24.3 mL/min/{1.73_m2}
291788 729 2022-01-16 89.6 mL/min
291530 729 2022-02-06 23.4 mL/min/{1.73_m2}
292532 729 2022-07-10 23.5 mL/min/{1.73_m2}
291975 729 2022-10-02 20.1 mL/min/{1.73_m2}

Produce events for Atlas

One approach to use Phea’s results inside Atlas is to produce novel records (e.g. new rows in OBSERVATION or MEASUREMENT) using custom concept IDs, then ETL those back into the dataset.

Below I illustrate how to do it with case A, assuming the new records will go into MEASUREMENT. Notice I put the difference inside value_as_number, just in case.

case_a_custom_concept_id <- 2000000001
# Case A: SCr increase by >=0.3 over 48 hours
data_for_etl <- scr_change |>
  filter(scr_case_a) |> # Keep only rows where case A was TRUE
  transmute(
    measurement_datetime = ts,
    person_id = pid,
    value_as_number = scr_value_as_number - scr_48h_min_value_as_number,
    measurement_concept_id = case_a_custom_concept_id)

head_shot(data_for_etl) |>
  kable()
measurement_datetime person_id value_as_number measurement_concept_id
2022-02-11 1 0.3 2000000001
2022-02-11 1 0.3 2000000001
2022-02-11 1 0.3 2000000001
2022-03-25 1 0.3 2000000001
2022-03-25 1 0.3 2000000001
2016-01-29 3 0.5 2000000001
2016-11-11 3 1.7 2000000001
2018-02-09 3 0.7 2000000001
2022-03-04 3 1.5 2000000001
2006-09-24 19 0.7 2000000001

Author contact

Fabrício Kury – Please be always welcome to reach me at .