Kidney injury lab presentation

This vignette assumes a SQL server at localhost (we use PostgreSQL), with data in OMOP Common Data Model v5.4 format in schema cdm_new_york3. The patient records shown in this example are synthetic data from Synthea(TM) Patient Generator.

In both cases, we will normalize the unit of measurement of serum creatinine to md/dL prior to computing formulas. We could also use formulas for that, and the result would be the same.

Create components

Serum creatinine

We collect SCr records from MEASUREMENT and convert the units to mg/dL.

# Serum creatinine codes used: 
# Loinc 38483-4 Creatinine [Mass/volume] in Blood, OMOP CDM concept ID 3051825
# Loinc 2160-0 Creatinine [Mass/volume] in Serum or Plasma, OMOP CDM concept ID 3016723

# "A" records: Unit is 'mg/dL'.
scr_records_a <- sqlt(measurement) |>
  filter(measurement_concept_id %in% c(3051825, 3016723) &&
    unit_source_value == 'mg/dL')

# "B" records: Unit is 'µmol/L', but we convert to 'mg/dL'.
scr_records_b <- sqlt(measurement) |>
  filter(measurement_concept_id %in% c(3051825, 3016723) &&
    unit_source_value == 'µmol/L') |>
  mutate( # Convert µmol/L to mg/dL
    value_as_number = value_as_number / 88.42,
    unit_source_value = 'mg/dL')

# Combine all available serum creatinine records.
scr_records <- union_all(scr_records_a, scr_records_b)

# Make a record source.
scr_record_source <- make_record_source(scr_records,
  ts = 'measurement_datetime',
  pid = 'person_id')

Glomerular filtration rate

We collect GFR records from MEASUREMENT.

# GFR codes used:
# Loinc 77147-7 Glomerular filtration rate/1.73 sq M.predicted [Volume Rate/Area] in Serum, Plasma or Blood by Creatinine-based formula (MDRD)
# OMOP CDM concept ID 46236952 
gfr_records <- sqlt(measurement) |>
  filter(measurement_concept_id == 46236952)

gfr_record_source <- make_record_source(gfr_records,
    ts = 'measurement_datetime',
    pid = 'person_id')

Calculate the phenotype

Formula scr_case_a contains the logic of case A (difference ≥0.3 mg/dL), scr_case_b contains case B (ratio ≥1.5), and gfr_case_c contains case C (two GFRs under 60 within 3-5 months).

Notice we use 48 days instead of 48 hours, and 7 months instead of 7 days. This is just to make a visually meaningful example out of the limited data that is produced by Synthea(TM). Those time intervals can be changed to “48 hours” and “7 days” to produce the real phenotype.

scr_change <- calculate_formula(
  components = list(
    # Current SCr
    scr = make_component(scr_record_source),
    
    # Minimum value within 48-hour window
    scr_48h_min = make_component(scr_record_source,
      window = "'48 days'::interval",
      fn = list(value_as_number = 'min')),
    
    # Minimum value within 7-day window
    scr_7d_min = make_component(scr_record_source,
      window = "'7 months'::interval",
      fn = list(value_as_number = 'min')),
    
    # Current glomerular filtration rate (GFR)
    gfr = make_component(gfr_record_source),
    
    # Glomerular filtration rate 3 to 5 months older than phenotype date
    gfr_prior = make_component(gfr_record_source,
      delay = "'3 months'::interval",
      window = "'5 months'::interval",
      fn = list(value_as_number = 'min'))),
    
  fml = list(
    scr_case_a = 'scr_value_as_number - scr_48h_min_value_as_number >= 0.3',
    
    scr_case_b = 'scr_value_as_number / scr_7d_min_value_as_number >= 1.5',
    
    gfr_case_c = 'gfr_value_as_number < 60 AND gfr_prior_value_as_number < 60'),
  
  export = c(
    'scr_measurement_datetime',
    'scr_48h_min_measurement_datetime',
    'scr_7d_min_measurement_datetime',
    'gfr_measurement_datetime',
    'gfr_prior_measurement_datetime'),
  
  cascaded = FALSE, # Because we don't need to use results of prior formulas inside other formulas.
)

Let us take a small peek at 15 rows from the phenotype results.

head_shot(scr_change, 15) |>
  kable()

phea_row_id	pid	ts	window	scr_value_as_number	scr_48h_min_value_as_number	scr_7d_min_value_as_number	gfr_value_as_number	gfr_prior_value_as_number	scr_measurement_datetime	scr_48h_min_measurement_datetime	scr_7d_min_measurement_datetime	gfr_measurement_datetime	gfr_prior_measurement_datetime	scr_case_a	scr_case_b	gfr_case_c
1	1	2014-03-07	00:00:00	1.4	1.4	1.4	NA	NA	2014-03-07	2014-03-07	2014-03-07	NA	NA	FALSE	FALSE	NA
2	1	2014-03-07	00:00:00	1.4	1.4	1.4	124.5	NA	2014-03-07	2014-03-07	2014-03-07	2014-03-07	NA	FALSE	FALSE	FALSE
3	1	2016-03-11	735 days	1.4	1.3	1.3	120.9	NA	2014-03-07	2016-03-11	2016-03-11	2016-03-11	NA	FALSE	FALSE	FALSE
4	1	2016-03-11	00:00:00	1.3	1.3	1.3	120.9	NA	2016-03-11	2016-03-11	2016-03-11	2016-03-11	NA	FALSE	FALSE	FALSE
5	1	2018-03-16	735 days	1.3	1.3	1.3	120.9	NA	2018-03-16	2018-03-16	2018-03-16	2016-03-11	NA	FALSE	FALSE	FALSE
6	1	2018-03-16	00:00:00	1.3	1.3	1.3	151.7	NA	2018-03-16	2018-03-16	2018-03-16	2018-03-16	NA	FALSE	FALSE	FALSE
7	1	2020-03-20	735 days	1.3	1.3	1.3	107.3	NA	2018-03-16	2020-03-20	2020-03-20	2020-03-20	NA	FALSE	FALSE	FALSE
8	1	2020-03-20	00:00:00	1.3	1.3	1.3	107.3	NA	2020-03-20	2020-03-20	2020-03-20	2020-03-20	NA	FALSE	FALSE	FALSE
9	1	2022-02-11	693 days	1.5	1.5	1.5	107.3	NA	2022-02-11	2022-02-11	2022-02-11	2020-03-20	NA	FALSE	FALSE	FALSE
10	1	2022-02-11	693 days	1.8	1.5	1.5	107.3	NA	2022-02-11	2022-02-11	2022-02-11	2020-03-20	NA	TRUE	FALSE	FALSE
11	1	2022-02-11	00:00:00	1.8	1.5	1.5	84.8	NA	2022-02-11	2022-02-11	2022-02-11	2022-02-11	NA	TRUE	FALSE	FALSE
12	1	2022-02-11	00:00:00	1.8	1.5	1.5	132.3	NA	2022-02-11	2022-02-11	2022-02-11	2022-02-11	NA	TRUE	FALSE	FALSE
13	1	2022-03-25	42 days	1.8	1.5	1.5	84.1	NA	2022-02-11	2022-02-11	2022-02-11	2022-03-25	NA	TRUE	FALSE	FALSE
14	1	2022-03-25	42 days	1.8	1.5	1.5	105.7	NA	2022-02-11	2022-02-11	2022-02-11	2022-03-25	NA	TRUE	FALSE	FALSE
15	1	2022-03-25	00:00:00	1.5	1.5	1.5	105.7	NA	2022-03-25	2022-02-11	2022-02-11	2022-03-25	NA	FALSE	FALSE	FALSE

Plot the phenotype for a random patient

For demonstration purposes, let us pick an “interesting” patient. By “interesting” I just mean a patient who at different times was TRUE and FALSE in each of the three criteria. A patient with variability in their timeline.

# Find an "interesting" case to plot: patients with each criteria at times TRUE, at times FALSE
patients <- scr_change |>
  group_by(pid) |>
  summarise(.groups = 'drop',
    n_a = n_distinct(scr_case_a),
    n_b = n_distinct(scr_case_b),
    n_c = n_distinct(gfr_case_c)) |>
  mutate(sort_variable = n_a + n_b + n_c) |>
  select(pid, sort_variable) |>
  collect()

patients <- patients |>
  arrange(desc(sort_variable)) |>
  pull('pid')

patients <- patients[1:20]

random_patient <- sample(patients, 1)

message('Sampled patient: ', random_patient)
#> Sampled patient: 729

Then we plot all data for the chosen patient (pid = 729).

scr_change |>
  select(-ends_with('datetime')) |>
  mutate(
    scr_case_a_or_b = scr_case_a | scr_case_b) |>
  phea_plot(random_patient)
#> Collecting lazy table, done. (turn this message off with `verbose` or `.verbose` in setup_phea())

At the end of this report I include a large table with all the data for this patient, directly from the record sources, for maximum verification.

Obtain the SQL query that computes the phenotype

To see the SQL query underlying the phenotype, use helper function code_shot(), or dbplyr::sql_render(), or the .clip_sql option in calculate_formula().

code_shot(scr_change)

SELECT
  *,
  scr_value_as_number - scr_48h_min_value_as_number >= 0.3 AS "scr_case_a",
  scr_value_as_number / scr_7d_min_value_as_number >= 1.5 AS "scr_case_b",
  gfr_value_as_number < 60 AND gfr_prior_value_as_number < 60 AS "gfr_case_c"
FROM (
  SELECT
    "phea_row_id",
    "pid",
    "ts",
    greatest("scr_ts", "scr_48h_min_ts", "scr_7d_min_ts", "gfr_ts", "gfr_prior_ts") - least("scr_ts", "scr_48h_min_ts", "scr_7d_min_ts", "gfr_ts", "gfr_prior_ts") AS "window",
    "scr_value_as_number",
    "scr_48h_min_value_as_number",
    "scr_7d_min_value_as_number",
    "gfr_value_as_number",
    "gfr_prior_value_as_number",
    "scr_measurement_datetime",
    "scr_48h_min_measurement_datetime",
    "scr_7d_min_measurement_datetime",
    "gfr_measurement_datetime",
    "gfr_prior_measurement_datetime"
  FROM (
    SELECT
      row_number() OVER (ORDER BY "pid", "ts") AS "phea_row_id",
      "pid",
      "ts",
      phea_last_value_ignore_nulls(case when "name" = 236253 then "measurement_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "scr_measurement_datetime",
      phea_last_value_ignore_nulls(case when "name" = 236253 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "scr_value_as_number",
      phea_last_value_ignore_nulls(case when "name" = 236253 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "scr_ts",
      min(case when "name" = 236253 then "measurement_datetime" else null end) over (partition by "pid" order by "ts" range between '48 days'::interval preceding and current row) AS "scr_48h_min_measurement_datetime",
      min(case when "name" = 236253 then "value_as_number" else null end) over (partition by "pid" order by "ts" range between '48 days'::interval preceding and current row) AS "scr_48h_min_value_as_number",
      phea_last_value_ignore_nulls(case when "name" = 236253 then "ts" else null end) over (partition by "pid" order by "ts" range between '48 days'::interval preceding and current row) AS "scr_48h_min_ts",
      min(case when "name" = 236253 then "measurement_datetime" else null end) over (partition by "pid" order by "ts" range between '7 months'::interval preceding and current row) AS "scr_7d_min_measurement_datetime",
      min(case when "name" = 236253 then "value_as_number" else null end) over (partition by "pid" order by "ts" range between '7 months'::interval preceding and current row) AS "scr_7d_min_value_as_number",
      phea_last_value_ignore_nulls(case when "name" = 236253 then "ts" else null end) over (partition by "pid" order by "ts" range between '7 months'::interval preceding and current row) AS "scr_7d_min_ts",
      phea_last_value_ignore_nulls(case when "name" = 666684 then "measurement_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "gfr_measurement_datetime",
      phea_last_value_ignore_nulls(case when "name" = 666684 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "gfr_value_as_number",
      phea_last_value_ignore_nulls(case when "name" = 666684 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "gfr_ts",
      min(case when "name" = 666684 then "measurement_datetime" else null end) over (partition by "pid" order by "ts" range between '5 months'::interval preceding and '3 months'::interval preceding) AS "gfr_prior_measurement_datetime",
      min(case when "name" = 666684 then "value_as_number" else null end) over (partition by "pid" order by "ts" range between '5 months'::interval preceding and '3 months'::interval preceding) AS "gfr_prior_value_as_number",
      phea_last_value_ignore_nulls(case when "name" = 666684 then "ts" else null end) over (partition by "pid" order by "ts" range between '5 months'::interval preceding and '3 months'::interval preceding) AS "gfr_prior_ts"
    FROM (
      (
        SELECT
          236253 AS "name",
          "person_id" AS "pid",
          "measurement_datetime" AS "ts",
          "measurement_datetime",
          "value_as_number"
        FROM (
          (
            SELECT *
            FROM "cdm_new_york3"."measurement"
            WHERE ("measurement_concept_id" IN (3051825.0, 3016723.0) AND "unit_source_value" = 'mg/dL')
          )
          UNION ALL
          (
            SELECT
              "measurement_id",
              "person_id",
              "measurement_concept_id",
              "measurement_date",
              "measurement_datetime",
              "measurement_time",
              "measurement_type_concept_id",
              "operator_concept_id",
              "value_as_number" / 88.42 AS "value_as_number",
              "value_as_concept_id",
              "unit_concept_id",
              "range_low",
              "range_high",
              "provider_id",
              "visit_occurrence_id",
              "visit_detail_id",
              "measurement_source_value",
              "measurement_source_concept_id",
              'mg/dL' AS "unit_source_value",
              "unit_source_concept_id",
              "value_source_value",
              "measurement_event_id",
              "meas_event_field_concept_id"
            FROM "cdm_new_york3"."measurement"
            WHERE ("measurement_concept_id" IN (3051825.0, 3016723.0) AND "unit_source_value" = 'µmol/L')
          )
        ) "q01"
      )
      UNION ALL
      (
        SELECT
          666684 AS "name",
          "person_id" AS "pid",
          "measurement_datetime" AS "ts",
          "measurement_datetime",
          "value_as_number"
        FROM "cdm_new_york3"."measurement"
        WHERE ("measurement_concept_id" = 46236952.0)
      )
    ) "q02"
  ) "q03"
) "q04"

See the raw data that went into `calculate_formula()`

Sometimes the best way to check a result is to manually, painstakingly go over the data. Below I print all rows from the record sources for the patient that was plotted above.

Serum creatinine records

scr_records |>
  filter(person_id == random_patient) |>
  collect() |>
  arrange(measurement_datetime) |>
  select(measurement_id, person_id, measurement_datetime, value_as_number, unit_source_value) |>
  kable()

measurement_id	person_id	measurement_datetime	value_as_number	unit_source_value
291341	729	2003-06-08	1.3	mg/dL
292202	729	2004-06-13	1.4	mg/dL
292012	729	2005-06-19	1.3	mg/dL
292605	729	2006-04-30	1.5	mg/dL
292619	729	2006-04-30	2.1	mg/dL
291412	729	2006-06-25	1.6	mg/dL
291427	729	2006-06-25	2.1	mg/dL
292230	729	2007-07-01	2.1	mg/dL
291480	729	2008-07-06	1.9	mg/dL
292272	729	2008-08-17	1.9	mg/dL
292289	729	2009-07-12	1.9	mg/dL
292662	729	2010-03-14	3.5	mg/dL
291377	729	2010-07-18	2.8	mg/dL
292542	729	2011-07-24	2.2	mg/dL
291722	729	2012-04-01	2.5	mg/dL
292160	729	2012-07-29	4.0	mg/dL
291992	729	2012-08-12	3.2	mg/dL
292444	729	2013-08-04	3.1	mg/dL
292643	729	2013-11-24	3.4	mg/dL
292328	729	2014-05-18	3.0	mg/dL
291836	729	2014-08-10	2.4	mg/dL
291907	729	2015-08-16	2.6	mg/dL
291605	729	2015-09-13	2.0	mg/dL
292345	729	2016-08-21	2.9	mg/dL
292477	729	2016-10-09	2.9	mg/dL
291877	729	2016-11-06	2.4	mg/dL
292389	729	2017-08-27	2.3	mg/dL
291943	729	2017-12-03	6.9	mg/dL
292679	729	2018-01-28	4.3	mg/dL
291918	729	2018-07-29	4.8	mg/dL
292568	729	2018-08-26	5.1	mg/dL
292502	729	2018-09-09	4.1	mg/dL
291636	729	2018-09-30	3.9	mg/dL
291495	729	2018-12-23	5.2	mg/dL
291822	729	2019-01-27	4.4	mg/dL
292140	729	2019-07-21	5.2	mg/dL
292096	729	2019-09-15	5.8	mg/dL
291687	729	2019-12-22	6.9	mg/dL
291588	729	2020-02-16	4.2	mg/dL
291801	729	2020-06-21	4.9	mg/dL
291732	729	2020-09-20	4.9	mg/dL
291447	729	2020-11-15	6.1	mg/dL
291516	729	2021-03-14	4.9	mg/dL
291559	729	2021-04-11	6.8	mg/dL
292372	729	2021-07-11	4.3	mg/dL
292419	729	2021-09-26	4.1	mg/dL
292031	729	2021-10-10	4.8	mg/dL
291759	729	2022-01-16	2.9	mg/dL
291779	729	2022-01-16	4.4	mg/dL
291533	729	2022-02-06	4.6	mg/dL
292521	729	2022-07-10	4.6	mg/dL
291964	729	2022-10-02	5.3	mg/dL

Glomerular filtration rate records

gfr_records |>
  filter(person_id == random_patient) |>
  collect() |>
  arrange(measurement_datetime) |>
  select(measurement_id, person_id, measurement_datetime, value_as_number, unit_source_value) |>
  kable()

measurement_id	person_id	measurement_datetime	value_as_number	unit_source_value
291362	729	2003-06-08	128.3	mL/min/{1.73_m2}
292187	729	2004-06-13	93.7	mL/min/{1.73_m2}
292026	729	2005-06-19	159.2	mL/min/{1.73_m2}
292599	729	2006-04-30	121.2	mL/min
292608	729	2006-04-30	63.1	mL/min/{1.73_m2}
291406	729	2006-06-25	146.8	mL/min
291416	729	2006-06-25	63.4	mL/min/{1.73_m2}
292216	729	2007-07-01	62.6	mL/min/{1.73_m2}
291482	729	2008-07-06	70.6	mL/min/{1.73_m2}
292271	729	2008-08-17	70.7	mL/min/{1.73_m2}
292304	729	2009-07-12	70.0	mL/min/{1.73_m2}
292665	729	2010-03-14	38.0	mL/min/{1.73_m2}
291366	729	2010-07-18	47.2	mL/min/{1.73_m2}
292546	729	2011-07-24	58.8	mL/min/{1.73_m2}
291702	729	2012-04-01	47.1	mL/min/{1.73_m2}
292166	729	2012-07-29	30.0	mL/min/{1.73_m2}
292003	729	2012-08-12	36.8	mL/min/{1.73_m2}
292435	729	2013-08-04	38.7	mL/min/{1.73_m2}
292634	729	2013-11-24	34.9	mL/min/{1.73_m2}
292317	729	2014-05-18	38.4	mL/min/{1.73_m2}
291847	729	2014-08-10	49.7	mL/min/{1.73_m2}
291893	729	2015-08-16	44.3	mL/min/{1.73_m2}
291620	729	2015-09-13	57.5	mL/min/{1.73_m2}
292360	729	2016-08-21	39.0	mL/min/{1.73_m2}
292488	729	2016-10-09	40.1	mL/min/{1.73_m2}
291881	729	2016-11-06	47.8	mL/min/{1.73_m2}
292393	729	2017-08-27	48.6	mL/min/{1.73_m2}
291949	729	2017-12-03	16.5	mL/min/{1.73_m2}
292690	729	2018-01-28	26.2	mL/min/{1.73_m2}
291929	729	2018-07-29	23.4	mL/min/{1.73_m2}
292578	729	2018-08-26	22.2	mL/min/{1.73_m2}
292509	729	2018-09-09	27.2	mL/min/{1.73_m2}
291632	729	2018-09-30	28.7	mL/min/{1.73_m2}
291485	729	2018-12-23	21.7	mL/min/{1.73_m2}
291833	729	2019-01-27	25.0	mL/min/{1.73_m2}
292150	729	2019-07-21	21.3	mL/min/{1.73_m2}
292109	729	2019-09-15	19.2	mL/min/{1.73_m2}
291698	729	2019-12-22	16.0	mL/min/{1.73_m2}
291592	729	2020-02-16	25.8	mL/min/{1.73_m2}
291797	729	2020-06-21	22.3	mL/min/{1.73_m2}
291743	729	2020-09-20	22.6	mL/min/{1.73_m2}
291458	729	2020-11-15	17.9	mL/min/{1.73_m2}
291525	729	2021-03-14	22.1	mL/min/{1.73_m2}
291570	729	2021-04-11	15.9	mL/min/{1.73_m2}
292382	729	2021-07-11	25.0	mL/min/{1.73_m2}
292425	729	2021-09-26	26.6	mL/min/{1.73_m2}
292047	729	2021-10-10	22.6	mL/min/{1.73_m2}
291768	729	2022-01-16	24.3	mL/min/{1.73_m2}
291788	729	2022-01-16	89.6	mL/min
291530	729	2022-02-06	23.4	mL/min/{1.73_m2}
292532	729	2022-07-10	23.5	mL/min/{1.73_m2}
291975	729	2022-10-02	20.1	mL/min/{1.73_m2}

Produce events for Atlas

One approach to use Phea’s results inside Atlas is to produce novel records (e.g. new rows in OBSERVATION or MEASUREMENT) using custom concept IDs, then ETL those back into the dataset.

Below I illustrate how to do it with case A, assuming the new records will go into MEASUREMENT. Notice I put the difference inside value_as_number, just in case.

case_a_custom_concept_id <- 2000000001
# Case A: SCr increase by >=0.3 over 48 hours
data_for_etl <- scr_change |>
  filter(scr_case_a) |> # Keep only rows where case A was TRUE
  transmute(
    measurement_datetime = ts,
    person_id = pid,
    value_as_number = scr_value_as_number - scr_48h_min_value_as_number,
    measurement_concept_id = case_a_custom_concept_id)

head_shot(data_for_etl) |>
  kable()

measurement_datetime	person_id	value_as_number	measurement_concept_id
2022-02-11	1	0.3	2000000001
2022-02-11	1	0.3	2000000001
2022-02-11	1	0.3	2000000001
2022-03-25	1	0.3	2000000001
2022-03-25	1	0.3	2000000001
2016-01-29	3	0.5	2000000001
2016-11-11	3	1.7	2000000001
2018-02-09	3	0.7	2000000001
2022-03-04	3	1.5	2000000001
2006-09-24	19	0.7	2000000001

Author contact

Fabrício Kury – Please be always welcome to reach me at fab@kury.dev.