Calculating the ASCVD Risk Estimator Plus score

This vignette assumes a SQL server at localhost (we use PostgreSQL), with data in OMOP Common Data Model v5.4 format in schema cdm_new_york3. The patient records shown in this example are synthetic data from Synthea^TM Patient Generator.

The Atherosclerotic Cardiovascular Disease (ASCVD) Events Risk Estimator+ (ASCVD Risk Estimator Plus, or ASCVD+ for short) is a score developed by the American College of Cardiology/American Heart Association Task Force on Practice Guidelines for estimating the 10-year risk of said events. It was published in Nov. 12, 2013 on the journal Circulation. See https://doi.org/10.1161/01.cir.0000437741.48606.98.

To get a sense of what data goes into the score, see the online calculator provided by the American College of Cardiology:

The mathematical formula for the score is a bit convoluted. I discuss the formula further down in this vignette.

To compute this phenotype, we first prepare the components, then the formula parts, then we put it all together in a call to calculate_formula().

Components

Systolic Blood Pressure (mm Hg)

Loinc 8480-6 Systolic blood pressure
OMOP CDM concept ID 3004249

sbp <- sqlt(measurement) |>
  filter(measurement_concept_id == 3004249) |>
  make_component(
    .pid = person_id,
    .ts = measurement_datetime)

Diastolic Blood Pressure (mm Hg)

Loinc 8462-4 Diastolic blood pressure
OMOP CDM concept ID 3012888

dbp <- sqlt(measurement) |>
  filter(measurement_concept_id == 3012888) |>
  make_component(
    .pid = person_id,
    .ts = measurement_datetime)

Total Cholesterol (mg/dL)

Loinc 2093-3 Cholesterol [Mass/volume] in Serum or Plasma
OMOP CDM concept ID 3027114

tc <- sqlt(measurement) |>
  filter(measurement_concept_id == 3027114) |>
  make_component(
    .pid = person_id,
    .ts = measurement_datetime)

HDL Cholesterol (mg/dL)

Loinc 2085-9 Cholesterol in HDL [Mass/volume] in Serum or Plasma
OMOP CDM concept ID 3007070

hdl <- sqlt(measurement) |>
  filter(measurement_concept_id == 3007070) |>
  make_component(
    .pid = person_id,
    .ts = measurement_datetime)

LDL Cholesterol (mg/dL)

[Not required]
Loinc 18262-6 Cholesterol in LDL [Mass/volume] in Serum or Plasma by Direct assay
OMOP CDM concept ID 3009966

ldl <- sqlt(measurement) |>
  filter(measurement_concept_id == 3009966) |>
  make_component(
    .pid = person_id,
    .ts = measurement_datetime)

History of Diabetes?

SNOMED 44054006 Type 2 diabetes mellitus
OMOP CDM concept ID 201826

diabetes <- sqlt(condition_occurrence) |>
  filter(condition_concept_id == 201826) |>
  make_component(
    .pid = person_id,
    .ts = condition_start_datetime)

Smoker?

SNOMED 449868002 Smokes tobacco daily
OMOP CDM concept ID 42709996

smoker <- sqlt(observation) |>
  filter(observation_concept_id %in% c(42709996)) |>
  make_component(
    .pid = person_id,
    .ts = observation_datetime)

On Hypertension Treatment?

ATC codes:

C02 - Antihypertensives
C03 - Diuretics
C07 - Beta blocking agents
C08 - Calcium channel blockers and
C09 - Agents acting on the renin-angiotensin system.

atc_concepts <- sqlt(concept) |>
  filter(vocabulary_id == 'ATC' &&
      concept_code %in% c('C02', 'C03', 'C07', 'C08', 'C09')) |>
  select(concept_id)

drug_concepts <- sqlt(concept_ancestor) |>
  inner_join(atc_concepts, by = c('ancestor_concept_id' = 'concept_id')) |>
  select(descendant_concept_id)

htt <- sqlt(drug_exposure) |>
  inner_join(drug_concepts, by = c('drug_concept_id' = 'descendant_concept_id')) |>
  make_component(
    .pid = person_id,
    .ts = drug_exposure_start_datetime)

On a Statin?

[Not required]

On Aspirin Therapy?

[Not required]

Patient age

person <- sqlt(person) |>
  make_component(
    .pid = person_id,
    .ts = birth_datetime)

Formula

The ASCVD Risk Estimator+ formula can be a bit confusing at first, because of a misprint in the paper. Here is how the paper defined it: There are three main parts here:

The individual sum (“IndX’B”)
The survival baseline (“S₁₀”)
The group mean (“_Mean_X’B”)

The truth is, the formula written over there is wrong. Credits to the authors, nevertheless, for including in the text careful demonstrations of the use of the formula, including a walk through each calculation. By going through the numbers and examples in the paper, and calculating by hand, one can realize that the ASCVD+ formula actually is:

estimated_risk = (1-b) * exp(a-c)

We prepare each part a, b and c separately, then assemble them together into a phentype using calculate_formula().

The individual sum (a) comes from multiplying the values of the components by their coefficients as published in Table A. of Appendix 7:

Notice that the coefficients have 4 variations according to whether the patient is:

female yes/no
black yes/no

We use CASE WHEN ... to construct the formula according to these 4 groups.

Some coefficients are N/A for some groups, meaning the formula loses that component. For example, the term ln(age)*ln(total cholesterol) is not used for Blacks.

The paper also is clear that, in regards to race, the question really is of type Boolean yes/no. As consequence, Asians and Latinos are computed using the coefficients of Whites.

Individual sum

Let us first compute the individual sum using the coeficients according to each of the 4 groups.

# WHITE WOMEN
white_women_sum <- paste(sep = ' + ',
  # Ln Age (y) –29.799
  '-29.799 * ln(age)',
  
  # Ln Age, Squared 4.884
  '4.884 * ln(age)^2',
  
  # Ln Total Cholesterol (mg/dL) 13.540
  '13.540 * ln(tc_value_as_number)',

  # Ln Age × Ln Total Cholesterol –3.114
  '-3.114 * ln(age) * ln(tc_value_as_number)',
  
  # Ln HDL-C (mg/dL) –13.578
  '-13.578 * ln(hdl_value_as_number)',
  
  # Ln Age × Ln HDL-C 3.149
  '3.149 * ln(age) * ln(hdl_value_as_number)',
  
  # Ln Treated Systolic BP (mm Hg) 2.019
  'case when htt is true then
    2.019 * ln(sbp_value_as_number) else 0 end',
  
  ### # Ln Age × Ln Treated Systolic BP
  # N/A
  
  # Ln Untreated Systolic BP (mm Hg) 1.957
  'case when htt is false then
    1.957 * ln(sbp_value_as_number) else 0 end',
  
  ### # Ln Age × Ln Untreated Systolic BP
  # N/A
  
  # Current Smoker (1=Yes, 0=No) 7.574
  '7.574 * case when is_smoker then 1 else 0 end',
  
  # Ln Age × Current Smoker –1.665
  '-1.665 * ln(age) * case when is_smoker then 1 else 0 end',
 
  # Diabetes (1=Yes, 0=No) 0.661 
  '0.661 * case when has_diabetes then 1 else 0 end'
)

# BLACK WOMEN
black_women_sum <- paste(sep = ' + ',
  # Ln Age (y) 17.114
  '17.114 * ln(age)',
  
  # Ln Age, Squared
  # N/A
  
  # Ln Total Cholesterol (mg/dL) 0.940
  '0.940 * ln(tc_value_as_number)',
  
  # Ln Age × Ln Total Cholesterol
  # N/A
  
  # Ln HDL-C (mg/dL) –18.920
  '-18.920 * ln(hdl_value_as_number)',
  
  # Ln Age × Ln HDL-C 4.475
  '4.475 * ln(age) * ln(hdl_value_as_number)',
  
  # Ln Treated Systolic BP (mm Hg) 29.291
  'case when htt is true then
    29.291 * ln(sbp_value_as_number) else 0 end',
  
  # Ln Age × Ln Treated Systolic BP –6.432
  'case when htt is true then
    -6.432 * ln(age) * ln(sbp_value_as_number) else 0 end',
  
  # Ln Untreated Systolic BP (mm Hg) 27.820
  'case when htt is false then
    27.820 * ln(sbp_value_as_number) else 0 end',
  
  # Ln Age × Ln Untreated Systolic BP –6.087
  'case when htt is false then
    -6.087 * ln(age) * ln(sbp_value_as_number) else 0 end',
  
  # Current Smoker (1=Yes, 0=No) 0.691
  '0.691 * case when is_smoker then 1 else 0 end',
  
  # Ln Age × Current Smoker
  # N/A
  
  # Diabetes (1=Yes, 0=No) 0.874
  '0.874 * case when has_diabetes then 1 else 0 end'
)

# WHITE MEN
white_men_sum <- paste(sep = ' + ',
  # Ln Age (y) 12.344
  '12.344 * ln(age)',
  
  # Ln Total Cholesterol (mg/dL) 11.853
  '11.853 * ln(tc_value_as_number)',
  
  # Ln Age × Ln Total Cholesterol –2.664
  '-2.664 * ln(age) * ln(tc_value_as_number)',
  
  # Ln HDL-C (mg/dL) –7.990
  '-7.990 * ln(hdl_value_as_number)',
  
  # Ln Age × Ln HDL-C 1.769
  '1.769 * ln(age) * ln(hdl_value_as_number)',
  
  # Ln Treated Systolic BP (mm Hg) 1.797
  'case when htt is true then 
    1.797 * ln(sbp_value_as_number) else 0 end',
  
  # Ln Untreated Systolic BP (mm Hg) 1.764
  'case when htt is false then
    1.764 * ln(sbp_value_as_number) else 0 end',
  
  # Current Smoker (1=Yes, 0=No) 7.837
  '7.837 * case when is_smoker then 1 else 0 end',
  
  # Ln Age × Current Smoker –1.795
  '-1.795 * ln(age) * case when is_smoker then 1 else 0 end',
  
  # Diabetes (1=Yes, 0=No) 0.658
  '0.658 * case when has_diabetes then 1 else 0 end'
)

# BLACK MEN
black_men_sum <- paste(sep = ' + ',
  # Ln Age (y) 2.469
  '2.469 * ln(age)',
  
  # Ln Total Cholesterol (mg/dL) 0.30
  '0.30 * ln(tc_value_as_number)',
  
  # Ln Age × Ln Total Cholesterol
  # N/A
  
  # Ln HDL-C (mg/dL) –0.307
  '-0.307 * ln(hdl_value_as_number)',
  
  # Ln Age × Ln HDL-C
  # N/A
  
  # Ln Treated Systolic BP (mm Hg) 1.916
  'case when htt is true then
    1.916 * ln(sbp_value_as_number) else 0 end',
  
  # Ln Untreated Systolic BP (mm Hg) 1.809
  'case when htt is false then
    1.809 * ln(sbp_value_as_number) else 0 end',
  
  # Current Smoker (1=Yes, 0=No) 0.549
  '0.549 * case when is_smoker then 1 else 0 end',
  
  # Ln Age × Current Smoker
  # N/A
  
  # Diabetes (1=Yes, 0=No) 0.645
  '0.645 * case when has_diabetes then 1 else 0 end'
)

Formula parts

To facilitate reading and using the code, we assemble parts of the phenotype in separate formulas.

individual_sum <- paste0(
  'case when is_woman then (
    case when is_black then (', black_women_sum, ')
    else (', white_women_sum, ')
    end)
  else (
    case when is_black then (', black_men_sum, ')
    else (', white_men_sum, ')
    end)
  end')

baseline_survival <- 
  'case when is_woman then (
    case when is_black then 0.9533
    else 0.9665
    end)
  else (
    case when is_black then 0.8954
    else 0.9144
    end)
  end'

group_mean <-
  'case when is_woman then (
    case when is_black then 86.61
    else -29.18
    end)
  else (
    case when is_black then 19.54
    else 61.18
    end)
  end'

Calculate formula

Armed with the components and formula parts, we put it all together in calculate_formula().

ascvd = calculate_formula(
  components = list(
    sbp = sbp,
    # dbp = dbp,
    tc = tc,
    hdl = hdl,
    # ldl = ldl,
    diabetes = diabetes,
    smoker = smoker,
    htt = htt,
    person = person),
  export = c(
    'diabetes_condition_concept_id',
    'htt_drug_concept_id'),
  fml = list(
    a = list(
      is_woman = 'person_gender_concept_id = 8532',
      
      is_black = 'person_race_concept_id = 8516',
      
      age = 'extract(year from age(ts, person_birth_datetime))',
      
      is_smoker = 'smoker_observation_datetime is not null',
      
      has_diabetes = 'diabetes_condition_start_datetime is not null',
      
      # Calculate difference as positive number, regardless of which came first
      htt_sbp_dist = "case when htt_drug_exposure_start_datetime < sbp_measurement_datetime
        then sbp_measurement_datetime - htt_drug_exposure_start_datetime
        else htt_drug_exposure_start_datetime - sbp_measurement_datetime end"
    ),
    
    b = list(
      baseline_survival = baseline_survival,
      
      group_mean = group_mean,
      # SBP and HTT within 4 months of each other
      htt = "htt_drug_exposure_start_datetime is not null
        and htt_sbp_dist < '4 months'"
    ),

    individual_sum = individual_sum,
    
    calculated_risk = '(1 - baseline_survival) * exp(individual_sum - group_mean)',
    
    # Cap to 1%-30%, as indicated by the paper
    estimated_risk = 'least(0.3, greatest(0.01, calculated_risk))'
    ),
  
  dont_require = c('smoker', 'htt', 'diabetes'),
  
  out_window = c('person', 'diabetes'))

Results

Let us peek at 20 rows from the phenotype.

ascvd |>
  head_shot(20, blind = TRUE) |>
  kable()

phea_row_id	pid	ts	window	person_gender_concept_id	person_race_concept_id	person_birth_datetime	smoker_observation_datetime	diabetes_condition_start_datetime	htt_drug_exposure_start_datetime	sbp_measurement_datetime	tc_value_as_number	hdl_value_as_number	sbp_value_as_number	diabetes_condition_concept_id	htt_drug_concept_id	is_woman	is_black	age	is_smoker	has_diabetes	htt_sbp_dist	baseline_survival	group_mean	htt	individual_sum	calculated_risk	estimated_risk
4	1	2005-05-13	00:00:00	8507	8527	1974-03-01	NA	NA	NA	2005-05-13	199.0	79.1	121	NA	NA	FALSE	FALSE	31	FALSE	FALSE	NA	0.9144	61.18	FALSE	56.79543	0.0010673	0.0100000
5	1	2006-05-12	364 days	8507	8527	1974-03-01	NA	NA	NA	2006-05-12	199.0	79.1	121	NA	NA	FALSE	FALSE	32	FALSE	FALSE	NA	0.9144	61.18	FALSE	56.98511	0.0012902	0.0100000
6	1	2008-05-16	1099 days	8507	8527	1974-03-01	NA	NA	NA	2008-05-16	199.0	79.1	117	NA	NA	FALSE	FALSE	34	FALSE	FALSE	NA	0.9144	61.18	FALSE	57.28800	0.0017466	0.0100000
7	1	2008-05-16	1099 days	8507	8527	1974-03-01	NA	NA	NA	2008-05-16	182.2	79.1	117	NA	NA	FALSE	FALSE	34	FALSE	FALSE	NA	0.9144	61.18	FALSE	57.07114	0.0014061	0.0100000
8	1	2008-05-16	00:00:00	8507	8527	1974-03-01	NA	NA	NA	2008-05-16	182.2	73.8	117	NA	NA	FALSE	FALSE	34	FALSE	FALSE	NA	0.9144	61.18	FALSE	57.19264	0.0015878	0.0100000
9	1	2011-05-20	1099 days	8507	8527	1974-03-01	NA	NA	NA	2008-05-16	182.2	60.7	117	NA	NA	FALSE	FALSE	37	FALSE	FALSE	NA	0.9144	61.18	FALSE	58.02043	0.0036332	0.0100000
10	1	2011-05-20	1099 days	8507	8527	1974-03-01	NA	2011-05-20	NA	2008-05-16	182.2	60.7	117	201826	NA	FALSE	FALSE	37	FALSE	TRUE	NA	0.9144	61.18	FALSE	58.67843	0.0070154	0.0100000
11	1	2011-05-20	1099 days	8507	8527	1974-03-01	NA	2011-05-20	NA	2008-05-16	183.1	60.7	117	201826	NA	FALSE	FALSE	37	FALSE	TRUE	NA	0.9144	61.18	FALSE	58.68943	0.0070931	0.0100000
12	1	2011-05-20	00:00:00	8507	8527	1974-03-01	NA	2011-05-20	NA	2011-05-20	183.1	60.7	134	201826	NA	FALSE	FALSE	37	FALSE	TRUE	NA	0.9144	61.18	FALSE	58.92875	0.0090109	0.0100000
13	1	2014-03-07	1022 days	8507	8527	1974-03-01	NA	2011-05-20	NA	2014-03-07	183.1	60.7	108	201826	NA	FALSE	FALSE	40	FALSE	TRUE	NA	0.9144	61.18	FALSE	58.99479	0.0096261	0.0100000
14	1	2014-03-07	1022 days	8507	8527	1974-03-01	NA	2011-05-20	NA	2014-03-07	209.9	60.7	108	201826	NA	FALSE	FALSE	40	FALSE	TRUE	NA	0.9144	61.18	FALSE	59.27152	0.0126949	0.0126949
15	1	2014-03-07	00:00:00	8507	8527	1974-03-01	NA	2011-05-20	NA	2014-03-07	209.9	43.5	108	201826	NA	FALSE	FALSE	40	FALSE	TRUE	NA	0.9144	61.18	FALSE	59.75942	0.0206787	0.0206787
16	1	2016-03-11	735 days	8507	8527	1974-03-01	NA	2011-05-20	NA	2014-03-07	209.9	57.3	108	201826	NA	FALSE	FALSE	42	FALSE	TRUE	NA	0.9144	61.18	FALSE	59.61266	0.0178562	0.0178562
17	1	2016-03-11	735 days	8507	8527	1974-03-01	NA	2011-05-20	NA	2014-03-07	224.1	57.3	108	201826	NA	FALSE	FALSE	42	FALSE	TRUE	NA	0.9144	61.18	FALSE	59.73677	0.0202156	0.0202156
18	1	2016-03-11	00:00:00	8507	8527	1974-03-01	NA	2011-05-20	NA	2016-03-11	224.1	57.3	123	201826	NA	FALSE	FALSE	42	FALSE	TRUE	NA	0.9144	61.18	FALSE	59.96618	0.0254284	0.0254284
19	1	2018-03-16	735 days	8507	8527	1974-03-01	NA	2011-05-20	NA	2018-03-16	224.1	57.3	111	201826	NA	FALSE	FALSE	44	FALSE	TRUE	NA	0.9144	61.18	FALSE	60.02178	0.0268821	0.0268821
20	1	2018-03-16	735 days	8507	8527	1974-03-01	NA	2011-05-20	NA	2018-03-16	221.5	57.3	111	201826	NA	FALSE	FALSE	44	FALSE	TRUE	NA	0.9144	61.18	FALSE	60.00110	0.0263320	0.0263320
21	1	2018-03-16	00:00:00	8507	8527	1974-03-01	NA	2011-05-20	NA	2018-03-16	221.5	52.7	111	201826	NA	FALSE	FALSE	44	FALSE	TRUE	NA	0.9144	61.18	FALSE	60.10953	0.0293479	0.0293479
22	1	2020-03-20	735 days	8507	8527	1974-03-01	NA	2011-05-20	NA	2018-03-16	221.5	55.5	111	201826	NA	FALSE	FALSE	46	FALSE	TRUE	NA	0.9144	61.18	FALSE	60.26748	0.0343694	0.0343694
23	1	2020-03-20	735 days	8507	8527	1974-03-01	NA	2011-05-20	NA	2018-03-16	205.0	55.5	111	201826	NA	FALSE	FALSE	46	FALSE	TRUE	NA	0.9144	61.18	FALSE	60.13948	0.0302400	0.0302400

Plots

To make a visually interesting plot, let us pick a patient at random, but from among those that have the most data points.
To do that we use function sqla(), which allows you to run arbitrary SQL on lazy tables from the dbplyr/dplyr interface.

patients_with_most_data <- list(ascvd = ascvd) |>
  sqla('
    select pid, count(*) as rows
    from ascvd
    group by pid
    order by rows desc
    limit 20
    ') |>
  pull('pid')

From the 20 patients_with_most_data, we pick one at random.

random_patient <- sample(patients_with_most_data, 1)

message('Sampled patient: ', random_patient)
#> Sampled patient: 7770

And then we plot the chosen patient.

ascvd |> select(
  # Let's NOT plot the columns below
    -ends_with('datetime'),
    -ends_with('concept_id'),
    -htt_sbp_dist,
    -group_mean,
    -individual_sum,
    -calculated_risk,
    -baseline_survival) |>
  phea_plot(random_patient, verbose = FALSE)

Obtain the SQL query that computes the phenotype

To see the SQL query that computes the phenotype, use helper function code_shot(), or dbplyr::sql_render(), or the .clip_sql option in calculate_formula().

code_shot(ascvd)

SELECT *, least(0.3, greatest(0.01, calculated_risk)) AS "estimated_risk"
FROM (
  SELECT
    *,
    (1 - baseline_survival) * exp(individual_sum - group_mean) AS "calculated_risk"
  FROM (
    SELECT
      *,
      case when is_woman then (
    case when is_black then (17.114 * ln(age) + 0.940 * ln(tc_value_as_number) + -18.920 * ln(hdl_value_as_number) + 4.475 * ln(age) * ln(hdl_value_as_number) + case when htt is true then
    29.291 * ln(sbp_value_as_number) else 0 end + case when htt is true then
    -6.432 * ln(age) * ln(sbp_value_as_number) else 0 end + case when htt is false then
    27.820 * ln(sbp_value_as_number) else 0 end + case when htt is false then
    -6.087 * ln(age) * ln(sbp_value_as_number) else 0 end + 0.691 * case when is_smoker then 1 else 0 end + 0.874 * case when has_diabetes then 1 else 0 end)
    else (-29.799 * ln(age) + 4.884 * ln(age)^2 + 13.540 * ln(tc_value_as_number) + -3.114 * ln(age) * ln(tc_value_as_number) + -13.578 * ln(hdl_value_as_number) + 3.149 * ln(age) * ln(hdl_value_as_number) + case when htt is true then
    2.019 * ln(sbp_value_as_number) else 0 end + case when htt is false then
    1.957 * ln(sbp_value_as_number) else 0 end + 7.574 * case when is_smoker then 1 else 0 end + -1.665 * ln(age) * case when is_smoker then 1 else 0 end + 0.661 * case when has_diabetes then 1 else 0 end)
    end)
  else (
    case when is_black then (2.469 * ln(age) + 0.30 * ln(tc_value_as_number) + -0.307 * ln(hdl_value_as_number) + case when htt is true then
    1.916 * ln(sbp_value_as_number) else 0 end + case when htt is false then
    1.809 * ln(sbp_value_as_number) else 0 end + 0.549 * case when is_smoker then 1 else 0 end + 0.645 * case when has_diabetes then 1 else 0 end)
    else (12.344 * ln(age) + 11.853 * ln(tc_value_as_number) + -2.664 * ln(age) * ln(tc_value_as_number) + -7.990 * ln(hdl_value_as_number) + 1.769 * ln(age) * ln(hdl_value_as_number) + case when htt is true then 
    1.797 * ln(sbp_value_as_number) else 0 end + case when htt is false then
    1.764 * ln(sbp_value_as_number) else 0 end + 7.837 * case when is_smoker then 1 else 0 end + -1.795 * ln(age) * case when is_smoker then 1 else 0 end + 0.658 * case when has_diabetes then 1 else 0 end)
    end)
  end AS "individual_sum"
    FROM (
      SELECT
        *,
        case when is_woman then (
    case when is_black then 0.9533
    else 0.9665
    end)
  else (
    case when is_black then 0.8954
    else 0.9144
    end)
  end AS "baseline_survival",
        case when is_woman then (
    case when is_black then 86.61
    else -29.18
    end)
  else (
    case when is_black then 19.54
    else 61.18
    end)
  end AS "group_mean",
        htt_drug_exposure_start_datetime is not null
        and htt_sbp_dist < '4 months' AS "htt"
      FROM (
        SELECT
          "phea_row_id",
          "pid",
          "ts",
          "window",
          "person_gender_concept_id",
          "person_race_concept_id",
          "person_birth_datetime",
          "smoker_observation_datetime",
          "diabetes_condition_start_datetime",
          "htt_drug_exposure_start_datetime",
          "sbp_measurement_datetime",
          "tc_value_as_number",
          "hdl_value_as_number",
          "sbp_value_as_number",
          "diabetes_condition_concept_id",
          "htt_drug_concept_id",
          person_gender_concept_id = 8532 AS "is_woman",
          person_race_concept_id = 8516 AS "is_black",
          extract(year from age(ts, person_birth_datetime)) AS "age",
          smoker_observation_datetime is not null AS "is_smoker",
          diabetes_condition_start_datetime is not null AS "has_diabetes",
          case when htt_drug_exposure_start_datetime < sbp_measurement_datetime
        then sbp_measurement_datetime - htt_drug_exposure_start_datetime
        else htt_drug_exposure_start_datetime - sbp_measurement_datetime end AS "htt_sbp_dist"
        FROM (
          SELECT
            *,
            greatest("sbp_ts", "tc_ts", "hdl_ts", "smoker_ts", "htt_ts") - least("sbp_ts", "tc_ts", "hdl_ts", "smoker_ts", "htt_ts") AS "window"
          FROM (
            SELECT
              row_number() OVER (ORDER BY "pid", "ts") AS "phea_row_id",
              "pid",
              "ts",
              phea_last_value_ignore_nulls(case when "name" = 932754 then "measurement_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "sbp_measurement_datetime",
              phea_last_value_ignore_nulls(case when "name" = 932754 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "sbp_value_as_number",
              phea_last_value_ignore_nulls(case when "name" = 932754 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "sbp_ts",
              phea_last_value_ignore_nulls(case when "name" = 727777 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "tc_value_as_number",
              phea_last_value_ignore_nulls(case when "name" = 727777 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "tc_ts",
              phea_last_value_ignore_nulls(case when "name" = 577443 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "hdl_value_as_number",
              phea_last_value_ignore_nulls(case when "name" = 577443 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "hdl_ts",
              phea_last_value_ignore_nulls(case when "name" = 790069 then "condition_concept_id" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "diabetes_condition_concept_id",
              phea_last_value_ignore_nulls(case when "name" = 790069 then "condition_start_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "diabetes_condition_start_datetime",
              phea_last_value_ignore_nulls(case when "name" = 790069 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "diabetes_ts",
              phea_last_value_ignore_nulls(case when "name" = 79017 then "observation_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "smoker_observation_datetime",
              phea_last_value_ignore_nulls(case when "name" = 79017 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "smoker_ts",
              phea_last_value_ignore_nulls(case when "name" = 731432 then "drug_concept_id" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "htt_drug_concept_id",
              phea_last_value_ignore_nulls(case when "name" = 731432 then "drug_exposure_start_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "htt_drug_exposure_start_datetime",
              phea_last_value_ignore_nulls(case when "name" = 731432 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "htt_ts",
              phea_last_value_ignore_nulls(case when "name" = 308404 then "gender_concept_id" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "person_gender_concept_id",
              phea_last_value_ignore_nulls(case when "name" = 308404 then "birth_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "person_birth_datetime",
              phea_last_value_ignore_nulls(case when "name" = 308404 then "race_concept_id" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "person_race_concept_id",
              phea_last_value_ignore_nulls(case when "name" = 308404 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "person_ts"
            FROM (
              (
                SELECT
                  *,
                  NULL AS "gender_concept_id",
                  NULL AS "birth_datetime",
                  NULL AS "race_concept_id"
                FROM (
                  (
                    SELECT
                      *,
                      NULL AS "drug_concept_id",
                      NULL AS "drug_exposure_start_datetime"
                    FROM (
                      (
                        SELECT *, NULL AS "observation_datetime"
                        FROM (
                          (
                            SELECT
                              *,
                              NULL AS "condition_concept_id",
                              NULL AS "condition_start_datetime"
                            FROM (
                              (
                                (
                                  SELECT
                                    932754 AS "name",
                                    "person_id" AS "pid",
                                    "measurement_datetime" AS "ts",
                                    "measurement_datetime",
                                    "value_as_number"
                                  FROM "cdm_new_york3"."measurement"
                                  WHERE ("measurement_concept_id" = 3004249.0)
                                )
                                UNION ALL
                                (
                                  SELECT
                                    "name",
                                    "pid",
                                    "ts",
                                    NULL AS "measurement_datetime",
                                    "value_as_number"
                                  FROM (
                                    SELECT
                                      727777 AS "name",
                                      "person_id" AS "pid",
                                      "measurement_datetime" AS "ts",
                                      "value_as_number"
                                    FROM "cdm_new_york3"."measurement"
                                    WHERE ("measurement_concept_id" = 3027114.0)
                                  ) "q01"
                                )
                              )
                              UNION ALL
                              (
                                SELECT
                                  "name",
                                  "pid",
                                  "ts",
                                  NULL AS "measurement_datetime",
                                  "value_as_number"
                                FROM (
                                  SELECT
                                    577443 AS "name",
                                    "person_id" AS "pid",
                                    "measurement_datetime" AS "ts",
                                    "value_as_number"
                                  FROM "cdm_new_york3"."measurement"
                                  WHERE ("measurement_concept_id" = 3007070.0)
                                ) "q02"
                              )
                            ) "q03"
                          )
                          UNION ALL
                          (
                            SELECT
                              "name",
                              "pid",
                              "ts",
                              NULL AS "measurement_datetime",
                              NULL AS "value_as_number",
                              "condition_concept_id",
                              "condition_start_datetime"
                            FROM (
                              SELECT
                                790069 AS "name",
                                "person_id" AS "pid",
                                "condition_start_datetime" AS "ts",
                                "condition_concept_id",
                                "condition_start_datetime"
                              FROM "cdm_new_york3"."condition_occurrence"
                              WHERE ("condition_concept_id" = 201826.0)
                            ) "q04"
                          )
                        ) "q05"
                      )
                      UNION ALL
                      (
                        SELECT
                          "name",
                          "pid",
                          "ts",
                          NULL AS "measurement_datetime",
                          NULL AS "value_as_number",
                          NULL AS "condition_concept_id",
                          NULL AS "condition_start_datetime",
                          "observation_datetime"
                        FROM (
                          SELECT
                            79017 AS "name",
                            "person_id" AS "pid",
                            "observation_datetime" AS "ts",
                            "observation_datetime"
                          FROM "cdm_new_york3"."observation"
                          WHERE ("observation_concept_id" IN (42709996.0))
                        ) "q06"
                      )
                    ) "q07"
                  )
                  UNION ALL
                  (
                    SELECT
                      "name",
                      "pid",
                      "ts",
                      NULL AS "measurement_datetime",
                      NULL AS "value_as_number",
                      NULL AS "condition_concept_id",
                      NULL AS "condition_start_datetime",
                      NULL AS "observation_datetime",
                      "drug_concept_id",
                      "drug_exposure_start_datetime"
                    FROM (
                      SELECT
                        731432 AS "name",
                        "person_id" AS "pid",
                        "drug_exposure_start_datetime" AS "ts",
                        "drug_concept_id",
                        "drug_exposure_start_datetime"
                      FROM (
                        SELECT
                          "drug_exposure_id",
                          "person_id",
                          "drug_concept_id",
                          "drug_exposure_start_date",
                          "drug_exposure_start_datetime",
                          "drug_exposure_end_date",
                          "drug_exposure_end_datetime",
                          "verbatim_end_date",
                          "drug_type_concept_id",
                          "stop_reason",
                          "refills",
                          "quantity",
                          "days_supply",
                          "sig",
                          "route_concept_id",
                          "lot_number",
                          "provider_id",
                          "visit_occurrence_id",
                          "visit_detail_id",
                          "drug_source_value",
                          "drug_source_concept_id",
                          "route_source_value",
                          "dose_unit_source_value"
                        FROM "cdm_new_york3"."drug_exposure" AS "LHS"
                        INNER JOIN (
                          SELECT "descendant_concept_id"
                          FROM (
                            SELECT
                              "ancestor_concept_id",
                              "descendant_concept_id",
                              "min_levels_of_separation",
                              "max_levels_of_separation"
                            FROM "cdm_new_york3"."concept_ancestor" AS "LHS"
                            INNER JOIN (
                              SELECT "concept_id"
                              FROM "cdm_new_york3"."concept"
                              WHERE ("vocabulary_id" = 'ATC' AND "concept_code" IN ('C02', 'C03', 'C07', 'C08', 'C09'))
                            ) "RHS"
                              ON ("LHS"."ancestor_concept_id" = "RHS"."concept_id")
                          ) "q08"
                        ) "RHS"
                          ON ("LHS"."drug_concept_id" = "RHS"."descendant_concept_id")
                      ) "q09"
                    ) "q10"
                  )
                ) "q11"
              )
              UNION ALL
              (
                SELECT
                  "name",
                  "pid",
                  "ts",
                  NULL AS "measurement_datetime",
                  NULL AS "value_as_number",
                  NULL AS "condition_concept_id",
                  NULL AS "condition_start_datetime",
                  NULL AS "observation_datetime",
                  NULL AS "drug_concept_id",
                  NULL AS "drug_exposure_start_datetime",
                  "gender_concept_id",
                  "birth_datetime",
                  "race_concept_id"
                FROM (
                  SELECT
                    308404 AS "name",
                    "person_id" AS "pid",
                    "birth_datetime" AS "ts",
                    "gender_concept_id",
                    "birth_datetime",
                    "race_concept_id"
                  FROM "cdm_new_york3"."person"
                ) "q12"
              )
            ) "q13"
          ) "q14"
        ) "q15"
        WHERE ("sbp_ts" is not null and "tc_ts" is not null and "hdl_ts" is not null and "person_ts" is not null)
      ) "q16"
    ) "q17"
  ) "q18"
) "q19"

Author contact

Fabrício Kury
2023/Jan/16
Be always welcome to reach me at fab@kury.dev.