This vignette assumes a SQL server at localhost
(we
use PostgreSQL), with data in OMOP Common Data Model v5.4 format in
schema cdm_new_york3
. The patient records shown in this
example are synthetic data from SyntheaTM
Patient Generator.
library(phea)
suppressPackageStartupMessages(library(dplyr))
# Connect to SQL server.
<- DBI::dbConnect(RPostgres::Postgres(),
dbcon host = fabcred$pg$host, port = fabcred$pg$port, dbname = fabcred$pg$database,
user = fabcred$pg$user, password = fabcred$pg$pass)
# Setup Phea
setup_phea(dbcon, 'cdm_new_york3')
The Atherosclerotic Cardiovascular Disease (ASCVD) Events Risk Estimator+ (ASCVD Risk Estimator Plus, or ASCVD+ for short) is a score developed by the American College of Cardiology/American Heart Association Task Force on Practice Guidelines for estimating the 10-year risk of said events. It was published in Nov. 12, 2013 on the journal Circulation. See https://doi.org/10.1161/01.cir.0000437741.48606.98.
To get a sense of what data goes into the score, see the online
calculator provided by the American College of Cardiology:
The mathematical formula for the score is a bit convoluted. I discuss the formula further down in this vignette.
To compute this phenotype, we first prepare the components, then the
formula parts, then we put it all together in a call to
calculate_formula()
.
Loinc 8480-6 Systolic blood pressure
OMOP CDM concept ID 3004249
<- sqlt(measurement) |>
sbp filter(measurement_concept_id == 3004249) |>
make_component(
.pid = person_id,
.ts = measurement_datetime)
Loinc 8462-4 Diastolic blood pressure
OMOP CDM concept ID 3012888
<- sqlt(measurement) |>
dbp filter(measurement_concept_id == 3012888) |>
make_component(
.pid = person_id,
.ts = measurement_datetime)
Loinc 2093-3 Cholesterol [Mass/volume] in Serum or Plasma
OMOP CDM concept ID 3027114
<- sqlt(measurement) |>
tc filter(measurement_concept_id == 3027114) |>
make_component(
.pid = person_id,
.ts = measurement_datetime)
Loinc 2085-9 Cholesterol in HDL [Mass/volume] in Serum or
Plasma
OMOP CDM concept ID 3007070
<- sqlt(measurement) |>
hdl filter(measurement_concept_id == 3007070) |>
make_component(
.pid = person_id,
.ts = measurement_datetime)
[Not required]
Loinc 18262-6 Cholesterol in LDL [Mass/volume] in Serum or Plasma by
Direct assay
OMOP CDM concept ID 3009966
<- sqlt(measurement) |>
ldl filter(measurement_concept_id == 3009966) |>
make_component(
.pid = person_id,
.ts = measurement_datetime)
SNOMED 44054006 Type 2 diabetes mellitus
OMOP CDM concept ID 201826
<- sqlt(condition_occurrence) |>
diabetes filter(condition_concept_id == 201826) |>
make_component(
.pid = person_id,
.ts = condition_start_datetime)
SNOMED 449868002 Smokes tobacco daily
OMOP CDM concept ID 42709996
<- sqlt(observation) |>
smoker filter(observation_concept_id %in% c(42709996)) |>
make_component(
.pid = person_id,
.ts = observation_datetime)
ATC codes:
<- sqlt(concept) |>
atc_concepts filter(vocabulary_id == 'ATC' &&
%in% c('C02', 'C03', 'C07', 'C08', 'C09')) |>
concept_code select(concept_id)
<- sqlt(concept_ancestor) |>
drug_concepts inner_join(atc_concepts, by = c('ancestor_concept_id' = 'concept_id')) |>
select(descendant_concept_id)
<- sqlt(drug_exposure) |>
htt inner_join(drug_concepts, by = c('drug_concept_id' = 'descendant_concept_id')) |>
make_component(
.pid = person_id,
.ts = drug_exposure_start_datetime)
[Not required]
[Not required]
<- sqlt(person) |>
person make_component(
.pid = person_id,
.ts = birth_datetime)
The ASCVD Risk Estimator+ formula can be a bit confusing at first,
because of a misprint in the paper. Here is how the paper defined it:
There are three main parts here:
The truth is, the formula written over there is wrong. Credits to the authors, nevertheless, for including in the text careful demonstrations of the use of the formula, including a walk through each calculation. By going through the numbers and examples in the paper, and calculating by hand, one can realize that the ASCVD+ formula actually is:
estimated_risk = (1-b) * exp(a-c)
We prepare each part a, b and c separately, then assemble them
together into a phentype using calculate_formula()
.
The individual sum (a) comes from multiplying the values of
the components by their coefficients as published in Table
A. of Appendix 7:
Notice that the coefficients have 4 variations according to whether the patient is:
We use CASE WHEN ...
to construct the formula according
to these 4 groups.
Some coefficients are N/A
for some groups, meaning the
formula loses that component. For example, the term
ln(age)*ln(total cholesterol)
is not used for Blacks.
The paper also is clear that, in regards to race, the question really is of type Boolean yes/no. As consequence, Asians and Latinos are computed using the coefficients of Whites.
Let us first compute the individual sum using the coeficients according to each of the 4 groups.
# WHITE WOMEN
<- paste(sep = ' + ',
white_women_sum # Ln Age (y) –29.799
'-29.799 * ln(age)',
# Ln Age, Squared 4.884
'4.884 * ln(age)^2',
# Ln Total Cholesterol (mg/dL) 13.540
'13.540 * ln(tc_value_as_number)',
# Ln Age × Ln Total Cholesterol –3.114
'-3.114 * ln(age) * ln(tc_value_as_number)',
# Ln HDL-C (mg/dL) –13.578
'-13.578 * ln(hdl_value_as_number)',
# Ln Age × Ln HDL-C 3.149
'3.149 * ln(age) * ln(hdl_value_as_number)',
# Ln Treated Systolic BP (mm Hg) 2.019
'case when htt is true then
2.019 * ln(sbp_value_as_number) else 0 end',
### # Ln Age × Ln Treated Systolic BP
# N/A
# Ln Untreated Systolic BP (mm Hg) 1.957
'case when htt is false then
1.957 * ln(sbp_value_as_number) else 0 end',
### # Ln Age × Ln Untreated Systolic BP
# N/A
# Current Smoker (1=Yes, 0=No) 7.574
'7.574 * case when is_smoker then 1 else 0 end',
# Ln Age × Current Smoker –1.665
'-1.665 * ln(age) * case when is_smoker then 1 else 0 end',
# Diabetes (1=Yes, 0=No) 0.661
'0.661 * case when has_diabetes then 1 else 0 end'
)
# BLACK WOMEN
<- paste(sep = ' + ',
black_women_sum # Ln Age (y) 17.114
'17.114 * ln(age)',
# Ln Age, Squared
# N/A
# Ln Total Cholesterol (mg/dL) 0.940
'0.940 * ln(tc_value_as_number)',
# Ln Age × Ln Total Cholesterol
# N/A
# Ln HDL-C (mg/dL) –18.920
'-18.920 * ln(hdl_value_as_number)',
# Ln Age × Ln HDL-C 4.475
'4.475 * ln(age) * ln(hdl_value_as_number)',
# Ln Treated Systolic BP (mm Hg) 29.291
'case when htt is true then
29.291 * ln(sbp_value_as_number) else 0 end',
# Ln Age × Ln Treated Systolic BP –6.432
'case when htt is true then
-6.432 * ln(age) * ln(sbp_value_as_number) else 0 end',
# Ln Untreated Systolic BP (mm Hg) 27.820
'case when htt is false then
27.820 * ln(sbp_value_as_number) else 0 end',
# Ln Age × Ln Untreated Systolic BP –6.087
'case when htt is false then
-6.087 * ln(age) * ln(sbp_value_as_number) else 0 end',
# Current Smoker (1=Yes, 0=No) 0.691
'0.691 * case when is_smoker then 1 else 0 end',
# Ln Age × Current Smoker
# N/A
# Diabetes (1=Yes, 0=No) 0.874
'0.874 * case when has_diabetes then 1 else 0 end'
)
# WHITE MEN
<- paste(sep = ' + ',
white_men_sum # Ln Age (y) 12.344
'12.344 * ln(age)',
# Ln Total Cholesterol (mg/dL) 11.853
'11.853 * ln(tc_value_as_number)',
# Ln Age × Ln Total Cholesterol –2.664
'-2.664 * ln(age) * ln(tc_value_as_number)',
# Ln HDL-C (mg/dL) –7.990
'-7.990 * ln(hdl_value_as_number)',
# Ln Age × Ln HDL-C 1.769
'1.769 * ln(age) * ln(hdl_value_as_number)',
# Ln Treated Systolic BP (mm Hg) 1.797
'case when htt is true then
1.797 * ln(sbp_value_as_number) else 0 end',
# Ln Untreated Systolic BP (mm Hg) 1.764
'case when htt is false then
1.764 * ln(sbp_value_as_number) else 0 end',
# Current Smoker (1=Yes, 0=No) 7.837
'7.837 * case when is_smoker then 1 else 0 end',
# Ln Age × Current Smoker –1.795
'-1.795 * ln(age) * case when is_smoker then 1 else 0 end',
# Diabetes (1=Yes, 0=No) 0.658
'0.658 * case when has_diabetes then 1 else 0 end'
)
# BLACK MEN
<- paste(sep = ' + ',
black_men_sum # Ln Age (y) 2.469
'2.469 * ln(age)',
# Ln Total Cholesterol (mg/dL) 0.30
'0.30 * ln(tc_value_as_number)',
# Ln Age × Ln Total Cholesterol
# N/A
# Ln HDL-C (mg/dL) –0.307
'-0.307 * ln(hdl_value_as_number)',
# Ln Age × Ln HDL-C
# N/A
# Ln Treated Systolic BP (mm Hg) 1.916
'case when htt is true then
1.916 * ln(sbp_value_as_number) else 0 end',
# Ln Untreated Systolic BP (mm Hg) 1.809
'case when htt is false then
1.809 * ln(sbp_value_as_number) else 0 end',
# Current Smoker (1=Yes, 0=No) 0.549
'0.549 * case when is_smoker then 1 else 0 end',
# Ln Age × Current Smoker
# N/A
# Diabetes (1=Yes, 0=No) 0.645
'0.645 * case when has_diabetes then 1 else 0 end'
)
To facilitate reading and using the code, we assemble parts of the phenotype in separate formulas.
<- paste0(
individual_sum 'case when is_woman then (
case when is_black then (', black_women_sum, ')
else (', white_women_sum, ')
end)
else (
case when is_black then (', black_men_sum, ')
else (', white_men_sum, ')
end)
end')
<-
baseline_survival 'case when is_woman then (
case when is_black then 0.9533
else 0.9665
end)
else (
case when is_black then 0.8954
else 0.9144
end)
end'
<-
group_mean 'case when is_woman then (
case when is_black then 86.61
else -29.18
end)
else (
case when is_black then 19.54
else 61.18
end)
end'
Armed with the components and formula parts, we put it all together
in calculate_formula()
.
= calculate_formula(
ascvd components = list(
sbp = sbp,
# dbp = dbp,
tc = tc,
hdl = hdl,
# ldl = ldl,
diabetes = diabetes,
smoker = smoker,
htt = htt,
person = person),
export = c(
'diabetes_condition_concept_id',
'htt_drug_concept_id'),
fml = list(
a = list(
is_woman = 'person_gender_concept_id = 8532',
is_black = 'person_race_concept_id = 8516',
age = 'extract(year from age(ts, person_birth_datetime))',
is_smoker = 'smoker_observation_datetime is not null',
has_diabetes = 'diabetes_condition_start_datetime is not null',
# Calculate difference as positive number, regardless of which came first
htt_sbp_dist = "case when htt_drug_exposure_start_datetime < sbp_measurement_datetime
then sbp_measurement_datetime - htt_drug_exposure_start_datetime
else htt_drug_exposure_start_datetime - sbp_measurement_datetime end"
),
b = list(
baseline_survival = baseline_survival,
group_mean = group_mean,
# SBP and HTT within 4 months of each other
htt = "htt_drug_exposure_start_datetime is not null
and htt_sbp_dist < '4 months'"
),
individual_sum = individual_sum,
calculated_risk = '(1 - baseline_survival) * exp(individual_sum - group_mean)',
# Cap to 1%-30%, as indicated by the paper
estimated_risk = 'least(0.3, greatest(0.01, calculated_risk))'
),
dont_require = c('smoker', 'htt', 'diabetes'),
out_window = c('person', 'diabetes'))
Let us peek at 20 rows from the phenotype.
|>
ascvd head_shot(20, blind = TRUE) |>
kable()
phea_row_id | pid | ts | window | person_gender_concept_id | person_race_concept_id | person_birth_datetime | smoker_observation_datetime | diabetes_condition_start_datetime | htt_drug_exposure_start_datetime | sbp_measurement_datetime | tc_value_as_number | hdl_value_as_number | sbp_value_as_number | diabetes_condition_concept_id | htt_drug_concept_id | is_woman | is_black | age | is_smoker | has_diabetes | htt_sbp_dist | baseline_survival | group_mean | htt | individual_sum | calculated_risk | estimated_risk |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | 1 | 2005-05-13 | 00:00:00 | 8507 | 8527 | 1974-03-01 | NA | NA | NA | 2005-05-13 | 199.0 | 79.1 | 121 | NA | NA | FALSE | FALSE | 31 | FALSE | FALSE | NA | 0.9144 | 61.18 | FALSE | 56.79543 | 0.0010673 | 0.0100000 |
5 | 1 | 2006-05-12 | 364 days | 8507 | 8527 | 1974-03-01 | NA | NA | NA | 2006-05-12 | 199.0 | 79.1 | 121 | NA | NA | FALSE | FALSE | 32 | FALSE | FALSE | NA | 0.9144 | 61.18 | FALSE | 56.98511 | 0.0012902 | 0.0100000 |
6 | 1 | 2008-05-16 | 1099 days | 8507 | 8527 | 1974-03-01 | NA | NA | NA | 2008-05-16 | 199.0 | 79.1 | 117 | NA | NA | FALSE | FALSE | 34 | FALSE | FALSE | NA | 0.9144 | 61.18 | FALSE | 57.28800 | 0.0017466 | 0.0100000 |
7 | 1 | 2008-05-16 | 1099 days | 8507 | 8527 | 1974-03-01 | NA | NA | NA | 2008-05-16 | 182.2 | 79.1 | 117 | NA | NA | FALSE | FALSE | 34 | FALSE | FALSE | NA | 0.9144 | 61.18 | FALSE | 57.07114 | 0.0014061 | 0.0100000 |
8 | 1 | 2008-05-16 | 00:00:00 | 8507 | 8527 | 1974-03-01 | NA | NA | NA | 2008-05-16 | 182.2 | 73.8 | 117 | NA | NA | FALSE | FALSE | 34 | FALSE | FALSE | NA | 0.9144 | 61.18 | FALSE | 57.19264 | 0.0015878 | 0.0100000 |
9 | 1 | 2011-05-20 | 1099 days | 8507 | 8527 | 1974-03-01 | NA | NA | NA | 2008-05-16 | 182.2 | 60.7 | 117 | NA | NA | FALSE | FALSE | 37 | FALSE | FALSE | NA | 0.9144 | 61.18 | FALSE | 58.02043 | 0.0036332 | 0.0100000 |
10 | 1 | 2011-05-20 | 1099 days | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2008-05-16 | 182.2 | 60.7 | 117 | 201826 | NA | FALSE | FALSE | 37 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 58.67843 | 0.0070154 | 0.0100000 |
11 | 1 | 2011-05-20 | 1099 days | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2008-05-16 | 183.1 | 60.7 | 117 | 201826 | NA | FALSE | FALSE | 37 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 58.68943 | 0.0070931 | 0.0100000 |
12 | 1 | 2011-05-20 | 00:00:00 | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2011-05-20 | 183.1 | 60.7 | 134 | 201826 | NA | FALSE | FALSE | 37 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 58.92875 | 0.0090109 | 0.0100000 |
13 | 1 | 2014-03-07 | 1022 days | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2014-03-07 | 183.1 | 60.7 | 108 | 201826 | NA | FALSE | FALSE | 40 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 58.99479 | 0.0096261 | 0.0100000 |
14 | 1 | 2014-03-07 | 1022 days | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2014-03-07 | 209.9 | 60.7 | 108 | 201826 | NA | FALSE | FALSE | 40 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 59.27152 | 0.0126949 | 0.0126949 |
15 | 1 | 2014-03-07 | 00:00:00 | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2014-03-07 | 209.9 | 43.5 | 108 | 201826 | NA | FALSE | FALSE | 40 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 59.75942 | 0.0206787 | 0.0206787 |
16 | 1 | 2016-03-11 | 735 days | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2014-03-07 | 209.9 | 57.3 | 108 | 201826 | NA | FALSE | FALSE | 42 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 59.61266 | 0.0178562 | 0.0178562 |
17 | 1 | 2016-03-11 | 735 days | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2014-03-07 | 224.1 | 57.3 | 108 | 201826 | NA | FALSE | FALSE | 42 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 59.73677 | 0.0202156 | 0.0202156 |
18 | 1 | 2016-03-11 | 00:00:00 | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2016-03-11 | 224.1 | 57.3 | 123 | 201826 | NA | FALSE | FALSE | 42 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 59.96618 | 0.0254284 | 0.0254284 |
19 | 1 | 2018-03-16 | 735 days | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2018-03-16 | 224.1 | 57.3 | 111 | 201826 | NA | FALSE | FALSE | 44 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 60.02178 | 0.0268821 | 0.0268821 |
20 | 1 | 2018-03-16 | 735 days | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2018-03-16 | 221.5 | 57.3 | 111 | 201826 | NA | FALSE | FALSE | 44 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 60.00110 | 0.0263320 | 0.0263320 |
21 | 1 | 2018-03-16 | 00:00:00 | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2018-03-16 | 221.5 | 52.7 | 111 | 201826 | NA | FALSE | FALSE | 44 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 60.10953 | 0.0293479 | 0.0293479 |
22 | 1 | 2020-03-20 | 735 days | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2018-03-16 | 221.5 | 55.5 | 111 | 201826 | NA | FALSE | FALSE | 46 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 60.26748 | 0.0343694 | 0.0343694 |
23 | 1 | 2020-03-20 | 735 days | 8507 | 8527 | 1974-03-01 | NA | 2011-05-20 | NA | 2018-03-16 | 205.0 | 55.5 | 111 | 201826 | NA | FALSE | FALSE | 46 | FALSE | TRUE | NA | 0.9144 | 61.18 | FALSE | 60.13948 | 0.0302400 | 0.0302400 |
To make a visually interesting plot, let us pick a patient at random,
but from among those that have the most data points.
To do that we use function sqla()
, which allows you to run
arbitrary SQL on lazy tables from the dbplyr/dplyr
interface.
<- list(ascvd = ascvd) |>
patients_with_most_data sqla('
select pid, count(*) as rows
from ascvd
group by pid
order by rows desc
limit 20
') |>
pull('pid')
From the 20 patients_with_most_data
, we pick one at
random.
<- sample(patients_with_most_data, 1)
random_patient
message('Sampled patient: ', random_patient)
#> Sampled patient: 7770
And then we plot the chosen patient.
|> select(
ascvd # Let's NOT plot the columns below
-ends_with('datetime'),
-ends_with('concept_id'),
-htt_sbp_dist,
-group_mean,
-individual_sum,
-calculated_risk,
-baseline_survival) |>
phea_plot(random_patient, verbose = FALSE)
To see the SQL query that computes the phenotype, use helper function
code_shot()
, or dbplyr::sql_render()
, or the
.clip_sql
option in calculate_formula()
.
code_shot(ascvd)
SELECT *, least(0.3, greatest(0.01, calculated_risk)) AS "estimated_risk"
FROM (
SELECT
*,
1 - baseline_survival) * exp(individual_sum - group_mean) AS "calculated_risk"
(FROM (
SELECT
*,
case when is_woman then (
case when is_black then (17.114 * ln(age) + 0.940 * ln(tc_value_as_number) + -18.920 * ln(hdl_value_as_number) + 4.475 * ln(age) * ln(hdl_value_as_number) + case when htt is true then
29.291 * ln(sbp_value_as_number) else 0 end + case when htt is true then
-6.432 * ln(age) * ln(sbp_value_as_number) else 0 end + case when htt is false then
27.820 * ln(sbp_value_as_number) else 0 end + case when htt is false then
-6.087 * ln(age) * ln(sbp_value_as_number) else 0 end + 0.691 * case when is_smoker then 1 else 0 end + 0.874 * case when has_diabetes then 1 else 0 end)
else (-29.799 * ln(age) + 4.884 * ln(age)^2 + 13.540 * ln(tc_value_as_number) + -3.114 * ln(age) * ln(tc_value_as_number) + -13.578 * ln(hdl_value_as_number) + 3.149 * ln(age) * ln(hdl_value_as_number) + case when htt is true then
2.019 * ln(sbp_value_as_number) else 0 end + case when htt is false then
1.957 * ln(sbp_value_as_number) else 0 end + 7.574 * case when is_smoker then 1 else 0 end + -1.665 * ln(age) * case when is_smoker then 1 else 0 end + 0.661 * case when has_diabetes then 1 else 0 end)
end)
else (
case when is_black then (2.469 * ln(age) + 0.30 * ln(tc_value_as_number) + -0.307 * ln(hdl_value_as_number) + case when htt is true then
1.916 * ln(sbp_value_as_number) else 0 end + case when htt is false then
1.809 * ln(sbp_value_as_number) else 0 end + 0.549 * case when is_smoker then 1 else 0 end + 0.645 * case when has_diabetes then 1 else 0 end)
else (12.344 * ln(age) + 11.853 * ln(tc_value_as_number) + -2.664 * ln(age) * ln(tc_value_as_number) + -7.990 * ln(hdl_value_as_number) + 1.769 * ln(age) * ln(hdl_value_as_number) + case when htt is true then
1.797 * ln(sbp_value_as_number) else 0 end + case when htt is false then
1.764 * ln(sbp_value_as_number) else 0 end + 7.837 * case when is_smoker then 1 else 0 end + -1.795 * ln(age) * case when is_smoker then 1 else 0 end + 0.658 * case when has_diabetes then 1 else 0 end)
end)
end AS "individual_sum"
FROM (
SELECT
*,
case when is_woman then (
case when is_black then 0.9533
else 0.9665
end)
else (
case when is_black then 0.8954
else 0.9144
end)
end AS "baseline_survival",
case when is_woman then (
case when is_black then 86.61
else -29.18
end)
else (
case when is_black then 19.54
else 61.18
end)
end AS "group_mean",
is not null
htt_drug_exposure_start_datetime and htt_sbp_dist < '4 months' AS "htt"
FROM (
SELECT
"phea_row_id",
"pid",
"ts",
"window",
"person_gender_concept_id",
"person_race_concept_id",
"person_birth_datetime",
"smoker_observation_datetime",
"diabetes_condition_start_datetime",
"htt_drug_exposure_start_datetime",
"sbp_measurement_datetime",
"tc_value_as_number",
"hdl_value_as_number",
"sbp_value_as_number",
"diabetes_condition_concept_id",
"htt_drug_concept_id",
= 8532 AS "is_woman",
person_gender_concept_id = 8516 AS "is_black",
person_race_concept_id extract(year from age(ts, person_birth_datetime)) AS "age",
is not null AS "is_smoker",
smoker_observation_datetime is not null AS "has_diabetes",
diabetes_condition_start_datetime case when htt_drug_exposure_start_datetime < sbp_measurement_datetime
then sbp_measurement_datetime - htt_drug_exposure_start_datetime
else htt_drug_exposure_start_datetime - sbp_measurement_datetime end AS "htt_sbp_dist"
FROM (
SELECT
*,
greatest("sbp_ts", "tc_ts", "hdl_ts", "smoker_ts", "htt_ts") - least("sbp_ts", "tc_ts", "hdl_ts", "smoker_ts", "htt_ts") AS "window"
FROM (
SELECT
row_number() OVER (ORDER BY "pid", "ts") AS "phea_row_id",
"pid",
"ts",
case when "name" = 932754 then "measurement_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "sbp_measurement_datetime",
phea_last_value_ignore_nulls(case when "name" = 932754 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "sbp_value_as_number",
phea_last_value_ignore_nulls(case when "name" = 932754 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "sbp_ts",
phea_last_value_ignore_nulls(case when "name" = 727777 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "tc_value_as_number",
phea_last_value_ignore_nulls(case when "name" = 727777 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "tc_ts",
phea_last_value_ignore_nulls(case when "name" = 577443 then "value_as_number" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "hdl_value_as_number",
phea_last_value_ignore_nulls(case when "name" = 577443 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "hdl_ts",
phea_last_value_ignore_nulls(case when "name" = 790069 then "condition_concept_id" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "diabetes_condition_concept_id",
phea_last_value_ignore_nulls(case when "name" = 790069 then "condition_start_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "diabetes_condition_start_datetime",
phea_last_value_ignore_nulls(case when "name" = 790069 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "diabetes_ts",
phea_last_value_ignore_nulls(case when "name" = 79017 then "observation_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "smoker_observation_datetime",
phea_last_value_ignore_nulls(case when "name" = 79017 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "smoker_ts",
phea_last_value_ignore_nulls(case when "name" = 731432 then "drug_concept_id" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "htt_drug_concept_id",
phea_last_value_ignore_nulls(case when "name" = 731432 then "drug_exposure_start_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "htt_drug_exposure_start_datetime",
phea_last_value_ignore_nulls(case when "name" = 731432 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "htt_ts",
phea_last_value_ignore_nulls(case when "name" = 308404 then "gender_concept_id" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "person_gender_concept_id",
phea_last_value_ignore_nulls(case when "name" = 308404 then "birth_datetime" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "person_birth_datetime",
phea_last_value_ignore_nulls(case when "name" = 308404 then "race_concept_id" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "person_race_concept_id",
phea_last_value_ignore_nulls(case when "name" = 308404 then "ts" else null end) OVER (PARTITION BY "pid" ORDER BY "ts" ROWS UNBOUNDED PRECEDING) AS "person_ts"
phea_last_value_ignore_nulls(FROM (
(SELECT
*,
NULL AS "gender_concept_id",
NULL AS "birth_datetime",
NULL AS "race_concept_id"
FROM (
(SELECT
*,
NULL AS "drug_concept_id",
NULL AS "drug_exposure_start_datetime"
FROM (
(SELECT *, NULL AS "observation_datetime"
FROM (
(SELECT
*,
NULL AS "condition_concept_id",
NULL AS "condition_start_datetime"
FROM (
(
(SELECT
932754 AS "name",
"person_id" AS "pid",
"measurement_datetime" AS "ts",
"measurement_datetime",
"value_as_number"
FROM "cdm_new_york3"."measurement"
WHERE ("measurement_concept_id" = 3004249.0)
)UNION ALL
(SELECT
"name",
"pid",
"ts",
NULL AS "measurement_datetime",
"value_as_number"
FROM (
SELECT
727777 AS "name",
"person_id" AS "pid",
"measurement_datetime" AS "ts",
"value_as_number"
FROM "cdm_new_york3"."measurement"
WHERE ("measurement_concept_id" = 3027114.0)
"q01"
)
)
)UNION ALL
(SELECT
"name",
"pid",
"ts",
NULL AS "measurement_datetime",
"value_as_number"
FROM (
SELECT
577443 AS "name",
"person_id" AS "pid",
"measurement_datetime" AS "ts",
"value_as_number"
FROM "cdm_new_york3"."measurement"
WHERE ("measurement_concept_id" = 3007070.0)
"q02"
)
)"q03"
)
)UNION ALL
(SELECT
"name",
"pid",
"ts",
NULL AS "measurement_datetime",
NULL AS "value_as_number",
"condition_concept_id",
"condition_start_datetime"
FROM (
SELECT
790069 AS "name",
"person_id" AS "pid",
"condition_start_datetime" AS "ts",
"condition_concept_id",
"condition_start_datetime"
FROM "cdm_new_york3"."condition_occurrence"
WHERE ("condition_concept_id" = 201826.0)
"q04"
)
)"q05"
)
)UNION ALL
(SELECT
"name",
"pid",
"ts",
NULL AS "measurement_datetime",
NULL AS "value_as_number",
NULL AS "condition_concept_id",
NULL AS "condition_start_datetime",
"observation_datetime"
FROM (
SELECT
79017 AS "name",
"person_id" AS "pid",
"observation_datetime" AS "ts",
"observation_datetime"
FROM "cdm_new_york3"."observation"
WHERE ("observation_concept_id" IN (42709996.0))
"q06"
)
)"q07"
)
)UNION ALL
(SELECT
"name",
"pid",
"ts",
NULL AS "measurement_datetime",
NULL AS "value_as_number",
NULL AS "condition_concept_id",
NULL AS "condition_start_datetime",
NULL AS "observation_datetime",
"drug_concept_id",
"drug_exposure_start_datetime"
FROM (
SELECT
731432 AS "name",
"person_id" AS "pid",
"drug_exposure_start_datetime" AS "ts",
"drug_concept_id",
"drug_exposure_start_datetime"
FROM (
SELECT
"drug_exposure_id",
"person_id",
"drug_concept_id",
"drug_exposure_start_date",
"drug_exposure_start_datetime",
"drug_exposure_end_date",
"drug_exposure_end_datetime",
"verbatim_end_date",
"drug_type_concept_id",
"stop_reason",
"refills",
"quantity",
"days_supply",
"sig",
"route_concept_id",
"lot_number",
"provider_id",
"visit_occurrence_id",
"visit_detail_id",
"drug_source_value",
"drug_source_concept_id",
"route_source_value",
"dose_unit_source_value"
FROM "cdm_new_york3"."drug_exposure" AS "LHS"
INNER JOIN (
SELECT "descendant_concept_id"
FROM (
SELECT
"ancestor_concept_id",
"descendant_concept_id",
"min_levels_of_separation",
"max_levels_of_separation"
FROM "cdm_new_york3"."concept_ancestor" AS "LHS"
INNER JOIN (
SELECT "concept_id"
FROM "cdm_new_york3"."concept"
WHERE ("vocabulary_id" = 'ATC' AND "concept_code" IN ('C02', 'C03', 'C07', 'C08', 'C09'))
"RHS"
) ON ("LHS"."ancestor_concept_id" = "RHS"."concept_id")
"q08"
) "RHS"
) ON ("LHS"."drug_concept_id" = "RHS"."descendant_concept_id")
"q09"
) "q10"
)
)"q11"
)
)UNION ALL
(SELECT
"name",
"pid",
"ts",
NULL AS "measurement_datetime",
NULL AS "value_as_number",
NULL AS "condition_concept_id",
NULL AS "condition_start_datetime",
NULL AS "observation_datetime",
NULL AS "drug_concept_id",
NULL AS "drug_exposure_start_datetime",
"gender_concept_id",
"birth_datetime",
"race_concept_id"
FROM (
SELECT
308404 AS "name",
"person_id" AS "pid",
"birth_datetime" AS "ts",
"gender_concept_id",
"birth_datetime",
"race_concept_id"
FROM "cdm_new_york3"."person"
"q12"
)
)"q13"
) "q14"
) "q15"
) WHERE ("sbp_ts" is not null and "tc_ts" is not null and "hdl_ts" is not null and "person_ts" is not null)
"q16"
) "q17"
) "q18"
) "q19" )