ehrQL cheatsheet🔗
Frames🔗
Patient frames contain one row per patient
patient id date_of_birth sex
123 1980-01-01 m
456 1990-06-06 f
789 2020-01-01 i
Event frames contain many rows per patient
patient_id event_date event_code
123 2020-04-01 abc
123 2020-04-02 def
123 2021-01-01 ghi
Simple dataset🔗
from ehrql import create_dataset
from ehrql.tables.core import patients
from ehrql.tables.tpp import addresses
dataset = create_dataset()
...
dataset.define_population(
patients.exists_for_patient()
)
Codelists🔗
statin_medications = codelist_from_csv("codelists/dm_cod.csv", column="code")
Debug🔗
from ehrql import debug
debug(patients)
Tables🔗
core🔗
clinical_events
medications
ons_deaths
patients
practice_registrations
selected values🔗
clinical_events.date
clinical_events.snomedct_code
clinical_events.numeric_value
patients.date_of_birth
patients.sex
patients.date_of_death
tpp🔗
addresses
apcs
apcs_cost
appointments
clinical_events
clinical_events_ranges
covid_therapeutics
ec
ec_cost
emergency_care_attendances
ethnicity_from_sus
household_memberships_2020
medications
occupation_on_covid_vaccine_record
ons_deaths
opa
opa_cost
opa_diag
opa_proc
open_prompt
parents
patients
practice_registrations
sgss_covid_all_tests
ukrr
vaccinations
wl_clockstops
wl_openpathways
selected values🔗
addresses.address_id
addresses.start_date
addresses.end_date
addresses.imd_rounded
apcs.admission_date
medications.date
medications.dmd_code
practice_registration.start_date
practice_registration.end_date
practice_registration.practice_pseudo_id
practice_registration.practice_stp
ukrr.renal_centre
vaccinations.date
vaccinations.product_name
adding data to a dataset🔗
from a patient frame🔗
value series
dataset.sex = patients.sex
dataset.date_of_birth = patients.date_of_birth
dataset.birth_year = patients.date_of_birth.year
dataset.age = patients.age_on("2024-01-01")
boolean series
dataset.died_with_X = ons_deaths.cause_of_death_is_in(cause_of_death_X_codelist)
from an event frame🔗
value series
dataset.imd = addresses.for_patient_on("2023-01-01").imd_rounded
aggregated value series
dataset.mean_hba1c = clinical_events.where(
clinical_events.snomedct_code.is_in(hba1c_codelist)
).where(
clinical_events.date.is_on_or_after("2022-07-01")
).numeric_value.mean_for_patient()
sorted value series
dataset.first_statin_prescription_date = medications.where(
medications.dmd_code.is_in(statin_medications)
).sort_by(
medications.date
).first_for_patient().date
boolean series
dataset.has_had_asthma_diagnosis = clinical_events.where(
clinical_events.snomedct_code.is_in(asthma_codelist)
).exists_for_patient()
boolean series with a date range
dataset.has_recent_cardiac_admission = apcs.where(
apcs.primary_diagnosis.is_in(cardiac_diagnosis_codes)
).where(
apcs.admission_date.is_on_or_between("2022-07-01", "2023-01-01")
).exists_for_patient()
logic operators🔗
==
equals!=
not equals&
and|
or~
not>
greater than>=
greater than or equals<=
less than or equals<
less than
selected functions🔗
common🔗
.is_null()
.is_not_null()
.is_in()
.contains()
.map_values()
aggregation🔗
.minimum_for_patient()
.maximum_for_patient()
.sum_for_patient()
.mean_for_patient()
.count_for_patient()
date🔗
.is_before(other)
.is_on_or_before(other)
.is_after(other)
.is_on_or_after(other)
.is_between_but_not_on(start, end)
.is_on_or_between(start, end)
.is_during(interval)
sorted event frames🔗
.first_for_patient()
.last_for_patient()