How to work with codelists🔗
A codelist is a collection of clinical codes that can be used to classify patients as having certain clinical events or demographic properties. For example, in a clinical system (e.g., SNOMED CT), an asthma diagnosis may be indicated by more than 100 codes.
Adding codelists to your dataset definition🔗
Codelists need to be stored as data within your study repository, from where they can be used in your dataset definition.
They live as CSV files in the
codelists/ directory (for more details see "Adding codelists to a project").
Codelists are loaded into variables as follows:
from ehrql import codelist_from_csv
ethnicity_codelist = codelist_from_csv(
You can add codelists to your
analysis/dataset_definition.py, but we recommend that you add all your codelists to a file called
analysis/codelists.py and import them at the top of your dataset definition.
We recommend that you name each codelists you want add in your import statement.
This makes it easier to read and understand your code.
from codelists import my_codelist1, my_codelist2
To combine different codelists you can use the
This maintains separate codelists for some variable definitions while also allowing to combine them if needed.
For example, the two codelists
acute_cardiac_codes can be combined as follows:
all_cardiac_codes = chronic_cardiac_codes + acute_cardiac_codes
Using a small number of codes🔗
In some cases you may want to use only one or two clinical codes. You can define a collection of codes as follows:
weight_codes = ["27113001", "162763007"]
When you use your user defined codelists, ehrQL will check whether the codes you specified are valid clinical codes in the clinical system you're querying. For ease of discoverability and reproducibility we recommend building codelists using OpenCodelists, or re-using existing ones.