Skip to content

Language reference

DatasetπŸ”—

create_dataset() πŸ”—

A dataset defines the patients you want to include in your population and the variables you want to extract for them.

A dataset definition file must define a dataset called dataset:

dataset = create_dataset()

Add variables to the dataset as attributes:

dataset.age = patients.age_on("2020-01-01")

class Dataset() πŸ”—

To create a dataset use the create_dataset function.

define_population(population_condition) πŸ”—

Define the condition that patients must meet to be included in the Dataset, in the form of a boolean patient series.

Example usage:

dataset.define_population(patients.date_of_birth < "1990-01-01")

For more detail see the how-to guide on defining populations.

add_column(column_name, ehrql_query) πŸ”—

Add a column to the dataset.

column_name
The name of the new column, as a string.

ehrql_query
An ehrQL query that returns one row per patient.

Example usage:

dataset.add_column("age", patients.age_on("2020-01-01"))

Using .add_column is equivalent to = for adding a single column but can also be used to add multiple columns, for example by iterating over a dictionary. For more details see the guide on "How to assign multiple columns to a dataset programmatically".

configure_dummy_data(population_size=10, legacy=False, timeout=60, additional_population_constraint=None) πŸ”—

Configure the dummy data to be generated.

population_size
Maximum number of patients to generate.

Note that you may get fewer patients than this if the generator runs out of time – see timeout below.

legacy
Use legacy dummy data.

timeout
Maximum time in seconds to spend generating dummy data.

additional_population_constraint
An additional ehrQL query that can be used to constrain the population that will be selected for dummy data. This is incompatible with legacy mode.

For example, if you wanted to ensure that two dates appear in a particular order in your dummy data, you could add additional_population_constraint = dataset.first_date < dataset.second_date.

You can also combine constraints with & as normal in ehrQL. E.g. additional_population_constraint = patients.sex.is_in(['male', 'female']) & ( patients.age_on(some_date) < 80) would give you dummy data consisting of only men and women who were under the age of 80 on some particular date.

Example usage:

dataset.configure_dummy_data(population_size=10000)


FramesπŸ”—

Frames are the starting point for building any query in ehrQL. You can think of a Frame as being like a table in a database, in that it contains multiple rows and multiple columns. But a Frame can have operations applied to it like filtering or sorting to produce a new Frame.

You don't need to define any Frames yourself. Instead you import them from the various schemas available in ehrql.tables e.g.

from ehrql.tables.core import patients

Frames have columns which you can access as attributes on the Frame e.g.

dob = patients.date_of_birth

The schema documentation contains the full list of available columns for each Frame. For example, see ehrql.tables.core.patients.

Accessing a column attribute on a Frame produces a Series, which are documented elsewhere below.

Some Frames contain at most one row per patient, we call these PatientFrames; others can contain multiple rows per patient, we call these EventFrames.

class PatientFrame() πŸ”—

Frame containing at most one row per patient.

exists_for_patient() πŸ”—

Return a boolean patient series which is True for each patient that has a row in this frame and False otherwise.

Example usage:

pratice_registrations.for_patient_on("2020-01-01").exists_for_patient()

count_for_patient() πŸ”—

Return an integer patient series giving the number of rows each patient has in this frame.

Note that if a patient has no rows at all in the frame the result will be zero rather than NULL.

Example usage:

clinical_events.where(clinical_events.date.year == 2020).count_for_patient()

class EventFrame() πŸ”—

Frame which may contain multiple rows per patient.

where(condition) πŸ”—

Return a new frame containing only the rows in this frame for which condition evaluates True.

Note that this excludes any rows for which condition is NULL.

Example usage:

clinical_events.where(clinical_events.date >= "2020-01-01")

except_where(condition) πŸ”—

Return a new frame containing only the rows in this frame for which condition evaluates False or NULL i.e. the exact inverse of the rows included by where().

Example usage:

practice_registrations.except_where(practice_registrations.end_date < "2020-01-01")

Note that except_where() is not the same as where() with an inverted condition, as the latter would exclude rows where condition is NULL.

sort_by(*sort_values) πŸ”—

Return a new frame with the rows sorted for each patient, by each of the supplied sort_values.

Where more than one sort value is supplied then the first (i.e. left-most) value has highest priority and each subsequent sort value will only be used as a tie-breaker in case of an exact match among previous values.

Note that NULL is considered smaller than any other value, so you may wish to filter out NULL values before sorting.

Example usage:

clinical_events.sort_by(clinical_events.date, clinical_events.snomedct_code)

exists_for_patient() πŸ”—

Return a boolean patient series which is True for each patient that has a row in this frame and False otherwise.

Example usage:

pratice_registrations.for_patient_on("2020-01-01").exists_for_patient()

count_for_patient() πŸ”—

Return an integer patient series giving the number of rows each patient has in this frame.

Note that if a patient has no rows at all in the frame the result will be zero rather than NULL.

Example usage:

clinical_events.where(clinical_events.date.year == 2020).count_for_patient()

class SortedEventFrame() πŸ”—

Frame which contains multiple rows per patient and has had one or more sort operations applied.

where(condition) πŸ”—

Return a new frame containing only the rows in this frame for which condition evaluates True.

Note that this excludes any rows for which condition is NULL.

Example usage:

clinical_events.where(clinical_events.date >= "2020-01-01")

except_where(condition) πŸ”—

Return a new frame containing only the rows in this frame for which condition evaluates False or NULL i.e. the exact inverse of the rows included by where().

Example usage:

practice_registrations.except_where(practice_registrations.end_date < "2020-01-01")

Note that except_where() is not the same as where() with an inverted condition, as the latter would exclude rows where condition is NULL.

sort_by(*sort_values) πŸ”—

Return a new frame with the rows sorted for each patient, by each of the supplied sort_values.

Where more than one sort value is supplied then the first (i.e. left-most) value has highest priority and each subsequent sort value will only be used as a tie-breaker in case of an exact match among previous values.

Note that NULL is considered smaller than any other value, so you may wish to filter out NULL values before sorting.

Example usage:

clinical_events.sort_by(clinical_events.date, clinical_events.snomedct_code)

exists_for_patient() πŸ”—

Return a boolean patient series which is True for each patient that has a row in this frame and False otherwise.

Example usage:

pratice_registrations.for_patient_on("2020-01-01").exists_for_patient()

count_for_patient() πŸ”—

Return an integer patient series giving the number of rows each patient has in this frame.

Note that if a patient has no rows at all in the frame the result will be zero rather than NULL.

Example usage:

clinical_events.where(clinical_events.date.year == 2020).count_for_patient()

first_for_patient() πŸ”—

Return a PatientFrame containing, for each patient, the first matching row according to whatever sort order has been applied.

Note that where there are multiple rows tied for first place then the specific row returned is picked arbitrarily but consistently i.e. you shouldn't depend on getting any particular result, but the result you do get shouldn't change unless the data changes.

Example usage:

medications.sort_by(medications.date).first_for_patient()

last_for_patient() πŸ”—

Return a PatientFrame containing, for each patient, the last matching row according to whatever sort order has been applied.

Note that where there are multiple rows tied for last place then the specific row returned is picked arbitrarily but consistently i.e. you shouldn't depend on getting any particular result, but the result you do get shouldn't change unless the data changes.

Example usage:

medications.sort_by(medications.date).last_for_patient()


SeriesπŸ”—

A Series represents a column of values of a certain type. Some Series contain at most one value per patient, we call these PatientSeries; others can contain multiple values per patient, we call these EventSeries. Values can be NULL (i.e. missing) but a Series can never mix values of different types.

class BoolPatientSeries() πŸ”—

One row per patient series of type boolean

self == other πŸ”—

Return a boolean series comparing each value in this series with its corresponding value in other.

Note that the result of comparing anything with NULL (including NULL itself) is NULL.

Example usage:

patients.sex == "female"

self != other πŸ”—

Return the inverse of == above.

Note that the same point regarding NULL applies here.

Example usage:

patients.sex != "unknown"

self & other πŸ”—

Logical AND

Return a boolean series which is True where both this series and other are True, False where either are False, and NULL otherwise.

Example usage:

is_female_and_alive = patients.is_alive_on("2020-01-01") & patients.sex.is_in(["female"])

self | other πŸ”—

Logical OR

Return a boolean series which is True where either this series or other is True, False where both are False, and NULL otherwise.

Example usage:

is_alive = patients.date_of_death.is_null() | patients.date_of_death.is_after("2020-01-01")
Note that the above example is equivalent to patients.is_alive_on("2020-01-01").

~ self πŸ”—

Logical NOT

Return a boolean series which is the inverse of this series i.e. where True becomes False, False becomes True, and NULL stays as NULL.

Example usage:

is_born_outside_period = ~ patients.date_of_birth.is_on_or_between("2020-03-01", "2020-06-30")

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

Return a boolean series which is True for each value in this series which is contained in other.

See how to combine is_in with a codelist in the how-to guide.

Example usage:

medications.dmd_code.is_in(["39113311000001107", "39113611000001102"])

other accepts any of the standard "container" types (tuple, list, set, frozenset, or dict) or another event series.

is_not_in(other) πŸ”—

Return the inverse of is_in() above.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

as_int() πŸ”—

Return each value in this Boolean series as 1 (True) or 0 (False).

class BoolEventSeries() πŸ”—

Multiple rows per patient series of type boolean

self == other πŸ”—

Return a boolean series comparing each value in this series with its corresponding value in other.

Note that the result of comparing anything with NULL (including NULL itself) is NULL.

Example usage:

patients.sex == "female"

self != other πŸ”—

Return the inverse of == above.

Note that the same point regarding NULL applies here.

Example usage:

patients.sex != "unknown"

self & other πŸ”—

Logical AND

Return a boolean series which is True where both this series and other are True, False where either are False, and NULL otherwise.

Example usage:

is_female_and_alive = patients.is_alive_on("2020-01-01") & patients.sex.is_in(["female"])

self | other πŸ”—

Logical OR

Return a boolean series which is True where either this series or other is True, False where both are False, and NULL otherwise.

Example usage:

is_alive = patients.date_of_death.is_null() | patients.date_of_death.is_after("2020-01-01")
Note that the above example is equivalent to patients.is_alive_on("2020-01-01").

~ self πŸ”—

Logical NOT

Return a boolean series which is the inverse of this series i.e. where True becomes False, False becomes True, and NULL stays as NULL.

Example usage:

is_born_outside_period = ~ patients.date_of_birth.is_on_or_between("2020-03-01", "2020-06-30")

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

Return a boolean series which is True for each value in this series which is contained in other.

See how to combine is_in with a codelist in the how-to guide.

Example usage:

medications.dmd_code.is_in(["39113311000001107", "39113611000001102"])

other accepts any of the standard "container" types (tuple, list, set, frozenset, or dict) or another event series.

is_not_in(other) πŸ”—

Return the inverse of is_in() above.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

as_int() πŸ”—

Return each value in this Boolean series as 1 (True) or 0 (False).

count_distinct_for_patient() πŸ”—

Return an integer patient series counting the number of distinct values for each patient in the series (ignoring any NULL values).

Note that if a patient has no values at all in the series the result will be zero rather than NULL.

Example usage:

medications.dmd_code.count_distinct_for_patient()

class StrPatientSeries() πŸ”—

One row per patient series of type string

self == other πŸ”—

Return a boolean series comparing each value in this series with its corresponding value in other.

Note that the result of comparing anything with NULL (including NULL itself) is NULL.

Example usage:

patients.sex == "female"

self != other πŸ”—

Return the inverse of == above.

Note that the same point regarding NULL applies here.

Example usage:

patients.sex != "unknown"

self > other πŸ”—

Return a boolean series which is True for each value in this series that is strictly less than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") < 18

self >= other πŸ”—

Return a boolean series which is True for each value in this series that is less than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") <= 17

self <= other πŸ”—

Return a boolean series which is True for each value in this series that is greater than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") >= 18

self < other πŸ”—

Return a boolean series which is True for each value in this series that is strictly greater than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") > 17

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

Return a boolean series which is True for each value in this series which is contained in other.

See how to combine is_in with a codelist in the how-to guide.

Example usage:

medications.dmd_code.is_in(["39113311000001107", "39113611000001102"])

other accepts any of the standard "container" types (tuple, list, set, frozenset, or dict) or another event series.

is_not_in(other) πŸ”—

Return the inverse of is_in() above.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

contains(other) πŸ”—

Return a boolean series which is True for each string in this series which contains other as a sub-string and False otherwise. For NULL values, the result is NULL.

Example usage:

is_female = patients.sex.contains("fem")

other can be another string series, in which case corresponding values are compared. If either value is NULL the result is NULL.

class StrEventSeries() πŸ”—

Multiple rows per patient series of type string

self == other πŸ”—

Return a boolean series comparing each value in this series with its corresponding value in other.

Note that the result of comparing anything with NULL (including NULL itself) is NULL.

Example usage:

patients.sex == "female"

self != other πŸ”—

Return the inverse of == above.

Note that the same point regarding NULL applies here.

Example usage:

patients.sex != "unknown"

self > other πŸ”—

Return a boolean series which is True for each value in this series that is strictly less than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") < 18

self >= other πŸ”—

Return a boolean series which is True for each value in this series that is less than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") <= 17

self <= other πŸ”—

Return a boolean series which is True for each value in this series that is greater than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") >= 18

self < other πŸ”—

Return a boolean series which is True for each value in this series that is strictly greater than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") > 17

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

Return a boolean series which is True for each value in this series which is contained in other.

See how to combine is_in with a codelist in the how-to guide.

Example usage:

medications.dmd_code.is_in(["39113311000001107", "39113611000001102"])

other accepts any of the standard "container" types (tuple, list, set, frozenset, or dict) or another event series.

is_not_in(other) πŸ”—

Return the inverse of is_in() above.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

contains(other) πŸ”—

Return a boolean series which is True for each string in this series which contains other as a sub-string and False otherwise. For NULL values, the result is NULL.

Example usage:

is_female = patients.sex.contains("fem")

other can be another string series, in which case corresponding values are compared. If either value is NULL the result is NULL.

count_distinct_for_patient() πŸ”—

Return an integer patient series counting the number of distinct values for each patient in the series (ignoring any NULL values).

Note that if a patient has no values at all in the series the result will be zero rather than NULL.

Example usage:

medications.dmd_code.count_distinct_for_patient()

minimum_for_patient() πŸ”—

Return the minimum value in the series for each patient (or NULL if the patient has no values).

Example usage:

clinical_events.where(...).numeric_value.minimum_for_patient()

maximum_for_patient() πŸ”—

Return the maximum value in the series for each patient (or NULL if the patient has no values).

Example usage:

clinical_events.where(...).numeric_value.maximum_for_patient()

class IntPatientSeries() πŸ”—

One row per patient series of type integer

self == other πŸ”—

Return a boolean series comparing each value in this series with its corresponding value in other.

Note that the result of comparing anything with NULL (including NULL itself) is NULL.

Example usage:

patients.sex == "female"

self != other πŸ”—

Return the inverse of == above.

Note that the same point regarding NULL applies here.

Example usage:

patients.sex != "unknown"

self > other πŸ”—

Return a boolean series which is True for each value in this series that is strictly less than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") < 18

self >= other πŸ”—

Return a boolean series which is True for each value in this series that is less than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") <= 17

self <= other πŸ”—

Return a boolean series which is True for each value in this series that is greater than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") >= 18

self < other πŸ”—

Return a boolean series which is True for each value in this series that is strictly greater than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") > 17

self + other πŸ”—

Return the sum of each corresponding value in this series and other (or NULL if either is NULL).

self - other πŸ”—

Return each value in this series with its corresponding value in other subtracted (or NULL if either is NULL).

self * other πŸ”—

Return the product of each corresponding value in this series and other (or NULL if either is NULL).

self / other πŸ”—

Return a series with each value in this series divided by its correponding value in other (or NULL if either is NULL).

Note that the result is always if a float even if the inputs are integers.

self // other πŸ”—

Return a series with each value in this series divided by its correponding value in other and then rounded down to the nearest integer value (or NULL if either is NULL).

Note that the result is always if an integer even if the inputs are floats.

- self πŸ”—

Return the negation of each value in this series.

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

Return a boolean series which is True for each value in this series which is contained in other.

See how to combine is_in with a codelist in the how-to guide.

Example usage:

medications.dmd_code.is_in(["39113311000001107", "39113611000001102"])

other accepts any of the standard "container" types (tuple, list, set, frozenset, or dict) or another event series.

is_not_in(other) πŸ”—

Return the inverse of is_in() above.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

as_int() πŸ”—

Return each value in this series rounded down to the nearest integer.

as_float() πŸ”—

Return each value in this series as a float (e.g. 10 becomes 10.0).

class IntEventSeries() πŸ”—

Multiple rows per patient series of type integer

self == other πŸ”—

Return a boolean series comparing each value in this series with its corresponding value in other.

Note that the result of comparing anything with NULL (including NULL itself) is NULL.

Example usage:

patients.sex == "female"

self != other πŸ”—

Return the inverse of == above.

Note that the same point regarding NULL applies here.

Example usage:

patients.sex != "unknown"

self > other πŸ”—

Return a boolean series which is True for each value in this series that is strictly less than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") < 18

self >= other πŸ”—

Return a boolean series which is True for each value in this series that is less than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") <= 17

self <= other πŸ”—

Return a boolean series which is True for each value in this series that is greater than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") >= 18

self < other πŸ”—

Return a boolean series which is True for each value in this series that is strictly greater than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") > 17

self + other πŸ”—

Return the sum of each corresponding value in this series and other (or NULL if either is NULL).

self - other πŸ”—

Return each value in this series with its corresponding value in other subtracted (or NULL if either is NULL).

self * other πŸ”—

Return the product of each corresponding value in this series and other (or NULL if either is NULL).

self / other πŸ”—

Return a series with each value in this series divided by its correponding value in other (or NULL if either is NULL).

Note that the result is always if a float even if the inputs are integers.

self // other πŸ”—

Return a series with each value in this series divided by its correponding value in other and then rounded down to the nearest integer value (or NULL if either is NULL).

Note that the result is always if an integer even if the inputs are floats.

- self πŸ”—

Return the negation of each value in this series.

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

Return a boolean series which is True for each value in this series which is contained in other.

See how to combine is_in with a codelist in the how-to guide.

Example usage:

medications.dmd_code.is_in(["39113311000001107", "39113611000001102"])

other accepts any of the standard "container" types (tuple, list, set, frozenset, or dict) or another event series.

is_not_in(other) πŸ”—

Return the inverse of is_in() above.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

as_int() πŸ”—

Return each value in this series rounded down to the nearest integer.

as_float() πŸ”—

Return each value in this series as a float (e.g. 10 becomes 10.0).

count_distinct_for_patient() πŸ”—

Return an integer patient series counting the number of distinct values for each patient in the series (ignoring any NULL values).

Note that if a patient has no values at all in the series the result will be zero rather than NULL.

Example usage:

medications.dmd_code.count_distinct_for_patient()

minimum_for_patient() πŸ”—

Return the minimum value in the series for each patient (or NULL if the patient has no values).

Example usage:

clinical_events.where(...).numeric_value.minimum_for_patient()

maximum_for_patient() πŸ”—

Return the maximum value in the series for each patient (or NULL if the patient has no values).

Example usage:

clinical_events.where(...).numeric_value.maximum_for_patient()

sum_for_patient() πŸ”—

Return the sum of all values in the series for each patient.

mean_for_patient() πŸ”—

Return the arithmetic mean of any non-NULL values in the series for each patient.

class FloatPatientSeries() πŸ”—

One row per patient series of type float

self == other πŸ”—

Return a boolean series comparing each value in this series with its corresponding value in other.

Note that the result of comparing anything with NULL (including NULL itself) is NULL.

Example usage:

patients.sex == "female"

self != other πŸ”—

Return the inverse of == above.

Note that the same point regarding NULL applies here.

Example usage:

patients.sex != "unknown"

self > other πŸ”—

Return a boolean series which is True for each value in this series that is strictly less than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") < 18

self >= other πŸ”—

Return a boolean series which is True for each value in this series that is less than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") <= 17

self <= other πŸ”—

Return a boolean series which is True for each value in this series that is greater than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") >= 18

self < other πŸ”—

Return a boolean series which is True for each value in this series that is strictly greater than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") > 17

self + other πŸ”—

Return the sum of each corresponding value in this series and other (or NULL if either is NULL).

self - other πŸ”—

Return each value in this series with its corresponding value in other subtracted (or NULL if either is NULL).

self * other πŸ”—

Return the product of each corresponding value in this series and other (or NULL if either is NULL).

self / other πŸ”—

Return a series with each value in this series divided by its correponding value in other (or NULL if either is NULL).

Note that the result is always if a float even if the inputs are integers.

self // other πŸ”—

Return a series with each value in this series divided by its correponding value in other and then rounded down to the nearest integer value (or NULL if either is NULL).

Note that the result is always if an integer even if the inputs are floats.

- self πŸ”—

Return the negation of each value in this series.

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

Return a boolean series which is True for each value in this series which is contained in other.

See how to combine is_in with a codelist in the how-to guide.

Example usage:

medications.dmd_code.is_in(["39113311000001107", "39113611000001102"])

other accepts any of the standard "container" types (tuple, list, set, frozenset, or dict) or another event series.

is_not_in(other) πŸ”—

Return the inverse of is_in() above.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

as_int() πŸ”—

Return each value in this series rounded down to the nearest integer.

as_float() πŸ”—

Return each value in this series as a float (e.g. 10 becomes 10.0).

class FloatEventSeries() πŸ”—

Multiple rows per patient series of type float

self == other πŸ”—

Return a boolean series comparing each value in this series with its corresponding value in other.

Note that the result of comparing anything with NULL (including NULL itself) is NULL.

Example usage:

patients.sex == "female"

self != other πŸ”—

Return the inverse of == above.

Note that the same point regarding NULL applies here.

Example usage:

patients.sex != "unknown"

self > other πŸ”—

Return a boolean series which is True for each value in this series that is strictly less than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") < 18

self >= other πŸ”—

Return a boolean series which is True for each value in this series that is less than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") <= 17

self <= other πŸ”—

Return a boolean series which is True for each value in this series that is greater than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") >= 18

self < other πŸ”—

Return a boolean series which is True for each value in this series that is strictly greater than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") > 17

self + other πŸ”—

Return the sum of each corresponding value in this series and other (or NULL if either is NULL).

self - other πŸ”—

Return each value in this series with its corresponding value in other subtracted (or NULL if either is NULL).

self * other πŸ”—

Return the product of each corresponding value in this series and other (or NULL if either is NULL).

self / other πŸ”—

Return a series with each value in this series divided by its correponding value in other (or NULL if either is NULL).

Note that the result is always if a float even if the inputs are integers.

self // other πŸ”—

Return a series with each value in this series divided by its correponding value in other and then rounded down to the nearest integer value (or NULL if either is NULL).

Note that the result is always if an integer even if the inputs are floats.

- self πŸ”—

Return the negation of each value in this series.

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

Return a boolean series which is True for each value in this series which is contained in other.

See how to combine is_in with a codelist in the how-to guide.

Example usage:

medications.dmd_code.is_in(["39113311000001107", "39113611000001102"])

other accepts any of the standard "container" types (tuple, list, set, frozenset, or dict) or another event series.

is_not_in(other) πŸ”—

Return the inverse of is_in() above.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

as_int() πŸ”—

Return each value in this series rounded down to the nearest integer.

as_float() πŸ”—

Return each value in this series as a float (e.g. 10 becomes 10.0).

count_distinct_for_patient() πŸ”—

Return an integer patient series counting the number of distinct values for each patient in the series (ignoring any NULL values).

Note that if a patient has no values at all in the series the result will be zero rather than NULL.

Example usage:

medications.dmd_code.count_distinct_for_patient()

minimum_for_patient() πŸ”—

Return the minimum value in the series for each patient (or NULL if the patient has no values).

Example usage:

clinical_events.where(...).numeric_value.minimum_for_patient()

maximum_for_patient() πŸ”—

Return the maximum value in the series for each patient (or NULL if the patient has no values).

Example usage:

clinical_events.where(...).numeric_value.maximum_for_patient()

sum_for_patient() πŸ”—

Return the sum of all values in the series for each patient.

mean_for_patient() πŸ”—

Return the arithmetic mean of any non-NULL values in the series for each patient.

class DatePatientSeries() πŸ”—

One row per patient series of type date

self == other πŸ”—

Return a boolean series comparing each value in this series with its corresponding value in other.

Note that the result of comparing anything with NULL (including NULL itself) is NULL.

Example usage:

patients.sex == "female"

self != other πŸ”—

Return the inverse of == above.

Note that the same point regarding NULL applies here.

Example usage:

patients.sex != "unknown"

self > other πŸ”—

Return a boolean series which is True for each value in this series that is strictly less than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") < 18

self >= other πŸ”—

Return a boolean series which is True for each value in this series that is less than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") <= 17

self <= other πŸ”—

Return a boolean series which is True for each value in this series that is greater than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") >= 18

self < other πŸ”—

Return a boolean series which is True for each value in this series that is strictly greater than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") > 17

self - other πŸ”—

Return a series giving the difference between each date in this series and other (see DateDifference).

Example usage:

age_months = (date("2020-01-01") - patients.date_of_birth).months

year πŸ”—

Return an integer series giving the year of each date in this series.

month πŸ”—

Return an integer series giving the month (1-12) of each date in this series.

day πŸ”—

Return an integer series giving the day of the month (1-31) of each date in this series.

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

Return a boolean series which is True for each value in this series which is contained in other.

See how to combine is_in with a codelist in the how-to guide.

Example usage:

medications.dmd_code.is_in(["39113311000001107", "39113611000001102"])

other accepts any of the standard "container" types (tuple, list, set, frozenset, or dict) or another event series.

is_not_in(other) πŸ”—

Return the inverse of is_in() above.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

to_first_of_year() πŸ”—

Return a date series with each date in this series replaced by the date of the first day in its corresponding calendar year.

Example usage:

patients.date_of_death.to_first_of_year()

to_first_of_month() πŸ”—

Return a date series with each date in this series replaced by the date of the first day in its corresponding calendar month.

Example usage:

patients.date_of_death.to_first_of_month()

is_before(other) πŸ”—

Return a boolean series which is True for each date in this series that is strictly earlier than its corresponding date in other and False otherwise (or NULL if either value is NULL).

Example usage:

medications.where(medications.date.is_before("2020-04-01"))

is_on_or_before(other) πŸ”—

Return a boolean series which is True for each date in this series that is earlier than or the same as its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

medications.where(medications.date.is_on_or_before("2020-03-31"))

is_after(other) πŸ”—

Return a boolean series which is True for each date in this series that is strictly later than its corresponding date in other and False otherwise (or NULL if either value is NULL).

Example usage:

medications.where(medications.date.is_after("2020-03-31"))

is_on_or_after(other) πŸ”—

Return a boolean series which is True for each date in this series that is later than or the same as its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

medications.where(medications.date.is_on_or_after("2020-04-01"))

is_between_but_not_on(start, end) πŸ”—

Return a boolean series which is True for each date in this series which is strictly between (i.e. not equal to) the corresponding dates in start and end, and False otherwise.

Example usage:

medications.where(medications.date.is_between_but_not_on("2020-03-31", "2021-04-01"))
For each trio of dates being compared, if any date is NULL the result is NULL.

is_on_or_between(start, end) πŸ”—

Return a boolean series which is True for each date in this series which is between or the same as the corresponding dates in start and end, and False otherwise.

Example usage:

medications.where(medications.date.is_on_or_between("2020-04-01", "2021-03-31"))
For each trio of dates being compared, if any date is NULL the result is NULL.

is_during(interval) πŸ”—

The same as is_on_or_between() above, but allows supplying a start/end date pair as single argument.

Example usage:

study_period = ("2020-04-01", "2021-03-31")
medications.where(medications.date.is_during(study_period))

Also see the docs on using is_during with the INTERVAL placeholder.

class DateEventSeries() πŸ”—

Multiple rows per patient series of type date

self == other πŸ”—

Return a boolean series comparing each value in this series with its corresponding value in other.

Note that the result of comparing anything with NULL (including NULL itself) is NULL.

Example usage:

patients.sex == "female"

self != other πŸ”—

Return the inverse of == above.

Note that the same point regarding NULL applies here.

Example usage:

patients.sex != "unknown"

self > other πŸ”—

Return a boolean series which is True for each value in this series that is strictly less than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") < 18

self >= other πŸ”—

Return a boolean series which is True for each value in this series that is less than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_underage = patients.age_on("2020-01-01") <= 17

self <= other πŸ”—

Return a boolean series which is True for each value in this series that is greater than or equal to its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") >= 18

self < other πŸ”—

Return a boolean series which is True for each value in this series that is strictly greater than its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

is_adult = patients.age_on("2020-01-01") > 17

self - other πŸ”—

Return a series giving the difference between each date in this series and other (see DateDifference).

Example usage:

age_months = (date("2020-01-01") - patients.date_of_birth).months

year πŸ”—

Return an integer series giving the year of each date in this series.

month πŸ”—

Return an integer series giving the month (1-12) of each date in this series.

day πŸ”—

Return an integer series giving the day of the month (1-31) of each date in this series.

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

Return a boolean series which is True for each value in this series which is contained in other.

See how to combine is_in with a codelist in the how-to guide.

Example usage:

medications.dmd_code.is_in(["39113311000001107", "39113611000001102"])

other accepts any of the standard "container" types (tuple, list, set, frozenset, or dict) or another event series.

is_not_in(other) πŸ”—

Return the inverse of is_in() above.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

to_first_of_year() πŸ”—

Return a date series with each date in this series replaced by the date of the first day in its corresponding calendar year.

Example usage:

patients.date_of_death.to_first_of_year()

to_first_of_month() πŸ”—

Return a date series with each date in this series replaced by the date of the first day in its corresponding calendar month.

Example usage:

patients.date_of_death.to_first_of_month()

is_before(other) πŸ”—

Return a boolean series which is True for each date in this series that is strictly earlier than its corresponding date in other and False otherwise (or NULL if either value is NULL).

Example usage:

medications.where(medications.date.is_before("2020-04-01"))

is_on_or_before(other) πŸ”—

Return a boolean series which is True for each date in this series that is earlier than or the same as its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

medications.where(medications.date.is_on_or_before("2020-03-31"))

is_after(other) πŸ”—

Return a boolean series which is True for each date in this series that is strictly later than its corresponding date in other and False otherwise (or NULL if either value is NULL).

Example usage:

medications.where(medications.date.is_after("2020-03-31"))

is_on_or_after(other) πŸ”—

Return a boolean series which is True for each date in this series that is later than or the same as its corresponding value in other and False otherwise (or NULL if either value is NULL).

Example usage:

medications.where(medications.date.is_on_or_after("2020-04-01"))

is_between_but_not_on(start, end) πŸ”—

Return a boolean series which is True for each date in this series which is strictly between (i.e. not equal to) the corresponding dates in start and end, and False otherwise.

Example usage:

medications.where(medications.date.is_between_but_not_on("2020-03-31", "2021-04-01"))
For each trio of dates being compared, if any date is NULL the result is NULL.

is_on_or_between(start, end) πŸ”—

Return a boolean series which is True for each date in this series which is between or the same as the corresponding dates in start and end, and False otherwise.

Example usage:

medications.where(medications.date.is_on_or_between("2020-04-01", "2021-03-31"))
For each trio of dates being compared, if any date is NULL the result is NULL.

is_during(interval) πŸ”—

The same as is_on_or_between() above, but allows supplying a start/end date pair as single argument.

Example usage:

study_period = ("2020-04-01", "2021-03-31")
medications.where(medications.date.is_during(study_period))

Also see the docs on using is_during with the INTERVAL placeholder.

count_distinct_for_patient() πŸ”—

Return an integer patient series counting the number of distinct values for each patient in the series (ignoring any NULL values).

Note that if a patient has no values at all in the series the result will be zero rather than NULL.

Example usage:

medications.dmd_code.count_distinct_for_patient()

minimum_for_patient() πŸ”—

Return the minimum value in the series for each patient (or NULL if the patient has no values).

Example usage:

clinical_events.where(...).numeric_value.minimum_for_patient()

maximum_for_patient() πŸ”—

Return the maximum value in the series for each patient (or NULL if the patient has no values).

Example usage:

clinical_events.where(...).numeric_value.maximum_for_patient()

count_episodes_for_patient(maximum_gap) πŸ”—

Counts the number of "episodes" for each patient where dates which are no more than maximum_gap apart are considered part of the same episode. The maximum_gap duration can be specified in days() or weeks().

For example, suppose a patient has the following sequence of events:

Event ID Date
A 2020-01-01
B 2020-01-04
C 2020-01-06
D 2020-01-10
E 2020-01-12

And suppose we count the episodes here using a maximum gap of three days:

.count_episodes_for_patient(days(3))

We will get an episode count of two: events A, B and C are considered as one episode and events D and E as another.

Note that events A and C are considered part of the same episode even though they are more than three days apart because event B is no more than three days apart from both of them. That is, the clock restarts with each new event in an episode rather than running from the first event in an episode.

class CodePatientSeries() πŸ”—

One row per patient series of type code

self == other πŸ”—

Return a boolean series comparing each value in this series with its corresponding value in other.

Note that the result of comparing anything with NULL (including NULL itself) is NULL.

Example usage:

patients.sex == "female"

self != other πŸ”—

Return the inverse of == above.

Note that the same point regarding NULL applies here.

Example usage:

patients.sex != "unknown"

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

Return a boolean series which is True for each value in this series which is contained in other.

See how to combine is_in with a codelist in the how-to guide.

Example usage:

medications.dmd_code.is_in(["39113311000001107", "39113611000001102"])

other accepts any of the standard "container" types (tuple, list, set, frozenset, or dict) or another event series.

is_not_in(other) πŸ”—

Return the inverse of is_in() above.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

to_category(categorisation, default=None) πŸ”—

An alias for map_values which makes the intention clearer when working with codelists.

For more detail see codelist_from_csv() and the how-to guide.

class CodeEventSeries() πŸ”—

Multiple rows per patient series of type code

self == other πŸ”—

Return a boolean series comparing each value in this series with its corresponding value in other.

Note that the result of comparing anything with NULL (including NULL itself) is NULL.

Example usage:

patients.sex == "female"

self != other πŸ”—

Return the inverse of == above.

Note that the same point regarding NULL applies here.

Example usage:

patients.sex != "unknown"

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

Return a boolean series which is True for each value in this series which is contained in other.

See how to combine is_in with a codelist in the how-to guide.

Example usage:

medications.dmd_code.is_in(["39113311000001107", "39113611000001102"])

other accepts any of the standard "container" types (tuple, list, set, frozenset, or dict) or another event series.

is_not_in(other) πŸ”—

Return the inverse of is_in() above.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

to_category(categorisation, default=None) πŸ”—

An alias for map_values which makes the intention clearer when working with codelists.

For more detail see codelist_from_csv() and the how-to guide.

count_distinct_for_patient() πŸ”—

Return an integer patient series counting the number of distinct values for each patient in the series (ignoring any NULL values).

Note that if a patient has no values at all in the series the result will be zero rather than NULL.

Example usage:

medications.dmd_code.count_distinct_for_patient()

class MultiCodeStringPatientSeries() πŸ”—

One row per patient series of type multi code string

self == other πŸ”—

This operation is not allowed because it is unlikely you would want to match the values in this field with an exact string e.g.

apcs.all_diagnoses == "||I302, K201, J180 || I302, K200, M920"
Instead you should use the contains or contains_any_of methods.

self != other πŸ”—

See above.

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

This operation is not allowed. To check for the presence of any codes in a codelist, please use the contains_any_of(codelist) method instead.

is_not_in(other) πŸ”—

This operation is not allowed. To check for the absence of all codes in a codelist, from a column called column, please use ~column.contains_any_of(codelist). NB the contains_any_of(codelist) will provide any records that contain any of the codes, which is then negated with the ~ operator.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

contains(code) πŸ”—

Check if the multi code field contains a specific code string and return the result as a boolean series. code can either be a string (and prefix matching works so e.g. "N17" in ICD-10 would match all acute renal failure), or a clinical code.

Example usages:

all_diagnoses.contains("N17")
all_diagnoses.contains(ICD10Code("N170"))

contains_any_of(codelist) πŸ”—

Check if any of the codes in codelist occur in the multi code field and return the result as a boolean series. As with the contains(code) method, the codelist can be a mixture of clinical codes and string prefixes, as seen in the example below.

Example usage:

all_diagnoses.contains([ICD10Code("N170"), "N17"])

class MultiCodeStringEventSeries() πŸ”—

Multiple rows per patient series of type multi code string

self == other πŸ”—

This operation is not allowed because it is unlikely you would want to match the values in this field with an exact string e.g.

apcs.all_diagnoses == "||I302, K201, J180 || I302, K200, M920"
Instead you should use the contains or contains_any_of methods.

self != other πŸ”—

See above.

is_null() πŸ”—

Return a boolean series which is True for each NULL value in this series and False for each non-NULL value.

Example usage:

patients.date_of_death.is_null()

is_not_null() πŸ”—

Return the inverse of is_null() above.

Example usage:

patients.date_of_death.is_not_null()

when_null_then(other) πŸ”—

Replace any NULL value in this series with the corresponding value in other.

Note that other must be of the same type as this series.

Example usage:

(patients.date_of_death < "2020-01-01").when_null_then(False)

is_in(other) πŸ”—

This operation is not allowed. To check for the presence of any codes in a codelist, please use the contains_any_of(codelist) method instead.

is_not_in(other) πŸ”—

This operation is not allowed. To check for the absence of all codes in a codelist, from a column called column, please use ~column.contains_any_of(codelist). NB the contains_any_of(codelist) will provide any records that contain any of the codes, which is then negated with the ~ operator.

map_values(mapping, default=None) πŸ”—

Return a new series with mapping applied to each value. mapping should be a dictionary mapping one set of values to another.

Example usage:

school_year = patients.age_on("2020-09-01").map_values(
    {13: "Year 9", 14: "Year 10", 15: "Year 11"},
    default="N/A"
)

contains(code) πŸ”—

Check if the multi code field contains a specific code string and return the result as a boolean series. code can either be a string (and prefix matching works so e.g. "N17" in ICD-10 would match all acute renal failure), or a clinical code.

Example usages:

all_diagnoses.contains("N17")
all_diagnoses.contains(ICD10Code("N170"))

contains_any_of(codelist) πŸ”—

Check if any of the codes in codelist occur in the multi code field and return the result as a boolean series. As with the contains(code) method, the codelist can be a mixture of clinical codes and string prefixes, as seen in the example below.

Example usage:

all_diagnoses.contains([ICD10Code("N170"), "N17"])

count_distinct_for_patient() πŸ”—

Return an integer patient series counting the number of distinct values for each patient in the series (ignoring any NULL values).

Note that if a patient has no values at all in the series the result will be zero rather than NULL.

Example usage:

medications.dmd_code.count_distinct_for_patient()


Date ArithmeticπŸ”—

ehrQL supports adding and subtracting durations from dates e.g. date_of_admission + days(10) or date_of_discharge - weeks(6). It also supports finding the difference between dates e.g.

days_in_hospital = (date_of_discharge - date_of_admission).days

When working with dates you should generally prefer using days or weeks as units, rather than months or years, unless there's a specific reason you care about the calendar. Adding or subtracting days or weeks to a date is a simple, unambiguous process; adding years or months introduces ambiguities and complexities as neither unit is a consistent length (see section below for details). So, for example, unless there is specific epidemiological significance to whether an event happened exactly three calendar months ago, it is better to say "90 days" rather than "3 months".

Additionally, some dates in the patient data (e.g. dates of birth) have been rounded to the first of the month for privacy purposes. It's pointless applying precise calendar arithmetic to such dates.

Tip: it can be clearer to readers to write time periods as a product rather than just a number e.g. it's easier to see that days(5 * 365) is approximately five years than it is with days(1825).

Ambiguous DatesπŸ”—

Adding years or months to a date sometimes has a clear answer e.g. 1 January 2001 is exactly one calendar year later than 1 January 2000, and 1 February is exactly one calendar month later. But not all cases are clear, for instance: which day is exactly one year after 29 February 2000? There is no 29 February 2001, so should it be 28 February or 1 March? And which day is exactly one calendar month after 31 August? There is no 31 September, so should it be 30 September or 1 October? Different databases give different answers here.

ehrQL takes the approach that we consistently round up to the next date. That is, whenever the naΓ―ve calculation takes us to a date which doesn't exist (like 29 February 2001 or 31 September) we always take the next largest date (1 March 2001, or 1 October).

As different databases take different approaches here, ehrQL has to go to some lengths to ensure that we get the same results on every database we support. This introduces complexity into the generated SQL which may have performance costs in some circumstances. This is another reason to prefer unambiguous units of days and weeks where possible.

class DateDifference(lhs, rhs) πŸ”—

Represents the difference between two dates or date series (i.e. it is what you get when you perform subtractions on DatePatientSeries or DateEventSeries).

days πŸ”—

The value of the date difference in days (can be positive or negative).

weeks πŸ”—

The value of the date difference in whole weeks (can be positive or negative).

months πŸ”—

The value of the date difference in whole calendar months (can be positive or negative).

years πŸ”—

The value of the date difference in whole calendar years (can be positive or negative).

class days(value) πŸ”—

Represents a duration of time specified in days.

Example usage:

last_medication_date = medications.sort_by(medications.date).last_for_patient().date
start_date = last_medication_date - days(90)
end_date = last_medication_date + days(90)

self == other πŸ”—

Return True if other has the same value and units, and False otherwise.

Hence, the result of weeks(1) == days(7) will be False.

self != other πŸ”—

Return the inverse of == above.

self + other πŸ”—

If other is a date or date series, add this duration to other to produce a new date.

If other is another duration with the same units, add the two durations together to produce a new duration.

self - other πŸ”—

Subtract other from this duration. other must be a duration in the same units.

- self πŸ”—

Invert this duration, i.e. count the duration backwards in time if it was originally forwards, and vice versa.

starting_on(date) πŸ”—

Return a list of time intervals covering the duration starting on date. Each interval lasts one unit.

Example usage:

weeks(3).starting_on("2000-01-01")
The above would return:
[
    (date(2000, 1, 1), date(2000, 1, 7)),
    (date(2000, 1, 8), date(2000, 1, 14)),
    (date(2000, 1, 15), date(2000, 1, 21)),
]

Useful for generating the intervals arguments to Measures.

ending_on(date) πŸ”—

Return a list of time intervals covering the duration ending on date. Each interval lasts one unit.

Example usage:

weeks(3).ending_on("2000-01-21")
The above would return:
[
    (date(2000, 1, 1), date(2000, 1, 7)),
    (date(2000, 1, 8), date(2000, 1, 14)),
    (date(2000, 1, 15), date(2000, 1, 21)),
]

Useful for generating the intervals arguments to Measures.

class months(value) πŸ”—

Represents a duration of time specified in calendar months.

Example usage:

last_medication_date = medications.sort_by(medications.date).last_for_patient().date
start_date = last_medication_date - months(3)
end_date = last_medication_date + months(3)

Consider using days() or weeks() instead - see the section on Ambiguous Dates for more.

self == other πŸ”—

Return True if other has the same value and units, and False otherwise.

Hence, the result of weeks(1) == days(7) will be False.

self != other πŸ”—

Return the inverse of == above.

self + other πŸ”—

If other is a date or date series, add this duration to other to produce a new date.

If other is another duration with the same units, add the two durations together to produce a new duration.

self - other πŸ”—

Subtract other from this duration. other must be a duration in the same units.

- self πŸ”—

Invert this duration, i.e. count the duration backwards in time if it was originally forwards, and vice versa.

starting_on(date) πŸ”—

Return a list of time intervals covering the duration starting on date. Each interval lasts one unit.

Example usage:

weeks(3).starting_on("2000-01-01")
The above would return:
[
    (date(2000, 1, 1), date(2000, 1, 7)),
    (date(2000, 1, 8), date(2000, 1, 14)),
    (date(2000, 1, 15), date(2000, 1, 21)),
]

Useful for generating the intervals arguments to Measures.

ending_on(date) πŸ”—

Return a list of time intervals covering the duration ending on date. Each interval lasts one unit.

Example usage:

weeks(3).ending_on("2000-01-21")
The above would return:
[
    (date(2000, 1, 1), date(2000, 1, 7)),
    (date(2000, 1, 8), date(2000, 1, 14)),
    (date(2000, 1, 15), date(2000, 1, 21)),
]

Useful for generating the intervals arguments to Measures.

class weeks(value) πŸ”—

Represents a duration of time specified in weeks.

Example usage:

last_medication_date = medications.sort_by(medications.date).last_for_patient().date
start_date = last_medication_date - weeks(12)
end_date = last_medication_date + weeks(12)

self == other πŸ”—

Return True if other has the same value and units, and False otherwise.

Hence, the result of weeks(1) == days(7) will be False.

self != other πŸ”—

Return the inverse of == above.

self + other πŸ”—

If other is a date or date series, add this duration to other to produce a new date.

If other is another duration with the same units, add the two durations together to produce a new duration.

self - other πŸ”—

Subtract other from this duration. other must be a duration in the same units.

- self πŸ”—

Invert this duration, i.e. count the duration backwards in time if it was originally forwards, and vice versa.

starting_on(date) πŸ”—

Return a list of time intervals covering the duration starting on date. Each interval lasts one unit.

Example usage:

weeks(3).starting_on("2000-01-01")
The above would return:
[
    (date(2000, 1, 1), date(2000, 1, 7)),
    (date(2000, 1, 8), date(2000, 1, 14)),
    (date(2000, 1, 15), date(2000, 1, 21)),
]

Useful for generating the intervals arguments to Measures.

ending_on(date) πŸ”—

Return a list of time intervals covering the duration ending on date. Each interval lasts one unit.

Example usage:

weeks(3).ending_on("2000-01-21")
The above would return:
[
    (date(2000, 1, 1), date(2000, 1, 7)),
    (date(2000, 1, 8), date(2000, 1, 14)),
    (date(2000, 1, 15), date(2000, 1, 21)),
]

Useful for generating the intervals arguments to Measures.

class years(value) πŸ”—

Represents a duration of time specified in calendar years.

Example usage:

last_medication_date = medications.sort_by(medications.date).last_for_patient().date
start_date = last_medication_date - years(1)
end_date = last_medication_date + years(1)

Consider using days() or weeks() instead - see the section on Ambiguous Dates for more.

self == other πŸ”—

Return True if other has the same value and units, and False otherwise.

Hence, the result of weeks(1) == days(7) will be False.

self != other πŸ”—

Return the inverse of == above.

self + other πŸ”—

If other is a date or date series, add this duration to other to produce a new date.

If other is another duration with the same units, add the two durations together to produce a new duration.

self - other πŸ”—

Subtract other from this duration. other must be a duration in the same units.

- self πŸ”—

Invert this duration, i.e. count the duration backwards in time if it was originally forwards, and vice versa.

starting_on(date) πŸ”—

Return a list of time intervals covering the duration starting on date. Each interval lasts one unit.

Example usage:

weeks(3).starting_on("2000-01-01")
The above would return:
[
    (date(2000, 1, 1), date(2000, 1, 7)),
    (date(2000, 1, 8), date(2000, 1, 14)),
    (date(2000, 1, 15), date(2000, 1, 21)),
]

Useful for generating the intervals arguments to Measures.

ending_on(date) πŸ”—

Return a list of time intervals covering the duration ending on date. Each interval lasts one unit.

Example usage:

weeks(3).ending_on("2000-01-21")
The above would return:
[
    (date(2000, 1, 1), date(2000, 1, 7)),
    (date(2000, 1, 8), date(2000, 1, 14)),
    (date(2000, 1, 15), date(2000, 1, 21)),
]

Useful for generating the intervals arguments to Measures.


CodelistsπŸ”—

codelist_from_csv(filename, column, category_column=None) πŸ”—

Read a codelist from a CSV file as either a list or a dictionary (for categorised codelists).

filename
Path to the file on disk, relative to the root of your repository. (Remember to use UNIX/style/forward-slashes not Windows\style\backslashes.)

column
Name of the column in the CSV file which contains the codes.

category_column
Optional name of a column in the CSV file which contains categories to which each code should be mapped. If this argument is passed then the resulting codelist will be a dictionary mapping each code to its corresponding category. This can be passed to the to_category() method to map a series of codes to a series of categories.

For more detail see the how-to guide.


FunctionsπŸ”—

case(*when_thens, otherwise=None) πŸ”—

Take a sequence of condition-values of the form:

when(condition).then(value)

And evaluate them in order, returning the value of the first condition which evaluates True. If no condition matches, return the otherwise value (or NULL if no otherwise value is specified).

Example usage:

category = case(
    when(size < 10).then("small"),
    when(size < 20).then("medium"),
    when(size >= 20).then("large"),
    otherwise="unknown",
)

Note that because the conditions are evaluated in order we don't need the condition for "medium" to specify (size >= 10) & (size < 20) because by the time the condition for "medium" is being evaluated we already know the condition for "small" is False.

A simpler form is available when there is a single condition. This example:

category = case(
    when(size < 15).then("small"),
    otherwise="large",
)

can be rewritten as:

category = when(size < 15).then("small").otherwise("large")

maximum_of(value, other_value, *other_values) πŸ”—

Return the maximum value of a collection of Series or Values, disregarding NULLs.

Example usage:

latest_event_date = maximum_of(event_series_1.date, event_series_2.date, "2001-01-01")

minimum_of(value, other_value, *other_values) πŸ”—

Return the minimum value of a collection of Series or Values, disregarding NULLs.

Example usage:

ealiest_event_date = minimum_of(event_series_1.date, event_series_2.date, "2001-01-01")


MeasuresπŸ”—

Measures are used for calculating the ratio of one quantity to another as it varies over time, and broken down by different demographic (or other) groupings.

create_measures() πŸ”—

A measure definition file must define a collection of measures called measures.

measures = create_measures()

Add measures to the collection using define_measure:

measures.define_measure(
    name="adult_proportion",
    numerator=patients.age_on(INTERVAL.start_date) >=18,
    denominator=patients.exists_for_patient()
)

class Measures() πŸ”—

To create a collection of measures use the create_measures function.

define_measure(name, numerator=None, denominator=None, group_by=None, intervals=None) πŸ”—

Add a measure to the collection of measures to be generated.

name
The name of the measure, as a string. Only used to identify the measure in the output. Must contain only alphanumeric and underscore characters and must start with a letter.

numerator
The numerator definition, which must be a patient series but can be either boolean or integer.

denominator
The denominator definition, which must be a patient series but can be either boolean or integer.

group_by
Optional groupings to break down the results by. If supplied, must be a dictionary of the form:

{
    "group_name": group_definition,
    ...
}

  • each group_name becomes a column in the output. It must contain only alphanumeric and underscore characters and must start with a letter. It also must not clash with any reserved column names like "numerator" or "ratio".

  • each group_definition must be a categorical patient series (i.e. a patient series which takes only a fixed set of values).

intervals
A list of start/end date pairs over which to evaluate the measures. These can be most conveniently generated using the starting_on()/ending_on() methods on years, months, and weeks e.g.

intervals = months(12).starting_on("2020-01-01")

The numerator, denominator and intervals arguments can be omitted if default values for them have been set using define_defaults().

define_defaults(numerator=None, denominator=None, group_by=None, intervals=None) πŸ”—

Define default values for a collection of measures. Useful to reduce repetition when defining several measures which share common arguments.

Example usage:

measures.define_defaults(
    intervals=months(6).starting_on("2020-01-01"),
)

Note that you can only define a single set of defaults and attempting to call this method more than once is an error.

configure_dummy_data(population_size=10, legacy=False, timeout=60, additional_population_constraint=None) πŸ”—

Configure the dummy data to be generated.

population_size
Maximum number of patients to generate.

Note that you may get fewer patients than this if the generator runs out of time – see timeout below.

legacy
Use legacy dummy data.

timeout
Maximum time in seconds to spend generating dummy data.

additional_population_constraint
An additional ehrQL query that can be used to constrain the population that will be selected for dummy data. This is incompatible with legacy mode.

For example, if you wanted to ensure that two dates appear in a particular order in your dummy data, you could add additional_population_constraint = dataset.first_date < dataset.second_date.

You can also combine constraints with & as normal in ehrQL. E.g. additional_population_constraint = patients.sex.is_in(['male', 'female']) & ( patients.age_on(some_date) < 80) would give you dummy data consisting of only men and women who were under the age of 80 on some particular date.

Example usage:

measures.configure_dummy_data(population_size=10000)

configure_disclosure_control(enabled=True) πŸ”—

Configure disclosure control.

By default, numerators and denominators are subject to disclosure control. First, values less than or equal to seven are replaced with zero (suppressed); then, values are rounded to the nearest five.

To disable disclosure control:

measures.configure_disclosure_control(enabled=False)

For more information about disclosure control in OpenSAFELY, please see the "Updated disclosure control guidance" page.

INTERVAL πŸ”—

This is a placeholder value to be used when defining numerator, denominator and group_by columns in a measure. This allows these definitions to be written once and then be automatically evaluated over multiple different intervals. Can be used just like any pair of dates in ehrQL.

Example usage:

clinical_events.date.is_during(INTERVAL)

start_date πŸ”—

Placeholder for the start date (inclusive) of the interval. Can be used like any other date.

Example usage:

clinical_events.date.is_on_or_after(INTERVAL.start_date)

end_date πŸ”—

Placeholder for the end date (inclusive) of the interval. Can be used like any other date.

Example usage:

clinical_events.date.is_on_or_before(INTERVAL.end_date)