Study design and setting

A cohort study utilising routinely collected individual-level; anonymised population-scale linked data within the Secure Anonymised Information Linkage (SAIL) Databank. Data sources include general practitioners (GP), hospital admissions, national community child health, maternal indicators, and vaccination data sources. All women recorded as being pregnant on or after 13th April 2021, aged 18 years or older, and eligible for COVID-19 vaccination were identified. They were linked to the COVID-19 vaccination data for dates up to and including 31st December 2021. Time to vaccination was measured from the pregnancy start date for all women recorded as being pregnant on or after the study start date. For those who began pregnancy prior to the study period, when vaccination was not yet accessible to them, the time was measured from the study start date, which corresponds to when vaccination became accessible. In the case of women who received the first dose prior to pregnancy, the second dose was considered as the first vaccination received during pregnancy for time measurement.

Data sources and linkage

Analysis was undertaken using anonymised population-scale, individual-level linked routinely collected national-scale data available in the SAIL Databank [17, 18], which anonymously links a wide range of person-based data employing a unique personal identifier. The linkage includes primary care data from Wales Longitudinal General Practice (WLGP) linked with secondary care data from inpatient hospital admissions, inpatient from Patient Episode Database for Wales (PEDW), and outpatient from Outpatient Database for Wales (OPDW), pregnancy and maternity related data from the National Community Child Health Database (NCCHD) and Maternal Indicators (MIDS) and vaccination data from the COVID-19 Vaccination Dataset (CVVD) [5]. The primary care data utilises Read codes, which are predominantly 5-digit codes that relate to diagnosis, medication, and process of care codes. The secondary care data uses International Classification of Diseases version 10 (ICD-10) codes for diagnosis and OPCS Classification of Interventions and Procedures version 4 (OPCS-4) surgical interventions. The NCCHD comprises information pertaining to birth registration, monitoring of child health examinations, and immunisations. The MIDS data contains data relating to the woman at initial assessment and to the mother and baby (or babies) for all births [5]. In addition to these data sources, the Welsh Demographic Service Dataset (WDSD) was linked to extract Lower-layer Super Output Area (LSOA) version 2011 information associated with area-level deprivation. In particular, the Welsh Index for Multiple Deprivation (WIMD) version 2019 was employed as a proxy to assess socioeconomic status [5]. These records were linked at the individual level for all women known to be pregnant in Wales between 13th April 2021 and 31st December 2021. Linkage quality has been assessed and reported as 99.9% for WLGP records and 99.3% for PEDW records [19]. All linkage was at the individual level.

Study population and key dates

Pregnant women were identified as any woman who had pregnancy codes in the WLGP data or in hospital admissions (PEDW) for pregnancy. Additionally, any mothers recorded in the NCCHD or MIDS data with the baby birth date (referred to as the pregnancy end date) and gestational age at birth available were also identified. The baby’s birth date and gestational age enabled the start date of pregnancy to be determined for those who gave birth during the study period [5]. Data collected included vaccination data, maternal age, ethnic group, WIMD 2019, smoking status, depression, diabetes, asthma, and different types of cardiovascular disease including myocardial infarction, cerebral infarction, and non-haemorrhagic or non-infarction stroke. The WIMD 2019 is an official measure for the relative deprivation of areas of Wales. It combines eight separate domains of deprivation, each compiled from a range of different indicators (income, employment, health, education, access to services, housing, community safety, and physical environment) into a single score and is widely used to measure deprivation in Wales [5]. Ethnic groups are categorised in SAIL into White, Asian, Mixed, Black, and Other. Smoking status is categorised in SAIL into Current Smoker, Former Smoker, Never Smoker, and Unknown [20]. In cases where women had multiple recorded statuses, the most recent status during or prior to pregnancy was selected.

The study start date of 13th April 2021 was selected because phase 2 of the vaccination program, which aimed to provide vaccinations to individuals aged 40 to 49, 30 to 39, and 18 to 29 years, commenced on this date. The inclusion criteria were currently pregnant women who had not received the vaccination or had one dose of vaccination before pregnancy, alive, known pregnant on the first day of follow-up, and aged 18 years or older. Pregnancies identified later in the study period were followed until delivery. The exclusion criteria were women who were fully vaccinated (i.e., two vaccinations) before pregnancy, those for whom it was not possible to determine the start date of pregnancy due to unavailability of the gestational age and initial assessment dates in their records or those with miscarriage or stillbirth outcomes. Currently, within SAIL researchers are unable to account for terminations as these are classed as sensitive data and not currently accessible for research purposes.

Calculating pregnancy start date

Pregnancy start dates were calculated from the following sources:

For pregnancies identified from the NCCHD and MIDS data, the pregnancy start dates were calculated based on the gestational age and the week of birth data items available in these data sources. In cases where gestational age is missing, a value of 40 weeks was applied as the majority of those with missing data (92%) had birth weights suggestive of full-term infants. Thus, the pregnancy start date (last menstrual period) was simply calculated by subtracting the gestational age at birth (in weeks) from the week of birth. Pregnancies identified from both data sources were compared/matched and duplicate records were removed.

For pregnancies identified from the WLGP data, all pregnant women with a pregnancy code and event date that occurred during the study period were extracted (Supplementary Table 1: Read codes (v2)). For those identified from the hospital admissions data (PEDW), all women with a pregnancy diagnosis code and an attendance date occurring during the study period were also extracted (Supplementary Table 2: ICD-10 codes). Identified cases from both the WLGP and PEDW were separately matched to those identified from the NCCHD and MIDS data to include only those who are still pregnant. Furthermore, the identified cases from both resources were further matched to remove duplicates and then linked to the initial assessment-related data items in the MIDS data. The gestational age in weeks and initial assessment data items are available in order to calculate the pregnancy start date. In cases where multiple records were found per pregnant woman, only the first occurring record between the study dates of interest was selected. The pregnancy start date for every successfully linked case was then calculated by subtracting the gestational age from the initial assessment date.

Multimorbidity in pregnancy

Multimorbidity was defined by the presence of two or more long-term health conditions, which can include defined physical and mental health conditions [8]. Long-term health conditions are those that generally last a year or longer and have a significant impact on a person’s life [9]. Four long-term health conditions including depression, diabetes, asthma, and cardiovascular were selected on the following basis: (1) prevalence; (2) potential to impact vaccine uptake; and (3) recorded in the study datasets. These conditions were aggregated to generate a new multimorbidity variable with two distinct categories: Multimorbid and Non-multimorbid. The multimorbid category comprises those with two or more health conditions, while the non-multimorbid comprises healthy individuals together with those with only one health condition. Read codes for depression, diabetes, asthma, and cardiovascular can be found in Supplementary Tables 3 to 6 respectively. ICD-10 codes for the same conditions can be found in Supplementary Tables 7 to 10 respectively.

Statistical analysis

Kaplan-Meier survival analysis was employed to examine time to vaccination by depression, diabetes, asthma, and cardiovascular diseases independently, by multimorbidity, as well as by smoking status censored at the delivery, death, or moved out of Wales while pregnant. The Log Rank test was used to determine if there were differences in the survival distributions of vaccine uptake times within the diseases independently, multimorbidity, and smoking status. Differences were reported in median times (MD) with 95% confidence intervals and significance level accepted at p < 0.05. Multivariate Cox regression hazard models were utilised to examine the impact of depression, diabetes, asthma, and cardiovascular diseases on vaccine uptake independently with age group, ethnic group, area of deprivation, and smoking status incorporated into the model, as well as the impact of multimorbidity, age group, ethnic group, area of deprivation, and smoking status on vaccine uptake, reporting hazard ratios (HR) with 95% confidence intervals and significance level accepted at p < 0.05. Bootstrapping internal validation was conducted to assess the performance of the model, reporting bootstrapped Beta coefficients (B), standard error, 95% confidence intervals, and significance level accepted at p < 0.05. The reference groups were those without multimorbidity, never smokers, aged 25–29, white ethnic group, and those living in the most affluent area. The data handling and preparation for the descriptive statistics, survival analysis, and Cox proportional hazard modelling were performed in an SQL IBM DB2 database within the SAIL Databank utilising Eclipse software. Final data preparation specific to these analyses, such as setting the reference groups was performed in IBM SPSS Statistics 28. Descriptive statistics, Survival, and Cox regression analyses were performed in SPSS.