vignettes/load-data.Rmd
load-data.Rmd
This vignette demonstrates how to clean and validate hospital records and load them into a data warehouse.
Ramses is provided simulated datasets consisting of records of 355 hospitalisations for 199 patients, complete with prescriptions, diagnoses and care episodes records.
First, packages must be loaded.
Second, a relational database server is needed.
For the purposes of this vignette, one may use on a small DuckDB database on the local hard drive. This database will behave in a similar way as a dedicated relational database server. This method should only be used for training with simulated data as it is not a secure method of handling actual patient data.
ramses_db <- connect_local_database("ramses-db.duckdb")
#> DuckDB database created in
#> ramses-db.duckdb
#> Please do not use real patient data.
Alternatively, one may use a dedicated PostgreSQL server (other SQL databases are not currently supported). In this case, we use the DBI and RPostgres packages to initiate the connection:
ramses_db <- DBI::dbConnect(
RPostgres::Postgres(),
host = 'example_server_name' ,
dbname = 'example_database_name',
user = 'username',
password = 'password',
port = 5432,
# Do not forget to specify a time zone to correctly collect datetime records
timezone = "Europe/London")
# Specify the name of the schema Ramses should use on the database
# (if not, Ramses will use 'public' by default)
DBI::dbExecute(ramses_db, "SET search_path = ramses_schema, public;")
Whatever method is used, the database connection information is
stored in an object (here ramses_db
) which must be provided
to most functions in the Ramses package.
We can now begin preparing records and loading them into the database. This involves three steps:
validate_inpatient_episodes()
) to verify that all data
conform to Ramses data validation constraintsWe begin with prescription records (you may choose to start with other records).
str(drug_prescriptions)
#> 'data.frame': 367 obs. of 12 variables:
#> $ patient_id : chr "5124578766" "4874231672" "6292626973" "6292626973" ...
#> $ prescription_id : chr "66cac1c5eab88d72c8b7687966357f5b" "8ccd67f4730b62ceafb8bcb27996c10c" "72cf4b592b0f4e2143b4bb9d7c569c97" "806c86b55cf50505a20b722f081c4075" ...
#> $ rxsummary : chr "Piperacillin / Tazobactam IVI 4.5 g TDS" "Ciprofloxacin ORAL 500 mg BD" "Flucloxacillin ORAL 500 mg 6H" "Metronidazole ORAL 400 mg TDS" ...
#> $ authoring_date : POSIXct, format: "2015-08-04 13:07:16" "2017-07-06 09:49:31" ...
#> $ prescription_start: POSIXct, format: "2015-08-04 14:45:16" "2017-07-06 10:26:31" ...
#> $ prescription_end : POSIXct, format: "2015-08-07 14:45:16" "2017-07-07 22:26:31" ...
#> $ tr_DESC : chr "Piperacillin / Tazobactam" "Ciprofloxacin" "Flucloxacillin" "Metronidazole" ...
#> $ route : chr "IV" "ORAL" "ORAL" "ORAL" ...
#> $ dose : num 4.5 500 500 400 4.5 500 4.5 4.5 600 2 ...
#> $ units : chr "g" "mg" "mg" "mg" ...
#> $ frequency : chr "TDS" "BD" "6H" "TDS" ...
#> $ duration : num 3 1.5 4 2 4 3 3 2 1 2 ...
Prescriptions records must conform to data validation requirements
listed in the documentation for
validate_prescriptions()
We first map drug concepts using the as.ab()
function
provided by the AMR
package. This function takes a character variable as input and
returns a three-letter code identifying antifungals and antibacterials.
These give access to the AMR package’s wide range of analytical methods.
Although the AMR package does not currently provide equivalent mapping
for antiparasitic and antiviral drugs, these can be mapped to any
substance-level terminology of your choice. RxNorm Ingredients
and SNOMED CT UK Edition Virtual
Therapeutic Moieties are both good options.
# attempting to map drug name using AMR package
drug_prescriptions$drug_code <- AMR::as.ab(drug_prescriptions$tr_DESC)
Some manual editing may be required when drugs are not successfully mapped, as shown in the example below.
# editing drug names
drug_prescriptions$drug_code <- gsub("Vancomycin protocol",
"Vancomycin",
drug_prescriptions$tr_DESC)
# mapping drug name using AMR package
drug_prescriptions$drug_code <- AMR::as.ab(drug_prescriptions$drug_code)
drug_prescriptions$drug_name <- AMR::ab_name(drug_prescriptions$drug_code)
We also map the route of administration to the World Health Organisation Anatomical Therapeutic Chemical Classification System (ATC) codes:
# recoding route of administration
drug_prescriptions <- mutate(drug_prescriptions,
ATC_route =
case_when(
route %in% c(NULL) ~ "Implant",
route %in% c("NEB", "INHAL") ~ "Inhal",
route %in% c("TOP", "EYE", "EYEL", "EYER", "EYEB",
"EAR", "EARL", "EARR", "EARB") ~ "Instill",
route %in% c("NASAL", "NOST", "NOSTL", "NOSTR", "NOSTB") ~ "N",
route %in% c("ORAL", "NAS", "PEG") ~ "O",
route %in% c("IV", "IVB", "IVI", "IMI", "IT", "IVT") ~ "P",
route %in% c("PR") ~ "R",
route %in% c("BUCC", "SB", "OROM", "SUBL") ~ "SL",
route %in% c("SC", "ID") ~ "TD",
route %in% c("PV") ~ "V",
TRUE ~ NA_character_
))
Using the AMR package, it is possible to infer important properties of antibacterial/antifungal drugs:
# extracting the ATC code (eg J01CR02) group (eg Beta-lactam antibacterials, penicillins)
drug_prescriptions$ATC_code <- AMR::ab_atc(drug_prescriptions$drug_code, only_first = TRUE)
drug_prescriptions$ATC_group <- AMR::ab_atc_group1(drug_prescriptions$drug_code)
# removing antifungal drugs, if applicable
drug_prescriptions <- filter(drug_prescriptions,
ATC_group != "Antimycotics for systemic use")
Working out the prescription DDDs can be complex in the case of compound medications. The ATC provides reference DDDs with respect to the main substance in a given medication. For example:
In order to obtain the correct DDDs for compound medications, it is recommended to map data to a suitable dictionary of medicines such as the RxNorm or the SNOMED CT UK Drug Extension and extract the corresponding ATC codes and DDDs.
In this example, however, we use a look-up table for the strength of compound medications.
# prepare DDD extraction
compound_strength_lookup <- data.frame(list(
drug_code = c("AMC", "AMC", "TZP", "SMX"),
route = c("oral", "oral", "oral", "oral"),
dose = c(625, 1.2, 4.5, 480),
units = c("mg", "g", "g", "mg"),
strength = c(500, 1, 4, 400),
basis_of_strength = c("AMX", "AMX", "PIP", "SMX")
), stringsAsFactors = F)
drug_prescriptions <- merge(drug_prescriptions,
compound_strength_lookup,
all.x = T)
drug_prescriptions <- drug_prescriptions %>%
mutate(strength = if_else(is.na(strength), dose, strength),
basis_of_strength = if_else(is.na(basis_of_strength),
as.character(drug_code),
basis_of_strength))
Ramses provides another look-up table
Ramses::reference_drug_frequency
to help convert the
frequency
character variable into a numeric variable
indicating the daily frequency of administration.
drug_prescriptions <- merge(drug_prescriptions,
reference_drug_frequency, by = "frequency", all.x = T)
We can now compute the prescription DDD based on the ATC reference DDDs and the daily frequency of administration:
drug_prescriptions <- drug_prescriptions %>%
mutate(daily_dose = strength * daily_frequency) %>%
mutate(DDD = compute_DDDs(
ATC_code = AMR::ab_atc(basis_of_strength, only_first = TRUE),
ATC_administration = ATC_route,
dose = daily_dose,
unit = units)) %>%
mutate(duration_days = if_else(
daily_frequency == -1,
"one-off",
if_else(
round(difftime(prescription_end,
prescription_start,
units = "days")) == 1,
"1 day",
paste(round(difftime(prescription_end,
prescription_start,
units = "days")),
"days")
)
))
drug_prescriptions <- drug_prescriptions %>%
transmute(patient_id,
prescription_id,
# prescription summary for displaying in interfaces
prescription_text = paste0(
drug_name, " ",
route, " ",
dose, units, " ",
duration_days
),
drug_code,
drug_name = drug_name,
drug_display_name = drug_name,
# in this example dataset, only antibacterial drugs are included
antiinfective_type = "antibacterial",
ATC_code,
ATC_group,
ATC_route,
authoring_date,
prescription_start,
prescription_end,
prescription_status = "completed",
prescription_context = "inpatient",
dose,
unit = units,
route,
frequency,
daily_frequency,
DDD)
An identical process is followed for drug administration, with the difference that DDDs refer to the dose of a single administration, and thus do not involve a frequency variable.
drug_administrations$drug_code <- gsub("Vancomycin protocol",
"Vancomycin",
drug_administrations$tr_DESC)
drug_administrations$drug_code <- AMR::as.ab(drug_administrations$drug_code)
drug_administrations$drug_name <- AMR::ab_name(drug_administrations$drug_code)
drug_administrations$drug_group <- AMR::ab_group(drug_administrations$drug_code)
# recoding route of administration
drug_administrations <- mutate(drug_administrations,
ATC_route =
case_when(
route %in% c(NULL) ~ "Implant",
route %in% c("NEB", "INHAL") ~ "Inhal",
route %in% c("TOP", "EYE", "EYEL", "EYER", "EYEB",
"EAR", "EARL", "EARR", "EARB") ~ "Instill",
route %in% c("NASAL", "NOST", "NOSTL", "NOSTR", "NOSTB") ~ "N",
route %in% c("ORAL", "NAS", "PEG") ~ "O",
route %in% c("IV", "IVB", "IVI", "IMI", "IT", "IVT") ~ "P",
route %in% c("PR") ~ "R",
route %in% c("BUCC", "SB", "OROM", "SUBL") ~ "SL",
route %in% c("SC", "ID") ~ "TD",
route %in% c("PV") ~ "V",
TRUE ~ "NA_character_"
))
drug_administrations$ATC_code <- AMR::ab_atc(drug_administrations$drug_code, only_first = TRUE)
drug_administrations <- filter(drug_administrations,
drug_group != "Antifungals/antimycotics")
drug_administrations <- merge(drug_administrations, compound_strength_lookup, all.x = T)
drug_administrations <- drug_administrations %>%
mutate(strength = if_else(is.na(strength), dose, strength),
basis_of_strength = if_else(is.na(basis_of_strength),
as.character(drug_code),
basis_of_strength))
drug_administrations <- drug_administrations %>%
mutate(DDD = compute_DDDs(
ATC_code = AMR::ab_atc(basis_of_strength, only_first = TRUE),
ATC_administration = ATC_route,
dose = dose,
unit = units
))
# Create an identifier
# Note: An alternative is encryption using openssl::sha256()
drug_administrations <- drug_administrations %>%
group_by(patient_id, drug_code, route,
dose, units, administration_date) %>%
mutate(administration_id = dplyr::cur_group_id()) %>%
ungroup()
drug_administrations <- drug_administrations %>%
transmute(
patient_id,
administration_id = as.character(administration_id),
prescription_id,
administration_text = paste0(
drug_name, " ", route, " ", dose, units),
drug_code,
drug_name,
drug_display_name = drug_name,
drug_group,
# in this example dataset, only antibacterial drugs are included
antiinfective_type = "antibacterial",
ATC_code,
ATC_route,
dose,
unit = units,
route,
administration_date,
administration_status = "completed",
DDD
)
This data frame can now be validated, before being loaded into the database.
validate_administrations(drug_administrations)
#> NULL
load_medications(conn = ramses_db,
prescriptions = drug_prescriptions,
administrations = drug_administrations,
overwrite = TRUE,
silent = FALSE)
#> Transitive closure of therapy episodes beginning...
#> Registered S3 method overwritten by 'igraph':
#> method from
#> plot.sir AMR
#> Transitive closure of therapy combinations beginning...
We now turn to records of patient demographics, admissions,
discharges and transfers. Ramses is designed around the notion of
hospital encounters identified by variable
encounter_id
, which correspond to the window from the time
a patient is admitted to the time a patient is discharged.
In addition, Ramses use subdivisions of hospital encounters called episodes of care, which designate a continuous period of inpatient care under the responsibility of a department, specialty, or clinical team. This corresponds to the ‘finished consultant episode’ in the UK NHS Admitted Patient Care data model (Herbert et al. 2017). All hospital encounters consist of at least one episode of care. Episodes of care are used to calculate bed-days as well as rates of infection, prescribing and other stewardship metrics for individual specialties and teams of clinicians.
Hospital encounters and episodes of care are recorded in a single table in which every row corresponds to one episode of care.
str(inpatient_episodes)
#> 'data.frame': 1082 obs. of 11 variables:
#> $ patient_id : chr "6145252493" "5124578766" "5124578766" "5124578766" ...
#> $ encounter_id : chr "5458286195" "1275624266" "1275624266" "1275624266" ...
#> $ admission_method : chr "1" "2" "2" "2" ...
#> $ admission_date : POSIXct, format: "2015-09-24 09:13:22" "2015-08-04 11:27:16" ...
#> $ discharge_date : POSIXct, format: "2015-09-24 19:16:41" "2015-08-12 12:54:48" ...
#> $ episode_number : int 1 1 2 3 4 5 1 1 2 1 ...
#> $ last_episode_in_encounter: chr "1" "2" "2" "2" ...
#> $ episode_start : POSIXct, format: "2015-09-24 09:13:22" "2015-08-04 11:27:16" ...
#> $ episode_end : POSIXct, format: "2015-09-24 19:16:41" "2015-08-06 11:49:09" ...
#> $ consultant_code : chr "C1000003" "C3400039" "C1000010" "C3400036" ...
#> $ main_specialty_code : chr "100" "340" "100" "340" ...
Dates of admissions and episode start/end must be valid:
validate_inpatient_episodes(patients = patients,
episodes = inpatient_episodes)
#> [1] TRUE
Optionally, Ramses can use data on ward movements. These records must be validated together with episodes of care
validate_inpatient_episodes(patients = patients,
episodes = inpatient_episodes,
wards = inpatient_wards)
#> [1] TRUE
The data can now be loaded into the database.
load_inpatient_episodes(conn = ramses_db,
patients_data = patients,
episodes_data = inpatient_episodes,
wards_data = inpatient_wards)
In order to exploit clinical diagnoses coded by doctors during hospital admissions, Ramses relies on a range of look up tables which must be built to suit the precise ICD-10 version used by the data. Ramses uses three types of look up tables:
This example uses the Ramses::inpatient_diagnoses
example dataset. First, we download the ICD-10-CM reference look up
table
icd10cm <- download_icd10cm()
str(icd10cm)
#> 'data.frame': 72184 obs. of 6 variables:
#> $ icd_code : chr "A000" "A001" "A009" "A0100" ...
#> $ icd_display : chr "A00.0" "A00.1" "A00.9" "A01.00" ...
#> $ icd_description : chr "Cholera due to Vibrio cholerae 01, biovar cholerae" "Cholera due to Vibrio cholerae 01, biovar eltor" "Cholera, unspecified" "Typhoid fever, unspecified" ...
#> $ category_code : chr "A00" "A00" "A00" "A01" ...
#> $ category_description: chr "Cholera" "Cholera" "Cholera" "Typhoid and paratyphoid fevers" ...
#> $ edition : chr "ICD-10-CM 2020" "ICD-10-CM 2020" "ICD-10-CM 2020" "ICD-10-CM 2020" ...
We verify that all ICD-10 codes in the dataset can be mapped to the
reference table. The first validation fails, because some of the
icd_code
in inpatient_diagnoses
are missing
(all diagnoses that are not infections). These are removed before
validation is attempted again. In the present example, the dataset is
simulated using ICD-10 5th Edition, rather than ICD-10-CM. The match is
therefore not complete on the second validation attempt. Although the
validation produces a warning, it passes (returns TRUE
).
This means it will not prevent loading the data into the warehouse.
validate_inpatient_diagnoses(inpatient_diagnoses, icd10cm)
#> Warning in .validate_variable_no_missing(data = diagnoses_data, vectorname =
#> diagnoses_data_schema[diagnoses_data_schema[["must_be_nonmissing"]], : The
#> following variables contain missing data: `icd_code`
#> Warning: some `icd_code` values in `diagnoses_data` do not match any `icd_code`
#> in `diagnoses_lookup`
#> [1] FALSE
inpatient_diagnoses <- filter(inpatient_diagnoses, !is.na(icd_code))
validate_inpatient_diagnoses(inpatient_diagnoses, icd10cm)
#> Warning: some `icd_code` values in `diagnoses_data` do not match any `icd_code`
#> in `diagnoses_lookup`
#> [1] TRUE
Next, we classify the reference look-up table to other classifications:
Ramses::antibiotic_icd_indications
)Although this is done automatically when loading diagnoses into the warehouse, this step can be useful to first check that the ICD-10 reference table is successfully mapped to other classifications.
icd_infection_lkup <- icd10cm %>%
select(icd_code) %>%
map_infections_abx_indications(df = ., icd_column = "icd_code")
comorbidity_lkup <- icd10cm %>%
select(icd_code) %>%
map_charlson_comorbidities(df = ., icd_column = "icd_code")
ccsr_lkup <- icd10cm %>%
select(icd_code) %>%
map_ICD10_CCSR(df = ., icd_column = "icd_code")
ccs_lkup <- icd10cm %>%
select(icd_code) %>%
map_ICD10_CCS(df = ., icd_column = "icd_code")
head(icd_infection_lkup)
#> icd_code infection_group1_code infection_group1_label
#> 1 A000 p02 Gastrointestinal infections
#> 2 A001 p02 Gastrointestinal infections
#> 3 A009 p02 Gastrointestinal infections
#> 4 A0100 p02 Gastrointestinal infections
#> 5 A0101 p02 Gastrointestinal infections
#> 6 A0102 p02 Gastrointestinal infections
#> infection_group2_label infection_group2_code antibiotics_indicated
#> 1 Intestinal infectious diseases p0201 potentially
#> 2 Intestinal infectious diseases p0201 potentially
#> 3 Intestinal infectious diseases p0201 potentially
#> 4 Intestinal infectious diseases p0201 potentially
#> 5 Intestinal infectious diseases p0201 potentially
#> 6 Intestinal infectious diseases p0201 potentially
Data may now be loaded into the database. Diagnoses and their reference look-up are loaded in one step.
load_inpatient_diagnoses(conn = ramses_db,
diagnoses_data = inpatient_diagnoses,
diagnoses_lookup = icd10cm,
overwrite = TRUE)
#> Warning: some `icd_code` values in `diagnoses_data` do not match any `icd_code`
#> in `diagnoses_lookup`
A small sample of records of microbial isolates is provided along with susceptibility testing results:
str(Ramses::inpatient_microbiology)
#> 'data.frame': 38 obs. of 10 variables:
#> $ patient_id : chr "99999999999" "99999999999" "99999999999" "99999999999" ...
#> $ specimen_id : chr "CCC2" "CCC1" "BBB8" "BBB7" ...
#> $ status : chr NA NA NA NA ...
#> $ specimen_type_display: chr "MRSA Screen" "Blood Culture" "MRSA Screen" "Faeces" ...
#> $ specimen_datetime : POSIXct, format: "2017-02-14" "2017-02-13" ...
#> $ organism_display_name: chr "No growth" "No growth" "No growth" "No growth" ...
#> $ isolate_id : chr "2300967" "2365356" "2284807" "2392702" ...
#> $ agent_display_name : chr NA NA NA NA ...
#> $ rsi_code : chr NA NA NA NA ...
#> $ isolation_datetime : POSIXct, format: "2017-02-17" "2017-02-16" ...
Before validating and loading these records into the database, some transformations will be required to create three separate data frames:
specimens
contains one row per microbial sample sent to
the laboratoryisolates
contains zero, one, or multiple rows for every
specimen – as many rows as the number of microorganisms isolated from
the specimen, whatever the technique (eg mass spectrometry, PCR,
bacterial culture)susceptibility
contains results of susceptibility
testing: one row per substance tested.For this, we create a list of data frames named
microbiology_data
, and commence with cleaning the raw data.
We employ the AMR
package to code microorganism species names (AMR::as.mo()
)
and antimicrobial names (AMR::as.ab()
).
microbiology_data <- list()
# Start by cleaning the raw data
microbiology_data$raw <- inpatient_microbiology
microbiology_data$raw <- microbiology_data$raw %>%
mutate(
organism_code = AMR::as.mo(if_else(
organism_display_name == "No growth",
NA_character_,
organism_display_name)),
agent_code = AMR::as.ab(agent_display_name)) %>%
mutate(organism_name = AMR::mo_name(organism_code),
agent_name = AMR::ab_name(agent_code))
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `agent_code = AMR::as.ab(agent_display_name)`.
#> Caused by warning:
#> ! in as.ab(): these values could not be coerced to a valid antimicrobial
#> ID: "".
As for specimen types (eg bood, bronchoalveolar lavage fluid, etc.),
they must be coded using SNOMED
CT specimen concepts. A complete list of concepts from the SNOMED CT
International Edition can be found in
Ramses::reference_specimen_type
.
Susceptibility testing results in the example dataset are already
interpreted as Resistant/Increased exposure/Susceptible (RSI). Where
needed, minimum inhibitory concentrations and disk diffusion diameters
can be interpreted as RSI using the as.sir()
function
provided by the AMR package.
If possible, use the most specific concept available (eg
309176002 | Bronchial brushings sample (specimen) |
) over a
more generic term (eg
258415003 | Biopsy sample (specimen) |
). Specific concepts
can always be reclassified to higher-level concepts using SNOMED CT
inference (in this case, not just
258415003 | Biopsy sample (specimen) |
, but also
258603007 | Respiratory sample (specimen) |
). The snomedizer package can be used to
query SNOMED CT concept parents and descendants.
microbiology_data$raw <- microbiology_data$raw %>%
mutate(specimen_type_code = case_when(
specimen_type_display == "Blood Culture" ~
"446131002", # Blood specimen obtained for blood culture
specimen_type_display == "Faeces" ~
"119339001", # Stool specimen
specimen_type_display == "MRSA Screen" ~
"697989009", # Anterior nares swab
specimen_type_display == "Urine" ~
"122575003", # Urine specimen
TRUE ~ NA_character_
)) %>%
left_join(transmute(reference_specimen_type,
specimen_type_code = conceptId,
specimen_type_name = pt_term))
#> Joining with `by = join_by(specimen_type_code)`
The three data frames can now be created.
microbiology_data$specimens <- microbiology_data$raw %>%
transmute(specimen_id,
patient_id,
status = "available",
specimen_datetime,
specimen_type_code,
specimen_type_name,
specimen_type_display) %>%
distinct() # Removing duplicates created by multiple isolates and susceptibility testing
microbiology_data$isolates <- microbiology_data$raw %>%
transmute(isolate_id,
specimen_id,
patient_id,
organism_code,
organism_name,
organism_display_name,
isolation_datetime) %>%
distinct() # Removing duplicates created by susceptibility testing
microbiology_data$susceptibilities <- microbiology_data$raw %>%
filter(!is.na(organism_code)) %>% # Remove no growth
transmute(isolate_id,
specimen_id,
patient_id,
organism_code,
organism_name,
organism_display_name,
agent_code,
agent_name,
agent_display_name,
rsi_code,
concept_code = NA_character_) %>%
distinct()
An optional step is to classify isolates by multidrug resistance
phenotype (Magiorakos et al. 2012). The
AMR package function AMR::mdro()
provides classification
based on:
In this vignette, we adopt intrinsic resistances as defined by EUCAST guidelines (European Committee on Antimicrobial Susceptibility Testing 2020). The tidyr package is used to pivot susceptibility test results into wide format (antimicrobials arranged as columns).
mdr_classification <- microbiology_data$susceptibilities %>%
select(isolate_id,
organism_code,
agent_code,
rsi_code) %>%
tidyr::pivot_wider(id_cols = c("isolate_id", "organism_code"),
names_from = agent_code,
values_from = rsi_code)
mdr_classification$multidrug_resistance <-
as.character(AMR::mdro(
x = mdr_classification,
guideline = "EUCAST",
col_mo = "organism_code"))
# Merging back into the isolates dataset
microbiology_data$isolates <- left_join(
microbiology_data$isolates,
select(mdr_classification, isolate_id, multidrug_resistance)
)
#> Joining with `by = join_by(isolate_id)`
The data may now be validated, and loaded.
validate_microbiology(specimens = microbiology_data$specimens,
isolates = microbiology_data$isolates,
susceptibilities = microbiology_data$susceptibilities)
#> [1] TRUE
load_inpatient_microbiology(
conn = ramses_db,
specimens = microbiology_data$specimens,
isolates = microbiology_data$isolates,
susceptibilities = microbiology_data$susceptibilities,
overwrite = TRUE
)
Finally, we import all other observation and test results, from basic vitals such as respiration rate and blood pressure, to urine antigen test results. Analysis of such data requires the use of standard test vocabularies.
The international standard code system for laboratory data is LOINC. A sample of important codes may be
found in dataset Ramses::reference_loinc
. Other observable
entity concepts or evaluation procedure concepts may be imported from
SNOMED CT.
data(inpatient_investigations)
str(inpatient_investigations)
#> 'data.frame': 1428 obs. of 13 variables:
#> $ observation_id : chr "1427" "1431" "1430" "1432" ...
#> $ patient_id : chr "99999999999" "99999999999" "99999999999" "99999999999" ...
#> $ encounter_id : chr "9278078393" "9278078393" "9278078393" "9278078393" ...
#> $ status : chr "final" "final" "final" "final" ...
#> $ request_datetime : POSIXct, format: "2016-03-30 13:51:13" "2016-03-30 13:51:13" ...
#> $ observation_datetime : POSIXct, format: "2016-03-30 13:51:45" "2016-03-30 13:51:45" ...
#> $ observation_value_text : chr "" "" "" "" ...
#> $ observation_value_numeric: num 35.7 154 2 17 95 108 4 36.6 113 0 ...
#> $ observation_name : chr "Body temperature" "Systolic blood pressure" "SEWS score" "Respiratory rate" ...
#> $ observation_display : chr "Temperature" "Systolic BP" "SEWS score" "Resps" ...
#> $ observation_code_system : chr "http://loinc.org" "http://loinc.org" "http://snomed.info/sct" "http://loinc.org" ...
#> $ observation_code : chr "8310-5" "8480-6" "1104051000000101" "9279-1" ...
#> $ observation_unit : chr "degree_Celsius" "mm_Hg" "" "breaths/min" ...
One particular set of validation constraints apply:
units
package in R.observation_code
must have one and only one
observation_unit
to pass validation. If observation records
come in a variety of units, they must first be converted to a common
unit.Many important units in medicine are not listed in the UCUM. For instance:
By default, these three custom units are passed to the
custom_units
argument of
validate_investigations()
. Any other custom unit in the
dataset must be handled with this argument.
validate_investigations(inpatient_investigations,
custom_units = c("breaths", "beats", "U"))
#> [1] TRUE
Data may now be loaded into the database.
load_inpatient_investigations(
conn = ramses_db,
investigations_data = inpatient_investigations,
overwrite = TRUE
)
There is one final step to create ancillary tables:
# Create bridge tables
bridge_tables(conn = ramses_db, overwrite = TRUE)
#> [1] TRUE
# Create reference AWaRe table (see ?reference_aware)
dplyr::copy_to(dest = ramses_db,
df = filter(reference_aware, version == "England" & year == "2019"),
name = "reference_aware",
temporary = FALSE,
overwrite = TRUE)
The database is now ready.
The main vignette (getting started) demonstrates how to use it.
Always remember close database connections when you are finished.
DBI::dbDisconnect(ramses_db, shutdown = TRUE)