Main Idea
The main idea of admiral is that an ADaM dataset is built by a sequence of derivations. Each derivation adds one or more variables or parameters to the processed dataset. This modular approach makes it easy to adjust code by adding, removing, or modifying derivations. Each derivation is a function call. Consider for example the following script which creates a (very simple) ADSL dataset.
Load Packages and Example Datasets
First, we will load our packages and example datasets to help with
our ADSL
creation. The dplyr and
lubridate packages are tidyverse packages
and used heavily throughout this script. The admiral
package also leverages the {admiral.test}
package for
example SDTM datasets which are from the CDISC Pilot Study.
library(dplyr, warn.conflicts = FALSE)
library(lubridate)
library(admiral)
library(admiral.test)
# Read in SDTM datasets
data("admiral_dm")
data("admiral_ds")
data("admiral_ex")
dm <- convert_blanks_to_na(admiral_dm)
ds <- convert_blanks_to_na(admiral_ds)
ex <- convert_blanks_to_na(admiral_ex)
Derive Treatment Variables (TRT0xP
,
TRT0xA
)
The mapping of the treatment variables is left to the ADaM programmer. An example mapping may be:
Derive/Impute Numeric Treatment Date/Time and Flag Variables
(TRTSDTM
, TRTEDTM
, TRTSTMF
,
TRTETMF
)
The function derive_vars_dtm()
can be used to convert
the DTC variables from EX
to numeric datetime variable and
impute missing components. The function call returns the original data
frame with the column EXSTDTM
, EXENDTM
and
corresponding time imputation flag variables EXSTTMF
and
EXENTMF
added to the end of the dataframe. Exposure
observations with an incomplete date are ignored. We impute missing time
to be 23:59:59 using time_imputation = "last"
. The required
imputation flags are determined automatically by the function. Here only
time imputation flags are derived because time is imputed but date is
not imputed.
Don’t be intimidated by the number of arguments! We try to make our
arguments self-explanatory, e.g. the new_vars_prefix
places
EXST
at the start of the --DTM
variable and
time_imputation = "last"
appends 23:59:59. However, this is
not always possible to make every argument self-explanatory. If you
click on the function, derive_vars_dtm()
, you can bring up
the reference documentation and learn more about each argument.
# Derive treatment variables
## Impute time of exposure dates (creates numeric datetime and time imputation flag variables)
ex_ext <- ex %>%
derive_vars_dtm(
dtc = EXSTDTC,
new_vars_prefix = "EXST"
) %>%
derive_vars_dtm(
dtc = EXENDTC,
new_vars_prefix = "EXEN",
time_imputation = "last"
)
## Derive variables for first/last treatment date and time imputation flags
adsl <- adsl %>%
derive_vars_merged(
dataset_add = ex_ext,
filter_add = !is.na(EXSTDTM),
new_vars = exprs(TRTSDTM = EXSTDTM, TRTSTMF = EXSTTMF),
order = exprs(EXSTDTM, EXSEQ),
mode = "first",
by_vars = exprs(STUDYID, USUBJID)
) %>%
derive_vars_merged(
dataset_add = ex_ext,
filter_add = !is.na(EXENDTM),
new_vars = exprs(TRTEDTM = EXENDTM, TRTETMF = EXENTMF),
order = exprs(EXENDTM, EXSEQ),
mode = "last",
by_vars = exprs(STUDYID, USUBJID)
)
Derive Date Variables from Date/Time Variables (TRTSDT
,
TRTEDT
)
The datetime variables returned can be converted to dates using the
derive_vars_dtm_to_dt()
function.
adsl <- adsl %>%
derive_vars_dtm_to_dt(source_vars = exprs(TRTSDTM, TRTEDTM))
Derive Treatment Duration (TRTDURD
)
Now, that TRTSDT
and TRTEDT
are derived,
the function derive_var_trtdurd()
can be used to calculate
the Treatment duration (TRTDURD
). Notice the lack of
inputs. The function defaults are set to TRTSDT
and
TRTEDT
. Clicking on derive_var_trtdurd()
will
bring up the reference documentation where you can see the
default arguments.
adsl <- adsl %>%
derive_var_trtdurd()
Amazing! With one dplyr function and four
admiral functions we successfully created nine new
variables for our ADSL
dataset. Let’s take a look at all
our newly derived variables.
Note: We only display variables that were derived. A user running this code will have additional adsl variables displayed. You can use the Choose Columns to Display button to add more variables into the table.
Derivations
The most important functions in admiral are the derivations.
Derivations add variables or observations to the input dataset. Existing
variables and observations of the input dataset are not changed.
Derivation functions start with derive_
. The first
parameter of these functions expects the input dataset. This allows us
to string together derivations using the %>%
operator.
Functions which derive a dedicated variable start with
derive_var_
followed by the variable name, e.g.,
derive_var_trtdurd()
derives the TRTDURD
variable.
Functions which can derive multiple variables start with
derive_vars_
followed by the variable name, e.g.,
derive_vars_dtm()
can derive both the TRTSDTM
and TRTSTMF
variables.
Functions which derive a dedicated parameter start with
derive_param_
followed by the parameter name, e.g.,
derive_param_bmi()
derives the BMI
parameter.
Input and Output
It is expected that the input dataset is not grouped. Otherwise an error is issued.
The output dataset is ungrouped. The observations are not ordered in a dedicated way. In particular, the order of the observations of the input dataset may not be preserved.
Computations
Computations
expect vectors as input and return a vector. Usually these computation
functions can not be used with %>%
. These functions can
be used in expressions like convert_dtc_to_dt()
in the
derivation of FINLABDT
in the example below:
# Derive final lab visit date
ds_final_lab_visit <- ds %>%
filter(DSDECOD == "FINAL LAB VISIT") %>%
transmute(USUBJID, FINLABDT = convert_dtc_to_dt(DSSTDTC))
# Derive treatment variables
adsl <- dm %>%
# Merge on final lab visit date
derive_vars_merged(
dataset_add = ds_final_lab_visit,
by_vars = exprs(USUBJID)
)
Parameters
For parameters which expect variable names or expressions of variable names, symbols or expressions must be specified rather than strings.
For parameters which expect a single variable name, the name can be specified without quotes and quotation, e.g.
new_var = TEMPBL
For parameters which expect one or more variable names, a list of symbols is expected, e.g.
by_vars = exprs(PARAMCD, AVISIT)
For parameters which expect a single expression, the expression needs to be passed “as is”, e.g.
filter = PARAMCD == "TEMP"
For parameters which expect one or more expressions, a list of expressions is expected, e.g.
order = exprs(AVISIT, desc(AESEV))
Handling of Missing Values
When using the haven package to read SAS datasets into
R, SAS-style character missing values, i.e. ""
, are
not converted into proper R NA
values. Rather they
are kept as is. This is problematic for any downstream data processing
as R handles ""
just as any other string. Thus, before any
data manipulation is being performed SAS blanks should be converted to R
NA
s using admiral’s
convert_blanks_to_na()
function, e.g.
dm <- haven::read_sas("dm.sas7bdat") %>%
convert_blanks_to_na()
Note that any logical operator being applied to an NA
value always returns NA
rather than
TRUE
or FALSE
.
visits <- c("Baseline", NA, "Screening", "Week 1 Day 7")
visits != "Baseline"
#> [1] FALSE NA TRUE TRUE
The only exception is is.na()
which returns
TRUE
if the input is NA
.
is.na(visits)
#> [1] FALSE TRUE FALSE FALSE
Thus, to filter all visits which are not "Baseline"
the
following condition would need to be used.
visits != "Baseline" | is.na(visits)
#> [1] FALSE TRUE TRUE TRUE
Also note that most aggregation functions, like mean()
or max()
, also return NA
if any element of the
input vector is missing.
To avoid this behavior one has to explicitly set
na.rm = TRUE
.
This is very important to keep in mind when using
admiral’s aggregation functions such as
derive_summary_records()
.
Validation
All functions are reviewed and tested to ensure that they work as described in the documentation. They are not validated yet.
Although admiral follows CDISC standards, it does not claim that the dataset resulting from calling admiral functions is ADaM compliant. This has to be ensured by the user.
Starting a Script
For the ADaM data structures, an overview of the flow and example function calls for the most common steps are provided by the following vignettes:
admiral also provides template R scripts as a starting
point. They can be created by calling use_ad_template()
,
e.g.,
use_ad_template(
adam_name = "adsl",
save_path = "./ad_adsl.R"
)
A list of all available templates can be obtained by
list_all_templates()
:
list_all_templates()
#> Existing ADaM templates in package 'admiral':
#> • ADAE
#> • ADCM
#> • ADEG
#> • ADEX
#> • ADLB
#> • ADLBHY
#> • ADMH
#> • ADPC
#> • ADPP
#> • ADSL
#> • ADVS
Support
Support is provided via the admiral Slack channel.