It is not uncommon to have an analysis need whereby one needs to derive an
analysis value (AVAL
) from multiple records. The ADaM basic dataset
structure variable DTYPE
is available to indicate when a new derived
records has been added to a dataset.
Usage
get_summary_records(
dataset,
by_vars,
filter = NULL,
analysis_var,
summary_fun,
set_values_to = NULL
)
Arguments
- dataset
A data frame.
- by_vars
Variables to consider for generation of groupwise summary records. Providing the names of variables in
exprs()
will create a groupwise summary and generate summary records for the specified groups.- filter
Filter condition as logical expression to apply during summary calculation. By default, filtering expressions are computed within
by_vars
as this will help when an aggregating, lagging, or ranking function is involved.For example,
filter_rows = (AVAL > mean(AVAL, na.rm = TRUE))
will filter all AVAL values greater than mean of AVAL with inby_vars
.filter_rows = (dplyr::n() > 2)
will filter n count ofby_vars
greater than 2.
- analysis_var
Analysis variable.
- summary_fun
Function that takes as an input the
analysis_var
and performs the calculation. This can include built-in functions as well as user defined functions, for examplemean
orfunction(x) mean(x, na.rm = TRUE)
.- set_values_to
A list of variable name-value pairs. Use this argument if you need to change the values of any newly derived records.
Set a list of variables to some specified value for the new observation(s)
LHS refer to a variable.
RHS refers to the values to set to the variable. This can be a string, a symbol, a numeric value or NA. (e.g.
exprs(PARAMCD = "TDOSE",PARCAT1 = "OVERALL")
). More general expression are not allowed.
Details
This function only creates derived observations and does not append them
to the original dataset observations. If you would like to this instead,
see the derive_summary_records()
function.
See also
General Derivation Functions for all ADaMs that returns variable appended to dataset:
derive_var_extreme_flag()
,
derive_var_joined_exist_flag()
,
derive_var_last_dose_amt()
,
derive_var_last_dose_date()
,
derive_var_last_dose_grp()
,
derive_var_merged_cat()
,
derive_var_merged_character()
,
derive_var_merged_exist_flag()
,
derive_var_merged_summary()
,
derive_var_obs_number()
,
derive_var_relative_flag()
,
derive_vars_joined()
,
derive_vars_last_dose()
,
derive_vars_merged_lookup()
,
derive_vars_merged()
,
derive_vars_transposed()
Examples
library(tibble)
library(dplyr, warn.conflicts = FALSE)
adeg <- tribble(
~USUBJID, ~EGSEQ, ~PARAM, ~AVISIT, ~EGDTC, ~AVAL, ~TRTA,
"XYZ-1001", 1, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:50", 385, "",
"XYZ-1001", 2, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:52", 399, "",
"XYZ-1001", 3, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:56", 396, "",
"XYZ-1001", 4, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:45", 384, "Placebo",
"XYZ-1001", 5, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:48", 393, "Placebo",
"XYZ-1001", 6, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:51", 388, "Placebo",
"XYZ-1001", 7, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:45", 385, "Placebo",
"XYZ-1001", 8, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:48", 394, "Placebo",
"XYZ-1001", 9, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:51", 402, "Placebo",
"XYZ-1002", 1, "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 399, "",
"XYZ-1002", 2, "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 410, "",
"XYZ-1002", 3, "QTcF Int. (msec)", "Baseline", "2016-02-22T08:01", 392, "",
"XYZ-1002", 4, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:50", 401, "Active 20mg",
"XYZ-1002", 5, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:53", 407, "Active 20mg",
"XYZ-1002", 6, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:56", 400, "Active 20mg",
"XYZ-1002", 7, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:50", 412, "Active 20mg",
"XYZ-1002", 8, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:53", 414, "Active 20mg",
"XYZ-1002", 9, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:56", 402, "Active 20mg",
)
# Summarize the average of the triplicate ECG interval values (AVAL)
get_summary_records(
adeg,
by_vars = exprs(USUBJID, PARAM, AVISIT),
analysis_var = AVAL,
summary_fun = function(x) mean(x, na.rm = TRUE),
set_values_to = exprs(DTYPE = "AVERAGE")
)
#> # A tibble: 6 x 5
#> USUBJID PARAM AVISIT AVAL DTYPE
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 XYZ-1001 QTcF Int. (msec) Baseline 393. AVERAGE
#> 2 XYZ-1001 QTcF Int. (msec) Visit 2 388. AVERAGE
#> 3 XYZ-1001 QTcF Int. (msec) Visit 3 394. AVERAGE
#> 4 XYZ-1002 QTcF Int. (msec) Baseline 400. AVERAGE
#> 5 XYZ-1002 QTcF Int. (msec) Visit 2 403. AVERAGE
#> 6 XYZ-1002 QTcF Int. (msec) Visit 3 409. AVERAGE
advs <- tribble(
~USUBJID, ~VSSEQ, ~PARAM, ~AVAL, ~VSSTRESU, ~VISIT, ~VSDTC,
"XYZ-001-001", 1164, "Weight", 99, "kg", "Screening", "2018-03-19",
"XYZ-001-001", 1165, "Weight", 101, "kg", "Run-In", "2018-03-26",
"XYZ-001-001", 1166, "Weight", 100, "kg", "Baseline", "2018-04-16",
"XYZ-001-001", 1167, "Weight", 94, "kg", "Week 24", "2018-09-30",
"XYZ-001-001", 1168, "Weight", 92, "kg", "Week 48", "2019-03-17",
"XYZ-001-001", 1169, "Weight", 95, "kg", "Week 52", "2019-04-14",
)
# Set new values to any variable. Here, `DTYPE = MAXIMUM` refers to `max()` records
# and `DTYPE = AVERAGE` refers to `mean()` records.
get_summary_records(
advs,
by_vars = exprs(USUBJID, PARAM),
analysis_var = AVAL,
summary_fun = max,
set_values_to = exprs(DTYPE = "MAXIMUM")
) %>%
get_summary_records(
by_vars = exprs(USUBJID, PARAM),
analysis_var = AVAL,
summary_fun = mean,
set_values_to = exprs(DTYPE = "AVERAGE")
)
#> # A tibble: 1 x 4
#> USUBJID PARAM AVAL DTYPE
#> <chr> <chr> <dbl> <chr>
#> 1 XYZ-001-001 Weight 101 AVERAGE
# Sample ADEG dataset with triplicate record for only AVISIT = 'Baseline'
adeg <- tribble(
~USUBJID, ~EGSEQ, ~PARAM, ~AVISIT, ~EGDTC, ~AVAL, ~TRTA,
"XYZ-1001", 1, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:50", 385, "",
"XYZ-1001", 2, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:52", 399, "",
"XYZ-1001", 3, "QTcF Int. (msec)", "Baseline", "2016-02-24T07:56", 396, "",
"XYZ-1001", 4, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:48", 393, "Placebo",
"XYZ-1001", 5, "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:51", 388, "Placebo",
"XYZ-1001", 6, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:48", 394, "Placebo",
"XYZ-1001", 7, "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:51", 402, "Placebo",
"XYZ-1002", 1, "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 399, "",
"XYZ-1002", 2, "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 410, "",
"XYZ-1002", 3, "QTcF Int. (msec)", "Baseline", "2016-02-22T08:01", 392, "",
"XYZ-1002", 4, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:53", 407, "Active 20mg",
"XYZ-1002", 5, "QTcF Int. (msec)", "Visit 2", "2016-03-06T09:56", 400, "Active 20mg",
"XYZ-1002", 6, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:53", 414, "Active 20mg",
"XYZ-1002", 7, "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:56", 402, "Active 20mg",
)
# Compute the average of AVAL only if there are more than 2 records within the
# by group
get_summary_records(
adeg,
by_vars = exprs(USUBJID, PARAM, AVISIT),
filter = n() > 2,
analysis_var = AVAL,
summary_fun = function(x) mean(x, na.rm = TRUE),
set_values_to = exprs(DTYPE = "AVERAGE")
)
#> # A tibble: 2 x 5
#> USUBJID PARAM AVISIT AVAL DTYPE
#> <chr> <chr> <chr> <dbl> <chr>
#> 1 XYZ-1001 QTcF Int. (msec) Baseline 393. AVERAGE
#> 2 XYZ-1002 QTcF Int. (msec) Baseline 400. AVERAGE