Merge a categorization variable from a dataset to the input dataset. The observations to merge can be selected by a condition and/or selecting the first or last observation for each by group.
Usage
derive_var_merged_cat(
dataset,
dataset_add,
by_vars,
order = NULL,
new_var,
source_var,
cat_fun,
filter_add = NULL,
mode = NULL,
missing_value = NA_character_
)
Arguments
- dataset
Input dataset
The variables specified by the
by_vars
argument are expected.- dataset_add
Additional dataset
The variables specified by the
by_vars
, thesource_var
, and theorder
argument are expected.- by_vars
Grouping variables
The input dataset and the selected observations from the additional dataset are merged by the specified by variables. The by variables must be a unique key of the selected observations. Variables from the additional dataset can be renamed by naming the element, i.e.,
by_vars = exprs(<name in input dataset> = <name in additional dataset>)
, similar to the dplyr joins.Permitted Values: list of variables created by
exprs()
- order
Sort order
If the argument is set to a non-null value, for each by group the first or last observation from the additional dataset is selected with respect to the specified order.
Default:
NULL
Permitted Values: list of variables or
desc(<variable>)
function calls created byexprs()
, e.g.,exprs(ADT, desc(AVAL))
orNULL
- new_var
New variable
The specified variable is added to the additional dataset and set to the categorized values, i.e.,
cat_fun(<source variable>)
.- source_var
Source variable
- cat_fun
Categorization function
A function must be specified for this argument which expects the values of the source variable as input and returns the categorized values.
- filter_add
Filter for additional dataset (
dataset_add
)Only observations fulfilling the specified condition are taken into account for merging. If the argument is not specified, all observations are considered.
Default:
NULL
Permitted Values: a condition
- mode
Selection mode
Determines if the first or last observation is selected. If the
order
argument is specified,mode
must be non-null.If the
order
argument is not specified, themode
argument is ignored.Default:
NULL
Permitted Values:
"first"
,"last"
,NULL
- missing_value
Values used for missing information
The new variable is set to the specified value for all by groups without observations in the additional dataset.
Default:
NA_character_
Value
The output dataset contains all observations and variables of the
input dataset and additionally the variable specified for new_var
derived
from the additional dataset (dataset_add
).
Details
The additional dataset is restricted to the observations matching the
filter_add
condition.The categorization variable is added to the additional dataset.
If
order
is specified, for each by group the first or last observation (depending onmode
) is selected.The categorization variable is merged to the input dataset.
See also
General Derivation Functions for all ADaMs that returns variable appended to dataset:
derive_var_extreme_flag()
,
derive_var_joined_exist_flag()
,
derive_var_last_dose_amt()
,
derive_var_last_dose_date()
,
derive_var_last_dose_grp()
,
derive_var_merged_character()
,
derive_var_merged_exist_flag()
,
derive_var_merged_summary()
,
derive_var_obs_number()
,
derive_var_relative_flag()
,
derive_vars_joined()
,
derive_vars_last_dose()
,
derive_vars_merged_lookup()
,
derive_vars_merged()
,
derive_vars_transposed()
,
get_summary_records()
Examples
library(admiral.test)
library(dplyr, warn.conflicts = FALSE)
data("admiral_dm")
data("admiral_vs")
wgt_cat <- function(wgt) {
case_when(
wgt < 50 ~ "low",
wgt > 90 ~ "high",
TRUE ~ "normal"
)
}
derive_var_merged_cat(
admiral_dm,
dataset_add = admiral_vs,
by_vars = exprs(STUDYID, USUBJID),
order = exprs(VSDTC, VSSEQ),
filter_add = VSTESTCD == "WEIGHT" & substr(VISIT, 1, 9) == "SCREENING",
new_var = WGTBLCAT,
source_var = VSSTRESN,
cat_fun = wgt_cat,
mode = "last"
) %>%
select(STUDYID, USUBJID, AGE, AGEU, WGTBLCAT)
#> # A tibble: 306 x 5
#> STUDYID USUBJID AGE AGEU WGTBLCAT
#> <chr> <chr> <dbl> <chr> <chr>
#> 1 CDISCPILOT01 01-701-1015 63 YEARS normal
#> 2 CDISCPILOT01 01-701-1023 64 YEARS normal
#> 3 CDISCPILOT01 01-701-1028 71 YEARS high
#> 4 CDISCPILOT01 01-701-1033 74 YEARS normal
#> 5 CDISCPILOT01 01-701-1034 77 YEARS normal
#> 6 CDISCPILOT01 01-701-1047 85 YEARS normal
#> 7 CDISCPILOT01 01-701-1057 59 YEARS NA
#> 8 CDISCPILOT01 01-701-1097 68 YEARS normal
#> 9 CDISCPILOT01 01-701-1111 81 YEARS normal
#> 10 CDISCPILOT01 01-701-1115 84 YEARS normal
#> # … with 296 more rows
# defining a value for missing VS data
derive_var_merged_cat(
admiral_dm,
dataset_add = admiral_vs,
by_vars = exprs(STUDYID, USUBJID),
order = exprs(VSDTC, VSSEQ),
filter_add = VSTESTCD == "WEIGHT" & substr(VISIT, 1, 9) == "SCREENING",
new_var = WGTBLCAT,
source_var = VSSTRESN,
cat_fun = wgt_cat,
mode = "last",
missing_value = "MISSING"
) %>%
select(STUDYID, USUBJID, AGE, AGEU, WGTBLCAT)
#> # A tibble: 306 x 5
#> STUDYID USUBJID AGE AGEU WGTBLCAT
#> <chr> <chr> <dbl> <chr> <chr>
#> 1 CDISCPILOT01 01-701-1015 63 YEARS normal
#> 2 CDISCPILOT01 01-701-1023 64 YEARS normal
#> 3 CDISCPILOT01 01-701-1028 71 YEARS high
#> 4 CDISCPILOT01 01-701-1033 74 YEARS normal
#> 5 CDISCPILOT01 01-701-1034 77 YEARS normal
#> 6 CDISCPILOT01 01-701-1047 85 YEARS normal
#> 7 CDISCPILOT01 01-701-1057 59 YEARS MISSING
#> 8 CDISCPILOT01 01-701-1097 68 YEARS normal
#> 9 CDISCPILOT01 01-701-1111 81 YEARS normal
#> 10 CDISCPILOT01 01-701-1115 84 YEARS normal
#> # … with 296 more rows