Add New Variable(s) to the Input Dataset Based on Variables from Another Dataset
Source:R/derive_merged.R
derive_vars_merged.Rd
Add new variable(s) to the input dataset based on variables from another
dataset. The observations to merge can be selected by a condition
(filter_add
argument) and/or selecting the first or last observation for
each by group (order
and mode
argument).
Usage
derive_vars_merged(
dataset,
dataset_add,
by_vars,
order = NULL,
new_vars = NULL,
mode = NULL,
filter_add = NULL,
match_flag = NULL,
check_type = "warning",
duplicate_msg = NULL
)
Arguments
- dataset
Input dataset
The variables specified by the
by_vars
argument are expected.- dataset_add
Additional dataset
The variables specified by the
by_vars
, thenew_vars
, and theorder
argument are expected.- by_vars
Grouping variables
The input dataset and the selected observations from the additional dataset are merged by the specified by variables. The by variables must be a unique key of the selected observations. Variables from the additional dataset can be renamed by naming the element, i.e.,
by_vars = exprs(<name in input dataset> = <name in additional dataset>)
, similar to the dplyr joins.Permitted Values: list of variables created by
exprs()
- order
Sort order
If the argument is set to a non-null value, for each by group the first or last observation from the additional dataset is selected with respect to the specified order.
Default:
NULL
Permitted Values: list of variables or
desc(<variable>)
function calls created byexprs()
, e.g.,exprs(ADT, desc(AVAL))
orNULL
- new_vars
Variables to add
The specified variables from the additional dataset are added to the output dataset. Variables can be renamed by naming the element, i.e.,
new_vars = exprs(<new name> = <old name>)
.For example
new_vars = exprs(var1, var2)
adds variablesvar1
andvar2
fromdataset_add
to the input dataset.And
new_vars = exprs(var1, new_var2 = old_var2)
takesvar1
andold_var2
fromdataset_add
and adds them to the input dataset renamingold_var2
tonew_var2
.If the argument is not specified or set to
NULL
, all variables from the additional dataset (dataset_add
) are added.Default:
NULL
Permitted Values: list of variables created by
exprs()
- mode
Selection mode
Determines if the first or last observation is selected. If the
order
argument is specified,mode
must be non-null.If the
order
argument is not specified, themode
argument is ignored.Default:
NULL
Permitted Values:
"first"
,"last"
,NULL
- filter_add
Filter for additional dataset (
dataset_add
)Only observations fulfilling the specified condition are taken into account for merging. If the argument is not specified, all observations are considered.
Default:
NULL
Permitted Values: a condition
- match_flag
Match flag
If the argument is specified (e.g.,
match_flag = FLAG
), the specified variable (e.g.,FLAG
) is added to the input dataset. This variable will beTRUE
for all selected records fromdataset_add
which are merged into the input dataset, andNA
otherwise.Default:
NULL
Permitted Values: Variable name
- check_type
Check uniqueness?
If
"warning"
or"error"
is specified, the specified message is issued if the observations of the (restricted) additional dataset are not unique with respect to the by variables and the order.Default:
"warning"
Permitted Values:
"none"
,"warning"
,"error"
- duplicate_msg
Message of unique check
If the uniqueness check fails, the specified message is displayed.
Default:
paste("Dataset `dataset_add` contains duplicate records with respect to", enumerate(vars2chr(by_vars)))
Value
The output dataset contains all observations and variables of the
input dataset and additionally the variables specified for new_vars
from
the additional dataset (dataset_add
).
Details
The records from the additional dataset (
dataset_add
) are restricted to those matching thefilter_add
condition.If
order
is specified, for each by group the first or last observation (depending onmode
) is selected.The variables specified for
new_vars
are renamed (if requested) and merged to the input dataset usingleft_join()
. I.e., the output dataset contains all observations from the input dataset. For observations without a matching observation in the additional dataset the new variables are set toNA
. Observations in the additional dataset which have no matching observation in the input dataset are ignored.
See also
General Derivation Functions for all ADaMs that returns variable appended to dataset:
derive_var_extreme_flag()
,
derive_var_joined_exist_flag()
,
derive_var_last_dose_amt()
,
derive_var_last_dose_date()
,
derive_var_last_dose_grp()
,
derive_var_merged_cat()
,
derive_var_merged_character()
,
derive_var_merged_exist_flag()
,
derive_var_merged_summary()
,
derive_var_obs_number()
,
derive_var_relative_flag()
,
derive_vars_joined()
,
derive_vars_last_dose()
,
derive_vars_merged_lookup()
,
derive_vars_transposed()
,
get_summary_records()
Examples
library(admiral.test)
library(dplyr, warn.conflicts = FALSE)
data("admiral_vs")
data("admiral_dm")
# Merging all dm variables to vs
derive_vars_merged(
admiral_vs,
dataset_add = select(admiral_dm, -DOMAIN),
by_vars = exprs(STUDYID, USUBJID)
) %>%
select(STUDYID, USUBJID, VSTESTCD, VISIT, VSTPT, VSSTRESN, AGE, AGEU)
#> # A tibble: 29,643 x 8
#> STUDYID USUBJID VSTESTCD VISIT VSTPT VSSTRESN AGE AGEU
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 CDISCPILO… 01-701-1… DIABP SCREENING… AFTER LYING DO… 64 63 YEARS
#> 2 CDISCPILO… 01-701-1… DIABP SCREENING… AFTER STANDING… 83 63 YEARS
#> 3 CDISCPILO… 01-701-1… DIABP SCREENING… AFTER STANDING… 57 63 YEARS
#> 4 CDISCPILO… 01-701-1… DIABP SCREENING… AFTER LYING DO… 68 63 YEARS
#> 5 CDISCPILO… 01-701-1… DIABP SCREENING… AFTER STANDING… 59 63 YEARS
#> 6 CDISCPILO… 01-701-1… DIABP SCREENING… AFTER STANDING… 71 63 YEARS
#> 7 CDISCPILO… 01-701-1… DIABP BASELINE AFTER LYING DO… 56 63 YEARS
#> 8 CDISCPILO… 01-701-1… DIABP BASELINE AFTER STANDING… 51 63 YEARS
#> 9 CDISCPILO… 01-701-1… DIABP BASELINE AFTER STANDING… 61 63 YEARS
#> 10 CDISCPILO… 01-701-1… DIABP AMBUL ECG… AFTER LYING DO… 67 63 YEARS
#> # … with 29,633 more rows
# Merge last weight to adsl
data("admiral_adsl")
derive_vars_merged(
admiral_adsl,
dataset_add = admiral_vs,
by_vars = exprs(STUDYID, USUBJID),
order = exprs(VSDTC),
mode = "last",
new_vars = exprs(LASTWGT = VSSTRESN, LASTWGTU = VSSTRESU),
filter_add = VSTESTCD == "WEIGHT",
match_flag = vsdatafl
) %>%
select(STUDYID, USUBJID, AGE, AGEU, LASTWGT, LASTWGTU, vsdatafl)
#> # A tibble: 306 x 7
#> STUDYID USUBJID AGE AGEU LASTWGT LASTWGTU vsdatafl
#> <chr> <chr> <dbl> <chr> <dbl> <chr> <lgl>
#> 1 CDISCPILOT01 01-701-1015 63 YEARS 53.5 kg TRUE
#> 2 CDISCPILOT01 01-701-1023 64 YEARS 80.3 kg TRUE
#> 3 CDISCPILOT01 01-701-1028 71 YEARS 99.8 kg TRUE
#> 4 CDISCPILOT01 01-701-1033 74 YEARS 88.4 kg TRUE
#> 5 CDISCPILOT01 01-701-1034 77 YEARS 64.0 kg TRUE
#> 6 CDISCPILOT01 01-701-1047 85 YEARS 67.1 kg TRUE
#> 7 CDISCPILOT01 01-701-1057 59 YEARS NA NA NA
#> 8 CDISCPILOT01 01-701-1097 68 YEARS 78.5 kg TRUE
#> 9 CDISCPILOT01 01-701-1111 81 YEARS 60.8 kg TRUE
#> 10 CDISCPILOT01 01-701-1115 84 YEARS 78.5 kg TRUE
#> # … with 296 more rows
# Derive treatment start datetime (TRTSDTM)
data(admiral_ex)
## Impute exposure start date to first date/time
ex_ext <- derive_vars_dtm(
admiral_ex,
dtc = EXSTDTC,
new_vars_prefix = "EXST",
highest_imputation = "M",
)
## Add first exposure datetime and imputation flags to adsl
derive_vars_merged(
select(admiral_dm, STUDYID, USUBJID),
dataset_add = ex_ext,
by_vars = exprs(STUDYID, USUBJID),
new_vars = exprs(TRTSDTM = EXSTDTM, TRTSDTF = EXSTDTF, TRTSTMF = EXSTTMF),
order = exprs(EXSTDTM),
mode = "first"
)
#> # A tibble: 306 x 5
#> STUDYID USUBJID TRTSDTM TRTSDTF TRTSTMF
#> <chr> <chr> <dttm> <chr> <chr>
#> 1 CDISCPILOT01 01-701-1015 2014-01-02 00:00:00 NA H
#> 2 CDISCPILOT01 01-701-1023 2012-08-05 00:00:00 NA H
#> 3 CDISCPILOT01 01-701-1028 2013-07-19 00:00:00 NA H
#> 4 CDISCPILOT01 01-701-1033 2014-03-18 00:00:00 NA H
#> 5 CDISCPILOT01 01-701-1034 2014-07-01 00:00:00 NA H
#> 6 CDISCPILOT01 01-701-1047 2013-02-12 00:00:00 NA H
#> 7 CDISCPILOT01 01-701-1057 NA NA NA
#> 8 CDISCPILOT01 01-701-1097 2014-01-01 00:00:00 NA H
#> 9 CDISCPILOT01 01-701-1111 2012-09-07 00:00:00 NA H
#> 10 CDISCPILOT01 01-701-1115 2012-11-30 00:00:00 NA H
#> # … with 296 more rows
# Derive treatment start datetime (TRTSDTM)
data(admiral_ex)
## Impute exposure start date to first date/time
ex_ext <- derive_vars_dtm(
admiral_ex,
dtc = EXSTDTC,
new_vars_prefix = "EXST",
highest_imputation = "M",
)
## Add first exposure datetime and imputation flags to adsl
derive_vars_merged(
select(admiral_dm, STUDYID, USUBJID),
dataset_add = ex_ext,
filter_add = !is.na(EXSTDTM),
by_vars = exprs(STUDYID, USUBJID),
new_vars = exprs(TRTSDTM = EXSTDTM, TRTSDTF = EXSTDTF, TRTSTMF = EXSTTMF),
order = exprs(EXSTDTM),
mode = "first"
)
#> # A tibble: 306 x 5
#> STUDYID USUBJID TRTSDTM TRTSDTF TRTSTMF
#> <chr> <chr> <dttm> <chr> <chr>
#> 1 CDISCPILOT01 01-701-1015 2014-01-02 00:00:00 NA H
#> 2 CDISCPILOT01 01-701-1023 2012-08-05 00:00:00 NA H
#> 3 CDISCPILOT01 01-701-1028 2013-07-19 00:00:00 NA H
#> 4 CDISCPILOT01 01-701-1033 2014-03-18 00:00:00 NA H
#> 5 CDISCPILOT01 01-701-1034 2014-07-01 00:00:00 NA H
#> 6 CDISCPILOT01 01-701-1047 2013-02-12 00:00:00 NA H
#> 7 CDISCPILOT01 01-701-1057 NA NA NA
#> 8 CDISCPILOT01 01-701-1097 2014-01-01 00:00:00 NA H
#> 9 CDISCPILOT01 01-701-1111 2012-09-07 00:00:00 NA H
#> 10 CDISCPILOT01 01-701-1115 2012-11-30 00:00:00 NA H
#> # … with 296 more rows
# Derive treatment end datetime (TRTEDTM)
## Impute exposure end datetime to last time, no date imputation
ex_ext <- derive_vars_dtm(
admiral_ex,
dtc = EXENDTC,
new_vars_prefix = "EXEN",
time_imputation = "last",
)
## Add last exposure datetime and imputation flag to adsl
derive_vars_merged(
select(admiral_dm, STUDYID, USUBJID),
dataset_add = ex_ext,
filter_add = !is.na(EXENDTM),
by_vars = exprs(STUDYID, USUBJID),
new_vars = exprs(TRTEDTM = EXENDTM, TRTETMF = EXENTMF),
order = exprs(EXENDTM),
mode = "last"
)
#> # A tibble: 306 x 4
#> STUDYID USUBJID TRTEDTM TRTETMF
#> <chr> <chr> <dttm> <chr>
#> 1 CDISCPILOT01 01-701-1015 2014-07-02 23:59:59 H
#> 2 CDISCPILOT01 01-701-1023 2012-09-01 23:59:59 H
#> 3 CDISCPILOT01 01-701-1028 2014-01-14 23:59:59 H
#> 4 CDISCPILOT01 01-701-1033 2014-03-31 23:59:59 H
#> 5 CDISCPILOT01 01-701-1034 2014-12-30 23:59:59 H
#> 6 CDISCPILOT01 01-701-1047 2013-03-09 23:59:59 H
#> 7 CDISCPILOT01 01-701-1057 NA NA
#> 8 CDISCPILOT01 01-701-1097 2014-07-09 23:59:59 H
#> 9 CDISCPILOT01 01-701-1111 2012-09-16 23:59:59 H
#> 10 CDISCPILOT01 01-701-1115 2013-01-23 23:59:59 H
#> # … with 296 more rows