Add New Variable(s) to the Input Dataset Based on Variables from Another Dataset

Add new variable(s) to the input dataset based on variables from another dataset. The observations to merge can be selected by a condition (filter_add argument) and/or selecting the first or last observation for each by group (order and mode argument).

Usage

derive_vars_merged(
  dataset,
  dataset_add,
  by_vars,
  order = NULL,
  new_vars = NULL,
  mode = NULL,
  filter_add = NULL,
  match_flag = NULL,
  check_type = "warning",
  duplicate_msg = NULL
)

Arguments

dataset

Input dataset

The variables specified by the by_vars argument are expected.

dataset_add

Additional dataset

The variables specified by the by_vars, the new_vars, and the order argument are expected.

by_vars

Grouping variables

The input dataset and the selected observations from the additional dataset are merged by the specified by variables. The by variables must be a unique key of the selected observations. Variables from the additional dataset can be renamed by naming the element, i.e., by_vars = exprs(<name in input dataset> = <name in additional dataset>), similar to the dplyr joins.

Permitted Values: list of variables created by exprs()

order

Sort order

If the argument is set to a non-null value, for each by group the first or last observation from the additional dataset is selected with respect to the specified order.

Default: NULL

Permitted Values: list of variables or desc(<variable>) function calls created by exprs(), e.g., exprs(ADT, desc(AVAL)) or NULL

new_vars

Variables to add

The specified variables from the additional dataset are added to the output dataset. Variables can be renamed by naming the element, i.e., new_vars = exprs(<new name> = <old name>).

For example new_vars = exprs(var1, var2) adds variables var1 and var2 from dataset_add to the input dataset.

And new_vars = exprs(var1, new_var2 = old_var2) takes var1 and old_var2 from dataset_add and adds them to the input dataset renaming old_var2 to new_var2.

If the argument is not specified or set to NULL, all variables from the additional dataset (dataset_add) are added.

Default: NULL

Permitted Values: list of variables created by exprs()

mode

Selection mode

Determines if the first or last observation is selected. If the order argument is specified, mode must be non-null.

If the order argument is not specified, the mode argument is ignored.

Default: NULL

Permitted Values: "first", "last", NULL

filter_add

Filter for additional dataset (dataset_add)

Only observations fulfilling the specified condition are taken into account for merging. If the argument is not specified, all observations are considered.

Default: NULL

Permitted Values: a condition

match_flag

Match flag

If the argument is specified (e.g., match_flag = FLAG), the specified variable (e.g., FLAG) is added to the input dataset. This variable will be TRUE for all selected records from dataset_add which are merged into the input dataset, and NA otherwise.

Default: NULL

Permitted Values: Variable name

check_type

Check uniqueness?

If "warning" or "error" is specified, the specified message is issued if the observations of the (restricted) additional dataset are not unique with respect to the by variables and the order.

Default: "warning"

Permitted Values: "none", "warning", "error"

duplicate_msg

Message of unique check

If the uniqueness check fails, the specified message is displayed.

Default:

paste("Dataset `dataset_add` contains duplicate records with respect to",
      enumerate(vars2chr(by_vars)))

Value

The output dataset contains all observations and variables of the input dataset and additionally the variables specified for new_vars from the additional dataset (dataset_add).

Details

The records from the additional dataset (dataset_add) are restricted to those matching the filter_add condition.
If order is specified, for each by group the first or last observation (depending on mode) is selected.
The variables specified for new_vars are renamed (if requested) and merged to the input dataset using left_join(). I.e., the output dataset contains all observations from the input dataset. For observations without a matching observation in the additional dataset the new variables are set to NA. Observations in the additional dataset which have no matching observation in the input dataset are ignored.

Examples

library(admiral.test)
library(dplyr, warn.conflicts = FALSE)
data("admiral_vs")
data("admiral_dm")

# Merging all dm variables to vs
derive_vars_merged(
  admiral_vs,
  dataset_add = select(admiral_dm, -DOMAIN),
  by_vars = exprs(STUDYID, USUBJID)
) %>%
  select(STUDYID, USUBJID, VSTESTCD, VISIT, VSTPT, VSSTRESN, AGE, AGEU)
#> # A tibble: 29,643 x 8
#>    STUDYID    USUBJID   VSTESTCD VISIT      VSTPT           VSSTRESN   AGE AGEU 
#>    <chr>      <chr>     <chr>    <chr>      <chr>              <dbl> <dbl> <chr>
#>  1 CDISCPILO… 01-701-1… DIABP    SCREENING… AFTER LYING DO…       64    63 YEARS
#>  2 CDISCPILO… 01-701-1… DIABP    SCREENING… AFTER STANDING…       83    63 YEARS
#>  3 CDISCPILO… 01-701-1… DIABP    SCREENING… AFTER STANDING…       57    63 YEARS
#>  4 CDISCPILO… 01-701-1… DIABP    SCREENING… AFTER LYING DO…       68    63 YEARS
#>  5 CDISCPILO… 01-701-1… DIABP    SCREENING… AFTER STANDING…       59    63 YEARS
#>  6 CDISCPILO… 01-701-1… DIABP    SCREENING… AFTER STANDING…       71    63 YEARS
#>  7 CDISCPILO… 01-701-1… DIABP    BASELINE   AFTER LYING DO…       56    63 YEARS
#>  8 CDISCPILO… 01-701-1… DIABP    BASELINE   AFTER STANDING…       51    63 YEARS
#>  9 CDISCPILO… 01-701-1… DIABP    BASELINE   AFTER STANDING…       61    63 YEARS
#> 10 CDISCPILO… 01-701-1… DIABP    AMBUL ECG… AFTER LYING DO…       67    63 YEARS
#> # … with 29,633 more rows

# Merge last weight to adsl
data("admiral_adsl")
derive_vars_merged(
  admiral_adsl,
  dataset_add = admiral_vs,
  by_vars = exprs(STUDYID, USUBJID),
  order = exprs(VSDTC),
  mode = "last",
  new_vars = exprs(LASTWGT = VSSTRESN, LASTWGTU = VSSTRESU),
  filter_add = VSTESTCD == "WEIGHT",
  match_flag = vsdatafl
) %>%
  select(STUDYID, USUBJID, AGE, AGEU, LASTWGT, LASTWGTU, vsdatafl)
#> # A tibble: 306 x 7
#>    STUDYID      USUBJID       AGE AGEU  LASTWGT LASTWGTU vsdatafl
#>    <chr>        <chr>       <dbl> <chr>   <dbl> <chr>    <lgl>   
#>  1 CDISCPILOT01 01-701-1015    63 YEARS    53.5 kg       TRUE    
#>  2 CDISCPILOT01 01-701-1023    64 YEARS    80.3 kg       TRUE    
#>  3 CDISCPILOT01 01-701-1028    71 YEARS    99.8 kg       TRUE    
#>  4 CDISCPILOT01 01-701-1033    74 YEARS    88.4 kg       TRUE    
#>  5 CDISCPILOT01 01-701-1034    77 YEARS    64.0 kg       TRUE    
#>  6 CDISCPILOT01 01-701-1047    85 YEARS    67.1 kg       TRUE    
#>  7 CDISCPILOT01 01-701-1057    59 YEARS    NA   NA       NA      
#>  8 CDISCPILOT01 01-701-1097    68 YEARS    78.5 kg       TRUE    
#>  9 CDISCPILOT01 01-701-1111    81 YEARS    60.8 kg       TRUE    
#> 10 CDISCPILOT01 01-701-1115    84 YEARS    78.5 kg       TRUE    
#> # … with 296 more rows

# Derive treatment start datetime (TRTSDTM)
data(admiral_ex)

## Impute exposure start date to first date/time
ex_ext <- derive_vars_dtm(
  admiral_ex,
  dtc = EXSTDTC,
  new_vars_prefix = "EXST",
  highest_imputation = "M",
)

## Add first exposure datetime and imputation flags to adsl
derive_vars_merged(
  select(admiral_dm, STUDYID, USUBJID),
  dataset_add = ex_ext,
  by_vars = exprs(STUDYID, USUBJID),
  new_vars = exprs(TRTSDTM = EXSTDTM, TRTSDTF = EXSTDTF, TRTSTMF = EXSTTMF),
  order = exprs(EXSTDTM),
  mode = "first"
)
#> # A tibble: 306 x 5
#>    STUDYID      USUBJID     TRTSDTM             TRTSDTF TRTSTMF
#>    <chr>        <chr>       <dttm>              <chr>   <chr>  
#>  1 CDISCPILOT01 01-701-1015 2014-01-02 00:00:00 NA      H      
#>  2 CDISCPILOT01 01-701-1023 2012-08-05 00:00:00 NA      H      
#>  3 CDISCPILOT01 01-701-1028 2013-07-19 00:00:00 NA      H      
#>  4 CDISCPILOT01 01-701-1033 2014-03-18 00:00:00 NA      H      
#>  5 CDISCPILOT01 01-701-1034 2014-07-01 00:00:00 NA      H      
#>  6 CDISCPILOT01 01-701-1047 2013-02-12 00:00:00 NA      H      
#>  7 CDISCPILOT01 01-701-1057 NA                  NA      NA     
#>  8 CDISCPILOT01 01-701-1097 2014-01-01 00:00:00 NA      H      
#>  9 CDISCPILOT01 01-701-1111 2012-09-07 00:00:00 NA      H      
#> 10 CDISCPILOT01 01-701-1115 2012-11-30 00:00:00 NA      H      
#> # … with 296 more rows

# Derive treatment start datetime (TRTSDTM)
data(admiral_ex)

## Impute exposure start date to first date/time
ex_ext <- derive_vars_dtm(
  admiral_ex,
  dtc = EXSTDTC,
  new_vars_prefix = "EXST",
  highest_imputation = "M",
)

## Add first exposure datetime and imputation flags to adsl
derive_vars_merged(
  select(admiral_dm, STUDYID, USUBJID),
  dataset_add = ex_ext,
  filter_add = !is.na(EXSTDTM),
  by_vars = exprs(STUDYID, USUBJID),
  new_vars = exprs(TRTSDTM = EXSTDTM, TRTSDTF = EXSTDTF, TRTSTMF = EXSTTMF),
  order = exprs(EXSTDTM),
  mode = "first"
)
#> # A tibble: 306 x 5
#>    STUDYID      USUBJID     TRTSDTM             TRTSDTF TRTSTMF
#>    <chr>        <chr>       <dttm>              <chr>   <chr>  
#>  1 CDISCPILOT01 01-701-1015 2014-01-02 00:00:00 NA      H      
#>  2 CDISCPILOT01 01-701-1023 2012-08-05 00:00:00 NA      H      
#>  3 CDISCPILOT01 01-701-1028 2013-07-19 00:00:00 NA      H      
#>  4 CDISCPILOT01 01-701-1033 2014-03-18 00:00:00 NA      H      
#>  5 CDISCPILOT01 01-701-1034 2014-07-01 00:00:00 NA      H      
#>  6 CDISCPILOT01 01-701-1047 2013-02-12 00:00:00 NA      H      
#>  7 CDISCPILOT01 01-701-1057 NA                  NA      NA     
#>  8 CDISCPILOT01 01-701-1097 2014-01-01 00:00:00 NA      H      
#>  9 CDISCPILOT01 01-701-1111 2012-09-07 00:00:00 NA      H      
#> 10 CDISCPILOT01 01-701-1115 2012-11-30 00:00:00 NA      H      
#> # … with 296 more rows

# Derive treatment end datetime (TRTEDTM)
## Impute exposure end datetime to last time, no date imputation
ex_ext <- derive_vars_dtm(
  admiral_ex,
  dtc = EXENDTC,
  new_vars_prefix = "EXEN",
  time_imputation = "last",
)

## Add last exposure datetime and imputation flag to adsl
derive_vars_merged(
  select(admiral_dm, STUDYID, USUBJID),
  dataset_add = ex_ext,
  filter_add = !is.na(EXENDTM),
  by_vars = exprs(STUDYID, USUBJID),
  new_vars = exprs(TRTEDTM = EXENDTM, TRTETMF = EXENTMF),
  order = exprs(EXENDTM),
  mode = "last"
)
#> # A tibble: 306 x 4
#>    STUDYID      USUBJID     TRTEDTM             TRTETMF
#>    <chr>        <chr>       <dttm>              <chr>  
#>  1 CDISCPILOT01 01-701-1015 2014-07-02 23:59:59 H      
#>  2 CDISCPILOT01 01-701-1023 2012-09-01 23:59:59 H      
#>  3 CDISCPILOT01 01-701-1028 2014-01-14 23:59:59 H      
#>  4 CDISCPILOT01 01-701-1033 2014-03-31 23:59:59 H      
#>  5 CDISCPILOT01 01-701-1034 2014-12-30 23:59:59 H      
#>  6 CDISCPILOT01 01-701-1047 2013-03-09 23:59:59 H      
#>  7 CDISCPILOT01 01-701-1057 NA                  NA     
#>  8 CDISCPILOT01 01-701-1097 2014-07-09 23:59:59 H      
#>  9 CDISCPILOT01 01-701-1111 2012-09-16 23:59:59 H      
#> 10 CDISCPILOT01 01-701-1115 2013-01-23 23:59:59 H      
#> # … with 296 more rows

Add New Variable(s) to the Input Dataset Based on Variables from Another Dataset

Usage

Arguments

Value

Details

See also

Examples