Skip to contents

Merge a categorization variable from a dataset to the input dataset. The observations to merge can be selected by a condition and/or selecting the first or last observation for each by group.

Usage

derive_var_merged_cat(
  dataset,
  dataset_add,
  by_vars,
  order = NULL,
  new_var,
  source_var,
  cat_fun,
  filter_add = NULL,
  mode = NULL,
  missing_value = NA_character_
)

Arguments

dataset

Input dataset

The variables specified by the by_vars argument are expected.

dataset_add

Additional dataset

The variables specified by the by_vars, the source_var, and the order argument are expected.

by_vars

Grouping variables

The input dataset and the selected observations from the additional dataset are merged by the specified by variables. The by variables must be a unique key of the selected observations. Variables from the additional dataset can be renamed by naming the element, i.e., by_vars = exprs(<name in input dataset> = <name in additional dataset>), similar to the dplyr joins.

Permitted Values: list of variables created by exprs()

order

Sort order

If the argument is set to a non-null value, for each by group the first or last observation from the additional dataset is selected with respect to the specified order.

Default: NULL

Permitted Values: list of variables or desc(<variable>) function calls created by exprs(), e.g., exprs(ADT, desc(AVAL)) or NULL

new_var

New variable

The specified variable is added to the additional dataset and set to the categorized values, i.e., cat_fun(<source variable>).

source_var

Source variable

cat_fun

Categorization function

A function must be specified for this argument which expects the values of the source variable as input and returns the categorized values.

filter_add

Filter for additional dataset (dataset_add)

Only observations fulfilling the specified condition are taken into account for merging. If the argument is not specified, all observations are considered.

Default: NULL

Permitted Values: a condition

mode

Selection mode

Determines if the first or last observation is selected. If the order argument is specified, mode must be non-null.

If the order argument is not specified, the mode argument is ignored.

Default: NULL

Permitted Values: "first", "last", NULL

missing_value

Values used for missing information

The new variable is set to the specified value for all by groups without observations in the additional dataset.

Default: NA_character_

Value

The output dataset contains all observations and variables of the input dataset and additionally the variable specified for new_var derived from the additional dataset (dataset_add).

Details

  1. The additional dataset is restricted to the observations matching the filter_add condition.

  2. The categorization variable is added to the additional dataset.

  3. If order is specified, for each by group the first or last observation (depending on mode) is selected.

  4. The categorization variable is merged to the input dataset.

Examples

library(admiral.test)
library(dplyr, warn.conflicts = FALSE)
data("admiral_dm")
data("admiral_vs")

wgt_cat <- function(wgt) {
  case_when(
    wgt < 50 ~ "low",
    wgt > 90 ~ "high",
    TRUE ~ "normal"
  )
}

derive_var_merged_cat(
  admiral_dm,
  dataset_add = admiral_vs,
  by_vars = exprs(STUDYID, USUBJID),
  order = exprs(VSDTC, VSSEQ),
  filter_add = VSTESTCD == "WEIGHT" & substr(VISIT, 1, 9) == "SCREENING",
  new_var = WGTBLCAT,
  source_var = VSSTRESN,
  cat_fun = wgt_cat,
  mode = "last"
) %>%
  select(STUDYID, USUBJID, AGE, AGEU, WGTBLCAT)
#> # A tibble: 306 x 5
#>    STUDYID      USUBJID       AGE AGEU  WGTBLCAT
#>    <chr>        <chr>       <dbl> <chr> <chr>   
#>  1 CDISCPILOT01 01-701-1015    63 YEARS normal  
#>  2 CDISCPILOT01 01-701-1023    64 YEARS normal  
#>  3 CDISCPILOT01 01-701-1028    71 YEARS high    
#>  4 CDISCPILOT01 01-701-1033    74 YEARS normal  
#>  5 CDISCPILOT01 01-701-1034    77 YEARS normal  
#>  6 CDISCPILOT01 01-701-1047    85 YEARS normal  
#>  7 CDISCPILOT01 01-701-1057    59 YEARS NA      
#>  8 CDISCPILOT01 01-701-1097    68 YEARS normal  
#>  9 CDISCPILOT01 01-701-1111    81 YEARS normal  
#> 10 CDISCPILOT01 01-701-1115    84 YEARS normal  
#> # … with 296 more rows

# defining a value for missing VS data
derive_var_merged_cat(
  admiral_dm,
  dataset_add = admiral_vs,
  by_vars = exprs(STUDYID, USUBJID),
  order = exprs(VSDTC, VSSEQ),
  filter_add = VSTESTCD == "WEIGHT" & substr(VISIT, 1, 9) == "SCREENING",
  new_var = WGTBLCAT,
  source_var = VSSTRESN,
  cat_fun = wgt_cat,
  mode = "last",
  missing_value = "MISSING"
) %>%
  select(STUDYID, USUBJID, AGE, AGEU, WGTBLCAT)
#> # A tibble: 306 x 5
#>    STUDYID      USUBJID       AGE AGEU  WGTBLCAT
#>    <chr>        <chr>       <dbl> <chr> <chr>   
#>  1 CDISCPILOT01 01-701-1015    63 YEARS normal  
#>  2 CDISCPILOT01 01-701-1023    64 YEARS normal  
#>  3 CDISCPILOT01 01-701-1028    71 YEARS high    
#>  4 CDISCPILOT01 01-701-1033    74 YEARS normal  
#>  5 CDISCPILOT01 01-701-1034    77 YEARS normal  
#>  6 CDISCPILOT01 01-701-1047    85 YEARS normal  
#>  7 CDISCPILOT01 01-701-1057    59 YEARS MISSING 
#>  8 CDISCPILOT01 01-701-1097    68 YEARS normal  
#>  9 CDISCPILOT01 01-701-1111    81 YEARS normal  
#> 10 CDISCPILOT01 01-701-1115    84 YEARS normal  
#> # … with 296 more rows