Skip to contents

Merge a summary variable from a dataset to the input dataset.

Usage

derive_var_merged_summary(
  dataset,
  dataset_add,
  by_vars,
  new_var,
  filter_add = NULL,
  analysis_var,
  summary_fun
)

Arguments

dataset

Input dataset

The variables specified by the by_vars argument are expected.

dataset_add

Additional dataset

The variables specified by the by_vars and the analysis_var arguments are expected.

by_vars

Grouping variables

The values of analysis_var are summarized by the specified variables. The summarized values are merged to the input dataset (dataset) by the specified by variables.

Permitted Values: list of variables created by exprs()

new_var

Variable to add

The specified variable is added to the input dataset (dataset) and set to the summarized values.

filter_add

Filter for additional dataset (dataset_add)

Only observations fulfilling the specified condition are taken into account for summarizing. If the argument is not specified, all observations are considered.

Permitted Values: a condition

analysis_var

Analysis variable

The values of the specified variable are summarized by the function specified for summary_fun.

summary_fun

Summary function

The specified function that takes as input analysis_var and performs the calculation. This can include built-in functions as well as user defined functions, for example mean or function(x) mean(x, na.rm = TRUE).

Value

The output dataset contains all observations and variables of the input dataset and additionally the variable specified for new_var.

Details

  1. The records from the additional dataset (dataset_add) are restricted to those matching the filter_add condition.

  2. The values of the analysis variable (analysis_var) are summarized by the summary function (summary_fun) for each by group (by_vars) in the additional dataset (dataset_add).

  3. The summarized values are merged to the input dataset as a new variable (new_var). For observations without a matching observation in the additional dataset the new variable is set to NA. Observations in the additional dataset which have no matching observation in the input dataset are ignored.

Examples

library(tibble)

# Add a variable for the mean of AVAL within each visit
adbds <- tribble(
  ~USUBJID, ~AVISIT,  ~ASEQ, ~AVAL,
  "1",      "WEEK 1",     1,    10,
  "1",      "WEEK 1",     2,    NA,
  "1",      "WEEK 2",     3,    NA,
  "1",      "WEEK 3",     4,    42,
  "1",      "WEEK 4",     5,    12,
  "1",      "WEEK 4",     6,    12,
  "1",      "WEEK 4",     7,    15,
  "2",      "WEEK 1",     1,    21,
  "2",      "WEEK 4",     2,    22
)

derive_var_merged_summary(
  adbds,
  dataset_add = adbds,
  by_vars = exprs(USUBJID, AVISIT),
  new_var = MEANVIS,
  analysis_var = AVAL,
  summary_fun = function(x) mean(x, na.rm = TRUE)
)
#> # A tibble: 9 x 5
#>   USUBJID AVISIT  ASEQ  AVAL MEANVIS
#>   <chr>   <chr>  <dbl> <dbl>   <dbl>
#> 1 1       WEEK 1     1    10      10
#> 2 1       WEEK 1     2    NA      10
#> 3 1       WEEK 2     3    NA     NaN
#> 4 1       WEEK 3     4    42      42
#> 5 1       WEEK 4     5    12      13
#> 6 1       WEEK 4     6    12      13
#> 7 1       WEEK 4     7    15      13
#> 8 2       WEEK 1     1    21      21
#> 9 2       WEEK 4     2    22      22

# Add a variable listing the lesion ids at baseline
adsl <- tribble(
  ~USUBJID,
  "1",
  "2",
  "3"
)

adtr <- tribble(
  ~USUBJID, ~AVISIT,    ~LESIONID,
  "1",      "BASELINE", "INV-T1",
  "1",      "BASELINE", "INV-T2",
  "1",      "BASELINE", "INV-T3",
  "1",      "BASELINE", "INV-T4",
  "1",      "WEEK 1",   "INV-T1",
  "1",      "WEEK 1",   "INV-T2",
  "1",      "WEEK 1",   "INV-T4",
  "2",      "BASELINE", "INV-T1",
  "2",      "BASELINE", "INV-T2",
  "2",      "BASELINE", "INV-T3",
  "2",      "WEEK 1",   "INV-T1",
  "2",      "WEEK 1",   "INV-N1"
)

derive_var_merged_summary(
  adsl,
  dataset_add = adtr,
  by_vars = exprs(USUBJID),
  filter_add = AVISIT == "BASELINE",
  new_var = LESIONSBL,
  analysis_var = LESIONID,
  summary_fun = function(x) paste(x, collapse = ", ")
)
#> # A tibble: 3 x 2
#>   USUBJID LESIONSBL                     
#>   <chr>   <chr>                         
#> 1 1       INV-T1, INV-T2, INV-T3, INV-T4
#> 2 2       INV-T1, INV-T2, INV-T3        
#> 3 3       NA