
Add Variables from an Additional Dataset Based on Conditions from Both Datasets
Source:R/derive_joined.R
derive_vars_joined.RdThe function adds variables from an additional dataset to the input dataset. The selection of the observations from the additional dataset can depend on variables from both datasets. For example, add the lowest value (nadir) before the current observation.
Usage
derive_vars_joined(
dataset,
dataset_add,
by_vars = NULL,
order = NULL,
new_vars = NULL,
join_vars = NULL,
filter_add = NULL,
filter_join = NULL,
mode = NULL,
check_type = "warning"
)Arguments
- dataset
Input dataset
The variables specified by
by_varsare expected.- dataset_add
Additional dataset
The variables specified by the
by_vars, thenew_vars, thejoin_vars, and theorderargument are expected.- by_vars
Grouping variables
The two datasets are joined by the specified variables. Variables from the additional dataset can be renamed by naming the element, i.e.,
by_vars = exprs(<name in input dataset> = <name in additional dataset>).Permitted Values: list of variables created by
exprs()- order
Sort order
If the argument is set to a non-null value, for each observation of the input dataset the first or last observation from the joined dataset is selected with respect to the specified order. The specified variables are expected in the additional dataset (
dataset_add). If a variable is available in bothdatasetanddataset_add, the one fromdataset_addis used for the sorting.Permitted Values: list of variables or
desc(<variable>)function calls created byexprs(), e.g.,exprs(ADT, desc(AVAL))orNULL- new_vars
Variables to add
The specified variables from the additional dataset are added to the output dataset. Variables can be renamed by naming the element, i.e.,
new_vars = exprs(<new name> = <old name>).For example
new_vars = exprs(var1, var2)adds variablesvar1andvar2fromdataset_addto the input dataset.And
new_vars = exprs(var1, new_var2 = old_var2)takesvar1andold_var2fromdataset_addand adds them to the input dataset renamingold_var2tonew_var2.If the argument is not specified or set to
NULL, all variables from the additional dataset (dataset_add) are added.Permitted Values: list of variables created by
exprs()- join_vars
Variables to use from additional dataset
Any extra variables required from the additional dataset for
filter_joinshould be specified for this argument. Variables specified fornew_varsdo not need to be repeated forjoin_vars. If a specified variable exists in both the input dataset and the additional dataset, the suffix ".join" is added to the variable from the additional dataset.The variables are not included in the output dataset.
Permitted Values: list of variables created by
exprs()- filter_add
Filter for additional dataset (
dataset_add)Only observations from
dataset_addfulfilling the specified condition are joined to the input dataset. If the argument is not specified, all observations are joined.Permitted Values: a condition
- filter_join
Filter for the joined dataset
The specified condition is applied to the joined dataset. Therefore variables from both datasets
datasetanddataset_addcan be used.Permitted Values: a condition
- mode
Selection mode
Determines if the first or last observation is selected. If the
orderargument is specified,modemust be non-null.If the
orderargument is not specified, themodeargument is ignored.Permitted Values:
"first","last",NULL- check_type
Check uniqueness?
If
"warning"or"error"is specified, the specified message is issued if the observations of the (restricted) joined dataset are not unique with respect to the by variables and the order.This argument is ignored if
orderis not specified. In this case an error is issued independent ofcheck_typeif the restricted joined dataset contains more than one observation for any of the observations of the input dataset.Permitted Values:
"none","warning","error"
Value
The output dataset contains all observations and variables of the
input dataset and additionally the variables specified for new_vars from
the additional dataset (dataset_add).
Details
The records from the additional dataset (
dataset_add) are restricted to those matching thefilter_addcondition.The input dataset and the (restricted) additional dataset are left joined by the grouping variables (
by_vars). If no grouping variables are specified, a full join is performed.The joined dataset is restricted by the
filter_joincondition.If
orderis specified, for each observation of the input dataset the first or last observation (depending onmode) is selected.The variables specified for
new_varsare renamed (if requested) and merged to the input dataset. I.e., the output dataset contains all observations from the input dataset. For observations without a matching observation in the joined dataset the new variables are set toNA. Observations in the additional dataset which have no matching observation in the input dataset are ignored.
See also
General Derivation Functions for all ADaMs that returns variable appended to dataset:
derive_var_extreme_flag(),
derive_var_joined_exist_flag(),
derive_var_last_dose_amt(),
derive_var_last_dose_date(),
derive_var_last_dose_grp(),
derive_var_merged_cat(),
derive_var_merged_character(),
derive_var_merged_exist_flag(),
derive_var_merged_summary(),
derive_var_obs_number(),
derive_var_relative_flag(),
derive_vars_last_dose(),
derive_vars_merged_lookup(),
derive_vars_merged(),
derive_vars_transposed(),
get_summary_records()
Examples
library(tibble)
library(lubridate)
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
# Add AVISIT (based on time windows), AWLO, and AWHI
adbds <- tribble(
~USUBJID, ~ADY,
"1", -33,
"1", -2,
"1", 3,
"1", 24,
"2", NA,
)
windows <- tribble(
~AVISIT, ~AWLO, ~AWHI,
"BASELINE", -30, 1,
"WEEK 1", 2, 7,
"WEEK 2", 8, 15,
"WEEK 3", 16, 22,
"WEEK 4", 23, 30
)
derive_vars_joined(
adbds,
dataset_add = windows,
filter_join = AWLO <= ADY & ADY <= AWHI
)
#> # A tibble: 5 x 5
#> USUBJID ADY AVISIT AWLO AWHI
#> <chr> <dbl> <chr> <dbl> <dbl>
#> 1 1 -33 NA NA NA
#> 2 1 -2 BASELINE -30 1
#> 3 1 3 WEEK 1 2 7
#> 4 1 24 WEEK 4 23 30
#> 5 2 NA NA NA NA
# derive the nadir after baseline and before the current observation
adbds <- tribble(
~USUBJID, ~ADY, ~AVAL,
"1", -7, 10,
"1", 1, 12,
"1", 8, 11,
"1", 15, 9,
"1", 20, 14,
"1", 24, 12,
"2", 13, 8
)
derive_vars_joined(
adbds,
dataset_add = adbds,
by_vars = exprs(USUBJID),
order = exprs(AVAL),
new_vars = exprs(NADIR = AVAL),
join_vars = exprs(ADY),
filter_add = ADY > 0,
filter_join = ADY.join < ADY,
mode = "first",
check_type = "none"
)
#> # A tibble: 7 x 4
#> USUBJID ADY AVAL NADIR
#> <chr> <dbl> <dbl> <dbl>
#> 1 1 -7 10 NA
#> 2 1 1 12 NA
#> 3 1 8 11 12
#> 4 1 15 9 11
#> 5 1 20 14 9
#> 6 1 24 12 9
#> 7 2 13 8 NA
# add highest hemoglobin value within two weeks before AE,
# take earliest if more than one
adae <- tribble(
~USUBJID, ~ASTDY,
"1", 3,
"1", 22,
"2", 2
)
adlb <- tribble(
~USUBJID, ~PARAMCD, ~ADY, ~AVAL,
"1", "HGB", 1, 8.5,
"1", "HGB", 3, 7.9,
"1", "HGB", 5, 8.9,
"1", "HGB", 8, 8.0,
"1", "HGB", 9, 8.0,
"1", "HGB", 16, 7.4,
"1", "HGB", 24, 8.1,
"1", "ALB", 1, 42,
)
derive_vars_joined(
adae,
dataset_add = adlb,
by_vars = exprs(USUBJID),
order = exprs(AVAL, desc(ADY)),
new_vars = exprs(HGB_MAX = AVAL, HGB_DY = ADY),
filter_add = PARAMCD == "HGB",
filter_join = ASTDY - 14 <= ADY & ADY <= ASTDY,
mode = "last"
)
#> # A tibble: 3 x 4
#> USUBJID ASTDY HGB_MAX HGB_DY
#> <chr> <dbl> <dbl> <dbl>
#> 1 1 3 8.5 1
#> 2 1 22 8 8
#> 3 2 2 NA NA
# Add APERIOD, APERIODC based on ADSL
adsl <- tribble(
~USUBJID, ~AP01SDT, ~AP01EDT, ~AP02SDT, ~AP02EDT,
"1", "2021-01-04", "2021-02-06", "2021-02-07", "2021-03-07",
"2", "2021-02-02", "2021-03-02", "2021-03-03", "2021-04-01"
) %>%
mutate(across(ends_with("DT"), ymd)) %>%
mutate(STUDYID = "xyz")
period_ref <- create_period_dataset(
adsl,
new_vars = exprs(APERSDT = APxxSDT, APEREDT = APxxEDT)
)
period_ref
#> # A tibble: 4 x 5
#> STUDYID USUBJID APERIOD APERSDT APEREDT
#> <chr> <chr> <int> <date> <date>
#> 1 xyz 1 1 2021-01-04 2021-02-06
#> 2 xyz 1 2 2021-02-07 2021-03-07
#> 3 xyz 2 1 2021-02-02 2021-03-02
#> 4 xyz 2 2 2021-03-03 2021-04-01
adae <- tribble(
~USUBJID, ~ASTDT,
"1", "2021-01-01",
"1", "2021-01-05",
"1", "2021-02-05",
"1", "2021-03-05",
"1", "2021-04-05",
"2", "2021-02-15",
) %>%
mutate(
ASTDT = ymd(ASTDT),
STUDYID = "xyz"
)
derive_vars_joined(
adae,
dataset_add = period_ref,
by_vars = exprs(STUDYID, USUBJID),
join_vars = exprs(APERSDT, APEREDT),
filter_join = APERSDT <= ASTDT & ASTDT <= APEREDT
)
#> # A tibble: 6 x 6
#> USUBJID ASTDT STUDYID APERIOD APERSDT APEREDT
#> <chr> <date> <chr> <int> <date> <date>
#> 1 1 2021-01-01 xyz NA NA NA
#> 2 1 2021-01-05 xyz 1 2021-01-04 2021-02-06
#> 3 1 2021-02-05 xyz 1 2021-01-04 2021-02-06
#> 4 1 2021-03-05 xyz 2 2021-02-07 2021-03-07
#> 5 1 2021-04-05 xyz NA NA NA
#> 6 2 2021-02-15 xyz 1 2021-02-02 2021-03-02