Skip to contents

Derive Query Variables

Usage

derive_vars_query(dataset, dataset_queries)

Arguments

dataset

Input dataset.

dataset_queries

A dataset containing required columns VAR_PREFIX, QUERY_NAME, TERM_LEVEL, TERM_NAME, TERM_ID, and optional columns QUERY_ID, QUERY_SCOPE, QUERY_SCOPE_NUM.

The content of the dataset will be verified by assert_valid_queries().

create_query_data() can be used to create the dataset.

Value

The input dataset with query variables derived.

Details

This function can be used to derive CDISC variables such as SMQzzNAM, SMQzzCD, SMQzzSC, SMQzzSCN, and CQzzNAM in ADAE and ADMH, and variables such as SDGzzNAM, SDGzzCD, and SDGzzSC in ADCM. An example usage of this function can be found in the OCCDS vignette.

A query dataset is expected as an input to this function. See the Queries Dataset Documentation vignette for descriptions, or call data("queries") for an example of a query dataset.

For each unique element in VAR_PREFIX, the corresponding "NAM" variable will be created. For each unique VAR_PREFIX, if QUERY_ID is not "" or NA, then the corresponding "CD" variable is created; similarly, if QUERY_SCOPE is not "" or NA, then the corresponding "SC" variable will be created; if QUERY_SCOPE_NUM is not "" or NA, then the corresponding "SCN" variable will be created.

For each record in dataset, the "NAM" variable takes the value of QUERY_NAME if the value of TERM_NAME or TERM_ID in dataset_queries matches the value of the respective TERM_LEVEL in dataset. Note that TERM_NAME in dataset_queries dataset may be NA only when TERM_ID is non-NA and vice versa. The "CD", "SC", and "SCN" variables are derived accordingly based on QUERY_ID, QUERY_SCOPE, and QUERY_SCOPE_NUM respectively, whenever not missing.

Examples

library(tibble)
data("queries")
adae <- tribble(
  ~USUBJID, ~ASTDTM, ~AETERM, ~AESEQ, ~AEDECOD, ~AELLT, ~AELLTCD,
  "01", "2020-06-02 23:59:59", "ALANINE AMINOTRANSFERASE ABNORMAL",
  3, "Alanine aminotransferase abnormal", NA_character_, NA_integer_,
  "02", "2020-06-05 23:59:59", "BASEDOW'S DISEASE",
  5, "Basedow's disease", NA_character_, 1L,
  "03", "2020-06-07 23:59:59", "SOME TERM",
  2, "Some query", "Some term", NA_integer_,
  "05", "2020-06-09 23:59:59", "ALVEOLAR PROTEINOSIS",
  7, "Alveolar proteinosis", NA_character_, NA_integer_
)
derive_vars_query(adae, queries)
#> # A tibble: 4 x 24
#>   USUBJID ASTDTM  AETERM   AESEQ AEDECOD  AELLT AELLTCD SMQ02NAM SMQ02CD SMQ02SC
#>   <chr>   <chr>   <chr>    <dbl> <chr>    <chr>   <int> <chr>      <int> <chr>  
#> 1 01      2020-0… ALANINE…     3 Alanine… NA         NA NA            NA NA     
#> 2 02      2020-0… BASEDOW…     5 Basedow… NA          1 NA            NA NA     
#> 3 03      2020-0… SOME TE…     2 Some qu… Some…      NA NA            NA NA     
#> 4 05      2020-0… ALVEOLA…     7 Alveola… NA         NA NA            NA NA     
#> # … with 14 more variables: SMQ02SCN <dbl>, SMQ03NAM <chr>, SMQ03CD <int>,
#> #   SMQ03SC <chr>, SMQ03SCN <dbl>, SMQ05NAM <chr>, SMQ05CD <int>,
#> #   SMQ05SC <chr>, SMQ05SCN <dbl>, CQ01NAM <chr>, CQ04NAM <chr>, CQ04CD <int>,
#> #   CQ06NAM <chr>, CQ06CD <int>