Derive/Impute a Datetime from a Date Character Vector
Source:R/derive_date_vars.R
derive_vars_dtm.Rd
Derive a datetime object ('--DTM'
) from a date character vector ('--DTC'
).
The date and time can be imputed (see date_imputation
/time_imputation
arguments)
and the date/time imputation flag ('--DTF'
, '--TMF'
) can be added.
Usage
derive_vars_dtm(
dataset,
new_vars_prefix,
dtc,
highest_imputation = "h",
date_imputation = "first",
time_imputation = "first",
flag_imputation = "auto",
min_dates = NULL,
max_dates = NULL,
preserve = FALSE,
ignore_seconds_flag = FALSE
)
Arguments
- dataset
Input dataset
The date character vector (
dtc
) must be present.- new_vars_prefix
Prefix used for the output variable(s).
A character scalar is expected. For the date variable "DT" is appended to the specified prefix, for the date imputation flag "DTF", and for the time imputation flag "TMF". I.e., for
new_vars_prefix = "AST"
the variablesASTDT
,ASTDTF
, andASTTMF
are created.- dtc
The
'--DTC'
date to imputeA character date is expected in a format like
yyyy-mm-dd
oryyyy-mm-ddThh:mm:ss
. Trailing components can be omitted and-
is a valid "missing" value for any component.- highest_imputation
Highest imputation level
The
highest_imputation
argument controls which components of the DTC value are imputed if they are missing. All components up to the specified level are imputed.If a component at a higher level than the highest imputation level is missing,
NA_character_
is returned. For example, forhighest_imputation = "D"
"2020"
results inNA_character_
because the month is missing.If
"n"
is specified, no imputation is performed, i.e., if any component is missing,NA_character_
is returned.If
"Y"
is specified,date_imputation
should be"first"
or"last"
andmin_dates
ormax_dates
should be specified respectively. Otherwise,NA_character_
is returned if the year component is missing.Default:
"h"
Permitted Values:
"Y"
(year, highest level),"M"
(month),"D"
(day),"h"
(hour),"m"
(minute),"s"
(second),"n"
(none, lowest level)- date_imputation
The value to impute the day/month when a datepart is missing.
A character value is expected, either as a
format with month and day specified as
"mm-dd"
: e.g."06-15"
for the 15th of June (The year can not be specified; for imputing the year"first"
or"last"
together withmin_dates
ormax_dates
argument can be used (see examples).),or as a keyword:
"first"
,"mid"
,"last"
to impute to the first/mid/last day/month.
The argument is ignored if
highest_imputation
is less then"D"
.Default:
"first"
.- time_imputation
The value to impute the time when a timepart is missing.
A character value is expected, either as a
format with hour, min and sec specified as
"hh:mm:ss"
: e.g."00:00:00"
for the start of the day,or as a keyword:
"first"
,"last"
to impute to the start/end of a day.
The argument is ignored if
highest_imputation = "n"
.Default:
"first"
.- flag_imputation
Whether the date/time imputation flag(s) must also be derived.
If
"auto"
is specified, the date imputation flag is derived if thedate_imputation
argument is not null and the time imputation flag is derived if thetime_imputation
argument is not nullDefault:
"auto"
Permitted Values:
"auto"
,"date"
,"time"
,"both"
, or"none"
- min_dates
Minimum dates
A list of dates is expected. It is ensured that the imputed date is not before any of the specified dates, e.g., that the imputed adverse event start date is not before the first treatment date. Only dates which are in the range of possible dates of the
dtc
value are considered. The possible dates are defined by the missing parts of thedtc
date (see example below). This ensures that the non-missing parts of thedtc
date are not changed. A date or date-time object is expected. For exampleimpute_dtc_dtm( "2020-11", min_dates = list( ymd_hms("2020-12-06T12:12:12"), ymd_hms("2020-11-11T11:11:11") ), highest_imputation = "M" )
returns
"2020-11-11T11:11:11"
because the possible dates for"2020-11"
range from"2020-11-01T00:00:00"
to"2020-11-30T23:59:59"
. Therefore"2020-12-06T12:12:12"
is ignored. Returning"2020-12-06T12:12:12"
would have changed the month although it is not missing (in thedtc
date).For date variables (not datetime) in the list the time is imputed to
"00:00:00"
. Specifying date variables makes sense only if the date is imputed. If only time is imputed, date variables do not affect the result.- max_dates
Maximum dates
A list of dates is expected. It is ensured that the imputed date is not after any of the specified dates, e.g., that the imputed date is not after the data cut off date. Only dates which are in the range of possible dates are considered. A date or date-time object is expected.
For date variables (not datetime) in the list the time is imputed to
"23:59:59"
. Specifying date variables makes sense only if the date is imputed. If only time is imputed, date variables do not affect the result.- preserve
Preserve day if month is missing and day is present
For example
"2019---07"
would return"2019-06-07
ifpreserve = TRUE
(anddate_imputation = "mid"
).Permitted Values:
TRUE
,FALSE
Default:
FALSE
- ignore_seconds_flag
ADaM IG states that given SDTM (
'--DTC'
) variable, if only hours and minutes are ever collected, and seconds are imputed in ('--DTM'
) as 00, then it is not necessary to set ('--TMF'
) to'S'
. A user can set this toTRUE
so the'S'
Flag is dropped from ('--TMF'
).A logical value
Default:
FALSE
Value
The input dataset with the datetime '--DTM'
(and the date/time imputation
flag '--DTF'
, '--TMF'
) added.
Details
In admiral we don't allow users to pick any single part of the date/time to impute, we only enable to impute up to a highest level, i.e. you couldn't choose to say impute months, but not days.
The presence of a '--DTF'
variable is checked and the variable is not derived
if it already exists in the input dataset. However, if '--TMF'
already exists
in the input dataset, a warning is issued and '--TMF'
will be overwritten.
See also
Date/Time Derivation Functions that returns variable appended to dataset:
derive_var_trtdurd()
,
derive_vars_dtm_to_dt()
,
derive_vars_dtm_to_tm()
,
derive_vars_dt()
,
derive_vars_duration()
,
derive_vars_dy()
Examples
library(tibble)
library(lubridate)
mhdt <- tribble(
~MHSTDTC,
"2019-07-18T15:25:40",
"2019-07-18T15:25",
"2019-07-18",
"2019-02",
"2019",
"2019---07",
""
)
derive_vars_dtm(
mhdt,
new_vars_prefix = "AST",
dtc = MHSTDTC,
highest_imputation = "M"
)
#> # A tibble: 7 x 4
#> MHSTDTC ASTDTM ASTDTF ASTTMF
#> <chr> <dttm> <chr> <chr>
#> 1 "2019-07-18T15:25:40" 2019-07-18 15:25:40 NA NA
#> 2 "2019-07-18T15:25" 2019-07-18 15:25:00 NA S
#> 3 "2019-07-18" 2019-07-18 00:00:00 NA H
#> 4 "2019-02" 2019-02-01 00:00:00 D H
#> 5 "2019" 2019-01-01 00:00:00 M H
#> 6 "2019---07" 2019-01-01 00:00:00 M H
#> 7 "" NA NA NA
# Impute AE end date to the last date and ensure that the imputed date is not
# after the death or data cut off date
adae <- tribble(
~AEENDTC, ~DTHDT, ~DCUTDT,
"2020-12", ymd("2020-12-06"), ymd("2020-12-24"),
"2020-11", ymd("2020-12-06"), ymd("2020-12-24")
)
derive_vars_dtm(
adae,
dtc = AEENDTC,
new_vars_prefix = "AEN",
highest_imputation = "M",
date_imputation = "last",
time_imputation = "last",
max_dates = exprs(DTHDT, DCUTDT)
)
#> # A tibble: 2 x 6
#> AEENDTC DTHDT DCUTDT AENDTM AENDTF AENTMF
#> <chr> <date> <date> <dttm> <chr> <chr>
#> 1 2020-12 2020-12-06 2020-12-24 2020-12-06 23:59:59 D H
#> 2 2020-11 2020-12-06 2020-12-24 2020-11-30 23:59:59 D H
# Seconds has been removed from the input dataset. Function now uses
# ignore_seconds_flag to remove the 'S' from the --TMF variable.
mhdt <- tribble(
~MHSTDTC,
"2019-07-18T15:25",
"2019-07-18T15:25",
"2019-07-18",
"2019-02",
"2019",
"2019---07",
""
)
derive_vars_dtm(
mhdt,
new_vars_prefix = "AST",
dtc = MHSTDTC,
highest_imputation = "M",
ignore_seconds_flag = TRUE
)
#> # A tibble: 7 x 4
#> MHSTDTC ASTDTM ASTDTF ASTTMF
#> <chr> <dttm> <chr> <chr>
#> 1 "2019-07-18T15:25" 2019-07-18 15:25:00 NA NA
#> 2 "2019-07-18T15:25" 2019-07-18 15:25:00 NA NA
#> 3 "2019-07-18" 2019-07-18 00:00:00 NA H
#> 4 "2019-02" 2019-02-01 00:00:00 D H
#> 5 "2019" 2019-01-01 00:00:00 M H
#> 6 "2019---07" 2019-01-01 00:00:00 M H
#> 7 "" NA NA NA
# A user imputing dates as middle month/day, i.e. date_imputation = "MID" can
# use preserve argument to "preserve" partial dates. For example, "2019---07",
# will be displayed as "2019-06-07" rather than 2019-06-15 with preserve = TRUE
derive_vars_dtm(
mhdt,
new_vars_prefix = "AST",
dtc = MHSTDTC,
highest_imputation = "M",
date_imputation = "mid",
preserve = TRUE
)
#> # A tibble: 7 x 4
#> MHSTDTC ASTDTM ASTDTF ASTTMF
#> <chr> <dttm> <chr> <chr>
#> 1 "2019-07-18T15:25" 2019-07-18 15:25:00 NA S
#> 2 "2019-07-18T15:25" 2019-07-18 15:25:00 NA S
#> 3 "2019-07-18" 2019-07-18 00:00:00 NA H
#> 4 "2019-02" 2019-02-15 00:00:00 D H
#> 5 "2019" 2019-06-30 00:00:00 M H
#> 6 "2019---07" 2019-06-07 00:00:00 M H
#> 7 "" NA NA NA