Each auto-generated ARD program (one generated for each output) follows a logical structure linked to the ARS model. Each script contains code for all the analyses related to the output, and follows the same code pattern for each analysis (except the first analysis, which handles the “big N” calculation by convention). An analysis-level ARD is generated for each analysis, and at the end of the program, all these analysis-level ARDs are appended to create one output-level ARD. Keep in mind that each of these code sections are auto-populated with ARS metadata. This can be visualized as follows:
# Section 1: Program header
# Section 2: Load libraries
# Section 3: Load ADaM datasets
# Section 4a (first Analysis): Code to calculate results as an ARD
# Section 4b (subsequent Analyses): Code to calculate results as an ARD
# Section 5: Append Analysis-level ARDsSection 3 (“Load ADaM datasets”) reads each ADaM dataset referenced
by the output. siera supports two on-disk formats and chooses the reader
from each file’s extension: CSV (.csv) files are read with
readr::read_csv(), and SAS transport (.xpt)
files with haven::read_xpt(). This mirrors how the ARS
input format is inferred from .json vs .xlsx,
so no extra argument is needed - just point readARS() at a
folder of .csv or .xpt ADaMs. Reading
.xpt datasets requires the haven package to be
installed. The file lookup is case-insensitive, so the lower-case file
names typical of regulatory submissions (e.g. adsl.xpt) are
matched to the upper-case dataset names in the ARS metadata
(ADSL). When both a .csv and a
.xpt exist for the same dataset, the .csv is
used.
Each analysis related to the output follows a logical structure based on the ARS model to create an analysis-level ARD. This structure is as follows:
This step applies the Analysis Set assigned to the output (e.g. Safety Population) to the ADaM dataset(s). In the case where the “big N” count is based on another dataset (like ADSL) than the main ADaM (e.g. ADAE), two separate datasets are created for downstream use in subsequent analyses. Example:
overlap <- intersect(names(ADSL), names(ADAE))
overlapfin <- setdiff(overlap, "USUBJID")
df_pop <- dplyr::filter(
ADSL,
SAFFL == "Y"
) |>
merge(ADAE |> dplyr::select(-dplyr::all_of(overlapfin)),
by = "USUBJID",
all = FALSE
)
df_poptot <- dplyr::filter(
ADSL,
SAFFL == "Y"
)Note: this is only done once for the first Analysis, and assigned by subsequent analyses, since the dataset(s) remain the same for the remainder of the program’s analyses.
Based on the resulting dataset from step 1, further data subsetting is applied which is relevant to the current analysis (e.g. filtering for serious, treatment-related Adverse Events). If no data subsetting is required for the analysis, a simple assignment of the previous dataset is done with no ‘filter’ statement. This step has a convention of starting the dataframe name with “df2”, followed by the AnalysisId.
This step takes the subsetted dataset, and applies the required
AnalysisMethod (e.g. counting subjects by treatment and a group, like
RACE). As explained in the vignette for using
cards and cardx, functions from these
packages are applied to handle the statistical operations for the
analysis. Typically, there would be some pre-work done on the dataset
before passing it to a cards or cardx
function. When the function is applied, the result is an analysis-level
ARD. At the end of this step, record-level metadata from the ARS model
is also merged to the ARD, to ensure the ability to trace each result
back to ARS metadata. See example below:
# intermediate step: Prepare Denominator Dataset for `cards` function
denom_dataset <- df2_An01_05_SAF_Summ_ByTrt |>
dplyr::select(TRT01A)
# intermediate step: Prepare input dataset for `cards` function
in_data <- df2_An03_05_Race_Summ_ByTrt |>
dplyr::distinct(TRT01A, RACE, USUBJID) |>
dplyr::mutate(dummy = "dummyvar")
# calculate subject counts and % (based on big N) grouped by treatment and race
df3_An03_05_Race_Summ_ByTrt <- cards::ard_tabulate(
data = in_data,
by = c("TRT01A", "RACE"),
variables = "dummy",
denominator = denom_dataset
)
# select relevant statistics as defined by the Method, and assign operation Ids
df3_An03_05_Race_Summ_ByTrt <- df3_An03_05_Race_Summ_ByTrt |>
dplyr::filter(stat_name %in% c("n", "p")) |>
dplyr::mutate(operationid = dplyr::case_when(
stat_name == "n" ~ "Mth01_1_n",
stat_name == "p" ~ "Mth01_2_pct"
))
# stamp CDISC ARD traceability metadata so each result traces back to the ARS definition
df3_An03_05_Race_Summ_ByTrt <- df3_An03_05_Race_Summ_ByTrt |>
dplyr::mutate(
AnalysisId = "An03_05_Race_Summ_ByTrt",
MethodId = "Mth01",
OutputId = "Out14-1-1",
# the grouping this column belongs to (here: the treatment-arm grouping)
group1_groupingId = "AnlsGrouping_01_Trt",
# for pre-defined groups, map each observed level to its ARS group id
group1_groupId = dplyr::case_when(
as.character(group1_level) == "Placebo" ~ "AnlsGrouping_01_Trt_1",
as.character(group1_level) == "Xanomeline Low Dose" ~ "AnlsGrouping_01_Trt_2",
as.character(group1_level) == "Xanomeline High Dose" ~ "AnlsGrouping_01_Trt_3"
)
)This example uses the current cards function name
ard_tabulate() (formerly ard_categorical());
see the using cards and
cardx vignette for the full list of renames.
The final mutate() is where siera stamps the
CDISC ARD traceability columns onto every row. For each
grouping applied to the analysis you will see a
group[n]_groupingId (which grouping the column belongs to)
plus one of:
group[n]_groupId - for pre-defined
groups (groups listed in the metadata, like treatment arms),
mapped from group[n]_level via case_when(), as
above; orgroup[n]_groupValue - for data-driven
groupings (dataDriven: true, where categories such
as cause of death are discovered from the ADaM data at run time),
capturing the discovered value directly.See the Concepts and conventions vignette for the full picture of these traceability columns.
The above process repeats for each Analysis, although the code for each step would of course vary (as defined in the specific ARS metadata for each Analysis). Once each Analysis ARD has been created, these ARDs are all appened to create output-level ARD. See example below:
# combine analyses to create ARD ----
ARD <- dplyr::bind_rows(
df3_An01_05_SAF_Summ_ByTrt,
df3_An03_01_Age_Summ_ByTrt,
df3_An03_01_Age_Comp_ByTrt,
df3_An03_02_AgeGrp_Summ_ByTrt,
df3_An03_02_AgeGrp_Comp_ByTrt,
df3_An03_03_Sex_Summ_ByTrt,
df3_An03_03_Sex_Comp_ByTrt,
df3_An03_04_Ethnic_Summ_ByTrt,
df3_An03_04_Ethnic_Comp_ByTrt,
df3_An03_05_Race_Summ_ByTrt,
df3_An03_05_Race_Comp_ByTrt,
df3_An03_06_Height_Summ_ByTrt,
df3_An03_06_Height_Comp_ByTrt
)The same pattern scales without special handling on your part:
group[n]_*
set of columns to the ARD.Examples of such an ARD script has been shipped with this package. Below are such examples, for
Access these with the below functions:
# see location of script:
ARD_script_example("ARD_Out14-1-1.R")
ARD_script_example("ARD_Out14-3-1-1.R")# open script to inspect:
file.edit(ARD_script_example("ARD_Out14-1-1.R"))
file.edit(ARD_script_example("ARD_Out14-3-1-1.R"))# run script locally:
source(ARD_script_example("ARD_Out14-1-1.R"))
source(ARD_script_example("ARD_Out14-3-1-1.R"))This ARD can be used in various ways downstream. Read more about this in the vignette on utilising ARDs.