ARD program structure

library(siera)

My ARD program has been auto-generated: What can I expect?

Each auto-generated ARD program (one generated for each output) follows a logical structure linked to the ARS model. Each script contains code for all the analyses related to the output, and follows the same code pattern for each analysis (except the first analysis, which handles the “big N” calculation by convention). An analysis-level ARD is generated for each analysis, and at the end of the program, all these analysis-level ARDs are appended to create one output-level ARD. Keep in mind that each of these code sections are auto-populated with ARS metadata. This can be visualized as follows:

# Section 1: Program header

# Section 2: Load libraries

# Section 3: Load ADaM datasets

# Section 4a (first Analysis): Code to calculate results as an ARD

# Section 4b (subsequent Analyses): Code to calculate results as an ARD

# Section 5: Append Analysis-level ARDs

Section 3 (“Load ADaM datasets”) reads each ADaM dataset referenced by the output. siera supports two on-disk formats and chooses the reader from each file’s extension: CSV (.csv) files are read with readr::read_csv(), and SAS transport (.xpt) files with haven::read_xpt(). This mirrors how the ARS input format is inferred from .json vs .xlsx, so no extra argument is needed - just point readARS() at a folder of .csv or .xpt ADaMs. Reading .xpt datasets requires the haven package to be installed. The file lookup is case-insensitive, so the lower-case file names typical of regulatory submissions (e.g. adsl.xpt) are matched to the upper-case dataset names in the ARS metadata (ADSL). When both a .csv and a .xpt exist for the same dataset, the .csv is used.

Analysis-level code to calculate ARDs

Each analysis related to the output follows a logical structure based on the ARS model to create an analysis-level ARD. This structure is as follows:

Step 1: Apply “Analysis Set” to ADaM(s)

This step applies the Analysis Set assigned to the output (e.g. Safety Population) to the ADaM dataset(s). In the case where the “big N” count is based on another dataset (like ADSL) than the main ADaM (e.g. ADAE), two separate datasets are created for downstream use in subsequent analyses. Example:

overlap <- intersect(names(ADSL), names(ADAE))
overlapfin <- setdiff(overlap, "USUBJID")

df_pop <- dplyr::filter(
  ADSL,
  SAFFL == "Y"
) |>
  merge(ADAE |> dplyr::select(-dplyr::all_of(overlapfin)),
    by = "USUBJID",
    all = FALSE
  )

df_poptot <- dplyr::filter(
  ADSL,
  SAFFL == "Y"
)

Note: this is only done once for the first Analysis, and assigned by subsequent analyses, since the dataset(s) remain the same for the remainder of the program’s analyses.

Step 2: Apply “Data Subset”

Based on the resulting dataset from step 1, further data subsetting is applied which is relevant to the current analysis (e.g. filtering for serious, treatment-related Adverse Events). If no data subsetting is required for the analysis, a simple assignment of the previous dataset is done with no ‘filter’ statement. This step has a convention of starting the dataframe name with “df2”, followed by the AnalysisId.

df2_An07_03_SerTEAE_Summ_ByTrt <- df_pop |>
  dplyr::filter(TRTEMFL == "Y" & AESER == "Y")

Step 3: Apply “Method”

This step takes the subsetted dataset, and applies the required AnalysisMethod (e.g. counting subjects by treatment and a group, like RACE). As explained in the vignette for using cards and cardx, functions from these packages are applied to handle the statistical operations for the analysis. Typically, there would be some pre-work done on the dataset before passing it to a cards or cardx function. When the function is applied, the result is an analysis-level ARD. At the end of this step, record-level metadata from the ARS model is also merged to the ARD, to ensure the ability to trace each result back to ARS metadata. See example below:

# intermediate step: Prepare Denominator Dataset for `cards` function
denom_dataset <- df2_An01_05_SAF_Summ_ByTrt |>
  dplyr::select(TRT01A)

# intermediate step: Prepare input dataset for `cards` function
in_data <- df2_An03_05_Race_Summ_ByTrt |>
  dplyr::distinct(TRT01A, RACE, USUBJID) |>
  dplyr::mutate(dummy = "dummyvar")

# calculate subject counts and % (based on big N) grouped by treatment and race
df3_An03_05_Race_Summ_ByTrt <- cards::ard_tabulate(
  data = in_data,
  by = c("TRT01A", "RACE"),
  variables = "dummy",
  denominator = denom_dataset
)

# select relevant statistics as defined by the Method, and assign operation Ids
df3_An03_05_Race_Summ_ByTrt <- df3_An03_05_Race_Summ_ByTrt |>
  dplyr::filter(stat_name %in% c("n", "p")) |>
  dplyr::mutate(operationid = dplyr::case_when(
    stat_name == "n" ~ "Mth01_1_n",
    stat_name == "p" ~ "Mth01_2_pct"
  ))

# stamp CDISC ARD traceability metadata so each result traces back to the ARS definition
df3_An03_05_Race_Summ_ByTrt <- df3_An03_05_Race_Summ_ByTrt |>
  dplyr::mutate(
    AnalysisId = "An03_05_Race_Summ_ByTrt",
    MethodId = "Mth01",
    OutputId = "Out14-1-1",
    # the grouping this column belongs to (here: the treatment-arm grouping)
    group1_groupingId = "AnlsGrouping_01_Trt",
    # for pre-defined groups, map each observed level to its ARS group id
    group1_groupId = dplyr::case_when(
      as.character(group1_level) == "Placebo"              ~ "AnlsGrouping_01_Trt_1",
      as.character(group1_level) == "Xanomeline Low Dose"  ~ "AnlsGrouping_01_Trt_2",
      as.character(group1_level) == "Xanomeline High Dose" ~ "AnlsGrouping_01_Trt_3"
    )
  )

This example uses the current cards function name ard_tabulate() (formerly ard_categorical()); see the using cards and cardx vignette for the full list of renames.

The final mutate() is where siera stamps the CDISC ARD traceability columns onto every row. For each grouping applied to the analysis you will see a group[n]_groupingId (which grouping the column belongs to) plus one of:

  • group[n]_groupId - for pre-defined groups (groups listed in the metadata, like treatment arms), mapped from group[n]_level via case_when(), as above; or
  • group[n]_groupValue - for data-driven groupings (dataDriven: true, where categories such as cause of death are discovered from the ADaM data at run time), capturing the discovered value directly.

See the Concepts and conventions vignette for the full picture of these traceability columns.

Final steps

The above process repeats for each Analysis, although the code for each step would of course vary (as defined in the specific ARS metadata for each Analysis). Once each Analysis ARD has been created, these ARDs are all appened to create output-level ARD. See example below:

# combine analyses to create ARD ----
ARD <- dplyr::bind_rows(
  df3_An01_05_SAF_Summ_ByTrt,
  df3_An03_01_Age_Summ_ByTrt,
  df3_An03_01_Age_Comp_ByTrt,
  df3_An03_02_AgeGrp_Summ_ByTrt,
  df3_An03_02_AgeGrp_Comp_ByTrt,
  df3_An03_03_Sex_Summ_ByTrt,
  df3_An03_03_Sex_Comp_ByTrt,
  df3_An03_04_Ethnic_Summ_ByTrt,
  df3_An03_04_Ethnic_Comp_ByTrt,
  df3_An03_05_Race_Summ_ByTrt,
  df3_An03_05_Race_Comp_ByTrt,
  df3_An03_06_Height_Summ_ByTrt,
  df3_An03_06_Height_Comp_ByTrt
)

Deeper tables and more groupings

The same pattern scales without special handling on your part:

  • Arbitrarily deep table hierarchies - the table stub (defined in mainListOfContents) can nest as deeply as your output requires; there is no longer a three-level limit.
  • More than three grouping factors - an analysis can be split by four or more groupings at once (e.g. Treatment x Age group x Sex x Region). Each grouping simply adds its own group[n]_* set of columns to the ARD.

Example

Examples of such an ARD script has been shipped with this package. Below are such examples, for

  • Summary of Demographics: ARD_Out14-1-1.R
  • Overall Summary of Treatment-Emergent Adverse Events: ARD_Out14-3-1-1.R

Access these with the below functions:

# see location of script:
ARD_script_example("ARD_Out14-1-1.R")
ARD_script_example("ARD_Out14-3-1-1.R")
# open script to inspect:
file.edit(ARD_script_example("ARD_Out14-1-1.R"))
file.edit(ARD_script_example("ARD_Out14-3-1-1.R"))
# run script locally:
source(ARD_script_example("ARD_Out14-1-1.R"))
source(ARD_script_example("ARD_Out14-3-1-1.R"))

This ARD can be used in various ways downstream. Read more about this in the vignette on utilising ARDs.