This vignette steps back from the “how” (covered in the other vignettes) to explain the “why” behind siera: what an Analysis Results Dataset (ARD) is good for, how siera fits into the workflow, and the conventions you will see in the generated scripts and their output.
Traditionally, analysis results have lived inside static outputs - the numbers in an RTF or PDF table. An Analysis Results Dataset (ARD) instead stores those same results as machine-readable data, one row per result, with metadata describing exactly what each number is. Once results are data, several things become a lot easier:
tfrmt
and gtsummary
format ARDs straight into submission-ready tables.siera’s job is to get you to an ARD without writing the analysis code by hand: you supply ARS metadata, and siera writes the R that produces the ARD.
The flow is always the same:
readARS(), which writes
one R script per Output defined in the metadata..csv) or SAS
transport (.xpt) files - and the result is an
ARD - one row per result, ready for downstream use.
siera reads each ADaM dataset according to its file extension
(.csv with readr::read_csv(),
.xpt with haven::read_xpt()), so no extra
argument is needed.The statistical computation itself is performed by the cards
and cardx
packages, whose functions siera writes into the generated
scripts (see the vignette on using
cards and cardx).
ARS metadata describes a whole reporting event, but siera only needs seven sections to generate code. It is worth knowing what each one contributes:
| ARS section | What siera does with it |
|---|---|
| mainListOfContents | Links each Output to its analyses, and sets the row order and indentation of the table stub. |
| otherListsOfContents | Supplies Output-level metadata (the list of planned outputs). |
| analysisSets | Defines the population filter for the Output (e.g. Safety
Population, SAFFL == "Y"). |
| dataSubsets | Adds row-level filters for individual analyses (e.g. serious, treatment-emergent AEs). |
| analysisGroupings | Defines the columns/subgroups results are split by (e.g. treatment arm), including data-driven groupings discovered at run time. |
| analyses | Ties everything together for one calculation: which method, population, subset and groupings apply. |
| methods | Describes the operations to perform, and carries the dynamic R code template siera fills in (inline, or referenced from an external method library - see Using cards). |
Each generated script is assembled from these pieces, and every result it produces carries identifiers back to them (see “Reading an ARD row” below).
A method’s code template need not be written inline: siera
can also resolve it from an external reference document
(the ARS codeTemplate.documentRef mechanism), so an ARS
file can point at a shared, tested method library by id
rather than copy-pasting code. This builds on
referenceDocuments, which is otherwise outside the seven
sections above. See the Using cards and cardx article for how
to wire it up.
ARS metadata officially travels as JSON, but siera also
accepts an Excel (XLSX) representation of the same information.
The two are semantically equivalent -
readARS() produces the same generated scripts either way,
so you can choose whichever format fits your tooling. The examples
shipped with the package include both (see
ARS_example()).
A core promise of the ARD is traceability: every result can be traced back to the metadata that defines it. To make that possible, each row of a siera-generated ARD carries identifier columns alongside the statistic itself:
AnalysisId - which analysis produced
the row.operationid - which operation within
the method (e.g. the n count vs. the %).group[n]_* columns:
group[n]_groupingId - the grouping the
column belongs to (e.g. the treatment-arm grouping).group[n]_groupId - for
pre-defined groups (groups listed explicitly in the
metadata), the identifier of the specific group.group[n]_groupValue - for
data-driven groupings (dataDriven: true,
where the categories are discovered from the ADaM data at run time,
e.g. cause of death or AE term), the actual value found in the
data.The distinction matters: a treatment-arm grouping is usually
pre-defined, so its rows carry group1_groupId; a grouping
such as “cause of death” is typically data-driven, so its rows carry
group1_groupValue with whatever categories appeared in the
data. A single ARD can contain both. The ARD program structure vignette
shows where these columns are stamped on in the generated code.