Runner

Purpose

Document CLI and YAML runner contracts for reproducible DSAMbayes runs.

Audience

  • Users operating DSAMbayes through scripts/dsambayes.R.
  • Engineers maintaining runner config and artefact contracts.

Pages

Page Topic
CLI Usage Commands, flags, exit codes, and error modes
Config Schema YAML keys, defaults, and validation rules
Output Artefacts Staged folder layout, file semantics, and precedence rules

Subsections of Runner

CLI Usage

Purpose

Define the supported command-line interface for scripts/dsambayes.R, including required flags, optional flags, and execution semantics.

Prerequisites

Before using the CLI:

  • Complete Install and Setup.
  • Run commands from repository root.
  • Ensure DSAMbayes is installed and available in R_LIBS_USER.

Entry point

Rscript scripts/dsambayes.R <command> [flags]

The script supports these commands:

  • init
  • validate
  • run
  • help (or -h / --help)

Command summary

Command Required flags Optional flags Behaviour
init --out --template, --overwrite Writes a config template file.
validate --config --run-dir Runs config and data checks only (dry_run = TRUE).
run --config --run-dir Executes the full pipeline (dry_run = FALSE) and writes run artefacts.
help none none Prints usage text and exits.

Flag reference

init

  • --out <path> (required): output path for the generated YAML file.
  • --template <name> (optional): template name. Default is blm.
    • Supported values in script: master, blm, re, cre, pooled, hierarchical.
    • hierarchical maps to the same template file as re.
  • --overwrite (optional flag): allow overwrite of an existing --out file.

validate

  • --config <path> (required): YAML config path.
  • --run-dir <path> (optional): explicit run directory path.

run

  • --config <path> (required): YAML config path.
  • --run-dir <path> (optional): explicit run directory path.

Usage examples

Show help

Rscript scripts/dsambayes.R --help

Expected outcome: usage panel is printed with command syntax and notes.

Create a new config from template

Rscript scripts/dsambayes.R init --template blm --out config/local_quickstart.yaml

Expected outcome: config/local_quickstart.yaml is created.

Validate only (dry-run behaviour)

R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R validate --config config/blm_timeseries.yaml

Expected outcome: validation completes without fitting Stan models.

Validate with explicit run directory

R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R validate \
    --config config/blm_timeseries.yaml \
    --run-dir results/quickstart_validate

Expected outcome: validation uses the provided run directory path when writing run metadata.

Execute full run

R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R run --config config/cre_geo_panel.yaml

Expected outcome: full modelling pipeline executes and artefacts are written under results/.

Execute full run with explicit run directory

R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R run \
    --config config/cre_geo_panel.yaml \
    --run-dir results/quickstart_run

Expected outcome: artefacts are written to results/quickstart_run (subject to overwrite rules in config).

Exit and error behaviour

  • Success exits with status 0.
  • CLI argument or runtime errors exit with status 2.
  • Typical hard failures include:
    • DSAMbayes not installed.
    • Missing required flags (--out or --config).
    • Unknown command.
    • Unknown argument format.

Operational notes

  • validate is the recommended pre-run gate. Use it before run whenever you change config or data.
  • run prints a run summary and suggested next-step artefacts at completion.
  • The CLI itself does not define model semantics. It delegates execution to DSAMbayes::run_from_yaml().

Config Schema

Purpose

This page documents the authored YAML contract used by:

  • scripts/dsambayes.R
  • DSAMbayes::run_from_yaml()
  • runme.R

The authored schema is schema_version: 2 only. Older formula-driven YAML files are intentionally rejected.

Processing order

The runner processes configs in this order:

  1. Parse YAML.
  2. Coerce YAML infinity tokens (.Inf, -.Inf).
  3. Apply v2 defaults.
  4. Resolve relative paths against the config file directory.
  5. Validate the authored v2 contract.
  6. Compile the authored config into the internal runner config.
  7. Apply managed holiday terms, then build the model and run.

Root sections

Key Required Purpose
schema_version yes Must be 2.
data yes Input data path, format, and date handling.
target yes Outcome column, KPI type, and response transform.
media yes Modeled media terms.
controls yes Non-media predictors, including manual trend/seasonality terms.
effects no Managed effects. In M1 this is holidays only.
model yes Model class and scaling options.
hierarchy conditional Required for model.type: re and model.type: cre.
pooling conditional Required for model.type: pooled.
priors no Default priors plus grouped or explicit overrides.
boundaries no Grouped or explicit parameter boundaries.
fit no MCMC or optimise settings.
diagnostics no Diagnostics, model selection, and time-series selection settings.
allocation no Budget optimisation settings.
outputs no Output paths and artifact toggles.
forecast no Reserved forecast toggle.

Unknown keys fail validation.

Minimal valid config

schema_version: 2

data:
  path: ../data/timeseries/demo_data_synthetic.csv
  format: csv
  date_var: date

target:
  column: revenue
  type: revenue
  transform: identity

media:
  - channel0_signal
  - channel1_signal

controls:
  - t_scaled
  - sin52_1
  - cos52_1

model:
  type: blm

Key differences from the retired schema

  • model.formula is no longer authored directly.
  • schema_version: 1 configs are rejected.
  • Trend and seasonality stay user-authored as ordinary columns under controls.
  • Managed time effects are limited to holidays under effects.holidays.
  • re and cre models use hierarchy, not cre.enabled flags.
  • pooled models use pooling, not pooling.enabled.

Section reference

schema_version

Key Type Rules
schema_version integer Must be 2.

data

Key Type Rules
data.path string Required. File must exist. Relative paths resolve from the config directory.
data.format string csv, rds, or long.
data.date_var string Required in M1.
data.date_format string or null Optional parser format for date columns.
data.na_action string omit or error.
data.long_id_col string or null Required when data.format: long.
data.long_variable_col string or null Required when data.format: long.
data.long_value_col string or null Required when data.format: long.
data.dictionary_path string or null Optional metadata CSV.
data.dictionary mapping Optional inline metadata keyed by term name.

target

Key Type Rules
target.column string Required response column.
target.type string revenue or subscriptions.
target.transform string identity or log.
target.offset_column string or null Supported only for model.type: blm in M1.

media and controls

  • media is a required list of modeled media terms.
  • controls is a required list, but it may be empty ([]).
  • A term may not appear in both lists.
  • Manual trend and seasonality terms belong in controls.

Compiled formula order is:

  1. generated holiday terms
  2. controls
  3. media
  4. generated CRE mean terms
  5. optional offset
  6. hierarchical random-effects term

effects.holidays

Managed holidays are optional and are the only managed effect in M1.

effects:
  holidays:
    enabled: true
    path: ../data/holidays.csv
    label_col: holiday
    country: gb
    country_col: country
    week_start: monday
    prefix: holiday_
Key Type Rules
effects.holidays.enabled boolean Enables holiday feature generation.
effects.holidays.path string Required when enabled. CSV or RDS.
effects.holidays.date_col string or null Optional calendar date column override.
effects.holidays.label_col string Holiday label column.
effects.holidays.country string or null Optional single-country filter.
effects.holidays.country_col string Calendar column used with country.
effects.holidays.date_format string or null Optional parser format for non-ISO dates.
effects.holidays.week_start string monday through sunday.
effects.holidays.timezone string Timezone used in parsing/alignment.
effects.holidays.prefix string Prefix for generated holiday columns.
effects.holidays.window_before integer Non-negative.
effects.holidays.window_after integer Non-negative.
effects.holidays.aggregation_rule string count or any.
effects.holidays.overlap_policy string count_all or dedupe_label_date.
effects.holidays.overwrite_existing boolean Replaces existing columns only when true.

Notes:

  • The data date column must be aligned to the configured weekly anchor.
  • Country filtering materializes a filtered calendar artifact before the compiled config is written.

model

Key Type Rules
model.name string Defaults to the config filename stem.
model.type string blm, re, cre, or pooled.
model.scale boolean Controls internal scaling before fit.
model.force_recompile boolean Forces Stan recompilation when true.

hierarchy

Required for model.type: re and model.type: cre.

Key Type Rules
hierarchy.group string Grouping column for panel models.
hierarchy.random_intercept boolean Include `(1
hierarchy.random_slopes list of strings Optional subset of authored media and controls.
hierarchy.cre_variables list of strings Required and non-empty for model.type: cre.
hierarchy.cre_prefix string Prefix for generated CRE mean terms. Default cre_mean_.

pooling

Required for model.type: pooled.

Key Type Rules
pooling.grouping_vars list of strings Required and non-empty.
pooling.map_path string Required. CSV or RDS.
pooling.map_format string csv or rds.
pooling.min_waves integer or null Optional positive integer.

priors

Key Type Rules
priors.use_defaults boolean Must remain true in M1.
priors.likelihood mapping Optional Abacus-style alias for noise_sd.
priors.overrides list Explicit parameter-level overrides.

Grouped families are available when applicable:

  • intercept
  • media_beta
  • control_beta
  • holiday_beta
  • cre_beta
  • pooling_beta
  • random_effect_sd
  • noise_sd

Each grouped family accepts either the legacy DSAMbayes style:

family: normal    # or lognormal_ms where supported
mean: 0
sd: 0.5

or the more explicit Abacus-style alias:

distribution: Normal   # or HalfNormal / LogNormalMS where supported
mu: 0
sigma: 0.5

HalfNormal compiles to a zero-centered Normal prior plus an implied lower bound of 0 for unconstrained targeted parameter(s). Parameters that are already positive by construction, such as noise_sd and hierarchical sd_*[...], do not receive an extra boundary row.

The residual-noise prior also accepts this alias:

priors:
  likelihood:
    sigma:
      distribution: HalfNormal
      sigma: 2

boundaries

Boundary families mirror the grouped prior families and may also use explicit boundaries.overrides.

Each grouped or explicit boundary row uses:

lower: -Inf
upper: Inf

fit

Key Type Rules
fit.method string mcmc or optimise. Pooled runs require mcmc.
fit.seed numeric or null Optional scalar seed.
fit.optimise.* mapping Optimisation controls.
fit.mcmc.* mapping Stan sampling controls.
fit.mcmc.parameterization.positive_priors string centered or noncentered.

diagnostics

Retains the current runner surface for:

  • model_selection
  • time_series_selection
  • identifiability
  • publish-gate controls

Important M1 rule:

  • diagnostics.time_series_selection.enabled: true is not supported for pooled runs.

allocation

Retains the current runner surface for budget optimisation, with channel targeting based on authored media terms.

outputs

outputs.root_dir and outputs.run_dir behave as before, but the metadata contract now includes:

  • config.original.yaml
  • config.resolved.yaml
  • config.compiled.yaml

forecast

Reserved toggle. No authored schema changes in M1.

Examples in this repository

  • config/blm_timeseries.yaml — weekly time-series BLM example
  • config/cre_geo_panel.yaml — weekly geo-panel CRE example

Output Artefacts

Purpose

This page defines what the YAML runner writes, where files are written, and which config flags control each artefact.

Related pages:

Run directory and layout semantics

Run directory precedence:

  1. CLI --run-dir
  2. outputs.run_dir
  3. Timestamped folder under outputs.root_dir

Layout behaviour:

  • outputs.layout: staged (default) writes files under numbered stage folders.
  • outputs.layout: flat writes all files directly under the run directory.

Stage folders used by the runner:

  • 00_run_metadata
  • 10_pre_run
  • 20_model_fit
  • 30_post_run
  • 40_diagnostics
  • 50_model_selection
  • 60_optimisation
  • 70_forecast (directory only, when forecast.enabled: true)

Command behaviour

validate

  • validate uses dry_run = TRUE.
  • If no run directory is resolved, no artefacts are written.
  • If a run directory is resolved (--run-dir or outputs.run_dir), config.original.yaml is written.
  • If a run directory is resolved (--run-dir or outputs.run_dir), config.resolved.yaml is written.
  • If a run directory is resolved (--run-dir or outputs.run_dir), config.compiled.yaml is written.
  • If a managed holiday country filter is active and a run directory is resolved, holiday_calendar.filtered.csv is materialised under 10_pre_run/.
  • If a run directory is resolved and outputs.save_session_info_txt: true, session_info.txt is written.
  • If forecast is enabled and a run directory is materialised, the 70_forecast/ directory is created.

run

  • run writes the full artefact set subject to config toggles and runtime conditions.

Artefact contract by stage

00_run_metadata

File Controlled by Written when Notes
config.original.yaml always run dir materialised Raw YAML text from the input config.
config.resolved.yaml always run dir materialised Authored config after defaults, path resolution, and v2 schema validation.
config.compiled.yaml always run dir materialised Internal compiled runner config after the friendly YAML is translated into the downstream runtime shape.
session_info.txt outputs.save_session_info_txt flag is true Includes DSAMbayes version, schema version, model/fit metadata, and sessionInfo().

10_pre_run

File Controlled by Written when Notes
transform_assumptions.txt outputs.save_transform_assumptions_txt flag is true Written even if transform sensitivity scenarios are disabled.
transform_sensitivity_summary.csv outputs.save_transform_sensitivity_summary_csv sensitivity object exists with rows Requires transforms.sensitivity.enabled: true and successful scenario execution.
transform_sensitivity_parameters.csv outputs.save_transform_sensitivity_parameters_csv sensitivity object exists with rows Parameter means/SD by scenario.
dropped_groups.csv none groups dropped by pooling.min_waves filter Written only when sparse groups are excluded.
holiday_calendar.filtered.csv none managed holidays enabled with a country filter Materialised filtered holiday calendar consumed by config.compiled.yaml.
holiday_feature_manifest.csv none managed holidays enabled and features generated Documents generated holiday terms and active-week counts.
design_matrix_manifest.csv outputs.save_design_matrix_manifest_csv flag is true and manifest non-empty Per-term design metadata.
data_dictionary.csv outputs.save_data_dictionary_csv flag is true and dictionary table non-empty Merges inline YAML metadata and optional CSV dictionary metadata.
spec_summary.csv outputs.save_spec_summary_csv flag is true and table available Single-row model/spec summary.
vif_report.csv outputs.save_vif_report_csv flag is true and predictors available VIF diagnostics for non-intercept predictors.

20_model_fit

File Controlled by Written when Notes
model.rds outputs.save_model_rds flag is true Fitted model object.
posterior.rds outputs.save_posterior_rds flag is true and MCMC fit Raw posterior object for MCMC runs only.
fit_metrics_by_group.csv implicit fitted summary is computed Written when any of save_fitted_csv, save_fit_png, save_residuals_csv, save_diagnostics_png is true.
fit_timeseries.png outputs.save_fit_png flag is true and ggplot2 installed Observed vs fitted over time, with subtitle metrics including Classical R^2 (posterior mean) and monthly date labels when date is a true Date.
fit_scatter.png outputs.save_fit_png flag is true and ggplot2 installed Observed vs fitted scatter.
posterior_forest.png none posterior draws available and ggplot2 installed Posterior coefficient forest plot; skipped for optimise/MAP runs.
prior_posterior.png none posterior draws available, model has priors, and ggplot2 installed Prior-versus-posterior comparison plot; skipped for optimise/MAP runs.

30_post_run

File Controlled by Written when Notes
observed.csv outputs.save_observed_csv flag is true Observed response on model response scale.
observed_kpi.csv outputs.save_observed_csv flag is true and response scale is log KPI-scale observed values (exp) with conversion_method = point_exp.
fitted.csv outputs.save_fitted_csv flag is true Fitted summaries on model response scale.
fitted_kpi.csv outputs.save_fitted_csv flag is true and response scale is log KPI-scale fitted summaries (exp).
posterior_summary.csv outputs.save_posterior_summary_csv flag is true and MCMC fit Posterior summaries for coefficients and scalar diagnostics.

Implementation note:

  • decomp_predictor_impact.csv, decomp_predictor_impact.png, decomp_timeseries.csv, and decomp_timeseries.png are present in stage mapping and config flags, but are not currently invoked by write_run_artifacts() in the active pipeline.

40_diagnostics

File Controlled by Written when Notes
chain_diagnostics.txt outputs.save_chain_diagnostics_txt flag is true and MCMC fit Chain diagnostics text output.
diagnostics_report.csv outputs.save_diagnostics_report_csv flag is true and diagnostics object exists One row per diagnostic check.
diagnostics_summary.txt outputs.save_diagnostics_summary_txt flag is true and diagnostics object exists Counts by status and overall status.
artifact_status.csv none artifact status rows recorded by the runner Per-artifact status log for skipped/warn/error events.
residuals.csv outputs.save_residuals_csv flag is true and fitted summary is computed Residual table on response scale.
residuals_timeseries.png outputs.save_diagnostics_png flag is true and ggplot2 installed Residuals over time.
residuals_vs_fitted.png outputs.save_diagnostics_png flag is true and ggplot2 installed Residuals vs fitted.
residuals_hist.png outputs.save_diagnostics_png flag is true and ggplot2 installed Residual histogram.
residuals_acf.png outputs.save_diagnostics_png flag is true and ggplot2 installed Residual autocorrelation plot.
residual_diagnostics.csv none diagnostics residual checks available Ljung-Box / ACF check outputs.
residuals_latent.csv none diagnostics latent residuals available Latent residual series from diagnostics object.
residuals_latent_acf.png outputs.save_diagnostics_png latent residuals available and ggplot2 installed Latent residual ACF plot.
ppc.png none posterior predictive plot available and ggplot2 installed Posterior predictive check plot; skipped for optimise/MAP runs.
boundary_hits.csv none boundary-hit table available Boundary-hit rates per parameter.
boundary_hits.png outputs.save_diagnostics_png boundary-hit table available and ggplot2 installed Boundary-hit visualisation.
within_variation.csv none within-variation table available Within-variation diagnostics for hierarchical terms.
within_variation.png outputs.save_diagnostics_png within-variation table available and ggplot2 installed Within-variation visualisation.
predictor_risk_register.csv outputs.save_predictor_risk_register_csv flag is true and table non-empty Ranked risk register combining VIF, within-variation, boundary hits, and slow-moving flags.

50_model_selection

File Controlled by Written when Notes
loo_summary.csv outputs.save_model_selection_csv flag is true, diagnostics.model_selection.enabled: true, and diagnostics report exists May be full PSIS-LOO summary or a stub row with skip reason.
loo_pointwise.csv outputs.save_model_selection_pointwise_csv flag is true, diagnostics report exists, and pointwise PSIS-LOO is available Optional pointwise LOO diagnostics.
loo_pit.png none posterior predictive draws available and ggplot2 installed LOO-PIT calibration plot.
pareto_k.png outputs.save_diagnostics_png pointwise PSIS-LOO available and ggplot2 installed Pareto-k diagnostic plot.
elpd_influence.png outputs.save_diagnostics_png pointwise PSIS-LOO available and ggplot2 installed Pointwise ELPD influence plot.
tscv_folds.csv diagnostics.time_series_selection.enabled time-series selection enabled and folds produced Fold windows plus fold-level runtime/status metadata.
tscv_summary.csv diagnostics.time_series_selection.enabled time-series selection enabled Written for success, skipped, or error outcomes.
tscv_pointwise.csv diagnostics.time_series_selection.enabled + diagnostics.time_series_selection.save_pointwise enabled and pointwise rows available Optional pointwise holdout log predictive densities.
tscv_elpd_by_fold.png diagnostics.time_series_selection.save_png + outputs.save_diagnostics_png enabled and ggplot2 installed ELPD-by-fold chart.

60_optimisation

File Controlled by Written when Notes
optimisation_runs.csv none fit.method: optimise All optimisation starts, including objective value and return code when available.
optimisation_best.csv none fit.method: optimise The selected MAP optimum: highest optimiser objective when available, otherwise lowest RMSE.
budget_summary.csv outputs.save_allocator_csv allocation enabled and flag is true Scenario-level optimisation summary.
budget_allocation.csv outputs.save_allocator_csv allocation enabled and flag is true Recommended allocation by channel.
budget_diagnostics.csv outputs.save_allocator_csv allocation enabled and flag is true Candidate and objective diagnostics.
budget_response_curves.csv outputs.save_allocator_csv allocation enabled and flag is true Response-curve payload.
budget_response_points.csv outputs.save_allocator_csv allocation enabled and flag is true Key plotted points for response curves.
budget_roi_cpa.csv outputs.save_allocator_csv allocation enabled and flag is true ROI/CPA panel payload (depends on KPI type).
budget_impact.csv outputs.save_allocator_csv allocation enabled and flag is true Allocation impact payload.
budget_response_curves.png outputs.save_allocator_png allocation enabled, flag is true, and ggplot2 installed Response curves plot.
budget_roi_cpa.png outputs.save_allocator_png allocation enabled, flag is true, and ggplot2 installed ROI/CPA panel plot.
budget_impact.png outputs.save_allocator_png allocation enabled, flag is true, and ggplot2 installed Allocation impact plot.
budget_optimisation.json outputs.save_allocator_json allocation enabled, flag is true, and jsonlite installed Combined JSON payload (summary, allocation, diagnostics, plot_data).

70_forecast

Item Controlled by Written when Notes
70_forecast/ directory forecast.enabled flag is true Directory is created, but no forecast files are currently emitted by runner writers.

Response scale semantics (*_kpi.csv vs base files)

Base files (observed.csv, fitted.csv) are always on the model response scale:

  • identity response: KPI units
  • log response: log(KPI)

KPI-scale files are written only for log-response models:

  • observed_kpi.csv
  • fitted_kpi.csv

Conversion metadata:

  • observed_kpi.csv uses conversion_method = point_exp.
  • fitted_kpi.csv uses conversion_method = lognormal_mean by default for log-response fitted values.
  • fitted_kpi.csv uses conversion_method = point_exp only when the median back-transform is explicitly requested.

Diagnostics status semantics

diagnostics_report.csv status values:

  • pass: check passed configured thresholds
  • warn: check breached warning threshold
  • fail: check breached fail threshold
  • skipped: check not applicable or intentionally skipped

Overall status logic:

  • fail if any check is fail
  • warn if no fails and at least one warn
  • pass otherwise

diagnostics_summary.txt reports:

  • overall_status
  • counts for pass, warn, fail, skipped

Quick verification commands

List produced files for a run:

latest_run="$(ls -td results/* | head -n 1)"
find "$latest_run" -type f | sort

Inspect key diagnostics files:

latest_run="$(ls -td results/* | head -n 1)"
head -n 20 "$latest_run/40_diagnostics/diagnostics_report.csv"
head -n 20 "$latest_run/40_diagnostics/diagnostics_summary.txt"