Time Components

Purpose

DSAMbayes provides managed time-component generation through the effects.holidays config section. When enabled, the runner deterministically generates holiday feature columns from a calendar file and appends them to the compiled model formula. This page defines the configuration contract, generation logic, naming conventions, and audit properties.

Overview

Time components in DSAMbayes cover:

  • Holidays — deterministic weekly indicator features derived from an external calendar file.
  • Trend and seasonality — specified directly in the model formula (e.g. t_scaled, sin52_1, cos52_1). These are not generated by the time-components system; they are user-supplied columns in the data.

The managed-effects system is responsible only for holiday feature generation.

YAML configuration

effects:
  holidays:
    enabled: true
    path: ../data/holidays.csv
    date_col: null
    label_col: holiday
    country: gb
    country_col: country
    date_format: null
    week_start: monday
    timezone: UTC
    prefix: holiday_
    window_before: 0
    window_after: 0
    aggregation_rule: count
    overlap_policy: count_all
    overwrite_existing: false

Key definitions

Key Default Description
holidays.enabled false Toggle for holiday feature generation
holidays.path null Path to the holiday calendar CSV/RDS (resolved relative to the config file)
holidays.date_col null Date column in the calendar; auto-detected from date, ds, or event_date
holidays.label_col holiday Column containing holiday event labels
holidays.country null Optional single-country filter
holidays.country_col country Calendar column used for country filtering
holidays.date_format null Date parse format; null assumes ISO 8601
holidays.week_start monday Day-of-week anchor for weekly aggregation
holidays.timezone UTC Timezone used when parsing POSIX date-time inputs
holidays.prefix holiday_ Prefix prepended to generated feature column names
holidays.window_before 0 Days before each event date to include in the holiday window
holidays.window_after 0 Days after each event date to include in the holiday window
holidays.aggregation_rule count Weekly aggregation: count sums event-days per week; any produces a binary indicator
holidays.overlap_policy count_all Overlap handling: count_all counts every event-day; dedupe_label_date deduplicates per label and date
holidays.overwrite_existing false Whether existing columns with matching names are overwritten

Calendar file contract

The holiday calendar is a CSV (or data frame) with at minimum:

Column Required Content
Date column Yes Daily event dates (one row per event occurrence)
Label column Yes Human-readable event name (e.g. Christmas, Black Friday)

Date column detection

If date_col is null, the system tries column names in order: date, ds, event_date. If none is found, validation aborts.

Label normalisation

Holiday labels are normalised to lowercase, alphanumeric-plus-underscore form via normalise_holiday_label(). For example:

  • Black Fridayblack_friday
  • New Year's Daynew_year_s_day
  • Empty labels → unnamed

The generated feature column name is {prefix}{normalised_label}, e.g. holiday_black_friday.

Generation pipeline

The runner calls build_weekly_holiday_features() with the following steps:

  1. Parse and validate the calendar. validate_holiday_calendar() checks column presence, date parsing, and label completeness.

  2. Expand holiday windows. expand_holiday_windows() replicates each event row across the [event_date - window_before, event_date + window_after] range.

  3. Align to weekly index. Each expanded event-day is mapped to its containing week using week_floor_date() with the configured week_start.

  4. Aggregate per week. Events are counted per week per feature. Under aggregation_rule: any, counts are collapsed to binary (0/1). Under overlap_policy: dedupe_label_date, duplicate label-date pairs within a week are removed before counting.

  5. Join to model data. The generated feature matrix is left-joined to the model data by the date column. Weeks with no events receive zero.

  6. Append to formula. Generated feature columns are appended as additive terms to the compiled population formula.

Weekly anchoring

All weekly alignment uses week_floor_date(), which computes the most recent occurrence of week_start on or before each date. The model data’s date column must contain week-start-aligned dates; normalise_weekly_index() validates this and aborts if dates are not aligned.

Supported week-start values

monday, tuesday, wednesday, thursday, friday, saturday, sunday.

Timezone handling

  • Calendar dates are parsed using the configured timezone (default UTC).
  • If the calendar contains POSIXt values, they are coerced to Date in the configured timezone.
  • Character dates are parsed as ISO 8601 by default, or using date_format if specified.

Generated-term audit contract

Generated holiday terms are tracked for downstream diagnostics and reporting:

  • The list of generated term names is stored in model$.runner_time_components$generated_terms.
  • The identifiability gate in R/diagnostics_report.R uses this list to auto-detect baseline terms (via detect_baseline_terms()), so generated holiday terms are included in baseline-media correlation checks without requiring explicit configuration.

Feature naming collision

If two different holiday labels normalise to the same feature name, build_weekly_holiday_features() aborts with a collision error. Ensure calendar labels are distinct after normalisation.

Interaction with existing data columns

  • If overwrite_existing: false (default), the runner aborts if any generated column name already exists in the data.
  • If overwrite_existing: true, existing columns with matching names are replaced by the generated features.

Practical guidance

  • Start with aggregation_rule: count to capture multi-day holiday effects (e.g. a holiday spanning two days in one week produces a count of 2).
  • Use window_before and window_after for events with known anticipation or lingering effects (e.g. window_before: 7 for pre-Christmas shopping).
  • Use aggregation_rule: any when you want binary holiday indicators regardless of how many event-days fall in a week.
  • Check generated terms in the resolved config (config.resolved.yaml) and posterior summary to confirm which holidays entered the model.

Cross-references