# Engagement Report Data Dictionary

This document provides a detailed mapping of each column in the generated engagement report to its source data, including the specific CSV file, fields used, and calculation method.

## Report Structure

The engagement report (`engagement_report_YYYYMMDD_HHMMSS.csv`) is generated in the project root directory and contains one row per participant with aggregated engagement metrics across all modules and activities. Input data files are located in the `data/` subdirectory.

---

## Column Definitions

### 1. Name
- **Description**: Participant username/identifier
- **Source File**: Derived from all source CSV files
- **Source Field**: `Name` or `User` field (varies by file)
- **Calculation**: Unique participant identifier collected across all data sources
- **Data Type**: String
- **Notes**: This is the primary key used to aggregate all metrics per participant

---

### 2. PID
- **Description**: Participant ID (assigned by study team)
- **Source File**: Not currently populated from data
- **Source Field**: N/A
- **Calculation**: Currently left blank (empty string)
- **Data Type**: String
- **Notes**: Reserved for manual entry or future data integration

---

### 3. Onboard Date
- **Description**: Date participant was onboarded to the study
- **Source File**: Not currently populated from data
- **Source Field**: N/A
- **Calculation**: Currently left blank (empty string)
- **Data Type**: String (YYYYMMDD format expected)
- **Notes**: Reserved for manual entry or future data integration

---

### 4. check-in
- **Description**: Total number of check-in submissions by participant
- **Source File**: `data/check-in-data.csv`
- **Source Field**: `Name`
- **Calculation**: Count of rows where `Name` matches the participant
- **Data Type**: Integer
- **Verification**: Sum should equal total rows in check-in-data.csv (excluding header)
- **Notes**: Each row in check-in-data.csv represents one check-in submission

---

### 5. CHECK-in Past 15 Days
- **Description**: Number of check-ins submitted within the past 15 days
- **Source File**: `data/check-in-data.csv`
- **Source Field**: `Name`, `Submitted Date`
- **Calculation**: Count of rows where:
  - `Name` matches the participant
  - `Submitted Date` is within 15 days of script execution date
  - Date format must be `YYYY-MM-DD`
- **Data Type**: Integer
- **Notes**: Currently left blank in output (empty string) to match example format, but calculated in script

---

### 6. #times in app
- **Description**: Number of app sessions (times participant opened the app)
- **Source File**: `data/time-in-app.csv`
- **Source Field**: `User`
- **Calculation**: Count of rows where `User` matches the participant
- **Data Type**: Integer
- **Verification**: Sum should equal total rows in time-in-app.csv (excluding header)
- **Notes**: Each row represents one app session with start_time and end_time

---

## Intro Module Columns (Intro Welcome, Intro Selfcare, Intro Stim Use)

### 7. Intro Welcome
- **Description**: Number of times "Welcome to START" intro module was accessed
- **Source File**: `data/intro-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Welcome to START"
- **Data Type**: Integer
- **Notes**: Multiple completions by same user are summed

### 8. Intro Selfcare
- **Description**: Number of times "Self Care" intro module was accessed
- **Source File**: `data/intro-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Self Care"
- **Data Type**: Integer

### 9. Intro Stim Use
- **Description**: Number of times "Stimulant Use & HIV" intro module was accessed
- **Source File**: `data/intro-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Stimulant Use & HIV"
- **Data Type**: Integer

**Verification for Intro Columns**: Sum of all three columns should equal the sum of all `number_of_times_reported` values in data/intro-status.csv

---

## Positive Events Module Columns (PosEvents GoodThings, PosEve Grat Journal, PosEve Meditation)

### 10. PosEvents GoodThings
- **Description**: Number of times "Good Things" activity was completed
- **Source File**: `data/positive-events-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Good Things"
- **Data Type**: Integer

### 11. PosEve Grat Journal
- **Description**: Number of times "Gratitude Journaling" activity was completed
- **Source File**: `data/positive-events-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Gratitude Journaling"
- **Data Type**: Integer

### 12. PosEve Meditation
- **Description**: Number of times "Good Things Meditation" was completed
- **Source File**: `data/positive-events-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Good Things Meditation"
- **Data Type**: Integer

**Verification for Positive Events Columns**: Sum of all three columns should equal the sum of all `number_of_times_reported` values in data/positive-events-status.csv

---

## Mindfulness Module Columns (Mindful Informal Mindfulness, Mindful Self Compassion, Mindful Meditation)

### 13. Mindful Informal Mindfulness
- **Description**: Number of times "Informal Mindfulness" practice was completed
- **Source File**: `data/mindfulness-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Informal Mindfulness"
- **Data Type**: Integer

### 14. Mindful Self Compassion
- **Description**: Number of times "Self Compassion" practice was completed
- **Source File**: `data/mindfulness-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Self Compassion"
- **Data Type**: Integer

### 15. Mindful Meditation
- **Description**: Number of times "Meditation Mindfulness" was completed
- **Source File**: `data/mindfulness-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Meditation Mindfulness"
- **Data Type**: Integer

**Verification for Mindfulness Columns**: Sum of all three columns should equal the sum of all `number_of_times_reported` values in data/mindfulness-status.csv

---

## Reappraisal Module Columns (Reappraisal Reappraisal, Reappraisal Meditation)

### 16. Reappraisal Reappraisal
- **Description**: Number of times "Reappraisal" exercise was completed
- **Source File**: `data/reappraisal-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Reappraisal"
- **Data Type**: Integer

### 17. Reappraisal Meditation
- **Description**: Number of times "Meditation Reappraisal" was completed
- **Source File**: `data/reappraisal-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Meditation Reappraisal"
- **Data Type**: Integer

**Verification for Reappraisal Columns**: Sum of both columns should equal the sum of all `number_of_times_reported` values in data/reappraisal-status.csv

---

## Values Module Columns (Values Intro, Values Values, Values Strengths, Values Goals, Values Meditation)

### 18. Values Intro
- **Description**: Number of times "Intro Values" module was accessed
- **Source File**: `data/values-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Intro Values"
- **Data Type**: Integer

### 19. Values Values
- **Description**: Number of times "Values" activity was completed
- **Source File**: `data/values-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Values"
- **Data Type**: Integer

### 20. Values Strengths
- **Description**: Number of times "Strengths" activity was completed
- **Source File**: `data/values-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Strengths"
- **Data Type**: Integer

### 21. Values Goals
- **Description**: Number of times "Goals" activity was completed
- **Source File**: `data/values-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Goals"
- **Data Type**: Integer

### 22. Values Meditation
- **Description**: Number of times "Meditation Values" was completed
- **Source File**: `data/values-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Meditation Values"
- **Data Type**: Integer

**Verification for Values Columns**: Sum of all five columns should equal the sum of all `number_of_times_reported` values in data/values-status.csv

---

## Kindness Module Columns (Kindness Kindness, Kindness Meditation)

### 23. Kindness Kindness
- **Description**: Number of times "Kindness" practice was completed
- **Source File**: `data/kindness-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Kindness"
- **Data Type**: Integer

### 24. Kindness Meditation
- **Description**: Number of times "Meditation Kindness" was completed
- **Source File**: `data/kindness-status.csv`
- **Source Field**: `Name`, `type`, `number_of_times_reported`
- **Calculation**: Sum of `number_of_times_reported` where:
  - `Name` matches the participant
  - `type` = "Meditation Kindness"
- **Data Type**: Integer

**Verification for Kindness Columns**: Sum of both columns should equal the sum of all `number_of_times_reported` values in data/kindness-status.csv

---

### 25. Videos
- **Description**: Total number of videos watched by participant
- **Source File**: `data/videos-watched.csv`
- **Source Field**: `Name`, `number_of_times_reported`
- **Calculation**: Direct value from `number_of_times_reported` where `Name` matches the participant
- **Data Type**: Integer
- **Verification**: Sum should equal the sum of all `number_of_times_reported` values in data/videos-watched.csv
- **Notes**: This is total video views across all videos; one row per user in source file

---

### 26. Reminders
- **Description**: Number of active reminders configured by participant
- **Source File**: `data/reminders.csv`
- **Source Field**: `User`, `Daily Reminder`, `Weekly Reminder`
- **Calculation**: Count of rows where:
  - `User` matches the participant
  - AND (`Daily Reminder` = "✔" OR `Weekly Reminder` = "✔")
- **Data Type**: Integer
- **Notes**: Only counts reminders that are actively enabled (marked with ✔ symbol)
- **Special Characters**: Uses Unicode checkmark character (✔) to identify active reminders

---

## Data Validation Checklist

Use this checklist to verify that the aggregated data matches the source files in the `data/` directory:

### Overall Validation
- [ ] Total unique participants in report matches count of unique `Name`/`User` values across all source files
- [ ] Each participant appears exactly once in the output report
- [ ] Participants are sorted alphabetically by `Name`

### Per-Column Validation

#### Simple Count Columns (one row = one count)
- [ ] **check-in**: Sum across all participants = total rows in data/check-in-data.csv
- [ ] **#times in app**: Sum across all participants = total rows in data/time-in-app.csv

#### Aggregated Count Columns (using number_of_times_reported)
For each of the following source files, verify that the sum of the corresponding output columns equals the sum of `number_of_times_reported` in that file:

- [ ] **data/intro-status.csv** → Sum of (Intro Welcome + Intro Selfcare + Intro Stim Use)
- [ ] **data/positive-events-status.csv** → Sum of (PosEvents GoodThings + PosEve Grat Journal + PosEve Meditation)
- [ ] **data/mindfulness-status.csv** → Sum of (Mindful Informal Mindfulness + Mindful Self Compassion + Mindful Meditation)
- [ ] **data/reappraisal-status.csv** → Sum of (Reappraisal Reappraisal + Reappraisal Meditation)
- [ ] **data/values-status.csv** → Sum of (Values Intro + Values Values + Values Strengths + Values Goals + Values Meditation)
- [ ] **data/kindness-status.csv** → Sum of (Kindness Kindness + Kindness Meditation)
- [ ] **data/videos-watched.csv** → Sum of Videos column

#### Reminder Validation
- [ ] **Reminders**: Count of rows in data/reminders.csv where Daily Reminder or Weekly Reminder = "✔"

### Sample Validation Query Examples

To manually verify data for a specific user (e.g., "david"):

```bash
# Count check-ins for user "david"
grep "^david," data/check-in-data.csv | wc -l

# Sum intro module completions for user "david"
grep "^david," data/intro-status.csv | awk -F',' '{sum += $2} END {print sum}'

# Check videos watched for user "david"
grep "^david," data/videos-watched.csv | cut -d',' -f2

# Count active reminders for user "david"
grep "^david," data/reminders.csv | grep -E "(✔)" | wc -l
```

---

## Known Limitations and Notes

1. **Case Sensitivity**: The script performs case-sensitive matching on the `Name`/`User` field. Ensure consistent capitalization across all source files.

2. **Whitespace**: Leading/trailing whitespace in `Name`/`User` fields is stripped during processing.

3. **Missing Files**: If a source CSV file is missing, all related columns will show `0` for all participants.

4. **Date Parsing**: The "CHECK-in Past 15 Days" calculation requires dates in `YYYY-MM-DD` format. Other date formats will be ignored.

5. **Empty Values**: If `number_of_times_reported` is empty or non-numeric, it is treated as `0`.

6. **Multiple Rows Per User**: For status files (intro, positive-events, mindfulness, etc.), multiple rows for the same user with the same type are summed together.

7. **Reminder Checkmarks**: The script specifically looks for the Unicode checkmark character (✔). Other symbols (✓, X, x) are not recognized as active reminders.

---

## File Version

- **Document Version**: 1.1
- **Last Updated**: February 6, 2026
- **Compatible with**: generate_engagement_report.py v1.1
- **Changes**: Updated all file paths to reflect new `data/` directory structure
