Add parse_array primitive for rule-level array deserialization

## Summary
Add a new `parse_array` primitive that converts serialized array values (primarily CSV strings) into typed Python lists before downstream operations like `reduce`.

## Motivation
`Reduce` currently expects a list/tuple input. In file-based workflows (especially CSV), array-like values arrive as strings (e.g., `"[8,8,8,8,6]"`).

Today, users need custom pre-processing before harmonization. This is workable but not composable in rules and makes UI-driven pipelines harder to express. A dedicated parsing primitive keeps concerns explicit and reusable.

## Problem Statement
- Current `reduce` behavior is intentionally strict (`list`/`tuple` only).
- CSV and similar formats encode arrays as strings.
- Users need a rule-level way to parse arrays without changing `reduce` semantics.

## Proposal
Introduce a new primitive operation:

- `operation`: `parse_array`
- responsibility: parse scalar/list-like input into `list[Any]`
- intended chaining: `parse_array` -> `reduce`

Example chain:

```json
{
  "source": "week_hours",
  "target": "total_hours",
  "operations": [
    {
      "operation": "parse_array",
      "format": "json",
      "item_type": "integer",
      "strict": true
    },
    {
      "operation": "reduce",
      "reduction": "sum"
    }
  ]
}
```

## Design Goals
1. Keep primitive responsibilities explicit (no implicit parsing inside `reduce`).
2. Make behavior deterministic and easy to reason about.
3. Provide strict-by-default error handling for data quality.
4. Support safe incremental extension for future parsing formats.

## Non-Goals (V1)
- General object parsing.
- Deep schema validation for nested arrays.
- Silent “best effort” coercion by default.

## V1 API / Serialization
Suggested serialized config:

```json
{
  "operation": "parse_array",
  "format": "json",
  "item_type": "auto",
  "strict": true
}
```

Optional keys:
- `default`: value returned when `strict=false` and parse fails (default: `null`)
- `allow_singleton`: when true, scalar input can be wrapped as one-item list after coercion

Field semantics:
- `format`: parsing strategy (V1: `json` only)
- `item_type`: `auto | string | integer | float | boolean`
- `strict`: fail-fast behavior

## Input/Output Contract
Input handling:
- `list`/`tuple`: return list (idempotent)
- `str`: parse according to `format`
- other scalar types:
  - strict mode: error
  - non-strict mode: return `default`

Output:
- Always `list` (or `default` in non-strict failure path)

## Error Handling
- `strict=true`: raise clear `ValueError` including input value and selected format.
- `strict=false`: return `default` and emit warning/log message.

## Why Not Put This in `Reduce`
Embedding string parsing in `reduce` would:
- conflate reduction and deserialization concerns,
- increase ambiguity (string syntax variants, malformed inputs),
- create inconsistent behavior relative to other primitives.

A dedicated parser primitive preserves composability and predictability.

## Extension Path
After V1 stabilizes, consider:
1. `format: delimiter` for values like `"8|8|8|8|6"` with configurable `delimiter`.
2. Optional `strip_items` behavior for string tokens.
3. Optional `python_literal` mode only if justified (higher complexity/risk).
4. Potential generic parsing family later (`parse_value`, `parse_object`) if needed.

## Suggested Implementation Areas
- New file: `src/harmonization_framework/primitives/parse_array.py`
- Register operation:
  - `src/harmonization_framework/primitives/vocabulary.py`
  - `src/harmonization_framework/primitives/__init__.py`
  - `src/harmonization_framework/harmonization_rule.py` (`from_serialization` dispatch)

## Test Plan
Add tests for:
1. JSON list parsing success.
2. Idempotent pass-through for list/tuple input.
3. Item coercion success/failure for each `item_type`.
4. Strict failure behavior.
5. Non-strict fallback behavior (`default`).
6. Invalid JSON / non-array JSON payloads.

## Open Questions
1. Should `parse_array` accept tuples in output, or always normalize to list?
2. For `item_type=boolean`, what token vocabulary should be accepted (`true/false`, `1/0`, `yes/no`)?
3. Should warning output route through existing logging infrastructure rather than `print`?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add parse_array primitive for rule-level array deserialization #97

Summary

Motivation

Problem Statement

Proposal

Design Goals

Non-Goals (V1)

V1 API / Serialization

Input/Output Contract

Error Handling

Why Not Put This in `Reduce`

Extension Path

Suggested Implementation Areas

Test Plan

Open Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Add parse_array primitive for rule-level array deserialization #97

Description

Summary

Motivation

Problem Statement

Proposal

Design Goals

Non-Goals (V1)

V1 API / Serialization

Input/Output Contract

Error Handling

Why Not Put This in Reduce

Extension Path

Suggested Implementation Areas

Test Plan

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Why Not Put This in `Reduce`