expressions: tree-general index deduction (Phase E) — inner-node general products#563
Open
evaleev wants to merge 3 commits into
Open
Conversation
…t target (Phase E) An index shared by the two children of a product is fused iff the node's target carries it, contracted otherwise; an index neither the sibling nor the target carries is consumed within the child subtree and not demanded of it. available_indices() (per-subtree leaf-annotation union, valid before init) supplies the up-pass; each child dictates the ORDER of its demand via preferred_layout() (canonical (fused, left-free, right-free) for products, pass-through elsewhere). Expressions consumed without a target (reductions) retain the bottom-up contraction convention.
…reaming re-permute A target that differs from the canonical (fused..., left-free..., right-free...) result layout cannot be folded into the batched tile op (BatchedContractReduce must be perm-free); evaluate canonically (Summa over a slab-replicated pmap) and re-permute to the target with a streaming UnaryEvalImpl. Honors the implicit-permute contract: when the consumer fuses the permutation into its own operation (transposed GEMM), only the tile ordinals/trange are remapped and contents stay canonical. Replaces the interleaved-target gate, enabling general products at inner expression-tree nodes and non-canonical root targets.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #562. Completes the general-product story for arbitrary expression trees: general products (fused + contracted + free indices) no longer need to sit at the root of their own assignment.
What
MultEngine::init_indices(target)): an index shared by a product's two children is fused iff the node's target carries it, contracted at the node otherwise; an index neither the sibling nor the target carries is consumed entirely within the child subtree and is dropped from its demand (pure intersection). The up-pass is the newavailable_indices()(per-subtree leaf-annotation union, valid before init); each child dictates the ORDER of its demand viapreferred_layout()(canonical (fused, left-free, right-free) for products). Expressions consumed without a target (reductions) retain the bottom-up contraction convention and remain unsupported for inner general products.UnaryEvalImplwrapper re-permutes to the consumer's layout. Honors TA's implicit-permute contract: when the consumer fuses the permutation into its transposed GEMM (implicit_permute_outer), only tile ordinals/trange are remapped and contents stay canonical. This also lifts the former root-level interleaved-target limitation.With this, the THC reconstruction evaluates in ONE expression:
(left-deep tree; r1 is fused where the two x factors meet, contracted where z joins, and dropped from demands above), verified against the explicit-intermediates staging.
Tests
expression_general_product_inner_node_gatedthrow test is superseded (the expression now evaluates).assign_subblock_block_base1failures), tot_contraction/permutations — green.Out of scope (follow-ups)
ScalMultEngine(scaled products) keeps depth-1 semantics for inner general products.