Skip to content

Fix TMEM lane address for debug mode across all SM100 kernels#341

Open
yejunjin wants to merge 1 commit into
deepseek-ai:mainfrom
yejunjin:fix-tmem-lane-check
Open

Fix TMEM lane address for debug mode across all SM100 kernels#341
yejunjin wants to merge 1 commit into
deepseek-ai:mainfrom
yejunjin:fix-tmem-lane-check

Conversation

@yejunjin

Copy link
Copy Markdown

Set explicit TMEM lane bits (22:16) under __CUDACC_DEBUG__ to satisfy the hardware's strict Warp Tensor Memory Access Check enabled by -G. In release mode the hardware auto-routes lanes, so the fix is guarded to avoid unnecessary overhead.

Set explicit TMEM lane bits (22:16) under `__CUDACC_DEBUG__` to satisfy
the hardware's strict Warp Tensor Memory Access Check enabled by `-G`.
In release mode the hardware auto-routes lanes, so the fix is guarded
to avoid unnecessary overhead.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant