[azure-ai-textanalytics] recognize_pii_entities: per-category recognition criteria (e.g., BRCPFNumber checksum) are undocumented

> **Note on test data used below:** Neither CPF value in this report is a real person's identifier.
> - `123.456.789-00` is syntactically a CPF but is mathematically invalid (it fails the public Brazilian check-digit algorithm — all digits sequential, check digits zero). It is a non-issuable placeholder commonly used as obvious dummy data.
> - `998.214.865-68` is a valid-formatted CPF but is not a real person's number — it is the exact value Microsoft itself ships in this very SDK's official sample [`sample_recognize_pii_entities.py`](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/textanalytics/azure-ai-textanalytics/samples/sample_recognize_pii_entities.py) (`"...Brazilian CPF number 998.214.865-68"`). I reuse it here only to keep the repro identical to the official sample.

- **Package Name:** azure-ai-textanalytics
- **Package Version:** latest
- **Operating System:** Windows
- **Python Version:** 3.x

## Describe the issue

The docstring and reference docs for [`TextAnalyticsClient.recognize_pii_entities`](https://learn.microsoft.com/en-us/python/api/azure-ai-textanalytics/azure.ai.textanalytics.textanalyticsclient#azure-ai-textanalytics-textanalyticsclient-recognize-pii-entities) and the `categories_filter` keyword describe the **request shape** (which categories to filter for) but give no signal that **detection criteria differ per category** — some appear to do format-only matching, others apparently apply additional validation (e.g., checksum) that is not documented.

This is more of a clarification request than a bug.

## To Reproduce

```python
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

client = TextAnalyticsClient(endpoint, AzureKeyCredential(key))

# (A) syntactically valid CPF format, invalid check digits -> NOT detected
res_a = client.recognize_pii_entities(
    ["Entre em contato pelo CPF 123.456.789-00"], language="pt"
)
# (B) valid CPF (same value used in this SDK's official PII sample) -> detected as BRCPFNumber
res_b = client.recognize_pii_entities(
    ["Entre em contato pelo CPF 998.214.865-68"], language="pt"
)
```

No `categories_filter` is passed in either case, so the default detection set applies. (A) returns no `BRCPFNumber`; (B) does.

## Expected behavior

Either:

1. The SDK reference / conceptual doc points out that recognition criteria vary per category and may include semantic validation beyond format (and points to a service-level page that lists per-category criteria), **or**
2. The service-level doc page ([Recognized PII and PHI entities](https://learn.microsoft.com/en-us/azure/ai-services/language-service/personally-identifiable-information/concepts/entity-categories)) is updated and the SDK references it.

> Note: I attempted to file the documentation-side report directly against `MicrosoftDocs/azure-docs`, but that repository now has GitHub Issues disabled (`has_issues: false`), so this SDK issue is the only public channel available for the report. Routing the service-side fix internally would be appreciated.

## Why this matters for SDK users

Without this signal, developers writing tests with placeholder PII (a common pattern) get silent false negatives and have to reverse-engineer the detection criteria empirically. This was the path I took to discover the behavior.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[azure-ai-textanalytics] recognize_pii_entities: per-category recognition criteria (e.g., BRCPFNumber checksum) are undocumented #47486

Describe the issue

To Reproduce

Expected behavior

Why this matters for SDK users

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[azure-ai-textanalytics] recognize_pii_entities: per-category recognition criteria (e.g., BRCPFNumber checksum) are undocumented #47486

Description

Describe the issue

To Reproduce

Expected behavior

Why this matters for SDK users

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions