Document Processing API

An event-driven ASP.NET Core backend that accepts document-analysis jobs through a Web API, stores job state in PostgreSQL, and processes jobs asynchronously using RabbitMQ and a background worker.

Overview

This portfolio project is built to demonstrate employable .NET backend skills beyond basic CRUD APIs, including:

asynchronous job processing
worker-based background execution
message-driven architecture
explicit job lifecycle management
outbox pattern
PostgreSQL persistence with EF Core
fully containerised application and infrastructure
layered architecture
automated testing
GitHub Actions CI
idempotent consumer design

In the current version, the API accepts text input rather than real file uploads. A client submits a document-processing job, the API stores the job and outbox message, and marks the job as queued. The publisher periodically polls the outbox for unpublished messages, processes them in batches, and publishes them to RabbitMQ. A worker then processes the job asynchronously. The client can then query job status and results later.

What This Project Demonstrates

This project is designed to show practical backend skills that map to real systems:

ASP.NET Core Web API design
EF Core + PostgreSQL persistence
RabbitMQ message publishing and consumption
Background worker processing
explicit domain state transitions
polling-based async job tracking
outbox pattern
clean separation of concerns across layers
fully containerised local development with Docker Compose
integration, domain, and worker unit testing
GitHub Actions CI
dead-letter queue handling
consumer-side retry tracking
idempotent consumer

Tech Stack

.NET 10
ASP.NET Core Web API
Worker Service
EF Core 10
PostgreSQL
RabbitMQ
Docker / Docker Compose
xUnit
GitHub Actions

Related Repositories

Document Processing UI — React frontend

Solution Structure

src/
  DocumentProcessing.Api
  DocumentProcessing.Application
  DocumentProcessing.Domain
  DocumentProcessing.Infrastructure
  DocumentProcessing.Worker

tests/
  DocumentProcessing.Api.Tests
  DocumentProcessing.Domain.Tests
  DocumentProcessing.Worker.Tests
  DocumentProcessing.E2E.Tests

Layer Responsibilities

DocumentProcessing.Api HTTP endpoints, request/response contracts, JSON configuration, and API wiring.
DocumentProcessing.Application Use cases, DTOs, service abstractions, repository abstractions, and messaging abstractions.
DocumentProcessing.Domain Core business model, job lifecycle rules, and domain invariants.
DocumentProcessing.Infrastructure EF Core persistence, repository implementations, RabbitMQ publisher, and infrastructure registration.
DocumentProcessing.Worker Background consumer that reads RabbitMQ messages, loads jobs from the database, performs analysis, and updates job state.

Job Lifecycle

The core aggregate is DocumentJob.

A job starts in Pending, is moved to Queued when the job and outbox message are persisted, then moves to Processing when the worker begins analysis. From Processing, the job can become either Completed or Failed.

There is also a dispatch-failure path from Queued to Failed. This is used when the outbox publisher cannot publish the job message to RabbitMQ after the maximum retry count is exceeded. In that case, the job failed before worker processing began.

Transition rules:

Pending -> Queued
Queued -> Processing
Processing -> Completed
Processing -> Failed
Queued -> Failed for outbox dispatch failure

For v1, Completed and Failed are treated as terminal states.

API Endpoints

Create a job

POST /api/jobs

Accepts text input and returns a queued job response.

Get a job by id

GET /api/jobs/{id}

Returns the current status and any available analysis results.

List jobs

GET /api/jobs

Returns jobs ordered by SubmittedAtUtc descending.

Example Request

{
  "inputText": "This is a test document.\nIt has multiple lines.\n"
}

Example Response After Submission

{
  "id": "c6147881-a3a7-41fb-97f7-f96d12e62e58",
  "status": "Queued",
  "inputText": "This is a test document.\nIt has multiple lines.\n",
  "submittedAtUtc": "2026-04-15T05:30:40.209483Z",
  "updatedAtUtc": "2026-04-15T05:30:40.283789Z",
  "completedAtUtc": null,
  "errorMessage": null,
  "wordCount": null,
  "characterCount": null,
  "lineCount": null,
  "keywordHits": null,
  "category": null,
  "summary": null
}

Example Response After Processing

{
  "id": "c6147881-a3a7-41fb-97f7-f96d12e62e58",
  "status": "Completed",
  "inputText": "This is a test document.\nIt has multiple lines.\n",
  "submittedAtUtc": "2026-04-15T05:30:40.209483Z",
  "updatedAtUtc": "2026-04-15T05:30:40.796823Z",
  "completedAtUtc": "2026-04-15T05:30:40.796823Z",
  "errorMessage": null,
  "wordCount": 9,
  "characterCount": 48,
  "lineCount": 2,
  "keywordHits": 0,
  "category": "General",
  "summary": "This is a test document.\nIt has multiple lines.\n"
}

Processing Flow

Client submits a document job to POST /api/jobs, job initial state is Pending
Application marks the job as Queued
Application persists the job and the outbox message atomically in a single transaction.
Background outbox publisher periodically polls for unpublished outbox messages, publishes them to RabbitMQ, and marks them as published.
Worker consumes the message from RabbitMQ.
Worker loads the job from PostgreSQL
Worker marks the job as Processing. In the current version, this intermediate state may not be visible to the client during fast processing or retry scenarios.
Worker performs simple text analysis
Worker marks the job as Completed or Failed
Client retrieves job status using GET /api/jobs/{id} or GET /api/jobs

Analysis Output

The worker currently produces:

word count
character count
line count
keyword hit count
default category
truncated summary

Notes

Line counting ignores trailing newline characters.
keywordHits is currently a placeholder implementation.
category is currently a simple default value.

Running Locally

Prerequisites

Docker Desktop

Start the full stack

docker compose up -d

This starts all services:

PostgreSQL
RabbitMQ
API (available at http://localhost:8080)
Worker

Database migrations are applied automatically on API startup.

Submit a job

curl -X POST http://localhost:8080/api/jobs \
  -H "Content-Type: application/json" \
  -d '{"inputText": "Hello from the fully containerised stack!"}'

Check job status

curl http://localhost:8080/api/jobs/{id}

RabbitMQ Management UI

Available at http://localhost:15672 using the credentials configured in docker-compose.yml.

API Documentation

Interactive API documentation is available via Scalar at http://localhost:8080/scalar/v1 when running locally.

Running locally without Docker (development)

If you prefer to run the API and worker directly for faster iteration:

1. Start infrastructure only

docker compose up -d postgres rabbitmq

2. Apply database migrations

dotnet ef database update \
  --project src/DocumentProcessing.Infrastructure \
  --startup-project src/DocumentProcessing.Api

3. Run the API and worker

dotnet run --project src/DocumentProcessing.Api
dotnet run --project src/DocumentProcessing.Worker

Testing

Domain tests

DocumentJob creation with valid input
valid transitions
invalid transitions
guard clauses
completion result mapping
OutboxMessage creation and validation rules
outbox publication/error rules

API integration tests

create job returns an accepted response
get job by id returns the persisted job
get job by id returns 404 when missing
list jobs returns jobs ordered by submission time

API integration tests require a running PostgreSQL instance.
Start infrastructure with docker compose up -d postgres rabbitmq before running the full test suite locally.
Domain and worker unit tests run without any infrastructure.

Worker unit tests

document analysis returns expected counts for single-line input
document analysis returns expected counts for multiline input
trailing newline does not create an extra counted line
long input truncates summary correctly

E2E tests

full job lifecycle from submission to completion
full job lifecycle from submission to failure (skipped — see Known Limitation: Consumer Retry Queue Pattern)

E2E tests require the full stack to be running via docker compose up.

Running the test suite

# Unit and integration tests
dotnet test --filter "Category!=E2E"

# E2E tests (requires docker compose up)
dotnet test --filter "Category=E2E"

Continuous Integration

GitHub Actions CI is configured for this repository and runs on every push.

The workflow:

restores dependencies
builds the solution
provisions PostgreSQL and RabbitMQ service containers
applies EF Core migrations
runs unit and integration tests

E2E tests are excluded from CI and are intended to be run against a locally running stack.

Important Design Notes

Outbox Pattern

Without the outbox pattern, a failure between saving the job and publishing the message could leave a job stranded in Queued state indefinitely.

With outbox pattern implemented with unit of work, the job is persisted in the database along with a message in the outbox in one atomic transaction. The Outbox Publisher then periodically polls the outbox for unpublished messages and processes them in batches.

A partial index on outbox_messages covering only unpublished messages ensures the publisher query stays fast as the table grows.

Message delivery semantics

This project uses the outbox pattern to provide at-least-once message delivery between the API and worker process. When a document job is created, the job state change and the corresponding outbox message are persisted in the same database transaction. A separate publisher then reads unpublished outbox messages and publishes them to RabbitMQ.

Because at-least-once delivery can result in the same message being delivered more than once, the worker is designed as an idempotent consumer. Before processing a message, the worker loads the current job state and only performs work when the job is in a valid processable state. Messages in Completed or Failed states are acknowledged without reprocessing the job.

This project intentionally does not attempt to provide at-most-once or exactly-once delivery. Implementing them would require additional distributed coordination that is outside the scope of this portfolio project.

Dead-Letter Queue

A Dead Letter Exchange (DLX) and Dead Letter Queue (DLQ) are implemented for handling messages that cannot be processed safely.

The main queue, document-processing.jobs, is configured with a dead-letter exchange:

DLX: document-processing.dlx
DLQ: document-processing.jobs.dlq
Dead-letter routing key: document-processing-key

When the worker determines that a message should not be retried, it negatively acknowledges the message with requeue: false. RabbitMQ then dead-letters the message to document-processing.dlx, which routes it to document-processing.jobs.dlq using the configured binding. The DLX, DLQ, and bindings are declared in rabbitmq/definitions.json.

Messages are dead-lettered for cases such as:

Invalid or empty ProcessDocumentJobMessage
Malformed x-death header
Non-existent DocumentJob
DocumentJob is in a state that cannot be safely processed or acknowledged as a duplicate

Producer / API
    │
    │ publish job message
    ▼
document-processing.jobs
    │
    │ consumed by Worker
    ▼
DocumentJobConsumer
    │
    ├── success
    │   └── message is acknowledged and removed from queue
    │       BasicAckAsync
    │
    ├── transient failure
    │   └── message is negatively acknowledged and requeued
    │       BasicNackAsync(requeue: true)
    │
    └── non-retryable failure / retry limit exceeded
        │
        │ message is negatively acknowledged without requeue
        │ BasicNackAsync(requeue: false)
        │
        │ RabbitMQ dead-letters the message using:
        │   exchange:    document-processing.dlx
        │   routing key: document-processing-key
        ▼
document-processing.dlx
    │
    │ direct exchange binding:
    │   routing key: document-processing-key
    ▼
document-processing.jobs.dlq

Known Limitation: Consumer Retry Queue Pattern

Currently, transient processing failures are handled with BasicNackAsync(requeue: true), which returns the message immediately to the main queue. This is simple, but it has two drawbacks:

a repeatedly failing message may be retried immediately and consume worker capacity
because the message is requeued rather than dead-lettered, RabbitMQ’s x-death header is not incremented for those retry attempts.

The worker already contains early support for reading RabbitMQ’s x-death header, but the current retry behaviour does not yet make full use of it because failed messages are requeued directly.

A more robust solution would use a dedicated retry exchange and retry queue with a TTL. Failed messages would be dead-lettered into the retry queue, wait for the TTL to expire, and then be routed back to the main queue. This would provide delayed retries and better broker-level retry tracking. This is planned as a future improvement.

Auto-migration on startup

The API automatically applies pending EF Core migrations on startup. This ensures the database schema is always up to date when running via Docker Compose without any manual steps.

Current Limitations

This version intentionally keeps scope tight:

text input only
no real file upload yet
no retry endpoint yet
no authentication/authorization yet
no pagination/filtering for job listing yet
no advanced text analytics yet

Future Improvements

Possible next steps:

JWT authentication / authorization
retry support for failed jobs
real file upload
richer keyword analysis and categorisation
pagination and filtering for job queries
React frontend for job submission and status tracking
dedicated retry exchange and queue with a TTL
dead-lettering messages that exceed the maximum retry count

Why This Project Exists

I'm transitioning into .NET backend development from a senior integration / telecom engineering background. This project is intended to demonstrate backend skills that map to real production systems: asynchronous workflows, messaging, persistence, background processing, lifecycle management, and operational thinking.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
rabbitmq		rabbitmq
src		src
tests		tests
.gitignore		.gitignore
DocumentProcessingApi.slnx		DocumentProcessingApi.slnx
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml

Folders and files

Latest commit

History

Repository files navigation

Document Processing API

Overview

What This Project Demonstrates

Tech Stack

Related Repositories

Solution Structure

Layer Responsibilities

Job Lifecycle

API Endpoints

Create a job

Get a job by id

List jobs

Example Request

Example Response After Submission

Example Response After Processing

Processing Flow

Analysis Output

Running Locally

Prerequisites

Start the full stack

Submit a job

Check job status

RabbitMQ Management UI

API Documentation

Running locally without Docker (development)

Testing

Domain tests

API integration tests

Worker unit tests

E2E tests

Running the test suite

Continuous Integration

Important Design Notes

Outbox Pattern

Message delivery semantics

Dead-Letter Queue

Messages are dead-lettered for cases such as:

Known Limitation: Consumer Retry Queue Pattern

Auto-migration on startup

Current Limitations

Future Improvements

Why This Project Exists

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages