Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions ADOPTERS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# OpenUBA Adopters

This file lists organizations and individuals using OpenUBA in production or evaluation.

If you are using OpenUBA, please submit a pull request to add your organization to this list.

## Production Users

| Organization | Use Case | Since |
|-------------|----------|-------|
| Georgia Cyber Warfare Range (GCWR) | Security operations training and threat detection | 2019 |

## Evaluation / Development

| Organization | Use Case | Since |
|-------------|----------|-------|
| *(Your organization here)* | | |

## Adding Your Organization

To add your organization, submit a PR editing this file with:
- Organization name
- Brief use case description
- Year you started using OpenUBA
37 changes: 37 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Contributor Covenant Code of Conduct

## Our Pledge

We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to a positive environment:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior:
* The use of sexualized language or imagery, and sexual attention or advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information without explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting

## Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.

## Scope

This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at info@gacwr.org. All complaints will be reviewed and investigated promptly and fairly.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org), version 2.1, available at https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
122 changes: 122 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Contributing to OpenUBA

Thank you for your interest in contributing to OpenUBA! This document provides guidelines and information for contributors.

## Developer Certificate of Origin (DCO)

All contributions to OpenUBA must be signed off under the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). By signing off, you certify that you wrote the contribution or otherwise have the right to submit it under the project's license.

Sign off your commits by adding `Signed-off-by` to your commit message:

```
git commit -s -m "Your commit message"
```

Or manually add to your commit message:

```
Signed-off-by: Your Name <your.email@example.com>
```

## Getting Started

1. **Fork the repository** on GitHub
2. **Clone your fork** locally:
```bash
git clone https://github.com/YOUR_USERNAME/OpenUBA.git
cd OpenUBA
```
3. **Set up development environment:**
```bash
make dev-hybrid # Backend local + infra in Kind cluster
```
4. **Create a branch** for your changes:
```bash
git checkout -b feature/your-feature-name
```

## Development Setup

### Prerequisites

- Python 3.11+
- Docker & Docker Compose
- Kind (Kubernetes in Docker)
- Node.js 18+ (for frontend)
- Make

### Backend

```bash
pip install -r requirements.txt
python -m uvicorn core.api:app --reload --port 8000
```

### Frontend

```bash
cd interface
npm install
npm run dev
```

### Running Tests

```bash
make test # All tests
make test-backend # Backend only
make test-models # Model pipeline tests
```

## What to Contribute

### Good First Issues

Look for issues labeled `good first issue` in the GitHub issue tracker.

### Areas of Interest

- **New ML models** for the Model Library
- **Data source integrations** (new loaders beyond ES and Spark)
- **Documentation** improvements
- **Test coverage** expansion
- **Frontend UX** improvements
- **Kubernetes operator** development
- **CNCF integration** (Falco, OpenTelemetry, Prometheus)

## Pull Request Process

1. Ensure your code follows the project's coding style
2. Update documentation if your changes affect user-facing behavior
3. Add tests for new functionality
4. Ensure all tests pass (`make test`)
5. Sign off all commits (DCO)
6. Submit a pull request against the `master` branch
7. Describe your changes clearly in the PR description
8. Link to any related issues

## Code Review

- All PRs require at least one maintainer review
- CI must pass before merging
- Maintainers may request changes or improvements

## Reporting Bugs

- Use GitHub Issues to report bugs
- Include: steps to reproduce, expected behavior, actual behavior, environment details
- Check existing issues before creating a new one

## Requesting Features

- Open a GitHub Issue with the `enhancement` label
- Describe the use case and expected behavior
- Be open to discussion about implementation approach

## Code of Conduct

All participants in the OpenUBA community are expected to follow the [Code of Conduct](CODE_OF_CONDUCT.md).

## License

By contributing to OpenUBA, you agree that your contributions will be licensed under the project's license.
40 changes: 40 additions & 0 deletions GOVERNANCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# OpenUBA Governance

## Overview

OpenUBA is an open-source project governed by its maintainers. The project aims for transparent, community-driven decision-making.

## Roles

### Users
Anyone who uses OpenUBA. Users may provide feedback, report bugs, and request features via GitHub Issues.

### Contributors
Anyone who contributes code, documentation, tests, or other improvements via pull requests. Contributors must sign off commits under the DCO (see CONTRIBUTING.md).

### Maintainers
Individuals with merge access to the repository. Maintainers review PRs, triage issues, and make technical decisions. Listed in MAINTAINERS.md.

### Project Lead
The project lead provides overall direction, resolves disputes, and represents the project externally. Currently: Jovonni Pharr (@Jovonni).

## Decision Making

- **Routine changes** (bug fixes, docs, tests): Single maintainer approval
- **Significant changes** (new features, architecture): Two maintainer approvals or project lead approval
- **Breaking changes** (API changes, major refactors): Discussed in a GitHub Issue first, requires project lead approval

## Becoming a Maintainer

1. Demonstrate sustained, quality contributions over 3+ months
2. Be nominated by an existing maintainer
3. Approved by project lead
4. Added to MAINTAINERS.md

## Code of Conduct

All participants must follow the [Code of Conduct](CODE_OF_CONDUCT.md).

## Changes to Governance

Changes to this governance model require approval from the project lead and must be proposed via a GitHub Issue for community discussion.
17 changes: 17 additions & 0 deletions MAINTAINERS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# OpenUBA Maintainers

This file lists the maintainers of the OpenUBA project.

## Active Maintainers

| Name | GitHub | Affiliation | Role |
|------|--------|-------------|------|
| Jovonni Pharr | [@Jovonni](https://github.com/Jovonni) | GACWR | Project Lead |

## Emeritus Maintainers

None at this time.

## Becoming a Maintainer

New maintainers are nominated by existing maintainers based on sustained, high-quality contributions to the project. See [GOVERNANCE.md](GOVERNANCE.md) for the full process.
113 changes: 113 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# OpenUBA Roadmap

## Vision

OpenUBA aims to be the standard open-source User & Entity Behavior Analytics (UEBA) platform for cloud-native security operations.

## Current State (v0.0.2)

### Core Engine
- FastAPI backend with REST API (24 routers under `core/api_routers/`)
- Containerized model execution sandbox (Docker-based, `docker/model-runner/runner.py`)
- Dual data pipelines: Elasticsearch + Apache Spark
- 5 ML runtime images: sklearn, pytorch, tensorflow, networkx, base
- Next.js frontend with model management UI (`interface/`, Next.js 14)
- PostgreSQL storage (single-instance Deployment in `k8s/postgres.yaml`)
- Local model library with install/train/infer lifecycle (10+ reference models in `core/model_library/`)
- Kind cluster development environment (`configs/local.yaml`, `make` targets)

### Kubernetes-Native Infrastructure
- Custom Resource Definitions: UBATraining, UBAInference, UBAPipeline, UBAWorkspace (group `openuba.io`)
- Kopf-based operator (`core/operator/`) with workspace, training, inference, and pipeline handlers
- Operator deployment, RBAC, and service accounts (`k8s/operator-deployment.yaml`, `k8s/operator-rbac.yaml`)
- Full K8s manifest set (19 yaml files): backend, frontend, Elasticsearch, Spark, Postgres, PostGraphile, operator, ingress, namespace, PV/PVCs, secrets

### Model Registry & Ecosystem
- Multi-backend model registry with adapter pattern (`core/registry/registry_service.py`)
- 11 functional adapters split into code registries (local FS, GitHub, OpenUBA Hub) and weights registries (local FS, HuggingFace, Kubeflow), plus 5 legacy generic adapters
- Registry service with unit tests (`core/tests/test_registry/`)
- SHA-256 hashing scaffold (`core/hash.py`) — not yet wired to install-time integrity verification
- **Community model marketplace — OpenUBA Hub LIVE at https://openuba.org** (Next.js Model Hub in sibling repo `openuba-model-hub`, CNAME `openuba.org`, static catalog of reference models). Backend client adapter ships in `core/registry/adapters/openuba_hub_adapter.py`.

### Scheduling & Async
- Model scheduler service (`core/services/model_scheduler.py`, APScheduler or K8s CronJob mode)
- Schedules API router (`core/api_routers/schedules.py`)
- Async inference and training endpoints; operator dispatches `UBAInference` / `UBATraining` CRs

### GraphQL
- PostGraphile deployment (`k8s/postgraphile-deployment.yaml`) plus dev-mode local PostGraphile bootstrap (`core/graphql/postgraphile.py`)
- GraphQL endpoint exposed (smoke test only at `core/tests/test_graphql.py` — does not exercise a real query; query coverage planned in Phase 1)

### Workspaces
- Jupyter notebook workspaces with hardware tiers and NodePort allocation (`core/services/workspace_service.py`)
- Workspace CRD + Kopf operator handler (`core/operator/workspace_handler.py`)
- Python SDK installable as `openuba` v0.0.2 (`sdk/src/openuba/`)

### Testing
- Comprehensive E2E test suite: 24 flow tests, 5,636 LOC across `core/tests/e2e/` covering anomalies, cases, dashboards, datasets, display, experiments, features, jobs, model lifecycle, pipelines, navigation, rules, visualizations, workspaces, and JupyterLab SDK

### Visual Rule Builder (Rule Canvas)
- ReactFlow-based drag-and-drop rule editor (664 LOC, `flow-canvas.tsx`)
- Custom node types for detection logic (290 LOC, `flow-nodes.tsx`)
- Palette with draggable condition/action nodes
- Rule save, test, severity configuration
- Integrated with GraphQL model queries

### LLM Investigation Assistant
- Omnipresent chat window accessible from any page (559 LOC, `chat-window.tsx`)
- Multi-provider support: Ollama, OpenAI, Claude, Gemini (538-LOC `chat_service.py` with per-provider streaming)
- SSE streaming with thinking-block parsing
- Context-aware: injects current page/entity context into prompts
- Backend chat API with SSE streaming (`interface/app/api/chat/route.ts` proxying to `core/api_routers/chat.py`)

### Governance Framework
- CNCF-shaped governance files shipped: `GOVERNANCE.md`, `MAINTAINERS.md`, `CONTRIBUTING.md` (DCO required), `CODE_OF_CONDUCT.md` (Contributor Covenant 2.1), `SECURITY.md`, `ADOPTERS.md` (PR #137). Substantive, non-boilerplate — see Phase 4 note on demonstrating use of the framework.

## Known Gaps in Current State

Honest list of items the Current State touches but does not fully deliver, with code citations:

- Postgres migration is incomplete — `core/model.py`, `core/user.py`, `core/process.py` still call `ReadJSONFileFS` / `WriteJSONFileFS` on `core/storage/*.json` (`users.json`, `models.json`, `model_sessions.json`, `scheme.json`). Repos (`core/repositories/`) + migration script (`core/db/migrations/migrate_from_files.py`) exist; cutover in progress.
- `/metrics` endpoint at `core/api_routers/data.py:201` emits domain JSON (Spark/ES counters), not Prometheus exposition format. No `opentelemetry-*` or `prometheus_client` in `requirements.txt`; no OpenTelemetry SDK init in `core/`.
- `core/hash.py` `HashLargeFile` references undefined `filename` / `hashlib` and is not wired into any install-time integrity flow. Scaffold only.
- `core/tests/test_graphql.py` is a 24-LOC smoke test (GETs `/`, checks for an `endpoints` key) — does not exercise a GraphQL query, schema introspection, or PostGraphile.
- `core/registry/adapters/openuba_hub_adapter.py` now defaults to `https://openuba.org`, but the live Hub serves a static Next.js catalog rather than the `/ml/` JSON contract the adapter expects — Hub-side JSON endpoint needs publishing.
- `k8s/postgres.yaml` is a vanilla single-replica Deployment + PVC, not CloudNativePG. HA Postgres moved to Phase 1.

## Phase 1: Production Hardening (Q3 2026)

- [ ] Helm chart packaging and publishing to Artifact Hub (`k8s/` is raw manifests today; no `Chart.yaml`, no `helm/` directory, no `helm` Makefile target)
- [ ] Horizontal pod autoscaling for Spark workers (`k8s/spark-deployment.yaml` hardcodes `replicas: 1`; no `autoscaling/v2` resources anywhere)
- [ ] Multi-tenant isolation (namespace-per-tenant + `tenant_id` across tables / repositories / RBAC; today there is one `openuba` namespace and zero tenant-scoped code)
- [ ] Production-grade observability — Prometheus exposition-format `/metrics` + OpenTelemetry SDK self-instrumentation (current: domain-JSON metrics endpoint only)
- [ ] Complete the JSON-file → Postgres cutover (remove `ReadJSONFileFS` / `WriteJSONFileFS` callers in `core/model.py`, `core/user.py`, `core/process.py`; make migration mandatory on startup)
- [ ] Migrate Postgres deployment to CloudNativePG operator (HA `Cluster` CR, automated failover, scheduled backups) — moved here from Current State per audit
- [ ] GraphQL query-level test coverage (replace smoke test with real query / mutation / schema-introspection suite against PostGraphile)
- [ ] Wire SHA-256 model integrity check into install path (fix `core/hash.py:HashLargeFile`, gate `model_installer` on checksum)

## Phase 2: CNCF Integration (Q4 2026)

- [ ] Falco integration — consume runtime security events as behavioral data source (no Falco consumer code today; aspirational mentions only)
- [ ] OpenTelemetry ingest — receive OTLP traces and logs as behavioral signals (distinct from Phase 1 emit; no OTLP receiver or collector config in repo)
- [ ] OPA / Kyverno policy trigger — output risk scores in OPA-input JSON shape with example `ClusterPolicy` wiring
- [ ] SPIFFE / SPIRE workload identity for inter-service authentication (current inter-service auth is JWT/bearer via `python-jose`)
- [ ] CNCF Landscape listing (PR to `cncf/landscape` under Security & Compliance)
- [ ] TAG Security presentation and feedback incorporation

## Phase 3: Community & Scale (Q1 2027)

- [ ] Performance benchmarks published (no `bench/`, `benchmarks/`, or `docs/performance/` today; README has no numeric throughput / latency / scale claims)
- [ ] Contributor diversity (multiple organizations) — **the longest pole**; see governance note in Phase 4. 12-month commit history shows one human author + dependabot; `MAINTAINERS.md` lists 1 maintainer.

*(The "Community model marketplace (OpenUBA Hub public instance)" item previously listed here has moved to Current State — the Hub is live at https://openuba.org. The remaining adapter-URL fix is tracked in Known Gaps.)*

## Phase 4: Incubation Readiness (Q2 2027)

- [ ] Production deployments documented in `ADOPTERS.md` — recruit 3+ independent adopters (file exists with 1 entry: the project's host org)
- [ ] Independent security audit (engage e.g. OSTIF / Trail of Bits / CNCF-sponsored; publish report in `docs/audit/`)
- [ ] Comprehensive documentation review — docs site (no `mkdocs.yml` / Docusaurus / Sphinx today), API reference, operator runbook, tutorial series; reconcile Python version drift between `docs/ARCHITECTURE.md` (3.9) and `CONTRIBUTING.md` (3.11+)
- [ ] Governance maturity demonstration — see note below

### Note on governance maturity (longest pole)

The governance framework is **shipped** — `GOVERNANCE.md`, `MAINTAINERS.md`, `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, `SECURITY.md`, `ADOPTERS.md` (PR #137). What is missing is *demonstration of using it*: the 2-maintainer approval path defined in `GOVERNANCE.md` is currently inoperable with 1 maintainer, `ADOPTERS.md` has 1 entry (the host org), there are no public TSC meeting notes, and no governance-tagged decisions on record. Both blockers resolve only via sustained external contributor + adopter outreach — i.e., they bottleneck on Phase 3's contributor-diversity item, which is the steepest CNCF Sandbox → Incubation gate this project faces.
Loading