Skip to content

ImRoodyDev/grabit-engine

Repository files navigation

Grabit Engine

Grabit Engine

npm version

License Node.js TypeScript Jest PRs Welcome

A simple, plugin-based engine for scraping media streams and subtitles. Load provider plugins from GitHub, local files, or directly in code — with health tracking, auto-updates, caching, and more built right in. Works in Node.js, browsers, React and React Native.


📑 Table of Contents


✨ Features

Core

  • 🔌 Plugin system — add or remove providers anytime
  • 🌍 Runs anywhere — Node.js, browsers, React Native
  • 🎯 Pick a provider — scrape from one specific provider by its scheme
  • Run in parallel — scrape from multiple providers at the same time
  • 🏁 Stop early — quit as soon as enough providers have responded
  • ⏱️ Timeouts — never wait forever for a slow provider

Reliability

  • 📊 Health tracking — see how each provider is doing (errors, successes)
  • 🔴 Auto-disable — bad providers get turned off on their own
  • 🔄 Auto-update — remote providers refresh themselves on a timer
  • 💾 Built-in cache — save results in memory so you don't repeat work
  • 🔁 Retries — automatically retry failed providers
  • Validation — checks that plugins are set up correctly before loading

📦 Installation

npm install grabit-engine
Optional: Puppeteer support (Node.js only)
npm install puppeteer-real-browser

Puppeteer is an optional peer dependency for providers that need headless browser automation.

Optional: base64 polyfill (React Native)
npm install base-64

React Native versions below 0.74 do not expose atob / btoa as globals. This library automatically polyfills them when it detects they are missing, using the base-64 package as an optional peer dependency.

If you are targeting React Native, install base-64 alongside this package. On Node.js and modern browsers the built-in atob / btoa are used and no extra package is needed.



🚀 Quick Start

import { ScrapePluginManager } from "grabit-engine";

// Create the manager with a registry source (simplest approach)
const manager = await ScrapePluginManager.create({
	source: {
		type: "registry",
		name: "my-providers",
		providers: {
			"my-provider": myProviderModule
		}
	},
	tmdbApiKeys: ["your-tmdb-api-key"]
});

// Scrape streams for a movie — minimal: only tmdbId is required!
// TMDB service auto-fills title, year, duration, imdbId, etc.
// Or provide full media data — TMDB only fills what's missing
const streams = await manager.getStreams({
	media: {
		type: "movie",
		tmdbId: "27205"
		// imdbId: "tt1375666"
		// title: "Inception",
		// duration: 148,
		// releaseYear: 2010,
	},
	targetLanguageISO: "en"
});

// Scrape from a specific provider by scheme
const targeted = await manager.getStreamsByScheme("my-provider", request);


🔗 Provider Sources

The manager can load plugins from three places:

Source Runtime Description Auto-Update
github All Download providers from a GitHub repo
local All Load providers from files on your machine
registry All Pass provider modules directly in code — no file I/O needed

GitHub Source

const manager = await ScrapePluginManager.create({
	source: {
		type: "github",
		url: "https://github.com/your-org/your-providers",
		branch: "main",
		rootDir: "dist", // optional, subdirectory containing manifest.json and providers (default: repo root)
		token: process.env.GITHUB_TOKEN, // optional, for private repos
		// Required in browser / React Native:
		moduleResolver: async (scheme, sourceCode) => {
			const exports = {};
			const module = { exports };
			new Function("module", "exports", sourceCode)(module, exports);
			return (module.exports as any).default ?? module.exports;
		}
	}
});
Repository structure

Your GitHub repo must contain a manifest.json. By default it's expected at the repo root, but you can set rootDir to point to a subdirectory:

your-providers/              # rootDir not set (default: repo root)
├── manifest.json
└── providers/
    ├── example-provider/
    │   └── index.js
    └── another-provider/
        └── index.js
your-providers/              # rootDir: "dist"
├── dist/
│   ├── manifest.json
│   └── providers/
│       ├── example-provider/
│       │   └── index.js
│       └── another-provider/
│           └── index.js
└── src/
    └── ...

manifest.json

{
	"name": "my-providers",
	"author": "your-name",
	"providers": {
		"example-provider": {
			"name": "ExampleProvider",
			"version": "1.0.0",
			"active": true,
			"language": "en",
			"type": "media",
			"env": "universal",
			"supportedMediaTypes": ["movie", "serie"],
			"priority": 10,
			"dir": "providers"
		}
	}
}

Local Source

const manager = await ScrapePluginManager.create({
	source: {
		type: "local",
		manifest: require("./manifest.json"),
		rootDir: "./providers",
		resolve: (path) => require(path)
	}
});

Registry Source

import exampleProvider from "./providers/example-provider";

const manager = await ScrapePluginManager.create({
	source: {
		type: "registry",
		name: "my-providers",
		providers: {
			"example-provider": exampleProvider
		}
	}
});


🔧 Creating a Provider Plugin

The fastest way to create a new provider is with the built-in CLI:

npx create-provider my-cool-provider

You can specify the language(s) upfront with --lang. Pass a comma-separated list for multiple languages:

# Single language (default: "en")
npx create-provider my-cool-provider --lang fr

# Multiple languages
npx create-provider my-cool-provider --lang en,fr,es

If no scheme is provided, the CLI enters interactive mode and prompts you for it:

npx create-provider

Once your provider is ready, bundle it for distribution with npx bundle-provider — see Bundling Providers for all available flags (--src, --out, --dry-run, --clean).

This creates a ready-to-edit folder:

providers/
└── my-cool-provider/
    ├── index.ts      ← entry point (exports the module)
    ├── config.ts     ← provider settings (URL, endpoints, etc.)
    ├── stream.ts     ← stream scraping logic
    └── subtitle.ts   ← subtitle scraping logic (optional)

You can also create the files by hand. Here's what each file looks like:

config.ts — Provider Configuration

import { ProviderConfig } from "grabit-engine";

export const config: ProviderConfig = {
	scheme: "example-provider",
	name: "ExampleProvider",
	language: "en", // or ["en", "fr"] for multi-language providers
	baseUrl: "https://example-streams.com",
	entries: {
		movie: { endpoint: "/embed/movie?tmdb={id:string}" },
		serie: { endpoint: "/embed/tv?tmdb={id:string}&season={season:1}&episode={episode:1}" }
	},
	mediaIds: ["tmdb", "imdb"]
};

stream.ts — Stream Handler

import { ScrapeRequester, InternalMediaSource, ProviderContext } from "grabit-engine";
import { Provider } from "grabit-engine/models/provider";
import { config } from "./config";

export async function getStreams(requester: ScrapeRequester, ctx: ProviderContext): Promise<InternalMediaSource[]> {
	const provider = Provider.create(config);
	const url = provider.createResourceURL(requester);

	ctx.log.info(`Fetching streams from ${url.href}`);

	const { $, response } = await ctx.cheerio.load(url, requester, ctx.xhr);
	const src = $("video > source").attr("src");

	if (!src) return [];

	return [
		{
			fileName: "video.mp4",
			format: "mp4",
			language: "en",
			playlist: src,
			xhr: { haveCorsPolicy: false, headers: {} }
		}
	];
}

subtitle.ts — Subtitle Handler

import { ScrapeRequester, InternalSubtitleSource, ProviderContext } from "grabit-engine";
import { Provider } from "grabit-engine/models/provider";
import { config } from "./config";

export async function getSubtitles(requester: ScrapeRequester, ctx: ProviderContext): Promise<InternalSubtitleSource[]> {
	const provider = Provider.create(config);
	const url = provider.createResourceURL(requester);

	ctx.log.info(`Fetching subtitles from ${url.href}`);

	const apiUrl = new URL(`/api/subtitles?id=${url.searchParams.get("tmdb")}`, url.origin);
	const response = await ctx.xhr.fetch(apiUrl, {}, requester);
	const data = await response.json();

	return data.map((sub: any) => ({
		fileName: "subtitles.srt",
		format: "srt" as const,
		language: sub.language,
		languageName: sub.languageName,
		url: sub.url,
		xhr: { haveCorsPolicy: false, headers: {} }
	}));
}

index.ts — Entry Point

import { defineProviderModule } from "grabit-engine/controllers/provider";
import { Provider } from "grabit-engine/models/provider";
import { config } from "./config";
import { getStreams } from "./stream";
import { getSubtitles } from "./subtitle";

const provider = Provider.create(config);

export default defineProviderModule(
	provider,
	{
		name: config.name,
		version: "1.0.0",
		active: true,
		env: "universal",
		type: "media",
		supportedMediaTypes: ["movie", "serie"],
		priority: 10,
		dir: "providers"
	},
	{ getStreams, getSubtitles }
);

Multi-Language Providers

The language field on both ProviderConfig and ProviderModuleManifest accepts a single string or an array of strings. This lets you declare that a provider serves content in multiple languages.

CLI

# Single language (default)
npx create-provider my-provider --lang en

# Multiple languages
npx create-provider my-provider --lang en,fr,es

Config

// Single language
export const config: ProviderConfig = {
	scheme: "single-lang",
	name: "SingleLang",
	language: "en"
	// ...
};

// Multi-language
export const config: ProviderConfig = {
	scheme: "multi-lang",
	name: "MultiLang",
	language: ["en", "fr", "es"]
	// ...
};

Manifest (manifest.json)

{
	"providers": {
		"my-provider": {
			"name": "MyProvider",
			"version": "1.0.0",
			"active": true,
			"language": ["en", "fr", "es"]
			// ...
		}
	}
}

When the manager sorts providers for a request, providers whose language field includes the requester's targetLanguageISO are prioritized higher.



� Bundling Providers

When providers are loaded from GitHub (via GithubService), each provider is fetched as a single index.js file and loaded via dynamic import() in an isolated temp directory. That directory has no node_modules and no sibling files — so relative imports (./config) and package imports (grabit-engine) would fail.

The bundler solves this by compiling each provider into a standalone, self-contained ES module with zero external imports.

Install esbuild

npm install --save-dev esbuild

Bundle all providers

npx bundle-provider

Bundle a specific provider

npx bundle-provider my-cool-provider

Folder structure

Providers can be organized flat or grouped inside subdirectories:

providers/
├── english/                    ← group folder (no index.ts)
│   ├── vidsrc/                 ← provider → scheme "english/vidsrc"
│   │   ├── index.ts
│   │   ├── config.ts
│   │   ├── stream.ts
│   │   └── subtitle.ts
│   └── another/                ← provider → scheme "english/another"
│       └── index.ts ...
├── loodvidrsc/                 ← provider → scheme "loodvidrsc"
│   ├── index.ts
│   └── ...
└── manifest.json

The bundler recursively walks the source directory. Folders with index.ts are providers; folders without are group organizers.

For grouped providers, pass the full relative path:

npx bundle-provider english/vidsrc

Custom source & output directories

By default, providers are read from providers/ and bundles are written next to the source. You can change both:

# Custom source directory
npx bundle-provider --src ./my-providers

# Custom output directory (mirrors the folder structure)
npx bundle-provider --out ./dist/providers

# Both
npx bundle-provider --src ./my-providers --out ./dist/providers

With --out ./dist/providers, the output becomes:

dist/providers/
├── english/vidsrc/index.js     ← standalone bundle
├── loodvidrsc/index.js         ← standalone bundle
└── ...

What the bundle contains

Each bundled index.js inlines everything it needs:

  • Your provider's config, stream, and subtitle logic
  • Runtime code from grabit-engine (Provider, defineProviderModule, etc.)
  • Manifest data from manifest.json

Tree-shaking keeps bundles small (~5–15 KB). The output has zero import statements.

CLI reference

Command Description
npx bundle-provider Bundle all providers
npx bundle-provider <scheme> Bundle one provider (e.g. vidsrc or english/vidsrc)
npx bundle-provider --src <dir> Custom source directory
npx bundle-provider --out <dir> Custom output directory
npx bundle-provider --dry-run Preview without writing
npx bundle-provider --clean Remove all generated bundles

Tip: After editing any provider source files, always re-bundle before pushing to GitHub.

See scripts/BUNDLING.md for the full bundling guide.



🧪 Testing Providers

Once you have written a provider, use the test-provider CLI tool to verify it scrapes correctly against real media data — without writing any test files or setting up a manager.

# Test a movie — minimal (TMDB fills title, year, duration, etc.)
npx test-provider --scheme my-provider --type movie --tmdb 27205

# Test a movie — full (all data provided, TMDB only fills gaps)
npx test-provider --scheme my-provider --type movie \
  --title "Inception" --year 2010 --tmdb 27205 --duration 148

# Test a series episode — minimal
npx test-provider --scheme my-provider --type serie \
  --tmdb 1396 --season 1 --episode 1

# Test a series episode — full
npx test-provider --scheme my-provider --type serie \
  --title "Breaking Bad" --year 2008 --tmdb 1396 \
  --season 1 --episode 1 --ep-tmdb 349232

# Test both streams and subtitles
npx test-provider --scheme my-provider --mode both --type movie --tmdb 27205

# Load media from a JSON file
npx test-provider --scheme my-provider --media-file ./test-media.json

The tool auto-bundles TypeScript source via esbuild if no pre-built index.js is present, fetches missing media data from TMDB automatically, runs the scrape with a configurable timeout, and prints a formatted report with a PASS / EMPTY / FAIL verdict.

See /TESTING.md for the full guide — all flags, output format, media file examples, and tips.



📖 API Reference

Full API documentation has been moved to API_REFERENCE.md for better readability.

It covers: ScrapePluginManager, ScrapeRequester, ProviderModuleManifest, ProviderMetrics & ProviderHealthReport, ProviderContext, ProviderFetchOptions, Media Input Types, Output Types, Provider Configuration, the Provider class, Error Classes, Utility Functions, and Services.



⚙️ Configuration

Option Type Default Description
source GithubSource | LocalSource | RegistrySource Required. Where to load your plugins from.
debug boolean false Turn on detailed logging.
strict boolean false Throw errors for bad plugins instead of just skipping them.
autoUpdateIntervalMinutes number 15 How often to refresh remote providers (min: 5).
cache.enabled boolean false Turn on result caching.
cache.TTL number 0 How long to keep cached results (in ms).
cache.MODULE_TTL number 900000 How long to keep loaded provider modules in cache (15 min).
cache.TMDB_TTL number 0 How long to cache TMDB API responses (in ms). Helps avoid hitting the TMDB API too hard. Set to e.g. 3600000 (1 hour) to cache responses.
cache.maxEntries number 10000 Maximum number of entries in the in-memory cache. Oldest entries are evicted when the limit is reached (LRU).
tmdbApiKeys string[] Required. Array of TMDB API keys. A random key is selected for each request to distribute load.

Scrape Configuration

Option Type Default Description
scrapeConfig.concurrentOperations number 5 How many providers can run at the same time.
scrapeConfig.maxAttempts number 1 How many times to retry a failing provider.
scrapeConfig.operationTimeout number 15000 Max time before giving up on a scrape (15 sec).
scrapeConfig.successQuorum number undefined Stop once this many providers have succeeded.
scrapeConfig.errorThresholdRate number 0.7 Error rate that triggers auto-disable (70%).
scrapeConfig.minOperationsForEvaluation number 10 How many scrapes before checking if a provider is healthy.


📊 Metrics & Health Monitoring

The manager keeps track of how each provider is doing and can automatically turn off unhealthy ones:

// Raw metrics map
const metrics = manager.getMetrics();
for (const [scheme, m] of metrics) {
	console.log(`${scheme}: ${m.successes} ok, ${m.errors} err`);
}

// Detailed health report
const report = manager.getMetricsReport();
report.forEach((r) => {
	console.log(`${r.moduleName}: ${r.totalOperations} ops, ` + `${(r.errorRate * 100).toFixed(1)}% errors, ` + `active=${r.active}`);
});

Providers that fail too often (more than errorThresholdRate after minOperationsForEvaluation scrapes) get turned off and won't be used again until the manager is reloaded.



📝 Examples

React Native with GitHub source
import { ScrapePluginManager } from "grabit-engine";

const manager = await ScrapePluginManager.create({
	source: {
		type: "github",
		url: "your-org/providers-repo",
		branch: "main",
		rootDir: "dist", // optional
		moduleResolver: async (_scheme, sourceCode) => {
			const exports: Record<string, unknown> = {};
			const module = { exports };
			new Function("module", "exports", sourceCode)(module, exports);
			return (module.exports as any).default ?? module.exports;
		}
	},
	tmdbApiKeys: ["your-tmdb-api-key"],
	scrapeConfig: {
		concurrentOperations: 3,
		successQuorum: 2,
		operationTimeout: 15000
	}
});

// Minimal request — just tmdbId, TMDB fills the rest

// Minimal request — just tmdbId, TMDB fills the rest
const streams = await manager.getStreams({
	media: { type: "movie", tmdbId: "27205" },
	targetLanguageISO: "en"
});
Node.js with local providers
import { ScrapePluginManager } from "grabit-engine";
import manifest from "./providers/manifest.json";

const manager = await ScrapePluginManager.create({
	source: {
		type: "local",
		manifest,
		rootDir: "./providers",
		resolve: (path) => require(path)
	},
	tmdbApiKeys: ["your-tmdb-api-key"],
	debug: true,
	cache: {
		enabled: true,
		TTL: 300_000,
		TMDB_TTL: 3_600_000, // Cache TMDB responses for 1 hour
		maxEntries: 5_000
	},
	scrapeConfig: {
		maxAttempts: 3,
		errorThresholdRate: 0.5
	}
});
Targeted scraping by scheme
// Only scrape from a specific provider
const streams = await manager.getStreamsByScheme("example-provider", request);
const subs = await manager.getSubtitlesByScheme("subtitle-provider", request);


⚛️ React Hook (useSources)

An optional React hook for declarative scraping inside React / React Native components. Requires react >= 17 as a peer dependency (already optional — non-React consumers are unaffected).

npm install react   # if not already installed

Basic Usage

import { useSources } from "grabit-engine";

function StreamList() {
	const { mediaSources, subtitleSources, isLoading, isManagerReady, error, scrape, clearSources } = useSources({
		managerConfig: {
			source: {
				type: "registry",
				name: "my-providers",
				providers: {
					/* ... */
				}
			},
			tmdbApiKeys: ["your-tmdb-api-key"]
		},
		type: "both"
	});

	const handleScrape = () => {
		scrape({
			media: { type: "movie", tmdbId: "27205" },
			targetLanguageISO: "en"
		});
	};

	return (
		<div>
			<button onClick={handleScrape} disabled={!isManagerReady || isLoading}>
				{isLoading ? "Scraping…" : "Scrape"}
			</button>
			{error && <p>Error: {error.message}</p>}
			<h3>Media ({mediaSources.length})</h3>
			<ul>
				{mediaSources.map((s) => (
					<li key={`${s.scheme}-${s.providerName}-${s.fileName}`}>{s.fileName}</li>
				))}
			</ul>
			<h3>Subtitles ({subtitleSources.length})</h3>
			<ul>
				{subtitleSources.map((s) => (
					<li key={`${s.scheme}-${s.providerName}-${s.fileName}`}>{s.fileName}</li>
				))}
			</ul>
		</div>
	);
}

Continuous Mode

When continuous: true, calling scrape() ignores scrapeConfig.successQuorum and streams results per-provider as they arrive — the list grows live instead of waiting for all providers to finish.

const { mediaSources, isContinuousScraping, scrape, stopContinuousScraping } = useSources({
	managerConfig: {
		/* ... */
	},
	continuous: true,
	type: "media"
});

// Start scraping — results appear one by one
scrape({ media: { type: "serie", tmdbId: "1396", ep_tmdbId: "62085", season: 1, episode: 1 }, targetLanguageISO: "en" });

// Cancel early — already-collected sources are kept
stopContinuousScraping();

Config (UseSourcesConfig)

Property Type Default Description
managerConfig ProviderManagerConfig Configuration for the ScrapePluginManager singleton.
continuous boolean false Stream results per-provider as they arrive (ignores successQuorum).
type "media" | "subtitle" | "both" "both" Which source category to fetch.

Return Value (UseSourcesReturn)

Property Type Description
mediaSources MediaSource[] Collected media sources (de-duplicated).
subtitleSources SubtitleSource[] Collected subtitle sources (de-duplicated).
isLoading boolean true while manager is initialising or a scrape is in-flight.
isManagerReady boolean true once the manager singleton is created.
isContinuousScraping boolean true while a continuous scrape is still resolving providers.
error ProcessError | HttpError | null The last error from init or scraping.
scrape(requester) (req: RawScrapeRequester) => Promise<void> Start a scrape. Clears previous sources.
stopContinuousScraping() () => void Cancel in-flight continuous scrape. Keeps collected sources.
clearSources() () => void Clear all collected sources.

Lifecycle

  • Mount — The manager singleton is created asynchronously.
  • scrape(requester) — Clears previous sources, then fetches. In continuous mode results stream in; in normal mode they arrive all at once.
  • New scrape() call — Cancels any in-flight operations, clears sources, starts fresh.
  • stopContinuousScraping() — Cancels remaining queued provider operations. Already-collected results are kept.
  • Unmount — All operations are cancelled and the manager is destroyed automatically.


🧪 Testing

# Run all tests
npm test

# Run specific test suites
npx jest tests/models/manager/ --verbose      # Manager unit tests
npx jest tests/models/sources/ --verbose      # Source integration tests

# With coverage
npx jest --coverage

📄 License

ISC © grabit-engine

About

A plugin-based engine for scraping media streams and subtitles. Works in Node.js, browsers, React and React Native. Load plugins from GitHub, local files, or code — with caching, health tracking, and auto-updates built in.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors