Core Concepts

Every LiteJoin pipeline follows the same data flow: sources ingest data into topics, joins and windows process it with SQL, and sinks deliver the results.

Sources → Topics (SQLite) → Joins / Windows → Sinks

Sources

A source is anything that produces data. LiteJoin supports three source types:

Source Type	How It Works
API	Polls any REST API on an interval, diffs the response, and emits only changes. Supports pagination, watermarks, and rate limiting.
HTTP	Opens an HTTP endpoint that accepts `POST` requests. External systems push events directly to LiteJoin.
Kafka	Consumes messages from one or more Kafka topics via a consumer group.

Each source writes messages to one or more topics.

Topics

A topic is a named stream of messages stored in SQLite. Every message has three fields:

Field	Description
`key`	A unique identifier for the record (e.g., `order_123`)
`payload`	The full JSON data
`timestamp`	When the message was ingested (Unix epoch)

Topics are the tables you query in your SQL joins. When you write FROM orders, you’re querying the orders topic’s SQLite table.

LiteJoin stores topics in sharded SQLite databases using WAL mode. Writes go through a single writer connection per shard; reads use a configurable pool of reader connections. This provides high throughput with zero contention.

Joins

A join is a SQL query that combines data from multiple topics. LiteJoin evaluates joins reactively — every time new data is written to a topic referenced by a join, the join query re-executes and emits results to a sink.

joins:
  - name: order-user-join
    query: |
      SELECT
        o.key as order_id,
        o.payload as order_data,
        u.payload as user_data
      FROM orders o
      INNER JOIN users u
        ON json_extract(o.payload, '$.user_id') = u.key
      WHERE o.timestamp > (strftime('%s', 'now') - 3600)
    sink: webhook-out

Key properties:

Reactive execution — Joins fire on every write, not on a schedule.
Standard SQL — Use SELECT, JOIN, WHERE, GROUP BY, json_extract(), and other SQLite functions.
Time-bounded — Use timestamp filters in WHERE to limit joins to recent data and keep queries fast.

Windows

Windows group messages by time intervals for aggregation. LiteJoin supports three window types:

Tumbling
Sliding
Session

Fixed, non-overlapping intervals. Each message belongs to exactly one window.Example: “Count orders every 5 minutes.”

windows:
  - name: order-count
    type: tumbling
    size: 5m
    query: |
      SELECT COUNT(*) as order_count, SUM(json_extract(payload, '$.amount')) as total
      FROM orders
    sink: dashboard

Overlapping intervals that slide forward. A message may appear in multiple windows.Example: “Average order value over a 10-minute window, updated every 1 minute.”

windows:
  - name: avg-order
    type: sliding
    size: 10m
    slide: 1m
    query: |
      SELECT AVG(json_extract(payload, '$.amount')) as avg_amount
      FROM orders
    sink: dashboard

Dynamic intervals based on activity gaps. A session closes when no new messages arrive within the gap duration.Example: “Group earthquake aftershocks into sessions — a session ends after 30 minutes of inactivity.”

windows:
  - name: quake-session
    type: session
    gap: 30m
    query: |
      SELECT COUNT(*) as quake_count, MAX(json_extract(payload, '$.magnitude')) as max_mag
      FROM earthquakes
    sink: alert-webhook

Sinks

A sink is a destination for join and window results. LiteJoin supports four sink types:

Sink Type	Description
HTTP	Sends results as JSON `POST` requests to a webhook URL.
Kafka	Produces results to a Kafka topic.
SSE	Streams results to browser clients via Server-Sent Events. Includes a `/snapshot` endpoint for initial state hydration.
SQLite	Writes results to a local SQLite database for querying by external applications.

Messages

Every piece of data flowing through LiteJoin is a Message:

{
  "topic": "orders",
  "key": "order_123",
  "payload": "{\"amount\": 45.00, \"user_id\": \"u_456\", \"status\": \"shipped\"}",
  "timestamp": 1740268800
}

For API sources, payloads include _change metadata describing what changed:

{
  "amount": 45.00,
  "status": "shipped",
  "_change": {
    "type": "updated",
    "fields": ["status"],
    "previous": { "status": "pending" },
    "current": { "status": "shipped" }
  }
}

Pipeline Lifecycle

Sources poll APIs, consume Kafka, or accept HTTP pushes → messages enter the pipeline.
Writer batches messages and writes them to the appropriate topic (SQLite table) via upsert.
Joiner re-evaluates all joins whose referenced topics received new data → emits JoinResult to sinks.
Windower evaluates window queries on their schedule → emits aggregated results to sinks.
Sinks deliver results to external systems (webhooks, Kafka, SSE, SQLite).
Retention cleaner periodically deletes data older than the configured TTL.

If at-least-once delivery is enabled, failed sink deliveries are captured in a dead-letter queue (DLQ) and retried automatically with exponential backoff.

Documentation Index

​Core Concepts

​Sources

​Topics

​Joins

​Windows

​Sinks

​Messages

​Pipeline Lifecycle

Core Concepts

Sources

Topics

Joins

Windows

Sinks

Messages

Pipeline Lifecycle