Skip to main content

Core Concepts

Every LiteJoin pipeline follows the same data flow: sources ingest data into topics, joins and windows process it with SQL, and sinks deliver the results.
Sources → Topics (SQLite) → Joins / Windows → Sinks

Sources

A source is anything that produces data. LiteJoin supports three source types:
Source TypeHow It Works
APIPolls any REST API on an interval, diffs the response, and emits only changes. Supports pagination, watermarks, and rate limiting.
HTTPOpens an HTTP endpoint that accepts POST requests. External systems push events directly to LiteJoin.
KafkaConsumes messages from one or more Kafka topics via a consumer group.
Each source writes messages to one or more topics.

Topics

A topic is a named stream of messages stored in SQLite. Every message has three fields:
FieldDescription
keyA unique identifier for the record (e.g., order_123)
payloadThe full JSON data
timestampWhen the message was ingested (Unix epoch)
Topics are the tables you query in your SQL joins. When you write FROM orders, you’re querying the orders topic’s SQLite table.
LiteJoin stores topics in sharded SQLite databases using WAL mode. Writes go through a single writer connection per shard; reads use a configurable pool of reader connections. This provides high throughput with zero contention.

Joins

A join is a SQL query that combines data from multiple topics. LiteJoin evaluates joins reactively — every time new data is written to a topic referenced by a join, the join query re-executes and emits results to a sink.
joins:
  - name: order-user-join
    query: |
      SELECT
        o.key as order_id,
        o.payload as order_data,
        u.payload as user_data
      FROM orders o
      INNER JOIN users u
        ON json_extract(o.payload, '$.user_id') = u.key
      WHERE o.timestamp > (strftime('%s', 'now') - 3600)
    sink: webhook-out
Key properties:
  • Reactive execution — Joins fire on every write, not on a schedule.
  • Standard SQL — Use SELECT, JOIN, WHERE, GROUP BY, json_extract(), and other SQLite functions.
  • Time-bounded — Use timestamp filters in WHERE to limit joins to recent data and keep queries fast.

Windows

Windows group messages by time intervals for aggregation. LiteJoin supports three window types:
Fixed, non-overlapping intervals. Each message belongs to exactly one window.Example: “Count orders every 5 minutes.”
windows:
  - name: order-count
    type: tumbling
    size: 5m
    query: |
      SELECT COUNT(*) as order_count, SUM(json_extract(payload, '$.amount')) as total
      FROM orders
    sink: dashboard

Sinks

A sink is a destination for join and window results. LiteJoin supports four sink types:
Sink TypeDescription
HTTPSends results as JSON POST requests to a webhook URL.
KafkaProduces results to a Kafka topic.
SSEStreams results to browser clients via Server-Sent Events. Includes a /snapshot endpoint for initial state hydration.
SQLiteWrites results to a local SQLite database for querying by external applications.

Messages

Every piece of data flowing through LiteJoin is a Message:
{
  "topic": "orders",
  "key": "order_123",
  "payload": "{\"amount\": 45.00, \"user_id\": \"u_456\", \"status\": \"shipped\"}",
  "timestamp": 1740268800
}
For API sources, payloads include _change metadata describing what changed:
{
  "amount": 45.00,
  "status": "shipped",
  "_change": {
    "type": "updated",
    "fields": ["status"],
    "previous": { "status": "pending" },
    "current": { "status": "shipped" }
  }
}

Pipeline Lifecycle

  1. Sources poll APIs, consume Kafka, or accept HTTP pushes → messages enter the pipeline.
  2. Writer batches messages and writes them to the appropriate topic (SQLite table) via upsert.
  3. Joiner re-evaluates all joins whose referenced topics received new data → emits JoinResult to sinks.
  4. Windower evaluates window queries on their schedule → emits aggregated results to sinks.
  5. Sinks deliver results to external systems (webhooks, Kafka, SSE, SQLite).
  6. Retention cleaner periodically deletes data older than the configured TTL.
If at-least-once delivery is enabled, failed sink deliveries are captured in a dead-letter queue (DLQ) and retried automatically with exponential backoff.