Skip to main content

Joins

Joins are the core of LiteJoin. They combine data from multiple topics using standard SQL and emit results to sinks every time new data arrives.

How Joins Work

  1. Data arrives from a source and is written to a topic (SQLite table).
  2. The joiner checks which join queries reference that topic.
  3. Those join queries are re-evaluated against the current state of all referenced topics.
  4. Results are grouped by a result key and emitted to the configured sink.
Joins are reactive — they fire on every write, not on a schedule.

Configuration

joins:
  - name: order-user-join
    query: |
      SELECT
        o.key as order_id,
        json_extract(o.payload, '$.amount') as amount,
        json_extract(u.payload, '$.name') as customer_name,
        json_extract(u.payload, '$.email') as customer_email
      FROM orders o
      INNER JOIN users u
        ON json_extract(o.payload, '$.user_id') = u.key
      WHERE o.timestamp > (strftime('%s', 'now') - 3600)
    sink: webhook-out

Fields

FieldTypeRequiredDescription
namestringyesUnique name for the join.
querystringyesSQL query joining one or more topics.
sinkstringyesName of the sink to emit results to.
key_columnstringnoColumn to use as the result key for grouping. Defaults to the first column.
result_keystringnoAlias for the result key in the output.

Join Types

LiteJoin supports all standard SQL join types:

INNER JOIN

Returns only rows where both sides match:
SELECT o.key, o.payload, u.payload
FROM orders o
INNER JOIN users u ON json_extract(o.payload, '$.user_id') = u.key
WHERE o.timestamp > (strftime('%s', 'now') - 3600)

LEFT JOIN

Returns all rows from the left table, with NULL for unmatched right-side rows:
SELECT o.key, o.payload, u.payload
FROM orders o
LEFT JOIN users u ON json_extract(o.payload, '$.user_id') = u.key
WHERE o.timestamp > (strftime('%s', 'now') - 3600)

Multi-way Joins

Join three or more topics:
SELECT
  o.key as order_id,
  json_extract(o.payload, '$.amount') as amount,
  json_extract(u.payload, '$.name') as customer,
  json_extract(p.payload, '$.name') as product
FROM orders o
LEFT JOIN users u ON json_extract(o.payload, '$.user_id') = u.key
LEFT JOIN products p ON json_extract(o.payload, '$.product_id') = p.key
WHERE o.timestamp > (strftime('%s', 'now') - 3600)

Result Format

Join results are emitted as JoinResult objects:
{
  "query": "order-user-join",
  "key": "order_123",
  "rows": [
    {
      "order_id": "order_123",
      "amount": 45.00,
      "customer_name": "Alice",
      "customer_email": "alice@example.com"
    }
  ],
  "emit_at": "2026-02-22T10:00:00Z"
}

Best Practices

Always include a time-bounded WHERE clause. Without it, join queries scan the entire topic, degrading performance as data grows. Use timestamp > (strftime('%s', 'now') - N) to limit to recent data.
  1. Index your join keys. Use the key column for joins when possible — it’s indexed by default.
  2. Keep queries focused. Each join should produce one type of output. Use multiple joins for different result shapes.
  3. Use json_extract() for access. All application data lives in the payload JSON column.
  4. Test with Studio. Use LiteJoin Studio to iterate on queries with live data before deploying.

Example: Payment Enrichment

sources:
  - name: stripe_charges
    type: api
    topic: charges
    api:
      url: "https://api.stripe.com/v1/charges?limit=100"
      interval: 10s
      key_path: "id"
      response_path: "data"
      headers:
        Authorization: "Bearer ${STRIPE_SECRET_KEY}"

  - name: stripe_customers
    type: api
    topic: customers
    api:
      url: "https://api.stripe.com/v1/customers?limit=100"
      interval: 5m
      key_path: "id"
      response_path: "data"
      headers:
        Authorization: "Bearer ${STRIPE_SECRET_KEY}"

joins:
  - name: charge-customer-join
    query: |
      SELECT
        c.key as charge_id,
        json_extract(c.payload, '$.amount') as amount,
        json_extract(c.payload, '$.currency') as currency,
        json_extract(cust.payload, '$.email') as customer_email,
        json_extract(cust.payload, '$.name') as customer_name
      FROM charges c
      LEFT JOIN customers cust
        ON json_extract(c.payload, '$.customer') = cust.key
      WHERE c.timestamp > (strftime('%s', 'now') - 60)
    sink: webhook-out

sinks:
  - type: http
    name: webhook-out
    config:
      url: "http://localhost:9000/enriched-charges"