Configuration Reference

LiteJoin is configured via a YAML file, typically named litejoin.yaml. This page documents every configuration field.

Top-Level Structure

sources: []      # Data ingestion sources
sinks: []        # Output destinations
storage: {}      # SQLite storage settings
joins: []        # Real-time join queries
windows: []      # Time-based aggregations
retention: {}    # Data retention policy
writer: {}       # Write batching settings
joiner: {}       # Join engine settings
windower: {}     # Window engine settings
delivery: {}     # Delivery guarantee settings

Sources

sources:
  - type: api | http | kafka
    name: "unique-name"
    topic: "topic-name"        # For api sources
    topics: ["topic1"]         # For http/kafka sources
    config: {}                 # Source-specific config
    api: {}                    # API source config (type: api only)

HTTP Source Config

Field	Type	Default	Description
`config.addr`	string	required	Listen address (e.g., `:8080`).

Kafka Source Config

Field	Type	Default	Description
`config.brokers`	string	required	Comma-separated broker addresses.
`config.group_id`	string	required	Consumer group ID.

API Source Config

See API Source for the complete api: block reference.

Sinks

sinks:
  - type: http | kafka | sse | sqlite
    name: "unique-name"
    config: {}

HTTP Sink

Field	Type	Default	Description
`config.url`	string	required	Webhook URL.
`config.timeout`	string	`30s`	Request timeout.

Kafka Sink

Field	Type	Default	Description
`config.brokers`	string	required	Broker addresses.
`config.topic`	string	required	Target topic.

SSE Sink

Field	Type	Default	Description
`config.addr`	string	required	Listen address.

SQLite Sink

Field	Type	Default	Description
`config.path`	string	required	Database file path.

Storage

storage:
  shard_count: 8
  data_dir: "./data"
  reader_pool_size: 4
  archive:
    enabled: false
    compaction_interval: 1m
    target_file_size: 128MB
    compression: snappy
    duckdb_memory_limit: 256MB
    duckdb_threads: 0
    local_retention: 168h
    cloud:
      enabled: false
      provider: s3
      bucket: ""
      prefix: ""
      region: ""
      upload_concurrency: 4
      upload_timeout: 5m

Field	Type	Default	Description
`shard_count`	int	`8`	Number of SQLite shards.
`data_dir`	string	`./data`	Data directory.
`reader_pool_size`	int	`4`	Reader connections per shard.

See Storage for archive configuration.

Joins

joins:
  - name: "join-name"
    query: |
      SELECT ...
    sink: "sink-name"
    key_column: "column"     # Optional
    result_key: "alias"      # Optional

Field	Type	Required	Description
`name`	string	yes	Unique join name.
`query`	string	yes	SQL query.
`sink`	string	yes	Target sink name.
`key_column`	string	no	Column for result grouping key.
`result_key`	string	no	Alias for result key in output.

See Joins for examples.

Windows

windows:
  - name: "window-name"
    type: tumbling | sliding | session
    size: 5m               # tumbling, sliding
    slide: 1m              # sliding only
    gap: 30m               # session only
    topic: "topic-name"
    query: |
      SELECT ...
    sink: "sink-name"

Field	Type	Required	Description
`name`	string	yes	Unique window name.
`type`	string	yes	`tumbling`, `sliding`, or `session`.
`size`	duration	tumbling/sliding	Window size.
`slide`	duration	sliding	Slide interval (must be ≤ `size`).
`gap`	duration	session	Inactivity gap to close session.
`topic`	string	yes	Topic to aggregate.
`query`	string	yes	SQL aggregation query.
`sink`	string	yes	Target sink name.

See Windows for examples.

Retention

retention:
  duration: 24h
  clean_interval: 1m

Field	Type	Default	Description
`duration`	duration	`24h`	Delete data older than this.
`clean_interval`	duration	`1m`	How often the cleaner runs.

Writer

writer:
  flush_interval: 10ms
  batch_size: 1000

Field	Type	Default	Description
`flush_interval`	duration	`10ms`	Time between batch flushes.
`batch_size`	int	`1000`	Max messages per batch.

Delivery

delivery:
  guarantee: best_effort | at_least_once
  dlq:
    path: "./data/dlq.db"
    retry_interval: 30s
    max_retries: 0
    backoff_multiplier: 2.0
    max_backoff: 5m
    ttl: 72h
    max_size_mb: 500
    cleanup_interval: 1h

See Delivery Guarantees for details.

Environment Variables

All string values support ${ENV_VAR} expansion, resolved at startup:

sources:
  - name: stripe
    type: api
    api:
      url: "https://api.stripe.com/v1/charges"
      headers:
        Authorization: "Bearer ${STRIPE_SECRET_KEY}"

Never commit secrets directly in config files. Use environment variables for all sensitive values.

Complete Example

sources:
  - name: stripe_charges
    type: api
    topic: charges
    api:
      url: "https://api.stripe.com/v1/charges?limit=100"
      interval: 10s
      key_path: "id"
      response_path: "data"
      headers:
        Authorization: "Bearer ${STRIPE_SECRET_KEY}"
      watermark:
        strategy: cursor
        path: "data.@last.id"
        param: "starting_after"

  - name: http_events
    type: http
    topics: []
    config:
      addr: ":8080"

sinks:
  - type: sse
    name: dashboard
    config:
      addr: ":9100"

  - type: http
    name: webhook
    config:
      url: "http://localhost:9000/webhook"

storage:
  shard_count: 8
  data_dir: ./data

writer:
  flush_interval: 10ms
  batch_size: 1000

retention:
  duration: 24h
  clean_interval: 1m

joins:
  - name: charge-enrichment
    query: |
      SELECT
        c.key as charge_id,
        c.payload as charge_data
      FROM charges c
      WHERE c.timestamp > (strftime('%s', 'now') - 60)
    sink: dashboard

delivery:
  guarantee: at_least_once
  dlq:
    retry_interval: 30s
    max_backoff: 5m
    ttl: 72h

Configuration

​Configuration Reference

​Top-Level Structure

​Sources

​HTTP Source Config

​Kafka Source Config

​API Source Config

​Sinks

​HTTP Sink

​Kafka Sink

​SSE Sink

​SQLite Sink

​Storage

​Joins

​Windows

​Retention

​Writer

​Delivery

​Environment Variables

​Complete Example

Configuration Reference

Top-Level Structure

Sources

HTTP Source Config

Kafka Source Config

API Source Config

Sinks

HTTP Sink

Kafka Sink

SSE Sink

SQLite Sink

Storage

Joins

Windows

Retention

Writer

Delivery

Environment Variables

Complete Example