Configuration Reference
LiteJoin is configured via a YAML file, typically named litejoin.yaml. This page documents every configuration field.
Top-Level Structure
sources: [] # Data ingestion sources
sinks: [] # Output destinations
storage: {} # SQLite storage settings
joins: [] # Real-time join queries
windows: [] # Time-based aggregations
retention: {} # Data retention policy
writer: {} # Write batching settings
joiner: {} # Join engine settings
windower: {} # Window engine settings
delivery: {} # Delivery guarantee settings
Sources
sources:
- type: api | http | kafka
name: "unique-name"
topic: "topic-name" # For api sources
topics: ["topic1"] # For http/kafka sources
config: {} # Source-specific config
api: {} # API source config (type: api only)
HTTP Source Config
| Field | Type | Default | Description |
|---|
config.addr | string | required | Listen address (e.g., :8080). |
Kafka Source Config
| Field | Type | Default | Description |
|---|
config.brokers | string | required | Comma-separated broker addresses. |
config.group_id | string | required | Consumer group ID. |
API Source Config
See API Source for the complete api: block reference.
Sinks
sinks:
- type: http | kafka | sse | sqlite
name: "unique-name"
config: {}
HTTP Sink
| Field | Type | Default | Description |
|---|
config.url | string | required | Webhook URL. |
config.timeout | string | 30s | Request timeout. |
Kafka Sink
| Field | Type | Default | Description |
|---|
config.brokers | string | required | Broker addresses. |
config.topic | string | required | Target topic. |
SSE Sink
| Field | Type | Default | Description |
|---|
config.addr | string | required | Listen address. |
SQLite Sink
| Field | Type | Default | Description |
|---|
config.path | string | required | Database file path. |
Storage
storage:
shard_count: 8
data_dir: "./data"
reader_pool_size: 4
archive:
enabled: false
compaction_interval: 1m
target_file_size: 128MB
compression: snappy
duckdb_memory_limit: 256MB
duckdb_threads: 0
local_retention: 168h
cloud:
enabled: false
provider: s3
bucket: ""
prefix: ""
region: ""
upload_concurrency: 4
upload_timeout: 5m
| Field | Type | Default | Description |
|---|
shard_count | int | 8 | Number of SQLite shards. |
data_dir | string | ./data | Data directory. |
reader_pool_size | int | 4 | Reader connections per shard. |
See Storage for archive configuration.
Joins
joins:
- name: "join-name"
query: |
SELECT ...
sink: "sink-name"
key_column: "column" # Optional
result_key: "alias" # Optional
| Field | Type | Required | Description |
|---|
name | string | yes | Unique join name. |
query | string | yes | SQL query. |
sink | string | yes | Target sink name. |
key_column | string | no | Column for result grouping key. |
result_key | string | no | Alias for result key in output. |
See Joins for examples.
Windows
windows:
- name: "window-name"
type: tumbling | sliding | session
size: 5m # tumbling, sliding
slide: 1m # sliding only
gap: 30m # session only
topic: "topic-name"
query: |
SELECT ...
sink: "sink-name"
| Field | Type | Required | Description |
|---|
name | string | yes | Unique window name. |
type | string | yes | tumbling, sliding, or session. |
size | duration | tumbling/sliding | Window size. |
slide | duration | sliding | Slide interval (must be ≤ size). |
gap | duration | session | Inactivity gap to close session. |
topic | string | yes | Topic to aggregate. |
query | string | yes | SQL aggregation query. |
sink | string | yes | Target sink name. |
See Windows for examples.
Retention
retention:
duration: 24h
clean_interval: 1m
| Field | Type | Default | Description |
|---|
duration | duration | 24h | Delete data older than this. |
clean_interval | duration | 1m | How often the cleaner runs. |
Writer
writer:
flush_interval: 10ms
batch_size: 1000
| Field | Type | Default | Description |
|---|
flush_interval | duration | 10ms | Time between batch flushes. |
batch_size | int | 1000 | Max messages per batch. |
Delivery
delivery:
guarantee: best_effort | at_least_once
dlq:
path: "./data/dlq.db"
retry_interval: 30s
max_retries: 0
backoff_multiplier: 2.0
max_backoff: 5m
ttl: 72h
max_size_mb: 500
cleanup_interval: 1h
See Delivery Guarantees for details.
Environment Variables
All string values support ${ENV_VAR} expansion, resolved at startup:
sources:
- name: stripe
type: api
api:
url: "https://api.stripe.com/v1/charges"
headers:
Authorization: "Bearer ${STRIPE_SECRET_KEY}"
Never commit secrets directly in config files. Use environment variables for all sensitive values.
Complete Example
sources:
- name: stripe_charges
type: api
topic: charges
api:
url: "https://api.stripe.com/v1/charges?limit=100"
interval: 10s
key_path: "id"
response_path: "data"
headers:
Authorization: "Bearer ${STRIPE_SECRET_KEY}"
watermark:
strategy: cursor
path: "data.@last.id"
param: "starting_after"
- name: http_events
type: http
topics: []
config:
addr: ":8080"
sinks:
- type: sse
name: dashboard
config:
addr: ":9100"
- type: http
name: webhook
config:
url: "http://localhost:9000/webhook"
storage:
shard_count: 8
data_dir: ./data
writer:
flush_interval: 10ms
batch_size: 1000
retention:
duration: 24h
clean_interval: 1m
joins:
- name: charge-enrichment
query: |
SELECT
c.key as charge_id,
c.payload as charge_data
FROM charges c
WHERE c.timestamp > (strftime('%s', 'now') - 60)
sink: dashboard
delivery:
guarantee: at_least_once
dlq:
retry_interval: 30s
max_backoff: 5m
ttl: 72h