Delivery Guarantees
By default, LiteJoin operates under a best-effort delivery model. If a sink fails to accept a result (HTTP timeout, Kafka broker unavailable), the result is logged and dropped.
When at-least-once delivery is enabled, failed deliveries are captured in a persistent dead-letter queue (DLQ) backed by SQLite and retried automatically with exponential backoff.
Enabling At-Least-Once Delivery
delivery:
guarantee: at_least_once
dlq:
path: "./data/dlq.db"
retry_interval: 30s
max_retries: 0 # 0 = unlimited
max_backoff: 5m
ttl: 72h
max_size_mb: 500
How It Works
Sink fails → Enqueue to DLQ (SQLite) → Retry worker scans → Redeliver → Ack on success
- The joiner attempts to deliver a result via
sink.Send().
- If delivery fails, the result is serialized to JSON and written to the DLQ database.
- A background RetryWorker periodically scans for entries whose retry time has passed.
- On success, the entry is deleted from the DLQ.
- On failure, the entry’s retry count and next retry time are updated with exponential backoff.
Configuration Reference
| Field | Type | Default | Description |
|---|
guarantee | string | best_effort | "best_effort" or "at_least_once". |
dlq.path | string | <data_dir>/dlq.db | Path to the DLQ SQLite database. |
dlq.retry_interval | duration | 30s | How often the retry worker scans. |
dlq.max_retries | int | 0 | Max retries before permanent failure. 0 = unlimited. |
dlq.backoff_multiplier | float | 2.0 | Exponential backoff multiplier. |
dlq.max_backoff | duration | 5m | Maximum backoff between retries. |
dlq.ttl | duration | 72h | Entries older than this are evicted. 0 = no TTL. |
dlq.max_size_mb | int | 500 | Max DLQ database file size. 0 = unlimited. |
dlq.cleanup_interval | duration | 1h | How often eviction policies run. |
Retry Backoff
Retries use exponential backoff with jitter:
next_retry = now + min(base × 2^retry_count, max_backoff) + jitter
Example progression with max_backoff: 5m:
| Retry # | Approximate Delay |
|---|
| 1 | 30s |
| 2 | 1m |
| 3 | 2m |
| 4 | 4m |
| 5+ | 5m (capped) |
Eviction Policies
Two independent policies control DLQ growth:
- TTL-based: Entries older than
ttl are deleted on cleanup sweeps.
- Size-based: When the DLQ exceeds
max_size_mb, oldest entries are removed until the file drops below 90% of the limit.
Important Considerations
Ordering is not guaranteed. Retried results may arrive after results produced later. Use the emit_at timestamp for ordering if needed.
Consumers must be idempotent. At-least-once delivery means duplicates are possible (original delivery succeeded but the response was lost, causing a retry). Design your consumers to handle duplicates.
Monitoring the DLQ
Key log messages:
| Event | Level | Message |
|---|
| Enqueue | INFO | dlq: enqueued result for sink "X" |
| Retry success | INFO | dlq: delivered entry after N retries |
| Retry failure | WARN | dlq: retry N failed for entry |
| Max retries exceeded | ERROR | dlq: entry exceeded max retries — permanently dropped |
| TTL eviction | INFO | dlq: evicted N entries past TTL |
| Size eviction | WARN | dlq: evicted N entries to stay under limit |