Skip to main content

Delivery Guarantees

By default, LiteJoin operates under a best-effort delivery model. If a sink fails to accept a result (HTTP timeout, Kafka broker unavailable), the result is logged and dropped. When at-least-once delivery is enabled, failed deliveries are captured in a persistent dead-letter queue (DLQ) backed by SQLite and retried automatically with exponential backoff.

Enabling At-Least-Once Delivery

delivery:
  guarantee: at_least_once

  dlq:
    path: "./data/dlq.db"
    retry_interval: 30s
    max_retries: 0          # 0 = unlimited
    max_backoff: 5m
    ttl: 72h
    max_size_mb: 500

How It Works

Sink fails → Enqueue to DLQ (SQLite) → Retry worker scans → Redeliver → Ack on success
  1. The joiner attempts to deliver a result via sink.Send().
  2. If delivery fails, the result is serialized to JSON and written to the DLQ database.
  3. A background RetryWorker periodically scans for entries whose retry time has passed.
  4. On success, the entry is deleted from the DLQ.
  5. On failure, the entry’s retry count and next retry time are updated with exponential backoff.

Configuration Reference

FieldTypeDefaultDescription
guaranteestringbest_effort"best_effort" or "at_least_once".
dlq.pathstring<data_dir>/dlq.dbPath to the DLQ SQLite database.
dlq.retry_intervalduration30sHow often the retry worker scans.
dlq.max_retriesint0Max retries before permanent failure. 0 = unlimited.
dlq.backoff_multiplierfloat2.0Exponential backoff multiplier.
dlq.max_backoffduration5mMaximum backoff between retries.
dlq.ttlduration72hEntries older than this are evicted. 0 = no TTL.
dlq.max_size_mbint500Max DLQ database file size. 0 = unlimited.
dlq.cleanup_intervalduration1hHow often eviction policies run.

Retry Backoff

Retries use exponential backoff with jitter:
next_retry = now + min(base × 2^retry_count, max_backoff) + jitter
Example progression with max_backoff: 5m:
Retry #Approximate Delay
130s
21m
32m
44m
5+5m (capped)

Eviction Policies

Two independent policies control DLQ growth:
  • TTL-based: Entries older than ttl are deleted on cleanup sweeps.
  • Size-based: When the DLQ exceeds max_size_mb, oldest entries are removed until the file drops below 90% of the limit.

Important Considerations

Ordering is not guaranteed. Retried results may arrive after results produced later. Use the emit_at timestamp for ordering if needed.
Consumers must be idempotent. At-least-once delivery means duplicates are possible (original delivery succeeded but the response was lost, causing a retry). Design your consumers to handle duplicates.

Monitoring the DLQ

Key log messages:
EventLevelMessage
EnqueueINFOdlq: enqueued result for sink "X"
Retry successINFOdlq: delivered entry after N retries
Retry failureWARNdlq: retry N failed for entry
Max retries exceededERRORdlq: entry exceeded max retries — permanently dropped
TTL evictionINFOdlq: evicted N entries past TTL
Size evictionWARNdlq: evicted N entries to stay under limit