Documentation Index
Fetch the complete documentation index at: https://docs.litejoin.io/llms.txt
Use this file to discover all available pages before exploring further.
API Source
The api source type polls any HTTP endpoint on a configurable interval, diffs the response against previously stored state in SQLite, and emits only the changes as standard pipeline messages. Any request/response API becomes a real-time event stream — no webhooks, no changes to the upstream system, no event infrastructure required.
Quick Start
Poll an OpenWeatherMap endpoint every 30 seconds and emit changes:
sources:
- name: weather
type: api
topic: weather_london
api:
url: "https://api.openweathermap.org/data/2.5/weather?q=London&appid=${OWM_KEY}"
interval: 30s
key_path: "id"
Three fields are required: url, interval, and key_path. Everything else has a sensible default.
Config Reference
Source-Level Fields
| Field | Type | Required | Description |
|---|
type | string | yes | Must be "api" |
name | string | yes | Unique name for this source |
topic | string | yes | Topic (table) to write events to |
api Block Fields
Core
| Field | Type | Default | Description |
|---|
url | string | required | Full URL to poll. Supports ${ENV_VAR} expansion. |
method | string | GET | HTTP method: GET, POST, or PUT. |
interval | duration | required | How often to poll (e.g. 30s, 1m, 5m). |
key_path | gjson path | required | Path within each record to the unique key. |
response_path | gjson path | "" | Path to the array of records in the response. Empty means top-level array or single object. |
change_detection | string | diff | "diff" (field-level) or "hash" (SHA-256). |
detect_deletes | bool | false | Emit a deleted event when a record disappears between polls. |
timeout | duration | 30s | HTTP request timeout per page. |
initial_snapshot | bool | true | If false, suppresses emission on the first poll. |
headers | map | {} | HTTP headers. Values support ${ENV_VAR} expansion. |
body | string | "" | Request body for POST/PUT. |
max_consecutive_failures | int | 5 | Failed cycles before exponential backoff. |
Watermark Block
Controls incremental polling — how the source avoids re-fetching data it has already seen.
| Field | Type | Default | Description |
|---|
strategy | string | none | none, timestamp, cursor, or etag. |
path | gjson path | "" | Path to extract the watermark value. |
param | string | "" | Query parameter name to set on subsequent requests. |
format | string | RFC3339 | Go time layout for timestamp strategy. |
initial | string | "" | Seed watermark value for first run. |
overlap | duration | 10s | Subtract from watermark to handle clock skew. |
| Field | Type | Default | Description |
|---|
strategy | string | none | none, link_header, cursor, or offset. |
param | string | "" | Query parameter for cursor value. |
path | gjson path | "" | Path to extract cursor from response. |
has_more_path | gjson path | "" | Boolean path indicating more pages exist. |
offset_param | string | offset | Query parameter for offset. |
limit_param | string | limit | Query parameter for page size. |
limit | int | — | Page size (required for offset strategy). |
total_path | gjson path | "" | Path to total record count. |
max_pages | int | 100 | Hard cap on pages per poll cycle. |
Watermark Strategies
none
timestamp
cursor
etag
Full-scan on every poll. Diffs against stored state. Simple and correct for any API.Best for: small result sets, APIs with no filtering support.watermark:
strategy: none
Tracks the maximum timestamp seen and filters subsequent requests.Best for: APIs with since, updated_after, or modified_since parameters.watermark:
strategy: timestamp
path: "updated_at"
param: "updated_after"
format: "2006-01-02T15:04:05Z07:00"
initial: "2026-01-01T00:00:00Z"
overlap: 10s
An overlap window (default 10s) handles clock skew. Duplicates are deduplicated by the storage layer’s upsert-by-key. Stores an opaque cursor and passes it as a query parameter on the next poll.Best for: Stripe, Slack, and cursor/keyset pagination APIs.watermark:
strategy: cursor
path: "data.@last.id"
param: "starting_after"
Uses HTTP conditional requests (If-None-Match / If-Modified-Since). If the server returns 304, the poll is skipped entirely.Best for: GitHub, CDN-backed APIs.watermark:
strategy: etag
Change Detection Modes
diff (default)
Field-level comparison. Emits records annotated with _change metadata:
{
"number": 42,
"state": "closed",
"_change": {
"type": "updated",
"fields": ["state"],
"previous": { "state": "open" },
"current": { "state": "closed" }
}
}
hash
SHA-256 hash of the full payload. If the hash changes, the record is emitted. No field-level metadata. Lower CPU overhead for large payloads.
Cookbook Examples
GitHub Pull Requests
Poll open PRs with Link header pagination and ETag caching:
sources:
- name: github_prs
type: api
topic: open_prs
api:
url: "https://api.github.com/repos/org/repo/pulls?state=open&per_page=100"
interval: 5m
key_path: "number"
detect_deletes: true
headers:
Authorization: "Bearer ${GITHUB_TOKEN}"
watermark:
strategy: etag
pagination:
strategy: link_header
Stripe Charges
Cursor-based pagination and watermark:
sources:
- name: stripe_charges
type: api
topic: charges
api:
url: "https://api.stripe.com/v1/charges?limit=100"
interval: 10s
key_path: "id"
response_path: "data"
headers:
Authorization: "Bearer ${STRIPE_SECRET_KEY}"
watermark:
strategy: cursor
path: "data.@last.id"
param: "starting_after"
pagination:
strategy: cursor
param: "starting_after"
path: "data.@last.id"
has_more_path: "has_more"
Jira Issues
Timestamp watermark with offset pagination:
sources:
- name: jira_issues
type: api
topic: jira_issues
api:
url: "https://your-org.atlassian.net/rest/api/3/search?jql=project=ENG"
interval: 2m
key_path: "id"
response_path: "issues"
headers:
Authorization: "Basic ${JIRA_API_TOKEN}"
watermark:
strategy: timestamp
path: "fields.updated"
param: "jql"
format: "2006-01-02T15:04:05.000-0700"
overlap: 30s
pagination:
strategy: offset
offset_param: "startAt"
limit_param: "maxResults"
limit: 50
total_path: "total"
Operational Guidance
Choosing a Poll Interval
| Use case | Suggested interval |
|---|
| Near-real-time tracking | 10s–30s |
| Operational data (orders, tickets) | 1m–5m |
| Reference data (users, products) | 5m–30m |
| Slow-changing config data | 1h or more |
Rate Limiting
LiteJoin handles rate limiting automatically:
429 Too Many Requests — retried with Retry-After header delay
X-RateLimit-Remaining: 0 — pauses until reset
- 5xx errors — retried with exponential backoff (max 3 retries)
- 4xx errors (except 429) — not retried (configuration problem)
Environment Variables
Header values and URLs support ${ENV_VAR} expansion:
headers:
Authorization: "Bearer ${STRIPE_SECRET_KEY}"
url: "https://api.example.com/data?api_key=${API_KEY}"
gjson Paths
All path fields use gjson syntax:
| Pattern | Example | What it accesses |
|---|
field | id | Top-level field |
a.b | data.id | Nested field |
#.field | #.id | Field from all array elements |
data.@last.id | — | Last element’s id in data array |