Skip to content

The client warehouse (ClickHouse + LocalStack)

A blissful-infra client can run a shared analytical warehouse that all its services and plugins read from and write to. ClickHouse provides the columnar query layer; LocalStack provides S3-compatible object storage. Together they form a real lakehouse pattern locally. ClickHouse can read parquet files directly from LocalStack S3 via the s3() table function.

This is the foundation for forecasting, analytics, anomaly detection, and anything else that needs structured analytical data shared across services.

See ADR-0008 for the rationale.

Terminal window
# Interactive: tick "ClickHouse" and "LocalStack" in the prompt
blissful-infra client create acme-corp
# Or manually edit ~/.blissful-infra/clients/<client>/blissful-infra.yaml:
# infrastructure:
# clickhouse: true
# localstack: true
flowchart LR
subgraph client["Client (network: <client>_infra)"]
ch[(ClickHouse warehouse<br/>port 8120+blockIndex)]
ls[(LocalStack S3<br/>port 4570+blockIndex)]
app1[Service: api]
app2[Service: storefront]
ch -->|s3() reads| ls
app1 -->|writes events| ch
app2 -->|writes events| ch
app1 -->|reads forecasts| ch
end

Both run on the client’s infra Docker network. Services reference them internally as clickhouse:8123 and localstack:4566.

Init scripts at ~/.blissful-infra/clients/<client>/clickhouse/init/ run on first ClickHouse startup. The default script creates:

  • A warehouse database (the canonical one, plugins write here)
  • An example warehouse.events table (generic event store; plugins are free to create their own tables)

LocalStack init at ~/.blissful-infra/clients/<client>/localstack/init/ creates a default <client>-data S3 bucket.

// Connection details come from env vars injected by service compose.
@Configuration
class WarehouseConfig {
@Bean
fun clickhouse(): ClickHouseConnection =
ClickHouseDriver.getConnection("jdbc:ch://clickhouse:8123/warehouse")
}
import clickhouse_connect
import boto3
# Warehouse query
ch = clickhouse_connect.get_client(host="clickhouse", port=8123, database="warehouse")
result = ch.query("SELECT count() FROM events WHERE source = 'api'")
# Read parquet from LocalStack S3 directly via ClickHouse
ch.query("""
SELECT * FROM s3(
'http://localstack:4566/<client>-data/raw/events.parquet',
'NOSIGN',
'Parquet'
)
""")
# Write to LocalStack S3
s3 = boto3.client("s3", endpoint_url="http://localstack:4566",
aws_access_key_id="test", aws_secret_access_key="test")
s3.put_object(Bucket="<client>-data", Key="raw/events.parquet", Body=parquet_bytes)
Terminal window
# Insert from CLI (port: 8120 + your client's blockIndex; check `client list`)
curl -X POST 'http://localhost:8120/?database=warehouse' \
-d "INSERT INTO events (source, event_name, properties)
VALUES ('curl-test', 'manual_event', '{\"source\":\"laptop\"}')"
# Query
curl 'http://localhost:8120/?query=SELECT+*+FROM+warehouse.events&database=warehouse'

Lakehouse pattern: ClickHouse reading parquet from S3

Section titled “Lakehouse pattern: ClickHouse reading parquet from S3”

The two services on the same network make this trivial:

-- ClickHouse query that reads directly from LocalStack S3
SELECT
event_name,
count() AS n
FROM s3(
'http://localstack:4566/acme-data/year=2026/month=05/events.parquet',
'NOSIGN',
'Parquet'
)
WHERE ts >= '2026-05-01'
GROUP BY event_name
ORDER BY n DESC;

Same query works against real S3 in production by changing the URL.

The ai-pipeline plugin (ADR-0010) connects to the client-level ClickHouse + MLflow on the shared infra network instead of co-deploying its own. When you service add <client> <service> --plugins ai-pipeline, the deps check requires both at the client level, the prompt offers to enable them if they aren’t already.

ConnectionValue (in-network)
MLFLOW_TRACKING_URIhttp://mlflow:5000
CLICKHOUSE_HOSTclickhouse
CLICKHOUSE_DBwarehouse (matches the client init script)
  • Not a cross-client store. Each client has its own ClickHouse + LocalStack. Cross-client analytics isn’t a thing on this platform.
  • Not migration-managed. Plugins write to whatever tables they want; there’s no platform-level schema lifecycle. A future plugin contract ADR will introduce ownership and migration conventions.
  • Not backed up. Data lives in Docker volumes. client remove deletes it. For production, real ClickHouse Cloud / managed object storage.

The same code that talks to local clickhouse:8123 works against managed ClickHouse (ClickHouse Cloud, Tinybird) by changing the connection string. Same for LocalStack → real S3 / R2 / GCS, boto3 works unchanged when you swap endpoint_url.

The blissful-infra cloud-deploy adapter (when shipped) will handle this automatically per deploy.target.