What industries do you work with?

We work across a wide range of industries including finance, healthcare, e-commerce, logistics, and telecommunications. Our solutions are tailored to each client’s specific domain requirements and regulatory environment.

How long does a typical engagement take?

It depends on the scope. A focused observability deployment or automation workflow can be delivered in 4-6 weeks. Larger initiatives like full-scale LLM integration or platform builds typically run 2-4 months. We always start with a discovery phase to align on timelines.

Do you offer ongoing support after project delivery?

Yes. We offer flexible support and maintenance plans to ensure your systems stay healthy, updated, and optimized. We can also embed with your team on a part-time basis for continuous improvement.

Can you work with our existing tech stack?

Absolutely. We integrate with your current infrastructure and tools rather than forcing a rip-and-replace. Whether you’re on AWS, GCP, Azure, or on-prem, we adapt our approach to what works best for your environment.

What is your pricing model?

We offer both fixed-price project engagements and time-and-materials contracts depending on the nature of the work. Reach out through our contact form and we’ll provide a tailored estimate within 24 hours.

How do you handle data security and compliance?

Security is built into every engagement. We follow industry best practices for data handling, support GDPR and SOC 2 compliance requirements, and can work within your existing security policies and access controls.

ClickHouse for Real-Time Analytics — Schema Design, Materialized Views, and Cluster Setup

Why ClickHouse Outperforms OLTP Databases for Analytics

The fundamental mismatch between OLTP databases and analytics workloads comes down to storage layout. PostgreSQL, MySQL, and most relational databases store data row by row: all columns for a given row are colocated on disk. This is optimal for transactional queries that read or write a complete record — a user lookup, an order insert — but catastrophically inefficient for analytics queries that aggregate a single column across millions of rows. Reading SELECT sum(revenue) FROM events in a row-oriented store requires loading every byte of every row even though you only need one column.

ClickHouse stores data column by column. Each column is a separate file on disk, compressed independently. An aggregation query reads only the columns it references, achieving 10–100x lower I/O compared to row-oriented stores. The columnar layout also enables far more effective compression: a column of event types (e.g., “click”, “view”, “purchase”) compresses dramatically better than mixed-type rows. Typical ClickHouse datasets compress to 5–10% of their raw size.

On top of columnar storage, ClickHouse applies vectorized query execution: queries process data in batches of 8,192 rows (a “block”), applying CPU SIMD instructions across the entire batch simultaneously. Combined with a multithreaded query engine that uses all available CPU cores by default, ClickHouse can scan and aggregate billions of rows per second on commodity hardware. Production deployments regularly achieve sub-second query response on datasets with hundreds of billions of rows.

Note

ClickHouse is purpose-built for INSERT-heavy, read-intensive analytics workloads with infrequent updates. It is not a replacement for PostgreSQL in OLTP contexts. If your workload involves frequent single-row updates, point lookups, or complex multi-table OLTP transactions, PostgreSQL remains the correct tool. ClickHouse shines when your queries scan large time ranges, aggregate across many rows, and your write pattern is append-only or batch. For batch pipelines that transform historical data before landing in ClickHouse, see Apache Spark performance tuning — the two tools complement each other, with Spark handling batch ETL and ClickHouse serving aggregated results at interactive latency.

MergeTree Engine Family — The Core of Every ClickHouse Table

Every production ClickHouse table uses a variant of the MergeTreeengine family. MergeTree is ClickHouse's primary storage engine: data is written in sorted parts, which are then merged in the background — similar to LSM trees but with additional semantics for deduplication, aggregation, and replication. The ORDER BY clause defines the sorting key and determines which primary index ClickHouse builds for fast range lookups. The PARTITION BY clause splits data into physical partitions (typically by date) that can be dropped, detached, or attached atomically.

Choosing the right ORDER BY key is the single most impactful schema decision you will make. The primary index — a sparse index with one entry per 8,192-row granule — is built on the ORDER BY columns. Queries that filter on a prefix of the ORDER BY key skip entire granules and partitions, dramatically reducing the data scanned. Cardinality matters: low-cardinality columns (tenant_id, event_type) should come before high-cardinality ones (user_id, session_id) in the ORDER BY to maximize granule skipping for filtered aggregations.

-- schema_events.sql
-- Analytics event table using MergeTree
-- Designed for queries like:
--   SELECT event_type, count() FROM events
--   WHERE tenant_id = 'acme' AND toDate(timestamp) BETWEEN '2026-01-01' AND '2026-01-31'
--   GROUP BY event_type

CREATE TABLE events
(
    -- Partition by month for efficient time-range pruning and partition drops
    timestamp       DateTime64(3, 'UTC'),
    tenant_id       LowCardinality(String),   -- low cardinality: ~1000 tenants
    event_type      LowCardinality(String),   -- low cardinality: ~50 event types
    user_id         UUID,
    session_id      String,
    page_url        String,
    referrer        String,
    -- Flexible payload stored as JSON (ClickHouse 22.6+ JSON type, or String)
    properties      String,
    -- Numeric metrics
    duration_ms     UInt32,
    revenue_cents   UInt32 DEFAULT 0,
    -- Codec: ZSTD(3) for best compression on string columns;
    -- Delta + ZSTD for monotonically increasing timestamps
    INDEX idx_user_id user_id TYPE bloom_filter(0.01) GRANULARITY 4
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (tenant_id, event_type, toDate(timestamp), user_id)
-- TTL: auto-expire data older than 90 days
TTL toDate(timestamp) + INTERVAL 90 DAY DELETE
SETTINGS
    index_granularity = 8192,
    -- Merge parts more aggressively for high-insert workloads
    merge_max_block_size = 8192,
    -- Store deleted rows as tombstones until next merge (for GDPR deletes)
    allow_experimental_lightweight_delete = 1;


-- Projection: pre-aggregated daily revenue per tenant
-- ClickHouse picks this automatically for matching queries
ALTER TABLE events ADD PROJECTION proj_daily_revenue
(
    SELECT
        tenant_id,
        toDate(timestamp) AS date,
        sum(revenue_cents) AS total_revenue,
        count() AS event_count
    ORDER BY (tenant_id, date)
);

ALTER TABLE events MATERIALIZE PROJECTION proj_daily_revenue;

MergeTree Variants

The MergeTree family includes specialized variants for common analytics patterns. Choose the right engine at table creation — changing it requires a full table rebuild.

Engine	Merge behavior	Best for
MergeTree	Sorts and merges parts; no dedup	Raw event storage, immutable logs
ReplacingMergeTree	Keeps latest version by ORDER BY key	Mutable entities (user profiles, order state)
SummingMergeTree	Sums numeric columns during merge	Pre-aggregated counters (page views, clicks)
AggregatingMergeTree	Merges AggregateFunction states	Materialized views with complex aggregations
CollapsingMergeTree	Cancels pairs of rows with sign +1/-1	Streams of state changes (inventory, balances)

Note

LowCardinality(String)is one of the most impactful type choices available. It encodes values as a dictionary — internally storing an integer ID instead of the full string — which reduces storage and dramatically speeds up GROUP BY on low-cardinality columns. Use it for any string column with fewer than ~10,000 distinct values: tenant IDs, event types, country codes, status fields. The breakeven is roughly >50% compression gain vs. raw String storage.

Materialized Views — Pre-Aggregation, Kafka Ingestion, and AggregatingMergeTree

ClickHouse materialized views are triggers: they fire on every INSERT into the source table and write transformed or aggregated rows to a target table. Unlike PostgreSQL materialized views — which are snapshots that require manual refresh — ClickHouse materialized views are kept current in real-time as data arrives. This makes them the primary tool for pre-aggregating analytics results that would otherwise require scanning billions of rows at query time.

The recommended pattern is to combine a materialized view with an AggregatingMergeTree target table using partial aggregation states. Rather than storing the final aggregated value (which cannot be re-aggregated across parts), you store intermediate aggregation states using AggregateFunction types. These states are merged correctly during background merges, producing exact results even when parts are not yet fully merged. Query the result using -Merge combinator functions.

-- materialized_views.sql
-- Step 1: Target table using AggregatingMergeTree
-- Stores partial aggregation states — merged correctly across parts

CREATE TABLE events_hourly_agg
(
    tenant_id       LowCardinality(String),
    event_type      LowCardinality(String),
    hour            DateTime,
    -- AggregateFunction types store intermediate states, not final values
    total_events    AggregateFunction(count),
    unique_users    AggregateFunction(uniq, UUID),
    total_revenue   AggregateFunction(sum, UInt32),
    p95_duration    AggregateFunction(quantile(0.95), UInt32)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(hour)
ORDER BY (tenant_id, event_type, hour);


-- Step 2: Materialized view — fires on every INSERT to events
-- Writes partial aggregation states to events_hourly_agg

CREATE MATERIALIZED VIEW mv_events_hourly
TO events_hourly_agg
AS
SELECT
    tenant_id,
    event_type,
    toStartOfHour(timestamp)        AS hour,
    countState()                    AS total_events,
    uniqState(user_id)              AS unique_users,
    sumState(revenue_cents)         AS total_revenue,
    quantileState(0.95)(duration_ms) AS p95_duration
FROM events
GROUP BY tenant_id, event_type, hour;


-- Step 3: Query using -Merge combinators
-- ClickHouse merges partial states from all parts on the fly

SELECT
    tenant_id,
    event_type,
    hour,
    countMerge(total_events)             AS events,
    uniqMerge(unique_users)              AS unique_users,
    sumMerge(total_revenue) / 100.0      AS revenue_usd,
    quantileMerge(0.95)(p95_duration)    AS p95_duration_ms
FROM events_hourly_agg
WHERE tenant_id = 'acme'
  AND hour >= now() - INTERVAL 7 DAY
GROUP BY tenant_id, event_type, hour
ORDER BY hour DESC;

Kafka Engine — Real-Time Ingestion Pipeline

The Kafka table engine lets ClickHouse consume messages directly from Kafka topics without an external connector. The canonical pattern uses three objects: a Kafka engine table (the consumer), a landing MergeTree table (the durable store), and a materialized view that moves data between them. ClickHouse manages offsets using its internal consumer group, providing at-least-once delivery semantics. For teams routing events through a Flink streaming pipeline before persistence, ClickHouse also integrates natively as a JDBC sink — Flink pre-aggregates with SQL window functions and writes compact result rows directly to a MergeTree table.

-- kafka_ingestion.sql
-- Pattern: Kafka engine → Materialized view → Landing table

-- 1. Kafka consumer table — reads raw JSON messages from the topic
CREATE TABLE kafka_events_raw
(
    raw String  -- read as raw JSON; parse in the materialized view
)
ENGINE = Kafka
SETTINGS
    kafka_broker_list       = 'kafka-broker:9092',
    kafka_topic_list        = 'analytics.events',
    kafka_group_name        = 'clickhouse-consumer-events',
    kafka_format            = 'JSONAsString',
    -- Tune for throughput: consume up to 10k messages per poll
    kafka_max_block_size    = 10000,
    kafka_skip_broken_messages = 100;


-- 2. Landing table — durable MergeTree storage
CREATE TABLE events_raw_landing
(
    timestamp   DateTime64(3, 'UTC') DEFAULT now64(),
    tenant_id   LowCardinality(String),
    event_type  LowCardinality(String),
    user_id     UUID,
    session_id  String,
    page_url    String,
    duration_ms UInt32,
    revenue_cents UInt32 DEFAULT 0,
    raw_json    String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (tenant_id, event_type, timestamp);


-- 3. Materialized view — parses JSON and inserts into landing table
-- Fires automatically for every batch consumed from Kafka

CREATE MATERIALIZED VIEW mv_kafka_to_landing
TO events_raw_landing
AS
SELECT
    parseDateTimeBestEffort(
        JSONExtractString(raw, 'timestamp')
    )                                               AS timestamp,
    JSONExtractString(raw, 'tenant_id')             AS tenant_id,
    JSONExtractString(raw, 'event_type')            AS event_type,
    toUUID(JSONExtractString(raw, 'user_id'))        AS user_id,
    JSONExtractString(raw, 'session_id')            AS session_id,
    JSONExtractString(raw, 'page_url')              AS page_url,
    JSONExtractUInt(raw, 'duration_ms')             AS duration_ms,
    JSONExtractUInt(raw, 'revenue_cents')           AS revenue_cents,
    raw                                             AS raw_json
FROM kafka_events_raw;

Note

ClickHouse Kafka consumers use a single-threaded consumer per table by default. For high-throughput topics, increase the number of parallel consumers by setting kafka_num_consumers equal to the number of Kafka partitions. Each consumer runs in its own thread and consumes from a dedicated partition assignment. Do not create multiple Kafka engine tables pointing to the same topic and consumer group — this will split messages across tables unpredictably.

Query Optimization — PREWHERE, Skip Indexes, and Projections

ClickHouse's query engine is fast by default, but three mechanisms — PREWHERE, skip indexes, and projections — can reduce scanned data by additional orders of magnitude for specific access patterns.

PREWHERE

PREWHERE is a ClickHouse-specific extension to WHERE. Filters in PREWHERE are evaluated before reading non-filtered columns — ClickHouse reads only the PREWHERE columns, builds a row mask of matching rows, then reads the remaining columns only for rows that pass. This is especially effective when a selective filter column is small (e.g., a flag or status) and the table has many wide columns. ClickHouse automatically moves eligible WHERE conditions to PREWHERE, but explicit PREWHERE gives you control when the optimizer's choice is suboptimal.

-- query_optimization.sql

-- PREWHERE example: filter on tenant_id first (small column),
-- then read the wide payload column only for matching rows
SELECT
    event_type,
    count()             AS events,
    uniq(user_id)       AS unique_users,
    avg(duration_ms)    AS avg_duration_ms
FROM events
PREWHERE tenant_id = 'acme'             -- evaluated first, minimal I/O
WHERE toDate(timestamp) >= today() - 7  -- further filtered after PREWHERE
GROUP BY event_type
ORDER BY events DESC
LIMIT 20;


-- Skip index: bloom filter on user_id
-- Skips granules that definitely don't contain the queried user_id
-- Useful for needle-in-haystack lookups on non-ORDER-BY columns
CREATE TABLE events_with_skip_idx
(
    timestamp   DateTime64(3, 'UTC'),
    tenant_id   LowCardinality(String),
    event_type  LowCardinality(String),
    user_id     UUID,
    -- bloom_filter: false positive rate 1%; GRANULARITY 4 = index one entry per 4 granules
    INDEX idx_user  user_id TYPE bloom_filter(0.01) GRANULARITY 4,
    -- minmax: skip granules where duration_ms is outside the queried range
    INDEX idx_dur   duration_ms TYPE minmax GRANULARITY 1,
    -- set: for columns with few distinct values per granule (e.g., country_code)
    INDEX idx_event event_type TYPE set(50) GRANULARITY 2
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (tenant_id, toDate(timestamp));


-- Using EXPLAIN to verify index usage and granule skipping
EXPLAIN indexes = 1
SELECT count()
FROM events
WHERE tenant_id = 'acme'
  AND user_id = '550e8400-e29b-41d4-a716-446655440000';
-- Look for "Granules: X/Y" — X granules read out of Y total.
-- A well-tuned query should read <5% of granules.

Projections

A projectionis a stored pre-aggregation embedded inside the same table. Unlike a materialized view, a projection shares the table's INSERT path — data written to the table is automatically projected without additional application changes. ClickHouse's query planner automatically picks projections when they satisfy a query's GROUP BY and ORDER BY requirements, reducing query latency from seconds to milliseconds for matching aggregation patterns.

-- projections.sql

-- Add a projection to the events table for fast daily revenue queries
ALTER TABLE events ADD PROJECTION proj_daily_revenue
(
    SELECT
        tenant_id,
        toDate(timestamp)   AS date,
        sum(revenue_cents)  AS total_revenue,
        count()             AS total_events,
        uniq(user_id)       AS unique_users
    ORDER BY (tenant_id, date)
);

-- Materialize the projection for existing data
ALTER TABLE events MATERIALIZE PROJECTION proj_daily_revenue
SETTINGS mutations_sync = 2;  -- wait for all replicas to apply

-- This query will now use the projection automatically (verify with EXPLAIN)
SELECT
    toDate(timestamp)       AS date,
    sum(revenue_cents)      AS revenue,
    count()                 AS events
FROM events
WHERE tenant_id = 'acme'
  AND toDate(timestamp) >= today() - 30
GROUP BY date
ORDER BY date;

Cluster Setup — ReplicatedMergeTree, Distributed Tables, and ClickHouse Keeper

ClickHouse achieves high availability through ReplicatedMergeTree, which synchronizes table parts across replica nodes using a coordination service. ClickHouse Keeper — a native re-implementation of ZooKeeper that ships with ClickHouse since version 22.4 — manages replication metadata, distributed DDL queues, and leader election without requiring a separate ZooKeeper cluster. For new deployments, always use ClickHouse Keeper over ZooKeeper.

Horizontal scaling uses the Distributed table engine: a virtual table that shards queries across multiple shards. Each shard is typically a pair of ReplicatedMergeTree replicas. INSERT into the Distributed table distributes rows across shards by the sharding key; SELECT queries fan out to all shards and merge results. The sharding key should distribute data evenly and align with your most common GROUP BY keys to minimize cross-shard data exchange.

-- config_keeper.xml
<!-- ClickHouse Keeper configuration — embed in /etc/clickhouse-server/config.d/ -->
<!-- Run 3 Keeper nodes for quorum-based HA -->

<clickhouse>
  <keeper_server>
    <tcp_port>9181</tcp_port>
    <!-- Unique server ID for each Keeper node (1, 2, 3) -->
    <server_id>1</server_id>
    <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
    <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
    <coordination_settings>
      <operation_timeout_ms>10000</operation_timeout_ms>
      <session_timeout_ms>30000</session_timeout_ms>
      <raft_logs_level>warning</raft_logs_level>
    </coordination_settings>
    <raft_configuration>
      <server>
        <id>1</id><hostname>ch-keeper-1</hostname><port>9234</port>
      </server>
      <server>
        <id>2</id><hostname>ch-keeper-2</hostname><port>9234</port>
      </server>
      <server>
        <id>3</id><hostname>ch-keeper-3</hostname><port>9234</port>
      </server>
    </raft_configuration>
  </keeper_server>
</clickhouse>

-- cluster_tables.sql
-- Create replicated tables with ON CLUSTER for automatic DDL propagation

-- ReplicatedMergeTree: uses Keeper for part synchronization
-- {shard} and {replica} macros are resolved per-node from config.xml
CREATE TABLE events ON CLUSTER 'analytics_cluster'
(
    timestamp       DateTime64(3, 'UTC'),
    tenant_id       LowCardinality(String),
    event_type      LowCardinality(String),
    user_id         UUID,
    revenue_cents   UInt32 DEFAULT 0
)
ENGINE = ReplicatedMergeTree(
    '/clickhouse/tables/{shard}/events',  -- Keeper path (unique per shard)
    '{replica}'                           -- replica name from macros
)
PARTITION BY toYYYYMM(timestamp)
ORDER BY (tenant_id, event_type, timestamp);


-- Distributed table: the query layer across all shards
-- Sits on top of the ReplicatedMergeTree tables on each shard
CREATE TABLE events_dist ON CLUSTER 'analytics_cluster'
(
    timestamp       DateTime64(3, 'UTC'),
    tenant_id       LowCardinality(String),
    event_type      LowCardinality(String),
    user_id         UUID,
    revenue_cents   UInt32 DEFAULT 0
)
ENGINE = Distributed(
    'analytics_cluster',   -- cluster name from config
    'default',             -- database
    'events',              -- local table name on each shard
    -- Sharding key: murmur hash of tenant_id distributes tenants evenly
    -- Use cityHash64(tenant_id) to co-locate all a tenant's data on one shard
    cityHash64(tenant_id)
);


-- cluster config fragment (config.d/cluster.xml)
-- 2 shards, 2 replicas each (total 4 ClickHouse nodes)
/*
<remote_servers>
  <analytics_cluster>
    <shard>
      <replica><host>ch-1-1</host><port>9000</port></replica>
      <replica><host>ch-1-2</host><port>9000</port></replica>
    </shard>
    <shard>
      <replica><host>ch-2-1</host><port>9000</port></replica>
      <replica><host>ch-2-2</host><port>9000</port></replica>
    </shard>
  </analytics_cluster>
</remote_servers>
*/

Note

Use ON CLUSTER for all DDL operations (CREATE TABLE, ALTER TABLE, DROP TABLE) in a cluster. Without it, the DDL only runs on the node you are connected to; the other nodes will not have the table. ClickHouse distributes ON CLUSTER DDL via the Keeper DDL queue — the operation is applied asynchronously on each node. Monitor the system.distributed_ddl_queue table to track DDL execution status across the cluster.

Python Integration — clickhouse-connect, Batch Inserts, and Monitoring

The official Python client is clickhouse-connect, which uses ClickHouse's native HTTP interface with binary column-oriented serialization. It supports both synchronous and async patterns, automatic compression, and integration with pandas and PyArrow. For high-throughput inserts, use the streaming insert API to send data in columnar batches rather than row by row.

# clickhouse_client.py
from __future__ import annotations

import os
from datetime import datetime, timezone
from typing import Any
import clickhouse_connect
from clickhouse_connect.driver import Client

CH_HOST = os.environ["CLICKHOUSE_HOST"]
CH_PORT = int(os.environ.get("CLICKHOUSE_PORT", "8123"))
CH_USER = os.environ["CLICKHOUSE_USER"]
CH_PASSWORD = os.environ["CLICKHOUSE_PASSWORD"]
CH_DATABASE = os.environ.get("CLICKHOUSE_DATABASE", "default")


def get_client() -> Client:
    """
    Create a clickhouse-connect client.
    connect_timeout: initial TCP handshake timeout
    send_receive_timeout: query execution timeout
    compress: enable LZ4 compression for HTTP transport
    """
    return clickhouse_connect.get_client(
        host=CH_HOST,
        port=CH_PORT,
        username=CH_USER,
        password=CH_PASSWORD,
        database=CH_DATABASE,
        connect_timeout=10,
        send_receive_timeout=120,
        compress=True,
    )


def insert_events(
    client: Client,
    events: list[dict[str, Any]],
    table: str = "events",
) -> None:
    """
    Batch insert events using columnar format.
    clickhouse-connect converts the list of dicts to columnar format internally.
    For maximum throughput, batch at least 10k rows per insert call.
    Aim for ~1MB–10MB per insert to stay within ClickHouse's recommended range.
    """
    if not events:
        return

    column_names = list(events[0].keys())
    data = [[row[col] for col in column_names] for row in events]

    client.insert(
        table=table,
        data=data,
        column_names=column_names,
        settings={
            # Async inserts buffer on the server side — useful for high-frequency
            # small inserts (e.g., per-request event tracking)
            "async_insert": 0,
            "wait_for_async_insert": 1,
        },
    )


def query_hourly_metrics(
    client: Client,
    tenant_id: str,
    days: int = 7,
) -> list[dict[str, Any]]:
    """
    Query pre-aggregated hourly metrics from the AggregatingMergeTree view.
    Returns list of dicts with hour, events, unique_users, revenue_usd, p95_duration_ms.
    """
    result = client.query(
        """
        SELECT
            hour,
            countMerge(total_events)              AS events,
            uniqMerge(unique_users)               AS unique_users,
            round(sumMerge(total_revenue) / 100, 2) AS revenue_usd,
            round(quantileMerge(0.95)(p95_duration), 0) AS p95_duration_ms
        FROM events_hourly_agg
        WHERE tenant_id = {tenant_id:String}
          AND hour >= now() - INTERVAL {days:UInt32} DAY
        GROUP BY hour
        ORDER BY hour DESC
        """,
        parameters={"tenant_id": tenant_id, "days": days},
    )
    return [dict(zip(result.column_names, row)) for row in result.result_rows]


def get_cluster_health(client: Client) -> list[dict[str, Any]]:
    """
    Query system tables to check replication lag and part counts.
    Critical for monitoring ReplicatedMergeTree cluster health.
    """
    result = client.query(
        """
        SELECT
            database,
            table,
            replica_name,
            queue_size,
            inserts_in_queue,
            merges_in_queue,
            log_pointer,
            log_max_index,
            log_max_index - log_pointer AS replication_lag
        FROM system.replicas
        WHERE replication_lag > 0
           OR queue_size > 100
        ORDER BY replication_lag DESC
        LIMIT 20
        """
    )
    return [dict(zip(result.column_names, row)) for row in result.result_rows]

Async Inserts for High-Frequency Small Batches

ClickHouse's async insert feature buffers small inserts server-side and flushes them as a single part when the buffer reaches a size or time threshold. This avoids the part explosion that results from thousands of tiny inserts creating thousands of small parts — a common performance killer for high-frequency event tracking workloads where each API request inserts one or a few rows.

# async_insert_config.sql
-- Enable async inserts at the user or session level
-- Server buffers inserts for up to 200ms or until 1MB accumulates
ALTER USER analytics_writer SETTINGS
    async_insert = 1,
    wait_for_async_insert = 0,           -- fire-and-forget (check via system.async_insert_log)
    async_insert_max_data_size = 1000000,  -- flush when buffer reaches 1MB
    async_insert_busy_timeout_ms = 200;    -- flush after 200ms regardless of size

-- Monitor async insert queue
SELECT
    database,
    table,
    format,
    status,
    rows,
    exception
FROM system.async_insert_log
WHERE event_time > now() - INTERVAL 1 HOUR
  AND status != 'Ok'
ORDER BY event_time DESC
LIMIT 50;

Production Patterns — TTL, Tiered Storage, and Monitoring

TTL-Based Data Lifecycle

Use TABLE-level TTL to automatically delete or move partitions as data ages. ClickHouse evaluates TTL during merges and can move old partitions to slower, cheaper storage tiers (e.g., from NVMe to S3 via storage policies) or delete them entirely. This eliminates manual partition management for time-series event data. Define TTL rules at table creation — adding TTL to an existing table requires a background mutation that rewrites all affected parts.

Part Monitoring and Merge Health

ClickHouse's write performance degrades when a table accumulates too many unmerged parts ('Too many parts' error fires at 300 parts per partition by default). Monitor system.parts and alert when active_parts > 150 for any partition. Common causes: ingesting too frequently in tiny batches, background merge threads falling behind, or mutations blocking merges. Increase background_pool_size for write-heavy tables and use async inserts to batch small writes server-side.

Query Profiling with system.query_log

Every query executed by ClickHouse is logged to system.query_log with full metrics: read_rows, read_bytes, memory_usage, query_duration_ms, and the query plan. Query this table to identify slow queries, high-memory queries, and queries reading excessive data. Set up a Grafana dashboard on system.query_log for p95 latency per user and query_kind, and alert on queries exceeding 10 seconds or reading more than 1 billion rows.

Prometheus Metrics via /metrics

ClickHouse exposes a Prometheus-compatible /metrics endpoint (port 8001 by default, configurable). Scrape it with Prometheus and dashboard the critical signals: ClickHouseMetrics_BackgroundPoolTask (background merge queue depth), ClickHouseProfileEvents_InsertedRows (insert throughput), ClickHouseMetrics_ReplicatedChecks (replication health), and ClickHouseAsyncMetrics_UncompressedCacheCells (cache utilization). Add to your standard infrastructure monitoring alongside node CPU and disk I/O.

-- monitoring_queries.sql
-- Key system table queries for production ClickHouse monitoring

-- 1. Parts health: alert if any partition has > 150 active parts
SELECT
    database,
    table,
    partition_id,
    count()             AS part_count,
    sum(rows)           AS total_rows,
    formatReadableSize(sum(bytes_on_disk)) AS disk_size
FROM system.parts
WHERE active
GROUP BY database, table, partition_id
HAVING part_count > 50
ORDER BY part_count DESC;


-- 2. Replication lag: alert if any replica is > 100 entries behind
SELECT
    database,
    table,
    replica_name,
    log_max_index - log_pointer AS lag,
    queue_size,
    last_queue_update
FROM system.replicas
WHERE log_max_index - log_pointer > 10
ORDER BY lag DESC;


-- 3. Slow queries in the last hour
SELECT
    query_start_time,
    query_duration_ms,
    formatReadableQuantity(read_rows)   AS read_rows,
    formatReadableSize(read_bytes)      AS read_bytes,
    formatReadableSize(memory_usage)    AS memory_usage,
    user,
    -- Truncate long queries for display
    substring(query, 1, 200)            AS query_preview
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_start_time >= now() - INTERVAL 1 HOUR
  AND query_duration_ms > 5000
ORDER BY query_duration_ms DESC
LIMIT 20;


-- 4. Insert throughput per table (last 10 minutes)
SELECT
    table,
    count()                                 AS insert_count,
    formatReadableQuantity(sum(read_rows))  AS total_rows_inserted,
    formatReadableSize(sum(result_bytes))   AS total_bytes
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_kind = 'Insert'
  AND query_start_time >= now() - INTERVAL 10 MINUTE
GROUP BY table
ORDER BY total_rows_inserted DESC;

ClickHouse for Real-Time Analytics — Schema Design, Materialized Views, and Cluster Setup

Why ClickHouse Outperforms OLTP Databases for Analytics

MergeTree Engine Family — The Core of Every ClickHouse Table

MergeTree Variants

Materialized Views — Pre-Aggregation, Kafka Ingestion, and AggregatingMergeTree

Kafka Engine — Real-Time Ingestion Pipeline

Query Optimization — PREWHERE, Skip Indexes, and Projections

PREWHERE

Projections

Cluster Setup — ReplicatedMergeTree, Distributed Tables, and ClickHouse Keeper

Python Integration — clickhouse-connect, Batch Inserts, and Monitoring

Async Inserts for High-Frequency Small Batches

Production Patterns — TTL, Tiered Storage, and Monitoring

TTL-Based Data Lifecycle

Part Monitoring and Merge Health

Query Profiling with system.query_log

Prometheus Metrics via /metrics

Further Reading

Running analytics at scale and hitting the limits of your current database?

Need help implementing this in production?

ClickHouse for Real-Time Analytics — Schema Design, Materialized Views, and Cluster Setup

Why ClickHouse Outperforms OLTP Databases for Analytics

MergeTree Engine Family — The Core of Every ClickHouse Table

MergeTree Variants

Materialized Views — Pre-Aggregation, Kafka Ingestion, and AggregatingMergeTree

Kafka Engine — Real-Time Ingestion Pipeline

Query Optimization — PREWHERE, Skip Indexes, and Projections

PREWHERE

Projections

Cluster Setup — ReplicatedMergeTree, Distributed Tables, and ClickHouse Keeper

Python Integration — clickhouse-connect, Batch Inserts, and Monitoring

Async Inserts for High-Frequency Small Batches

Production Patterns — TTL, Tiered Storage, and Monitoring

TTL-Based Data Lifecycle

Part Monitoring and Merge Health

Query Profiling with system.query_log

Prometheus Metrics via /metrics

Further Reading

Running analytics at scale and hitting the limits of your current database?

Related Articles

Need help implementing this in production?