What should storage.cql.max-connections-per-host be set to?

Size it to per-node native-transport capacity, not to a fixed number. Total sockets are roughly max-connections-per-host multiplied by node count, so raising it past node capacity exhausts backend transport threads and shows up as rising coordinator queue depth rather than a driver error. Keep a warm core-connections-per-host floor beneath it to avoid cold handshakes on burst.

How do I stop NoHostAvailableException during ingestion bursts?

It is usually pool exhaustion, not node death. Confirm nodes are UN with nodetool status, then raise storage.cql.max-connections-per-host toward node capacity, keep a warm core-connections baseline, and shorten connection-timeout so backpressure reaches the producer instead of queuing threads indefinitely.

Why do traversals miss vertices JanusGraph just committed?

The storage commit is synchronous but the mixed-index update is asynchronous, and a dropped or retried connection can reorder the index dispatch relative to the storage write. Set atomic-batch-mutate=true, read at LOCAL_QUORUM, and route recency-critical reads through a storage-backed id lookup until the index catches up.

Should the gremlin-python pool_size match the backend pool size?

Yes. The client pool_size must equal storage.cql.max-connections-per-host. If the client pool is smaller you get driver-side queueing; if it is larger the backend rejects excess sockets. The smallest of client pool_size, backend max-connections-per-host, and the Gremlin Server worker pool is the real concurrency limit.

What causes a climbing reconnection-count on healthy nodes?

Half-open sockets. A firewall or load balancer silently drops idle TCP sessions the driver still believes are live, and the failure only surfaces on the next submit. Enable driver heartbeats, set the keepalive interval below the network idle-drop timeout, and lower core-connections-per-host so fewer sockets sit idle long enough to be culled.

Connection Pooling

Connection pooling in Apache JanusGraph is not an optimization layer — it is a hard boundary for transactional consistency and index-synchronization throughput. Unmanaged TCP handshakes, session thrashing, and stale socket retention degrade commit ordering and trigger cascading consistency violations across the storage cluster. This guide sits under the JanusGraph Storage Backend Architecture & Configuration reference and narrows it to one subsystem: the pool of sockets that carries every mutation and traversal between the JanusGraph transaction engine and a CQL-based backend. The failure mode this page prevents is thread starvation — the state where producers block on connection acquisition, commits queue behind an undersized pool, and the index backend receives writes out of order. Everything below prioritizes bounded, warm, observable pools over the driver defaults, which will not survive sustained ingestion.

The diagram below shows where pool sizing matters: client workers multiplex through a bounded connection pool to the Gremlin Server and storage backend.

Core Configuration & Consistency Tuning

Pool sizing must align with cluster topology, replication factor, and expected concurrency ceilings. The following janusgraph.properties baseline targets CQL-based backends. It assumes a three-node datacenter with local rack affinity, and it is the profile you tune from — not a value to copy blindly, because the correct maximum is a function of node count and per-node capacity.

properties

# TCP socket allocation per host
storage.cql.core-connections-per-host=4
storage.cql.max-connections-per-host=12

# Request multiplexing over each socket
storage.cql.max-requests-per-connection=1024

# Fail-fast acquisition
storage.cql.connection-timeout=5000

# Consistency routing
storage.cql.local-datacenter=dc1
storage.cql.read-consistency-level=LOCAL_QUORUM
storage.cql.write-consistency-level=LOCAL_QUORUM

Operational constraints, in the order they bite:

core-connections-per-host sets the warm floor. These sockets stay open regardless of load, absorbing traffic spikes without a cold TCP + TLS handshake in the request path. Set it too low and the first burst after an idle window pays connection latency on every mutation.
max-connections-per-host is a physical socket cap, not a request cap. Total sockets a JanusGraph instance opens is roughly max-connections-per-host × node_count. Raising it past node capacity pushes the backend into native-transport thread exhaustion — the symptom is rising coordinator queue depth, not a driver error.
max-requests-per-connection governs async frame multiplexing. The CQL protocol lets many in-flight requests share one socket. For ScyllaDB’s shard-per-core reactor model, values up to 4096 are appropriate; for Cassandra, 1024–2048 keeps per-socket latency variance bounded.
connection-timeout decides who feels backpressure. With a bounded timeout the driver fails fast and surfaces pressure to the producer; without one, threads queue on acquisition indefinitely and the JVM silently starves. Keep it low enough that a saturated pool becomes a visible error, not a hang.

Effective concurrency is bounded by the product of sockets and per-socket requests:

\text{max in-flight} = \text{max-connections-per-host} \times \text{node count} \times \text{max-requests-per-connection}

These settings must be coordinated with JVM heap, OS ulimit -n file-descriptor ceilings, and Gremlin Server thread-pool sizing. The same topology decisions that drive Replication Strategies — replica count and datacenter layout — also set the ceiling for how many sockets a healthy pool should hold, so fix replication before you finalize pool maxima. When moving off Cassandra, the driver overrides described in the ScyllaDB Migration guide change these defaults, because Scylla’s per-shard model rewards fewer connections with far higher max-requests-per-connection.

Index Synchronization Window & Lag Metrics

Mixed-index synchronization to Elasticsearch or OpenSearch depends on strict commit ordering. When the pool exhausts sockets or drops a connection mid-transaction, the JanusGraph transaction manager may retry the storage write while the index backend has already queued a partial update. The result is a desynchronized graph: phantom vertices in search results, or edge properties that are missing during traversal even though the storage row is durable.

The sync window is the interval between a durable storage commit and the moment the mutation is visible through a has() predicate. Pool health directly widens or narrows it:

A saturated pool delays the storage commit itself, which delays the downstream index dispatch — lag compounds.
A dropped connection forces a retry, and a retried mutation can reach the IndexProvider queue after a later mutation on the same element, inverting apply order.

Align pool behavior with consistency guarantees:

properties

storage.cql.read-consistency-level=LOCAL_QUORUM
storage.cql.write-consistency-level=LOCAL_ONE
storage.cql.atomic-batch-mutate=true

LOCAL_ONE on writes minimizes pool pressure during bulk ingestion; data propagates asynchronously to the remaining replicas.
LOCAL_QUORUM on reads guarantees visibility of recently committed data across the local rack, so index-backed traversals return committed state.
atomic-batch-mutate=true forces multi-statement mutations to apply as a single unit, preventing partial composite-index updates when a socket drops mid-batch.

Poll these lag metrics rather than assuming the window is small. Elasticsearch /_cat/thread_pool/write?v bulk queue depth and the JanusGraph index.[name].elasticsearch backpressure counters are the two signals that move first under pool saturation. The full acknowledgment-boundary trade-off — how long to wait for the index before a read is considered consistent — is covered in Eventual vs Strong Consistency, and the routing logic that decides which backing index answers a query lives in Mixed-Index Routing. For backend-specific propagation behavior, see the Elasticsearch integration and OpenSearch sync patterns guides.

Python Integration Pattern

Python ingestion pipelines must manage connection lifecycle explicitly and implement idempotent retries. The following gremlin-python client wraps pool sizing and exponential backoff so transient network failures, pool exhaustion, and server-side timeouts do not corrupt graph state. The key discipline is that a retry must be safe to replay — pair it with mergeV/mergeE upsert semantics or an idempotency key, never a bare addV, or a retried mutation duplicates vertices.

python

from gremlin_python.driver.client import Client
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
)
from concurrent.futures import ThreadPoolExecutor
import socket
import logging

logger = logging.getLogger(__name__)


class JanusGraphPoolClient:
    def __init__(self, host: str, port: int = 8182, pool_size: int = 12):
        # gremlin-python's Client takes a WebSocket URL. pool_size caps the
        # connection pool; it must match storage.cql.max-connections-per-host
        # so the driver and backend agree on the socket ceiling.
        url = f"ws://{host}:{port}/gremlin"
        self.client = Client(url, "g", pool_size=pool_size, max_workers=pool_size)
        self.executor = ThreadPoolExecutor(max_workers=pool_size)

    @retry(
        retry=retry_if_exception_type((ConnectionError, socket.timeout, OSError)),
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10),
        reraise=True,
    )
    def submit(self, query: str, bindings: dict | None = None):
        # ConnectionError/OSError trigger pool reconnection; swallowing them
        # here would leave the index-sync state silently desynchronized.
        try:
            result_set = self.client.submit(query, bindings or {})
            return result_set.all().result()
        except Exception as e:
            logger.error("query submission failed: %s", e)
            raise

    def upsert_vertex(self, key: str, label: str, props: dict):
        # Idempotent by construction: a replayed retry converges, it does not
        # duplicate. This is what makes the retry decorator safe.
        query = (
            "g.mergeV([(T.id): kid])"
            ".option(onCreate, [(T.label): lbl])"
            ".sideEffect(__.property('data', payload))"
        )
        return self.submit(query, {"kid": key, "lbl": label, "payload": props})

    def close(self):
        self.client.close()
        self.executor.shutdown(wait=True)

Implementation requirements:

pool_size on the client must equal storage.cql.max-connections-per-host. A mismatch produces driver-side queueing (client too small) or backend rejection (client too large).
tenacity applies exponential backoff, which spreads reconnection attempts and prevents a thundering herd against a recovering backend node.
ThreadPoolExecutor isolates traversal execution from the caller’s event loop so a slow query cannot block the whole pipeline.
ConnectionError and OSError are caught explicitly because they signal a dead socket the driver must replace; silent handling here is exactly what corrupts index-sync ordering.

For the full mapping from Python concurrency limits to JVM thread-pool boundaries, work through the JanusGraph Connection Pool Tuning Guide.

Connection Lifecycle & Pool Management

A pool is a state machine, not a static array of sockets. Each connection cycles through warm-idle, in-flight, half-open, and evicted states, and most production incidents trace back to a socket stuck in the wrong one.

Sizing rules:

Floor from steady state, ceiling from bursts. Set core-connections-per-host to cover typical concurrent traversals so the common path never handshakes; set max-connections-per-host to absorb peak burst without exceeding per-node native-transport capacity.
Match the three ceilings. Client pool_size, storage.cql.max-connections-per-host, and Gremlin Server threadPoolWorker must be reconciled — the smallest one is the real concurrency limit, and the others are wasted or misleading.
Budget file descriptors. Total sockets across all pools plus index-backend connections must stay under the OS ulimit -n; a descriptor exhaustion presents identically to pool exhaustion but is not fixed by resizing the pool.

Idle timeout and health:

Firewalls and load balancers silently drop idle TCP sessions, leaving the driver holding a half-open socket that only reveals itself on the next submit. Keep core-connections-per-host modest and rely on connection-timeout plus driver heartbeats to evict dead sockets before they serve a query.
Enable driver-level keepalive/heartbeat so idle sockets are probed on a cadence shorter than the network’s idle-drop timeout.

Retry policy:

Retries belong at the application boundary (as in the client above), not buried in the driver, so that idempotency is provable per operation.
Cap attempts and use exponential backoff with jitter; unbounded retries against a saturated pool convert a transient spike into a sustained outage.

Diagnostics & Operational Fallbacks

Expose pool metrics through the JMX beans published by the CQL driver and Gremlin Server, then triage against these symptom/diagnosis/resolution triplets.

Symptom: NoHostAvailableException bursts during ingestion, nodes still UN in nodetool status. Diagnosis: pool exhaustion, not node death — open-connections sits pinned at max-connections-per-host while in-flight-requests plateaus. Resolution: raise max-connections-per-host toward per-node capacity, keep a warm core-connections-per-host baseline, and shorten connection-timeout so backpressure reaches the producer instead of queuing.
Symptom: Traversals return vertices JanusGraph just committed as missing. Diagnosis: widened index-sync window — a dropped/retried connection reordered the index dispatch relative to the storage commit. Resolution: set atomic-batch-mutate=true, read at LOCAL_QUORUM, and route recency-critical reads through a storage-backed id lookup instead of a has() predicate until the index catches up.
Symptom: Latency spikes correlate with JVM garbage-collection pauses. Diagnosis: a long GC cycle stalls connection acquisition; threads park on the pool and time out in clusters. Resolution: tune G1GC for shorter pauses, cap request timeouts to fail fast, and keep the pool small enough that a pause does not leave hundreds of threads blocked on acquisition.
Symptom: reconnection-count climbing steadily on otherwise healthy nodes. Diagnosis: half-open sockets — a firewall is dropping idle sessions the driver still believes are live. Resolution: enable driver heartbeats, lower the idle keepalive interval below the network drop timeout, and reduce core-connections-per-host so fewer sockets sit idle long enough to be culled.
Symptom: Elasticsearch/OpenSearch bulk queue rejections under write load. Diagnosis: mixed-index backpressure — /_cat/thread_pool/write queue depth is rising while storage commits succeed. Resolution: decouple graph commits from index updates with asynchronous indexing, tune index.search.elasticsearch.bulk-size, and throttle producers before the index queue overflows.

For quorum-calculation baselines behind the consistency settings above, see the upstream Apache Cassandra consistency levels reference, and align the storage-tier setup itself through Cassandra Backend Setup.

Up a level: JanusGraph Storage Backend Architecture & Configuration — the storage tier this pool feeds.
JanusGraph Connection Pool Tuning Guide — step-by-step mapping of Python concurrency limits to JVM thread-pool boundaries.
Cassandra Backend Setup — keyspace, consistency, and compaction tuning the pool sizing depends on.
Replication Strategies — replica count and datacenter layout that set the ceiling for pool maxima.
ScyllaDB Migration — driver overrides that change pool defaults for the shard-per-core model.
Eventual vs Strong Consistency — where to place the index-acknowledgment boundary the sync window depends on.