Why must I use NetworkTopologyStrategy instead of SimpleStrategy for JanusGraph replication?

SimpleStrategy places replicas by walking the ring without regard to rack or datacenter, which causes cross-datacenter read and write amplification and breaks local-quorum availability during a regional partition. NetworkTopologyStrategy honors explicit per-datacenter replica counts, so LOCAL_QUORUM writes stay within the local datacenter. It is mandatory for any multi-node production JanusGraph deployment.

What replication factor and consistency level should I run in production?

Use a local replication factor of 3 with LOCAL_QUORUM writes and LOCAL_ONE reads. RF=3 tolerates one node down at quorum while bounding write latency by the second-fastest replica. Avoid RF=2, which pays quorum cost while tolerating zero failures. Reserve EACH_QUORUM for bulk loads and schema migrations, never as a steady-state write default.

How do I keep replication-strategy-options and the keyspace DDL in sync?

Provision the keyspace with the exact per-datacenter replica counts first, then set storage.cql.replication-strategy-options to match byte-for-byte and point JanusGraph at the existing keyspace. Never let JanusGraph auto-create it, because auto-creation defaults to SimpleStrategy and makes the options cosmetic. Confirm alignment with DESCRIBE KEYSPACE after any topology change.

Why does NoHostAvailableException appear under load when every node is healthy?

It is usually connection-pool exhaustion, not node death. Every LOCAL_QUORUM write fans out to a majority of local replicas, so a pool sized for single-node throughput starves under quorum concurrency. Confirm nodes are UN via nodetool status, then raise storage.cql.max-connections-per-host toward node capacity, keep a warm core-connections baseline, and shorten connection-timeout so backpressure surfaces to the producer.

Replication Strategies

A replication strategy in a production Apache JanusGraph deployment is an operational contract, not a theoretical exercise: it fixes query latency, decides which node failures are survivable, and sets the boundary inside which a committed write is guaranteed durable. Because JanusGraph owns no persistence of its own, every replication decision you make is really a decision about the underlying storage engine’s consensus behavior — and a second, looser decision about how far the search index is allowed to drift behind it. This guide sits under the JanusGraph Storage Backend Architecture & Configuration reference and narrows it to one subsystem: how vertex and edge mutations propagate to storage replicas, how the replica topology interacts with consistency levels, and how to keep the mixed index reconciled with the storage view. The failure mode this page exists to prevent is the quiet one — writes succeed, quorum is satisfied, and traversals still return stale or missing data because the topology, the consistency levels, and the index sync window were tuned in isolation instead of as one decision.

The topology below shows NetworkTopologyStrategy with asymmetric replication factors across two datacenters: a full-quorum primary and a lighter disaster-recovery region.

Core Configuration & Consistency Tuning

JanusGraph delegates replication entirely to its storage backend. The tuning surface that matters lives in two properties — storage.cql.replication-strategy-class and storage.cql.replication-strategy-options — plus the pair of consistency levels that decide how many of those replicas must acknowledge before an operation returns. For any multi-node deployment, NetworkTopologyStrategy is mandatory. SimpleStrategy places replicas by walking the ring without regard to rack or datacenter, which silently guarantees cross-DC write amplification and breaks local-quorum availability the moment a region partitions.

properties

# janusgraph-production.properties
storage.backend=cql
storage.hostname=10.0.1.10,10.0.1.11,10.0.1.12
storage.cql.keyspace=graph_data
storage.cql.local-datacenter=dc1

# Replication topology & consistency
storage.cql.replication-strategy-class=NetworkTopologyStrategy
storage.cql.replication-strategy-options=dc1,3,dc2,1
storage.cql.write-consistency-level=LOCAL_QUORUM
storage.cql.read-consistency-level=LOCAL_ONE
storage.cql.only-use-local-consistency-for-system-operations=true

# Index backend (async, out of the storage quorum path)
index.search.backend=elasticsearch
index.search.hostname=10.0.2.10,10.0.2.11
index.search.elasticsearch.client-only=true

Treat the following as hard operational constraints, not defaults you may drift away from:

Provision the keyspace before JanusGraph touches it. If you let JanusGraph auto-create the keyspace, it defaults to SimpleStrategy and your replication-strategy-options become cosmetic. Create the keyspace with NetworkTopologyStrategy and the exact per-DC replica counts first, then point JanusGraph at it. The DDL and provisioning walkthrough lives in Cassandra Backend Setup.
Keep the keyspace DDL and replication-strategy-options byte-for-byte identical. A mismatch — dc2,1 in the properties, dc2,2 in the live keyspace — produces replicas Cassandra will honor but JanusGraph’s schema assumptions will not, and read repair papers over the divergence until a node dies.
Pin local-datacenter explicitly. The DataStax 4.x driver refuses to route without an explicit local DC under its default load-balancing policy; omit it and JanusGraph throws at startup rather than degrading.
Keep system operations on local consistency. Set only-use-local-consistency-for-system-operations=true so ID-block allocation and schema locks stay on LOCAL_QUORUM instead of escalating to a global QUORUM that stalls every schema mutation on cross-DC round trips.

The two consistency levels are where the topology becomes latency. The three choices you will actually weigh:

LOCAL_QUORUM (writes) — blocks until a majority of replicas within the local datacenter acknowledge. Survives a single-node failure with no cross-DC latency penalty. This is the correct write default.
LOCAL_ONE (reads) — routes to the nearest available replica and accepts eventual consistency for traversals. The correct read default when a briefly stale vertex does not corrupt business logic; escalate to LOCAL_QUORUM reads only where read-your-writes on storage is a hard requirement.
EACH_QUORUM (writes) — demands a quorum in every datacenter before returning. Reserve it for bulk loads and schema migrations where synchronous cross-DC durability is worth the tail latency; never run it as a steady-state default.

The quorum arithmetic is worth internalizing because it dictates your failure budget. A write at LOCAL_QUORUM with replication factor $RF$ in the local datacenter blocks until $\lfloor RF/2 \rfloor + 1$ replicas acknowledge. With $RF=3$ that is 2 replicas — you tolerate one node down, and latency is bounded by the second-fastest replica, not the slowest. Drop the local RF to 2 and quorum still needs 2 acknowledgments, so you tolerate zero failures while paying quorum cost: an RF of 2 is the worst of both worlds and should never appear in a production replication-strategy-options. The asymmetric dc1,3,dc2,1 topology above is deliberate — dc2 at RF=1 is a warm standby that receives replicas asynchronously without dragging every LOCAL_QUORUM write in dc1 into a cross-DC negotiation. The full multi-region routing model, including EACH_QUORUM bulk-load windows and rack-aware placement, is worked through in Configuring Multi-Datacenter Replication for Graph Data.

If your throughput targets outgrow Cassandra’s coordinator model, the same replication-strategy-options semantics carry over to a CQL-compatible backend — the ScyllaDB migration guide details the driver overrides required to preserve these write semantics while bypassing Cassandra-specific RPC limits.

Index Synchronization & Drift Mitigation

Replication guarantees stop at the storage boundary. The mixed index — Elasticsearch or OpenSearch — does not participate in Cassandra’s consensus protocol at all; it is fed asynchronously from a separate mutation log after the storage commit has already returned. That decoupling opens a sync window: a vertex committed at LOCAL_QUORUM is durable and quorum-replicated, yet still invisible to a has() predicate until the index applies and refreshes the corresponding document. Under sustained ingestion this window stretches, and the symptom on-call sees is stale traversal results or intermittent IndexNotFoundException errors on a graph that is, by every storage metric, perfectly healthy. Where you place the acknowledgment boundary between storage durability and index visibility is the central trade-off of eventual vs strong consistency.

The length of the window is governed by three knobs, none of which replication factor influences:

Refresh interval. The default 1s Elasticsearch/OpenSearch refresh forces excessive segment-flush I/O under heavy ingestion. Set index.search.elasticsearch.create.ext.refresh_interval to 5s or 10s for batch pipelines and leave it low only where near-real-time search is a hard requirement. The same lever, tuned for OpenSearch clusters, is covered in OpenSearch Sync Patterns.
Client-only wiring. Set index.search.elasticsearch.client-only=true so JanusGraph never joins the search cluster as a data node. This keeps index-node lifecycle out of the graph’s failure domain and removes a whole class of split-brain risk during a partition.
Reconciliation cadence. The index is eventually consistent by construction; the only defense against unbounded drift is to measure it. Run periodic jobs that compare authoritative vertex counts in the CQL backend against index document counts and flag divergence beyond a threshold for targeted reindex rather than assuming the async worker kept pace.

The reconciliation polling pattern is deliberately narrow: it confirms search visibility for a specific write, not storage durability. Do not force every commit to block on a synchronous refresh to close this window — that trades a rare stale read for a permanent latency tax on every mutation. Instead, gate only the reads that genuinely need read-your-writes behind a bounded poll, and let the bulk of ingestion ride the fast asynchronous path. Which query path a given traversal resolves through — storage-backed ID lookup versus mixed index — is itself a routing decision covered under mixed-index routing.

Python Integration Pattern

Production reconciliation requires deterministic retry logic and explicit error boundaries that separate a transient network fault from a genuine logical gap. The orchestrator below verifies that a vertex present in the storage backend is also present in the search index, applies exponential backoff on transient transport failures, and forces an index refresh only when a document is confirmed missing — never speculatively, which would convert a reconciliation pass into an I/O storm.

python

import logging
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from gremlin_python.driver import client
from elasticsearch import Elasticsearch, exceptions as es_exc

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
logger = logging.getLogger("graph_reconciler")


class GraphIndexReconciler:
    def __init__(self, gremlin_url: str, es_url: str, index_name: str = "janusgraph"):
        self.gremlin = client.Client(gremlin_url, "g")
        self.es = Elasticsearch([es_url], verify_certs=True, max_retries=3)
        self.index_name = index_name

    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1.5, min=2, max=15),
        retry=retry_if_exception_type(
            (ConnectionError, es_exc.ConnectionError, es_exc.TransportError)
        ),
    )
    def reconcile_vertex(self, vertex_id: str) -> bool:
        """Verify a vertex exists in both storage and index. Force a refresh only
        when the storage-authoritative vertex is confirmed missing from the index."""
        # 1. Query the storage backend — the authoritative source of truth.
        try:
            traversal = self.gremlin.submit(f"g.V('{vertex_id}').elementMap()")
            result = traversal.all().result()
            if not result:
                # Not in storage: nothing to reconcile, do NOT touch the index.
                logger.warning(f"Vertex {vertex_id} missing in storage backend. Aborting.")
                return False
        except Exception as e:
            logger.error(f"Storage fetch failed for {vertex_id}: {e}")
            raise

        # 2. Verify index presence for the confirmed-durable vertex.
        try:
            self.es.get(index=self.index_name, id=vertex_id)
            logger.info(f"Index verified for {vertex_id}")
            return True
        except es_exc.NotFoundError:
            logger.info(f"Index miss for {vertex_id}. Forcing segment refresh.")
            self.es.indices.refresh(index=self.index_name)
            return False
        except Exception as e:
            logger.error(f"Elasticsearch query failed for {vertex_id}: {e}")
            raise


# Usage: reconciler.reconcile_vertex("v-8a3f9c")

Two disciplines make this safe to run against a live cluster. First, the storage query is authoritative and runs before any index mutation — a vertex absent from storage is never a reconciliation target, because writing it to the index would manufacture a phantom document that outlives its own data. Second, the @retry predicate matches only transport-level exceptions; a NotFoundError is a logical result, not a fault, so it deliberately falls outside the retry set and returns cleanly. Schedule this as a Kubernetes CronJob at a fixed interval and serialize reindex triggers across regions — concurrent refresh calls fanning out to the same index during a drift spike compound the very I/O pressure that widened the window in the first place. Keep the property keys and index mappings this pipeline reads stable across deploys; changing a key’s index binding mid-flight is a schema evolution concern that belongs in a CI gate, not a reconciliation job.

Connection Lifecycle & Pool Management

Replication topology decides how many replicas a write must reach; the connection pool decides whether your client can reach them under load. In a multi-DC deployment the CQL driver holds a distinct pool per node, and LOCAL_QUORUM means the coordinator must contact a majority of local replicas for every write — so a pool sized for single-node throughput starves the moment quorum fan-out kicks in. Undersized pools surface as NoHostAvailableException and traversal timeouts that look identical to node failure until you read the pool-utilization metric.

Key pool parameters for the DataStax CQL driver under a replicated topology:

storage.cql.max-connections-per-host — the ceiling on physical sockets per node. Size it to peak concurrent writers per node plus roughly 20% headroom for retry and reconciliation traffic, and keep it under each node’s native_transport_max_threads so you do not convert a downstream slowdown into a coordinator-saturating thundering herd.
storage.cql.core-connections-per-host — the warm baseline. A pool that starts at zero pays a TLS-handshake tax on the first write of every burst; keep enough warm connections that a quorum fan-out never cold-starts.
storage.cql.connection-timeout — fail fast rather than queue. A short timeout surfaces backpressure to the producer during saturation; a long one hides pool exhaustion until it is total.

Two sizing rules keep the pool honest across replicas. Size to quorum concurrency, not average load — because every LOCAL_QUORUM write touches multiple replicas simultaneously, the pool must absorb concurrent-writers × local-quorum-fan-out at peak, not just the raw request rate. Bound the client idle timeout below the server’s — keep the driver’s idle reaping under Cassandra’s native_transport_idle_timeout so the client closes dead sockets first; a client reusing a server-closed socket surfaces as a spurious GremlinServerError that the retry policy will burn attempts on. The full pool sizing model, minimum/maximum tuning, and starvation symptoms live in Connection Pooling.

Diagnostics & Operational Fallbacks

On a replicated backend the top failure modes all present as the same page — writes succeed, reads look wrong — so triage depends on reading the metric that separates a topology fault from a pool fault from an index fault. The table maps symptom to diagnosis to fix.

Symptom	Diagnose	Resolve
Cross-DC reads return stale or missing rows after failover	`DESCRIBE KEYSPACE` shows a per-DC replica count that differs from `replication-strategy-options`	Realign the keyspace DDL to the properties exactly, then `nodetool repair` the affected keyspace before re-enabling traffic
`NoHostAvailableException` under load, nodes healthy	`nodetool status` shows all replicas `UN` but driver pool utilization pegged at 100%	Pool exhaustion, not node death — raise `max-connections-per-host` toward node capacity and shorten `connection-timeout` per the Connection Pooling model
`has()` traversals miss vertices committed seconds ago	Vertex resolves via `g.V(id)` but not via `has()`; index document count trails storage count	Index sync window is stretching — raise `refresh_interval`, throttle the producer, or gate the read behind the reconciliation poll
Schema mutations hang on `apply` in a multi-DC keyspace	Global `QUORUM` on system operations forcing cross-DC round trips	Set `only-use-local-consistency-for-system-operations=true` so ID allocation and schema locks stay local
Write latency degrades cluster-wide, no node down	Steady-state writes running at `EACH_QUORUM` or `QUORUM` instead of `LOCAL_QUORUM`	Return the write consistency level to `LOCAL_QUORUM`; reserve `EACH_QUORUM` for bulk-load windows only

When drift or divergence persists beyond what a producer throttle, a refresh-interval change, or a nodetool repair can close, run a REINDEX through the JanusGraph Management API during a maintenance window rather than dropping and rebuilding the index live. Validate replication behavior against the official Apache Cassandra documentation and the JanusGraph reference documentation, and keep continuous monitoring on three series that predict every row in the table above: per-DC replica health, driver pool utilization, and the storage-versus-index count delta. Accurate replication is the alignment of those three, not any single well-tuned property.

Up a level: JanusGraph Storage Backend Architecture & Configuration — the storage tier these replication decisions plug into.
Configuring Multi-Datacenter Replication for Graph Data — topology-aware routing, EACH_QUORUM bulk windows, and rack placement.
Cassandra Backend Setup — keyspace provisioning and the CQL DDL that must match your replica counts.
ScyllaDB Migration — driver overrides that preserve these write semantics on a CQL-compatible backend.
Connection Pooling — sizing the CQL pool for quorum fan-out.
Eventual vs Strong Consistency — where to place the index acknowledgment boundary the sync window depends on.