What if LOCAL_QUORUM writes time out during a ScyllaDB bulk load?

Timeouts under load are a throttling signal, not a reason to weaken durability. Retry with exponential backoff, route persistent failures to a low-priority queue drained at QUORUM after the backlog clears, and widen request-timeout. Reverting to LOCAL_ONE would trade a transient timeout for permanent phantom reads.

Optimizing ScyllaDB Read/Write Consistency for Graphs

This guide walks an on-call engineer through pinning ScyllaDB read/write consistency so multi-hop JanusGraph traversals return coherent adjacency lists instead of the phantom reads, dangling edges, and duplicate-vertex writes that LOCAL_ONE produces under concurrent mutation. It is the consistency-tuning procedure under ScyllaDB Migration; if you need the end-to-end backend-switch sequence or the shard-aware pool model behind these values, read that reference first, because everything below assumes the CQL adapter is already bound to a warm ScyllaDB cluster. The specific failure it prevents: a single replica acknowledging a write before its peers converge, so g.V().outE() returns an incomplete edge set and breaks shortest-path and connected-component algorithms mid-run.

Prerequisites

Confirm every item before you change a consistency level. Skipping the topology and driver checks is the most common cause of a config that passes a single-threaded smoke test and then fails on the first concurrent traversal load.

JanusGraph 0.6.x or 1.0.x with the CQL storage adapter (storage.backend=cql) already pointed at ScyllaDB. The legacy Thrift adapter is removed in 1.0 and must not be used.
ScyllaDB Open Source 5.x or Enterprise 2024.x with native_transport_port 9042 reachable from every JanusGraph node. Verify with nc -zv scylla-node-01 9042 before starting.
A defined replica topology. Know your datacenter name and per-DC replication factor from nodetool status — the LOCAL_QUORUM math below only holds if the keyspace uses NetworkTopologyStrategy per your replication strategies, not SimpleStrategy.
gremlinpython and the ScyllaDB-compatible cassandra-driver on the operator host. The cassandra-driver supplies the shard-aware DCAwareRoundRobinPolicy used by the sync monitor in Step 2.
Write permission to edit janusgraph.properties and restart Gremlin Server, plus read access to system_traces and system_distributed on the ScyllaDB cluster.
A baseline P95/P99 latency capture from nodetool cfstats taken before any change, so a consistency-induced regression is measurable rather than a guess.

Step 1 — Enforce strong consistency for transactional traversals

Default LOCAL_ONE read/write levels cause phantom reads during concurrent edge mutations: a single replica acknowledges before its peers converge, so a traversal reading a different replica sees a truncated adjacency list. Override the defaults with LOCAL_QUORUM for both reads and writes, force atomic batch mutations, and stop system operations from bypassing cluster consensus.

Apply these exact settings in janusgraph.properties:

properties

storage.backend=cql
storage.hostname=scylla-node-01,scylla-node-02,scylla-node-03
storage.cql.read-consistency-level=LOCAL_QUORUM
storage.cql.write-consistency-level=LOCAL_QUORUM
storage.cql.only-use-local-consistency-for-system-operations=false
storage.cql.atomic-batch-mutate=true
storage.cql.batch-statement-size=20

Operational constraints for the non-default values:

read/write-consistency-level=LOCAL_QUORUM requires a majority of local replicas to acknowledge, which closes the phantom-read window without the cross-datacenter latency of a global QUORUM. This is the same acknowledgment boundary you set for the Cassandra Backend Setup; keep it identical across the migration so a latency change is never ambiguous. The wider durability-versus-visibility trade-off is analyzed under eventual vs strong consistency.
atomic-batch-mutate=true forces ScyllaDB to use logged batches for multi-partition edge writes, preventing partial commits when a coordinator fails mid-mutation. It covers storage and composite indexes only — mixed indexes stay asynchronous (Step 2).
batch-statement-size=20 caps the batch payload so concurrent Gremlin mutations do not trip BatchTooLarge while still holding a transactional boundary.
only-use-local-consistency-for-system-operations=false stops ID allocation and schema locks from silently running at a weaker level than your data path during the consistency audit — set it back to true once tuning is confirmed if cross-DC schema stalls appear.

Restart Gremlin Server so the properties reload. ScyllaDB’s own consistency documentation lists coordinator behavior for each level if your replica count differs from three.

Step 2 — Decouple index sync from storage consistency

Raising storage consistency raises write latency, which delays asynchronous mixed-index (Elasticsearch/OpenSearch) synchronization. Pipelines that resolve has() predicates through the search tier then return stale results and create duplicate vertices. Keep LOCAL_QUORUM on storage but let index updates proceed asynchronously, and apply backpressure when the divergence exceeds a threshold. Reference the JanusGraph Storage Backend Architecture & Configuration baseline to confirm the mixed-index backend is bound to the CQL layer, and the mixed-index routing rules for where reads should resolve while the index catches up.

This monitor polls ScyllaDB’s materialized-view build status as an index-sync signal, widens the traversal timeout while views rebuild, and flips a backpressure flag the application layer reads before routing:

python

import logging
from cassandra.cluster import Cluster
from cassandra.policies import DCAwareRoundRobinPolicy
from cassandra.query import SimpleStatement, ConsistencyLevel

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")


class IndexSyncMonitor:
    def __init__(self, contact_points, keyspace="janusgraph", max_lag_ms=5000):
        self.cluster = Cluster(
            contact_points=contact_points,
            load_balancing_policy=DCAwareRoundRobinPolicy(local_dc="DC1"),
        )
        self.session = self.cluster.connect(keyspace)
        self.max_lag_ms = max_lag_ms
        self.backpressure_active = False
        self.traversal_timeout_ms = 30000

    def poll_sync_state(self):
        """Poll ScyllaDB system tables for index (materialized-view) build state."""
        try:
            # Any view not in 'SUCCESS' means the index is still catching up
            # with the storage layer.
            stmt = SimpleStatement(
                "SELECT view_name, status FROM system_distributed.view_build_status",
                consistency_level=ConsistencyLevel.LOCAL_ONE,
            )
            pending = [r.view_name for r in self.session.execute(stmt)
                       if r.status != "SUCCESS"]

            if pending and not self.backpressure_active:
                self._activate_backpressure(pending)
            elif not pending and self.backpressure_active:
                self._deactivate_backpressure()

            return pending
        except Exception as exc:
            logging.error("Sync state poll failed: %s", exc)
            return None

    def _activate_backpressure(self, pending):
        self.backpressure_active = True
        self.traversal_timeout_ms = 60000  # widen timeout while indexes rebuild
        logging.warning("Backpressure activated. Views still building: %s. "
                        "Traversal timeout set to %dms", pending, self.traversal_timeout_ms)

    def _deactivate_backpressure(self):
        self.backpressure_active = False
        self.traversal_timeout_ms = 30000
        logging.info("Backpressure deactivated. Restoring standard traversal timeout.")

    def close(self):
        self.cluster.shutdown()

While backpressure_active is True, route index.search queries to storage-backed traversals (g.V().has('name', 'x').out()) and disable mixed-index lookups at the application layer until poll_sync_state() returns an empty list. Wrap the flag in a circuit breaker that fails fast if sync stays active for more than 60 seconds continuously — a longer stall is an index-repair problem, covered in Resolving OpenSearch Index Drift in Production, not a backpressure problem.

Step 3 — Reproduce phantom reads under load

Before trusting the new levels, prove the phantom-read window is actually closed. Use gremlinpython to execute a write and an immediate read-back; a correctly tuned cluster satisfies the assertion every time, while a LOCAL_ONE regression fails it intermittently under concurrency.

python

from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal

conn = DriverRemoteConnection("ws://localhost:8182/gremlin", "g")
g = traversal().with_remote(conn)

# Write, then immediately read back through a (potentially different) replica.
g.addV("person").property("id", "100").next()
result = g.V().has("id", "100").count().next()
assert result == 1, "Phantom read detected: consistency misalignment"

conn.close()

Run this from several producers in parallel to exercise the coordinator under contention — a single-threaded run cannot surface a replica-convergence race. If the assertion holds across concurrent runs, LOCAL_QUORUM is enforcing read-your-writes on the storage tier.

Verification commands

Confirm each step landed independently; do not treat a passing assertion as proof the whole path is coherent.

bash

# Step 1: confirm the coordinator applied LOCAL_QUORUM, not a stale default.
# Trace a query, then read its consistency_level back from system_traces.
cqlsh -e "SELECT consistency_level, duration FROM system_traces.sessions \
  WHERE session_id = <trace_id>;"

# Step 1: confirm atomic (logged) batches are executing during concurrent writes.
tail -f /var/log/scylla/scylla.log | grep -i "logged batch"

# Read vs write latency histograms — a healthy ratio is roughly balanced.
nodetool cfstats janusgraph

python

# Step 2: the monitor reports zero pending views when the index is caught up.
monitor = IndexSyncMonitor(["scylla-node-01", "scylla-node-02", "scylla-node-03"])
assert monitor.poll_sync_state() == [], "Mixed index still building; hold reads"
monitor.close()

The UNLOGGED batch warnings must be absent from the ScyllaDB log during concurrent Gremlin mutations — their presence means atomic-batch-mutate did not take effect. Watch nodetool tpstats for pending write tasks trending to zero before declaring the run healthy.

Explicit fallback procedures

Each fallback maps to the step most likely to produce it. Run the diagnosis first; do not revert to LOCAL_ONE blindly — that reopens the phantom-read window you just closed.

Step 1 — ReadTimeout or WriteTimeout under peak ingestion. Do not drop consistency. Retry failed mutations up to 3 times with exponential backoff (100 ms, 250 ms, 500 ms). If timeouts persist, route the transaction to a low-priority queue and drain it via cross-DC QUORUM only after the backlog clears; watch nodetool tpstats pending mutations to confirm clearance before resuming standard routing.
Step 1 — Read Latency exceeds Write Latency by more than 3x in nodetool cfstats. Raise storage.cql.request-timeout in janusgraph.properties from 12000 to 30000 and restart. If phantom reads still appear, enable request tracing in ScyllaDB and ship traces to a central sink for coordinator-level analysis.
Step 2 — backpressure stays active beyond 60 seconds. Trip the circuit breaker, disable mixed-index routing entirely, and serve reads from storage-backed traversals. Trigger a full mixed-index reindex through the JanusGraph Management API during a maintenance window rather than waiting on the async queue.
Step 3 — the assertion fails intermittently. Confirm no replica is down with nodetool status (all nodes UN), verify the keyspace is NetworkTopologyStrategy with RF ≥ 3, and confirm LOCAL_QUORUM is actually loaded via the system_traces query above — an intermittent failure with a correct config almost always means a replica flapped mid-test.
Whole path — transient coordinator failures corrupt traversal state. Run a tiered routing policy: the primary path uses LOCAL_QUORUM writes with atomic batching and mixed-index queries enabled; on UnavailableException or ReadTimeoutException switch to QUORUM, disable mixed-index routing, and buffer mutations to a disk-backed queue; once nodetool status shows all replicas UN and system_distributed shows zero pending index builds, flush the buffer and restore LOCAL_QUORUM.

FAQ

Why LOCAL_QUORUM instead of QUORUM for graph traversals? QUORUM requires a replica majority across all datacenters, so every read and write pays cross-DC round-trip latency and stalls when a remote DC is degraded. LOCAL_QUORUM keeps the majority inside the coordinator’s datacenter, which closes the phantom-read window for a graph that ingests in one region while still replicating asynchronously to others.

Does atomic-batch-mutate=true make my Elasticsearch/OpenSearch documents consistent? No. It only makes storage mutations and composite (storage-backed) index updates atomic within ScyllaDB. Mixed indexes dispatch asynchronously after the storage commit, which is exactly why the Step 2 monitor and backpressure logic exist.

If a phantom read appears after enabling LOCAL_QUORUM, should I raise consistency further? No. A phantom read with LOCAL_QUORUM set means a replica flapped or the level is not actually loaded — verify with the system_traces query and nodetool status before touching consistency. Escalating to QUORUM only adds cross-DC latency without fixing a downed replica.

What if LOCAL_QUORUM writes time out during a bulk load? Timeouts under load are a throttling signal, not a reason to weaken durability. Apply the tiered retry-and-queue fallback, widen request-timeout, and let the low-priority queue drain at QUORUM — reverting to LOCAL_ONE would trade a transient timeout for permanent phantom reads.

Up a level: ScyllaDB Migration — the parent reference for the full Cassandra-to-ScyllaDB backend switch this tuning sits inside.
How to Configure Cassandra for JanusGraph Storage — the source-backend procedure whose consistency and pool values you carry into ScyllaDB unchanged.
Eventual vs Strong Consistency Tradeoffs in JanusGraph — where to place the acknowledgment boundary between storage durability and search visibility.
Resolving OpenSearch Index Drift in Production — the repair loop for the async index side that Step 2 backpressures against.
Configuring Mixed-Index Fallback Chains — how to route reads to storage-backed traversals while backpressure is active.