How to Configure Cassandra for JanusGraph Storage
Configuring Cassandra as the Apache JanusGraph Storage Backend & Index Synchronization layer requires strict alignment of replication factors, connection pool sizing, and consistency models. Misconfigured pools or mismatched consistency levels trigger StorageException cascades and index drift during high-throughput ingestion. This guide provides production-ready CQL provisioning, exact janusgraph-cql.properties mappings, and a Python pipeline template with explicit fallback procedures for index parity.
1. Cassandra Keyspace & Replication Baseline
JanusGraph requires a pre-provisioned keyspace with explicit datacenter routing. Relying on auto-creation bypasses production validation checks and introduces unpredictable compaction behavior. Execute the following CQL to establish the storage foundation:
CREATE KEYSPACE janusgraph_graph
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'dc1': 3,
'dc2': 1
} AND DURABLE_WRITES = true;
USE janusgraph_graph;
-- Verify partitioner and compaction strategy defaults
DESCRIBE KEYSPACE janusgraph_graph;
DURABLE_WRITES = true prevents commit log truncation during unclean Cassandra shutdowns, which directly causes index drift. Align the replication factor with your physical topology. For multi-region deployments, consult the Cassandra Backend Setup reference to map local datacenter hints and eliminate cross-DC latency during bulk vertex commits. Reference the official Apache Cassandra CQL Reference for syntax validation on older cluster versions.
Fallback: If DESCRIBE KEYSPACE returns SimpleStrategy or mismatched RF values, drop and recreate the keyspace. Do not use ALTER KEYSPACE on active JanusGraph instances, as it triggers full cluster re-tokenization and temporary UnavailableException spikes.
2. JanusGraph Storage Properties
The janusgraph-cql.properties file must explicitly declare consistency levels, connection limits, and timeout thresholds. Default DataStax driver settings exhaust connection pools under concurrent Gremlin traversal loads.
# Core Storage Mapping
storage.backend=cql
storage.hostname=10.0.1.10,10.0.1.11,10.0.1.12
storage.cql.keyspace=janusgraph_graph
storage.cql.local-datacenter=dc1
# Consistency & Durability
storage.cql.read-consistency-level=QUORUM
storage.cql.write-consistency-level=QUORUM
storage.cql.replication-strategy=NetworkTopologyStrategy
storage.cql.replication-factor={"dc1": 3, "dc2": 1}
# Connection Pool & Timeouts
storage.cql.max-connections-per-host=32
storage.cql.core-connections-per-host=8
storage.cql.connection-timeout=5000
storage.cql.read-timeout=12000
storage.cql.write-timeout=12000
storage.cql.request-timeout=15000
# Index & Schema Sync
storage.cql.atomic-batch-mutate=true
ids.block-size=100000
Operational constraints:
max-connections-per-hostmust equal or exceed your Gremlin Server thread pool size (gremlinserver.gremlinPool). Mismatched values causeConnectionPoolExhaustedException.atomic-batch-mutate=trueforces vertex/edge mutations and secondary index updates into a single CassandraUNLOGGED BATCH. This prevents orphaned index entries during partial failures.ids.block-sizecontrols ID pre-allocation. Increase to500000for ingestion rates >10k vertices/sec, but monitor JanusGraph heap usage to prevent OOM kills.
Review the JanusGraph Storage Backend Architecture & Configuration documentation for driver-level connection pooling overrides when deploying behind service meshes.
Fallback: If connection timeouts persist, reduce max-connections-per-host to 16 and increase storage.cql.request-timeout to 30000. Verify Cassandra rpc_address and native_transport_port are reachable via nc -zv <host> 9042.
3. Python Pipeline Integration & Index Synchronization
Python ingestion pipelines must manage transaction boundaries explicitly to prevent partial commits and index desynchronization. The following gremlinpython implementation handles batch ingestion and enforces transactional isolation.
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal
import logging
import time
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class JanusGraphPipeline:
def __init__(self, ws_url="ws://gremlin-server:8182/gremlin"):
self.connection = DriverRemoteConnection(ws_url, "g")
self.g = traversal().withRemote(self.connection)
self.batch_size = 5000
self.committed = 0
def ingest_batch(self, vertices):
# Explicit transaction boundary ensures atomicity. begin() spawns the
# transaction-bound traversal source; mutations must run on it.
tx = self.g.tx()
gtx = tx.begin()
try:
for v_data in vertices:
gtx.addV("entity").property("id", v_data["id"]).next()
tx.commit()
self.committed += len(vertices)
logger.info(f"Committed {len(vertices)} vertices. Total: {self.committed}")
except Exception as e:
logger.error(f"Batch failed: {e}")
tx.rollback() # explicit rollback; log for retry queue
raise
def close(self):
self.connection.close()
Index Synchronization Protocol: JanusGraph relies on Cassandra’s write-ahead log and background compaction to materialize mixed indexes. If query latency degrades or index queries return stale results:
- Verify index status via Gremlin Console:
mgmt = graph.openManagement(); mgmt.printIndexes(). - If status is
INSTALLEDorREGISTERED, runmgmt.updateIndex(mgmt.getGraphIndex("myIndex"), SchemaAction.REINDEX).get(). - Monitor Cassandra
nodetool compactionstatsuntil pending tasks drop to zero.
Fallback: If REINDEX stalls due to tombstone accumulation, execute nodetool repair -pr janusgraph_graph followed by a manual ALTER TABLE janusgraph_graph.edgestore WITH compaction = {'class': 'LeveledCompactionStrategy'}; to force SSTable reorganization. Consult the Gremlin Python Driver Reference for connection lifecycle management in long-running workers.
4. Diagnostics & Operational Fallbacks
Production deployments require immediate triage paths for storage exceptions and connection exhaustion.
Symptom: org.janusgraph.core.JanusGraphException: Could not execute query due to StorageException
Diagnosis: Run cqlsh -e "SELECT * FROM system.local;" on each Cassandra node. Check system_traces for WriteTimeoutException or UnavailableException.
Resolution:
- Verify network partition:
nodetool statusmust showUNfor all nodes. - Check Cassandra commitlog disk space:
df -h /var/lib/cassandra/commitlog. Clear if >85% utilized. - Temporarily lower consistency:
storage.cql.write-consistency-level=LOCAL_QUORUMuntil cluster stabilizes.
Symptom: ConnectionPoolExhaustedException or NoHostAvailableException
Diagnosis: Inspect JanusGraph logs for PoolState metrics. Cross-reference with Cassandra nodetool tpstats for MutationStage queue depth.
Resolution:
- Increase
storage.cql.max-connections-per-hostincrementally by+8. - Enable driver-level retry policies:
storage.cql.retry-attempts=3. - If queues remain backed up, throttle ingestion rate at the Python pipeline level using exponential backoff.
Explicit Fallback for Index Parity Loss: When Cassandra and JanusGraph indexes diverge beyond repair:
- Disable mixed index queries:
graph.getConfiguration().setProperty("index.search.backend", "none"). - Export graph to GraphSON:
graph.io(IoCore.graphson()).writer().create().writeGraph("backup.graphson"). - Truncate JanusGraph tables:
TRUNCATE janusgraph_graph.edgestore; TRUNCATE janusgraph_graph.graphindex;. - Re-import and rebuild indexes via OLAP Spark jobs or sequential
REINDEXcommands.