Property Indexing Rules

Property indexing rules decide which JanusGraph property mutations become queryable, how fast they become queryable, and whether a predicate on that property routes to an index or degrades into a full graph scan. The failure this page exists to prevent is the quiet kind: a property that was fast in staging silently full-scanning in production because its index never reached ENABLED, an index document that trails the storage commit because the async dispatch queue is backed up, or an ingestion pipeline that keeps writing undeclared property keys until the search cluster mapping drifts out of alignment with the graph. These are not clean errors — they surface as latency regressions and stale reads. This page scopes the indexing surface within the parent Graph Schema Validation & Modeling Strategies subsystem, and covers index-type selection, cardinality and data-type constraints, the synchronization window routed reads must respect, gremlin-python ingestion with explicit transaction boundaries, and the diagnostics an on-call engineer runs when an index and its storage backend disagree.

The diagram below shows how a property mutation flows from the storage commit to a queryable index, and where drift originates.

Index Types & Selection Rules

JanusGraph offers two index families, and the rule that governs almost every routing outcome is choosing the right one for the predicate you intend to serve. A composite index is stored natively in the storage backend and written inside the same mutation as the data it indexes, so it is transactionally consistent — it can never disagree with the graph. It answers equality predicates on one or more exactly-specified keys and nothing else. A mixed index pushes property mappings to an external search cluster and is updated asynchronously, so it is eventually consistent, but it answers the predicates a composite index cannot: full-text (textContains), range and inequality, geospatial, and multi-property ad-hoc combinations. The relationship between these two guarantees, and how the query optimizer picks between them at execution time, is covered in depth under Mixed Index Routing.

The selection rules that keep predicates off the full-scan path:

Equality on a fixed key set → composite index. If every query against a property specifies an exact value, a composite index gives you transactional consistency for free and never touches the search cluster. Prefer it whenever the access pattern allows.
Range, full-text, geo, or ad-hoc → mixed index. Any predicate that cannot be expressed as exact equality must route to a mixed index. Binding it to a composite index will not help — the optimizer will fall through to a scan.
Uniqueness → composite index with unique(). Enforce a uniqueness constraint at the composite index only; a mixed index cannot enforce uniqueness because its update is not part of the storage transaction.
Never rely on an implicit index. A predicate on an unindexed key silently full-scans. Set query.force-index=true so the traversal throws instead of scanning, turning a latency regression into a loud, catchable error.

Which property keys are eligible to back a mixed index at all is bounded upstream by your type registry — the cardinality and mapping decisions made during Vertex and Edge Validation determine whether a key can carry a routed predicate before you ever define the index.

Core Configuration & Consistency Tuning

Indexing behavior is governed by janusgraph.properties. The baseline below balances durability on the write path against query freshness on the read path for a production ingestion load. Every non-default value carries an indexing or consistency consequence; the numbered constraints after the block are the ones that change indexing outcomes.

properties

# Storage backend: Cassandra / ScyllaDB via the CQL driver
storage.backend=cql
storage.hostname=10.0.1.10,10.0.1.11,10.0.1.12
storage.cql.keyspace=graph_prod
storage.cql.write-consistency-level=QUORUM
storage.cql.read-consistency-level=LOCAL_QUORUM

# Mixed index backend: Elasticsearch / OpenSearch
index.search.backend=elasticsearch
index.search.hostname=10.0.2.20,10.0.2.21,10.0.2.22
index.search.port=9200
index.search.elasticsearch.client-only=true
index.search.elasticsearch.create.ext.refresh_interval=5s
index.search.elasticsearch.create.ext.number_of_replicas=1
index.search.elasticsearch.create.ext.number_of_shards=6

# Bulk dispatch & retry bounds
index.search.elasticsearch.bulk-refresh=false
index.search.elasticsearch.max-retry-time=300000

# Index sync & consistency guardrails
query.force-index=true
schema.default=none
cache.db-cache=true
cache.db-cache-clean-wait=20
cache.db-cache-time=180000
cache.db-cache-size=0.5

Operational constraints that govern indexing:

storage.cql.write-consistency-level=QUORUM ensures a property mutation survives node failure before it is handed to the async index pipeline. Downgrading to ONE during a bulk load is acceptable only if you follow it with a controlled REINDEX, because an index built from an under-replicated storage view will be silently incomplete.
create.ext.refresh_interval=5s caps the visibility window on newly indexed documents. Values below 2s under high write throughput trigger excessive segment merges and degrade indexing throughput without giving you transactional read-after-write — that guarantee comes from bulk-refresh, not the refresh interval. 5s is the operational baseline.
create.ext.number_of_shards=6 fixes the shard count at index-creation time and is immutable afterward; changing it later requires a full reindex. Mirror it to your storage partition topology so routed writes spread the same way the storage layer already balances them.
query.force-index=true blocks accidental full-graph scans when a query bypasses an indexed property. This guardrail converts an unbounded storage scan under high concurrency into an immediate exception.
schema.default=none rejects writes containing undeclared property keys or mismatched data types, preventing the mapping drift that corrupts a mixed index at ingestion. Never run production with an implicit default schema.
bulk-refresh=false disables a synchronous refresh per write batch so ingestion does not thrash the segment-merge pipeline; flip it to wait_for only on the narrow set of writes that must be immediately visible to a following routed read.

The storage-side consistency choices here depend on your cluster layout. QUORUM for writes and LOCAL_QUORUM for reads is the cross-datacenter baseline; the topology-specific tuning is in Cassandra Backend Setup, and the partition layout that shard alignment must follow is settled by your Replication Strategies. Review the upstream Apache Cassandra consistency levels reference before deviating from these values.

Index Synchronization Protocol

Indexing rules have to be reasoned about together with synchronization because JanusGraph deliberately decouples storage consistency from index visibility. A mutation commits to storage at QUORUM and returns to the caller immediately. The composite index, written in the same mutation, is queryable at once. The mixed index document is dispatched asynchronously through the IndexProvider interface and becomes searchable only after a propagation window elapses. A predicate routed to the mixed index inside that window reads a stale view.

The visibility window is the sum of three intervals:

t_{visible} = t_{queue} + t_{bulk} + t_{refresh}

where t_queue is time spent in the async dispatch queue, t_bulk is bulk-transport plus indexing latency on the search cluster, and t_refresh is bounded by create.ext.refresh_interval. Any read that assumes read-after-write on a mixed index will intermittently fail whenever t_visible exceeds the gap between a write and the read that depends on it. The deeper analysis of where to place the acknowledgment boundary lives in Eventual vs Strong Consistency; this page assumes you have chosen a boundary and are tuning indexing to respect it.

Two patterns keep indexed reads correct without serializing all ingestion:

Selective wait_for. For the specific writes whose result must be immediately visible to a following indexed query, issue the mutation with bulk-refresh=wait_for semantics so the commit blocks until the document is searchable. Applying it globally serializes throughput behind refresh; scope it to the read-your-writes path only.
Lag-gated polling. For reconciliation pipelines, poll a lag metric rather than sleeping a fixed interval. Track the index write-queue depth and the indexing-latency trend, and only route the reconciliation read once both are within threshold.

The lag signals to watch, and what a rising value means:

IndexProvider queue size (JMX, org.janusgraph.diskstorage.indexing.IndexProvider) — a monotonically rising queue is producer backpressure; indexed reads are trailing further behind with every batch.
Search cluster /_cat/thread_pool/write?v — non-zero rejections mean the producer is outrunning the search cluster and routed writes are being dropped into the retry loop.
/_nodes/stats/indices/indexing latency — the leading indicator that t_bulk is growing and the visibility window is widening.

When sync stalls, run ManagementSystem.updateIndex(index, SchemaAction.REINDEX) during a maintenance window rather than waiting on the queue to drain. See the upstream Elasticsearch Refresh API for tuning the segment lifecycle that governs t_refresh.

Python Integration Pattern

A production ingestion pipeline must treat an indexed write as fallible: the search backend can be mid-relocation, a mutation can partially commit, and the async queue can apply backpressure. The gremlin-python pattern below wraps an idempotent upsert with an explicit transaction boundary, bounded exponential backoff, and deterministic error handling so a transient index failure retries safely instead of double-writing or hard-failing the batch.

python

import logging
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
)
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.traversal import T
from gremlin_python.driver.protocol import GremlinServerError

logger = logging.getLogger(__name__)


@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10),
    retry=retry_if_exception_type((ConnectionError, GremlinServerError, TimeoutError)),
    reraise=True,
)
def ingest_vertex_with_indexed_property(conn_url, vertex_id, key, value):
    conn = DriverRemoteConnection(conn_url, "g")
    try:
        g = traversal().with_remote(conn)

        # Idempotent upsert: create the vertex if it is missing, then set the
        # indexed property. Safe to retry because coalesce() never double-adds.
        result = (
            g.V(vertex_id)
            .fold()
            .coalesce(
                __.unfold(),
                __.addV("entity").property(T.id, vertex_id),
            )
            .property(key, value)
            .next()
        )

        logger.info("committed vertex %s with indexed property %s", vertex_id, key)
        return result
    except GremlinServerError as exc:
        # A mapping/schema violation is not retryable — surface it, do not loop.
        logger.error("gremlin execution failed for %s: %s", vertex_id, exc)
        raise
    finally:
        conn.close()

Pipeline design rules that keep the index consistent with storage:

Make writes idempotent. coalesce(unfold(), addV(...)) is safe to retry because it never double-adds a vertex, so an exponential-backoff retry after a transient index failure cannot corrupt cardinality.
Force a deterministic flush. JanusGraph auto-commits on traversal execution, but end each mutation with .next() or .iterate() so the transaction boundary is explicit and the index dispatch is triggered predictably.
Cap retries and dead-letter the rest. Use tenacity for bounded backoff on GremlinServerError and connection drops. Never loop forever — a SchemaViolationException is not transient, so route it to a dead-letter queue for manual reconciliation rather than retrying it.
Validate before submission. Reject malformed property types and cardinality mismatches at the application layer. Vertex and Edge Validation prevents a bad payload from reaching the index serializer, where the same error is far harder to trace.

Connection Lifecycle & Pool Management

Index-lag symptoms and connection-starvation symptoms look identical from the traversal side — both surface as a TimeoutException on a query that used to be fast — so pool sizing has to be correct before you can trust any indexing diagnosis. gremlin-python opens a connection pool per DriverRemoteConnection; each connection multiplexes requests up to a per-connection concurrency limit, and an undersized pool queues traversals client-side while the storage and index backends sit idle.

Sizing and lifecycle rules for an ingestion pipeline:

Size the pool to backend concurrency, not to worker count. Set pool_size near the number of concurrent in-flight traversals the storage cluster can absorb at QUORUM, and cap max_workers per connection so a single connection does not head-of-line-block the batch. Oversizing the pool moves the bottleneck into the storage coordinator; undersizing it starves ingestion while the backend is healthy.
Bound idle connections. Long-lived idle connections accumulate against the Gremlin Server and against the CQL driver behind it. Set an idle timeout so the pool reclaims connections between batches rather than holding sockets open across an entire ingestion run.
Reuse the pool across the batch. Opening a DriverRemoteConnection per vertex, as a naive loop does, thrashes TCP and TLS setup and multiplies t_queue. Construct the pool once per worker and close it in a finally block at batch end.
Match the retry policy to the pool. An aggressive retry against an already-saturated pool amplifies the outage. Bound the retry window below the pool’s idle timeout so a failed flush does not queue behind an unbounded retry loop and stall the async dispatch queue.

The full sizing model — how pool depth, per-connection concurrency, and storage coordinator threads interact — is in Connection Pooling; settle it before tuning indexing, because a starved pool will masquerade as index lag through every diagnostic below.

Schema Evolution & Collision Management

Indexing rules change as graph topology evolves, and an uncoordinated change is a primary source of index divergence. Adding a mixed index to an existing property requires a full reindex; dropping an index without draining the mutation queue leaves orphaned documents in the search cluster; two pipelines registering the same index name against different mappings will collide at commit time.

Gate every index change through Schema Evolution and CI Gating so a deployment cannot alter index cardinality, data type, or tokenizer configuration without a documented migration path and a reference state to diff against.
When concurrent pipelines register identical index names or overlapping property mappings, detect and resolve the conflict deterministically — the atomic detection-and-remediation procedure is in Automating Property Index Collision Resolution.
Call ManagementSystem.commit() only after verifying registration with mgmt.getGraphIndex(indexName).getIndexStatus(propertyKey). The status must equal SchemaStatus.ENABLED before you route production queries to the index — an index stuck at INSTALLED or REGISTERED silently falls back to a full scan.

Diagnostics & Operational Fallbacks

When an index and its storage backend disagree, work from symptom to diagnosis command to resolution rather than reindexing blindly. The table below covers the failure modes you will actually page on.

Symptom	Diagnose	Resolve
Equality query fast in staging, full-scanning in production	`.profile()` shows a scan step, not an index step; `mgmt.printIndexes()` shows the key not `ENABLED`	Build/enable the composite index for the key; keep `query.force-index=true` so future gaps throw instead of scanning
Recent writes missing from a full-text query	`/_nodes/stats/indices/indexing` latency rising; `IndexProvider` JMX queue climbing	Visibility window stretching under load — throttle the producer, or scope `bulk-refresh=wait_for` to the read-your-writes path only
Full-text predicate matches nothing after a backend switch	Documents exist in `/_cat/indices?v` but the analyzer/mapping differs from what JanusGraph expects	Mapping drift — realign the field mapping/analyzer, then `REINDEX` via the Management API; verify on both backends if mid-migration
`EsRejectedExecutionException` on indexed writes	`/_cat/thread_pool/write?v` shows non-zero rejections; bulk payload near/over 5 MB	Lower the bulk size; add producer-side backpressure; scale index write threads before retrying
Ingestion traversal times out during bursts	`nodetool tpstats` clean but the driver throws `TimeoutException`; pool utilization at 100%	Starved client pool, not index lag — resize per the Connection Pooling model and cap batch concurrency
`SchemaViolationException` at commit	Two pipelines registered the same index name with different cardinality/type	Halt the colliding pipeline; resolve deterministically before retrying, do not force-commit over it

For ScyllaDB-backed clusters, run nodetool repair before any index rebuild so the index is not populated from an under-replicated storage view — the read/write consistency benchmarks that bound how tight the visibility window can safely go are in ScyllaDB Migration. The Elasticsearch and OpenSearch transport specifics behind these commands are in Elasticsearch Integration and OpenSearch Sync Patterns.

Frequently Asked Questions

When should I use a composite index instead of a mixed index? Use a composite index whenever every query against the property is exact equality, or when you need a uniqueness constraint. A composite index is written inside the same storage mutation as the data, so it is transactionally consistent and never trails the graph. Reach for a mixed index only when the predicate is range, full-text, geospatial, or an ad-hoc multi-property combination that a composite index cannot express.

Why is my newly created index not returning results? Almost always because it never reached ENABLED. A freshly registered index moves through INSTALLED and REGISTERED before it serves traffic, and adding an index to a property that already has data requires a REINDEX to backfill existing rows. Confirm the status with mgmt.getGraphIndex(name).getIndexStatus(key) and run ManagementSystem.updateIndex(index, SchemaAction.REINDEX) if the backfill has not completed.

Does raising storage consistency to QUORUM fix stale reads from a mixed index? No. Storage consistency governs durability and read repair inside the storage cluster only. Mixed-index visibility is a separate downstream concern bounded by t_queue + t_bulk + t_refresh. Raising the storage level increases write latency without shrinking the index visibility window. Use selective bulk-refresh=wait_for for the specific reads that need read-after-write.

Should I lower refresh_interval to get read-after-write on indexed queries? No. A lower refresh interval reduces t_refresh but adds segment-merge I/O and never gives transactional read-after-write. Below 2s under load it degrades indexing throughput. Immediate visibility comes from bulk-refresh=wait_for scoped to the writes that need it, not from the global refresh interval.

How do I stop unindexed queries from silently full-scanning? Set query.force-index=true. It makes any traversal whose predicate cannot be answered by an index throw an exception instead of scanning the whole vertex or edge space. Combined with schema.default=none, it turns two of the most common silent-degradation modes into loud, catchable errors at ingestion and query time.

Up a level: Graph Schema Validation & Modeling Strategies — the subsystem this indexing surface sits inside.
Automating Property Index Collision Resolution — deterministic detection and remediation when concurrent pipelines register conflicting index definitions.
Vertex and Edge Validation — the type and cardinality enforcement that decides which keys can back an index.
Schema Evolution and CI Gating — gating index changes so cardinality, type, or tokenizer edits cannot ship unversioned.
Mixed Index Routing — how the optimizer selects a backend for a predicate at execution time.
Connection Pooling — the client sizing model that separates pool starvation from index lag.