Clustering

Clustering enables a multi-node deployment against a shared repository. The content (database) and blob storage are shared, while each node keeps its own search index and temporary area. Nodes coordinate through database leases (locks) and a journal that replays transactions across nodes.

Configuration

Repository

# <repository>/etc/repository.yml
cluster:
  enabled: true
  nodeId: node-1   # optional; unique per node

Override per node with the env var CMS_CLUSTER_NODE_ID, or framework properties org.mintjams.jcr.cluster.nodeId / org.mintjams.jcr.cluster.enabled. If nodeId is omitted, the host name is used (or a random id).

Workspace (shared storage)

# <workspace>/etc/jcr/jcr.yml
datasource:
  jdbcURL: jdbc:postgresql://db:5432/jcr_${workspace.name}
  username: jcr
  password: secret
  driverClassName: org.postgresql.Driver
blobstore:
  type: fs
  directory: /mnt/shared/cms/blobs/${workspace.name}
search:
  indexPath: /var/lib/cms/search/${workspace.name}   # node-local fast storage

Variables such as ${repository.home}, ${workspace.name} and ${cluster.nodeId} are substituted. The search index is kept per node and rebuilt automatically from content if empty.

Where persistent state lives

State	Standalone (default)	Clustered
Content, ACLs, journal	embedded H2	shared DB (e.g. PostgreSQL), one DB per workspace
Blobs (binaries)	local files	shared storage (NFS, etc.)
Full-text search index	local	node-local

Files that must be identical on every node

The following "identity files" must be identical across all nodes (auto-generated on first boot; do not regenerate on the second and later nodes — copy them from the first):

secrets/secret-key.yml (encryption key for stored secrets)
etc/boot.id (repository identifier; used to derive keys for masked values)
etc/idp-keystore.p12 / etc/sp-keystore.p12 (SAML keys)
etc/idp.yml / etc/saml2.yml

The recommended approach is to put the repository directory on shared storage (so etc/ and secrets/ are shared automatically). The temporary directory (tmp/) is wiped at startup, so in a cluster it automatically uses tmp/nodes/<nodeId> and must not be shared.

Journal & coordination

Every transaction is recorded in a journal, and each node's poller (every 2 seconds) replays transactions from other nodes. This makes cache invalidation, index updates and OSGi events (Camel route redeployment, CMS events, SSE/GraphQL subscriptions) cluster-aware.

Coordination tables are created automatically:

jcr_cluster_nodes — node registry; refreshes last_heartbeat every 30s
jcr_cluster_locks — lease locks (with TTL, so a crash never blocks indefinitely)
jcr_cluster_signals — a signal bus for short-lived control notifications

Single-node work — workspace startup, blob cleanup, content deployment — is serialized with leases.

Procedure (overview)

Provision a PostgreSQL database per workspace for JCR (and one for BPM if used)
Install the PostgreSQL JDBC driver bundle into Felix
Put the repository directory on shared storage (at minimum, share blobstore.directory across nodes)
Configure each workspace's jcr.yml#datasource (and bpm.yml#jdbcURL if needed) identically on all nodes
Share the identity files across nodes (on first boot, start a single node alone)
Enable cluster.enabled and give each node a unique nodeId. Keep node clocks NTP-synchronized
Place the nodes behind a load balancer (sticky sessions recommended)

Single-execution task guards

To run a scheduled task as exactly one execution at a time — excluding overlapping executions on the same node and across cluster nodes alike — take a session-scoped JCR lock on a lock resource.

def lock = repositorySession.getResource("/var/locks/nightly-report")
        .tryLock(false, true, 600)
if (lock != null) {
    try {
        // ... exactly one execution at a time ...
    } finally {
        lock.unlock()
    }
}

Lock state lives in the workspace database, so the same code runs unchanged in standalone and clustered deployments. The lock is released automatically when the session ends, and the timeout (seconds) bounds how long a crashed node can keep it. For cluster information, cluster.isClusterEnabled(), cluster.nodeId and cluster.listMembers() are available.

Monitoring

Use the GraphQL cluster query (admin), or the Cluster card in the Dashboard Operations section, to review each node's heartbeat (liveness). A node silent for three intervals (~90s) is logged as a warning.

Cautions

Clock skew breaks the stability window (10s). NTP synchronization is required.
External databases and blob storage are not auto-managed. Cleaning up the DB/blobs after deleting a workspace, and clearing the DB before recreating one of the same name, are manual steps.
The search index is per-node and not replicated (it rebuilds automatically when empty).