CockroachDB

Usage Notes

⚠️

SpiceDB's Watch API requires CockroachDB's Experimental Changefeed (opens in a new tab) to be enabled.

Recommended for multi-region deployments, with configurable region awareness
Enables horizontal scalability by adding more SpiceDB and CockroachDB instances
Resiliency to individual CockroachDB instance failures
Query and data balanced across the CockroachDB
Setup and operational complexity of running CockroachDB

Developer Notes

Code can be found here (opens in a new tab)
Documentation can be found here (opens in a new tab)
Implemented using pgx (opens in a new tab) for a SQL driver and connection pooling
Has a native changefeed
Stores migration revisions using the same strategy as Alembic (opens in a new tab)

Configuration

Required Parameters

Parameter	Description	Example
`datastore-engine`	the datastore engine	`--datastore-engine=cockroachdb`
`datastore-conn-uri`	connection string used to connect to CRDB	`--datastore-conn-uri="postgres://user:password@localhost:26257/spicedb?sslmode=disable"`

Optional Parameters

Parameter	Description	Example
`datastore-max-tx-retries`	Maximum number of times to retry a query before raising an error	`--datastore-max-tx-retries=50`
`datastore-tx-overlap-strategy`	The overlap strategy to prevent New Enemy on CRDB (see below)	`--datastore-tx-overlap-strategy=static`
`datastore-tx-overlap-key`	The key to use for the overlap strategy (see below)	`--datastore-tx-overlap-key="foo"`
`datastore-conn-pool-read-max-idletime`	Maximum amount of time a connection can idle in a remote datastore's connection pool (default 30m0s)	`--datastore-conn-pool-read-max-idletime=30m0s`
`datastore-conn-pool-read-max-lifetime`	Maximum amount of time a connection can live in a remote datastore's connection pool (default 30m0s)	`--datastore-conn-pool-read-max-lifetime=30m0s`
`datastore-conn-pool-read-max-lifetime-jitter`	Waits rand(0, jitter) after a connection is open for max lifetime to actually close the connection	`--datastore-conn-pool-read-max-lifetime-jitter=6m`
`datastore-conn-pool-read-max-open`	Number of concurrent connections open in a remote datastore's connection pool (default 20)	`--datastore-conn-pool-read-max-open=20`
`datastore-conn-pool-read-min-open`	Number of minimum concurrent connections open in a remote datastore's connection pool (default 20)	`--datastore-conn-pool-read-min-open=20`
`datastore-conn-pool-write-healthcheck-interval`	Amount of time between connection health checks in a remote datastore's connection pool (default 30s)	`--datastore-conn-pool-write-healthcheck-interval=30s`
`datastore-conn-pool-write-max-idletime`	Maximum amount of time a connection can idle in a remote datastore's connection pool (default 30m0s)	`--datastore-conn-pool-write-max-idletime=30m0s`
`datastore-conn-pool-write-max-lifetime`	Maximum amount of time a connection can live in a remote datastore's connection pool (default 30m0s)	`--datastore-conn-pool-write-max-lifetime=30m0s`
`datastore-conn-pool-write-max-lifetime-jitter`	Waits rand(0, jitter) after a connection is open for max lifetime to actually close the connection	`--datastore-conn-pool-write-max-lifetime-jitter=6m`
`datastore-conn-pool-write-max-open`	Number of concurrent connections open in a remote datastore's connection pool (default 10)	`--datastore-conn-pool-write-max-open=10`
`datastore-conn-pool-write-min-open`	Number of minimum concurrent connections open in a remote datastore's connection pool (default 10)	`--datastore-conn-pool-write-min-open=10`
`datastore-query-split-size`	The (estimated) query size at which to split a query into multiple queries	`--datastore-query-split-size=5kb`
`datastore-gc-window`	Sets the window outside of which overwritten relationships are no longer accessible	`--datastore-gc-window=1s`
`datastore-revision-fuzzing-duration`	Sets a fuzzing window on all zookies/zedtokens	`--datastore-revision-fuzzing-duration=50ms`
`datastore-readonly`	Places the datastore into readonly mode	`--datastore-readonly=true`
`datastore-follower-read-delay-duration`	Amount of time to subtract from non-sync revision timestamps to ensure follower reads	`--datastore-follower-read-delay-duration=4.8s`
`datastore-relationship-integrity-enabled`	Enables relationship integrity checks, only supported on CRDB	`--datastore-relationship-integrity-enabled=false`
`datastore-relationship-integrity-current-key-id`	Current key id for relationship integrity checks	`--datastore-relationship-integrity-current-key-id="foo"`
`datastore-relationship-integrity-current-key-filename`	Current key filename for relationship integrity checks	`--datastore-relationship-integrity-current-key-filename="foo"`
`datastore-relationship-integrity-expired-keys`	Config for expired keys for relationship integrity checks	`--datastore-relationship-integrity-expired-keys="foo"`

Understanding the New Enemy Problem with CockroachDB

CockroachDB is a Spanner-like datastore supporting global, immediate consistency, with the mantra "no stale reads." The CockroachDB implementation should be used when your SpiceDB service runs in multiple geographic regions, and Google's Cloud Spanner is unavailable (e.g. AWS, Azure, bare metal.)

In order to prevent the new-enemy problem, we need to make related transactions overlap. We do this by choosing a common database key and writing to that key with all relationships that may overlap. This tradeoff is cataloged in our blog post "The One Crucial Difference Between Spanner and CockroachDB (opens in a new tab)", and means we are trading off write throughput for consistency.

Overlap Strategies

CockroachDB datastore users that are willing to rely on more subtle guarantees to mitigate the New Enemy Problem can configure the overlap strategy with the flag --datastore-tx-overlap-strategy. The available strategies are:

Strategy	Description
`static` (default)	All writes overlap to protect against the New Enemy Problem at the cost of write throughput
`prefix`	Only writes that contain objects with same prefix overlap (e.g. `tenant1/user` and `tenant2/user` can be written in concurrently)
`request`	Only writes with the same `io.spicedb.requestoverlapkey` gRPC request header overlap enabling applications to decide on-the-fly which writes have causual dependencies. Writes without any header act the same as `insecure`.
`insecure`	No writes overlap, providing the best write throughput, does not protect against the New Enemy Problem

Depending on your application, insecure may be acceptable, and it avoids the performance cost associated with the static and prefix options. If the New Enemy Problem is not a concern for your application, consider using the insecure strategy.

When is `insecure` overlap a problem?

Using insecure overlap strategy for SpiceDB with CockroachDB means that it is possible that timestamps for two subsequent writes will be out of order. When this happens, it's possible for the New Enemy Problem to occur.

Let's look at how likely this is, and what the impact might actually be for your workload.

When can timestamps be reversed?

Before we look at how this can impact an application, let's first understand when and how timestamps can be reversed in the first place.

When two writes are made in short succession against CockroachDB
And those two writes hit two different gateway nodes
And the CRDB gateway node clocks have a delta D
And the writes touch disjoint sets of relationships
And those two writes are sent within the time delta D between the gateway nodes
And the writes land in ranges whose followers are disjoint sets of nodes
And other independent cockroach processes (heartbeats, etc) haven't coincidentally synced the gateway node clocks during the writes.

Then it's possible that the second write will be assigned a timestamp earlier than the first write. In the next section we'll look at whether that matters for your application, but for now let's look at what makes the above conditions more or less likely:

Clock skew: A larger clock skew gives a bigger window in which timestamps can be reversed - note that CRDB enforces a max offset between clocks, and getting within some fraction of that max offset will kick the node from the cluster.
Network congestion: or anything that interferes with node heartbeating - this increases the length of time that clocks can be desynchronized befor Cockroach notices and syncs them back up.
Cluster size: When there are many nodes, it is more likely that a write to one range will not have follower nodes that overlap with the followers of a write to another range, and it also makes it more likely that the two writes will have different gateway nodes. On the other side, a 3 node cluster with replicas: 3 means that all writes will sync clocks on all nodes.
Write rate: If the write rate is high, it's more likely that two writes will hit the conditions to have reversed timestamps. If writes only happen once every max offset period for the cluster, it's impossible for their timestamps to be reversed.

The likelihood of a timestamp reversal is dependent on the cockroach cluster and the application's usage patterns.

When does a timestamp reversal matter?

Now we know when timestamps could be reversed. But when does that matter to your application?

The TL;DR is: only when you care about the New Enemy Problem.

Let's take a look at a couple of examples of how reversed timestamps may be an issue for an application storing permissions in SpiceDB.

Neglecting ACL Update Order

Two separate WriteRelationship calls come in:

A: Alice removes Bob from the shared folder
B: Alice adds a new document not-for-bob.txt to the shared folder

The normal case is that the timestamp for A < the timestamp for B.

But if those two writes hit the conditions for a timestamp reversal, then B < A.

From Alice's perspective, there should be no time at which Bob can ever see not-for-bob.txt. She performed the first write, got a response, and then performed the second write.

But this isn't true when using MinimizeLatency or AtLeastAsFresh consistency. If Bob later performs a Check request for the not-for-bob.txt document, it's possible that SpiceDB will pick an evaluation timestamp such that B < T < A, so that the document is in the folder and bob is allowed to see the contents of the folder.

Note that this is only possible if A - T < quantization window: the check has to happen soon enough after the write for A that it's possible that SpiceDB picks a timestamp in between them. The default quantization window is 5s.

Application Mitigations for ACL Update Order

This could be mitigated in your application by:

Not caring about the problem
Not allowing the write from B within the max_offset time of the CRDB cluster (or the quantization window).
Not allowing a Check on a resource within max_offset of its ACL modification (or the quantization window).

Mis-apply Old ACLs to New Content

Two separate API calls come in:

A: Alice remove Bob as a viewer of document secret
B: Alice does a FullyConsistent Check request to get a ZedToken
C: Alice stores that ZedToken (timestamp B) with the document secret when she updates it to say Bob is a fool.

Same as before, the normal case is that the timestamp for A < the timestamp for B, but if the two writes hit the conditions for a timestamp reversal, then B < A.

Bob later tries to read the document. The application performs an AtLeastAsFresh Check for Bob to access the document secret using the stored Zedtoken (which is timestamp B.)

It's possible that SpiceDB will pick an evaluation timestamp T such that B < T < A, so that bob is allowed to read the newest contents of the document, and discover that Alice thinks he is a fool.

Same as before, this is only possible if A - T < quantization window: Bob's check has to happen soon enough after the write for A that it's possible that SpiceDB picks a timestamp in between A and B, and the default quantization window is 5s.

Application Mitigations for Misapplying Old ACLs

This could be mitigated in your application by:

Not caring about the problem
Waiting for max_offset (or the quantization window) before doing the fully-consistent check.

When does a timestamp reversal not matter?

There are also some cases when there is no New Enemy Problem even if there are reversed timestamps.

Non-sensitive domain

Not all authorization problems have a version of the New Enemy Problem, which relies on there being some meaningful consequence of hitting an incorrect ACL during the small window of time where it's possible.

If the worst thing that happens from out-of-order ACL updates is that some users briefly see some non-sensitive data, or that a user retains access to something that they already had access to for a few extra seconds, then even though there could still effectively be a "New Enemy Problem," it's not a meaningful problem to worry about.

Disjoint SpiceDB Graphs

The examples of the New Enemy Problem above rely on out-of-order ACLs to be part of the same permission graph. But not all ACLs are part of the same graph, for example:

definition user {}
 
definition blog {
    relation author: user
    permission edit = author
}
 
defintion video {
    relation editor: user
    permission change_tags = editor
}

A: Alice is added as an author of the Blog entry new-enemy B: Bob is removed from the editors of the spicedb.mp4 video

If these writes are given reversed timestamps, it is possible that the ACLs will be applied out-or-order and this would normally be a New Enemy Problem. But the ACLs themselves aren't shared between any permission computations, and so there is no actual consequence to reversed timestamps.

Garbage Collection Window

As of February 2023, the default garbage collection window (opens in a new tab) has changed to 1.25 hours for CockroachDB Serverless and 4 hours for CockroachDB Dedicated.

SpiceDB warns if the garbage collection window as configured in CockroachDB is smaller than the SpiceDB configuration.

If you need a longer time window for the Watch API or querying at exact snapshots, you can adjust the value in CockroachDB (opens in a new tab):

ALTER ZONE default CONFIGURE ZONE USING gc.ttlseconds = 90000;

Relationship Integrity

Relationship Integrity is a new experimental feature in SpiceDB that ensures that data written into the supported backing datastores (currently: only CockroachDB) is validated as having been written by SpiceDB itself.

What does relationship integrity ensure?

Relationship integrity primarily ensures that all relationships written into the backing datastore were written via a trusted instance of SpiceDB or that the caller has access to the key(s) necessary to write those relationships. It ensures that if someone gains access to the underlying datastore, they cannot simply write new relationships of their own invention.

What does relationship integrity not ensure?

Since the relationship integrity feature signs each individual relationship, it does not ensure that removal of relationships is by a trusted party. Schema is also currently unverified, so an untrusted party could change it as well. Support for schema changes will likely come in a future version.

Setting up relationship integrity

To run with relationship integrity, new flags must be given to SpiceDB:

spicedb serve ...existing flags...
--datastore-relationship-integrity-enabled
--datastore-relationship-integrity-current-key-id="somekeyid"
--datastore-relationship-integrity-current-key-filename="some.key"

Place the generated key contents (which must support an HMAC key) in some.key

Deployment Process

Start with a clean datastore for SpiceDB. At this time, migrating an existing SpiceDB installation is not supported.
Run the standard migrate command but with relationship integrity flags included.
Run SpiceDB with the relationship integrity flags included.

Cloud Spanner Memdb