Anestis' notes

Using Cassandra Operator with Openshift 4.x

I’ve been running Cassandra on Openshift for some time now through a homebrew stateful set, but it’s a tedious process, and prone to failures. Naturally one of the first things I tried with OpenShift 4.x is to make use of Instaclustr Cassandra Operator to automate the provisioning of Cassandra clusters.

Deployment

Just follow the instructions for how to Deploy the Operator. First deploy the custom resource descriptors:

❯ oc apply -f deploy/crds.yaml

Then the operator itself:

❯ oc apply -f deploy/bundle.yaml

After the operator pod starts up, I tried to spin up a small 3-node Cassandra cluster using the following YAML:

apiVersion: cassandraoperator.instaclustr.com/v1alpha1
kind: CassandraDataCenter
metadata:
  name: test-cluster-dc1
  labels:
    app: cassandra
datacenter: dc1
cluster: test-cluster
spec:
  prometheusSupport: true
  optimizeKernelParams: false
  serviceAccountName: cassandra
  nodes: 3
  cassandraImage: "gcr.io/cassandra-operator/cassandra-3.11.6:latest"
  sidecarImage: "gcr.io/cassandra-operator/cassandra-sidecar:latest"
  imagePullPolicy: Always
  imagePullSecrets:
    - name: regcred
  resources:
    limits:
      memory: 2Gi
    requests:
      memory: 2Gi
  sidecarResources:
    limits:
      memory: 512Mi
    requests:
      memory: 512Mi
  dataVolumeClaimSpec:
    accessModes:
      - ReadWriteOnce
    storageClassName: thick
    resources:
      requests:
        storage: 5Gi
  cassandraAuth:
    authenticator: PasswordAuthenticator
    authorizer: CassandraAuthorizer
    roleManager: CassandraRoleManager

Easy right?

❯ oc get pod
NAME                                 READY   STATUS             RESTARTS   AGE
cassandra-operator-55d759bcd-6h5sm   1/1     Running            0          47s
cassandra-test-cluster-dc1-rack1-0   1/2     CrashLoopBackOff   1          22s

CrashLoopBackOff. Let’s see if we can get any meaningful error from the logs:

❯ oc logs cassandra-test-cluster-dc1-rack1-0 -c cassandra
+ '[' unset == true ']'
+ exec /bin/bash -xue /usr/bin/entry-point /tmp/operator-config /tmp/cassandra-rack-config
+ for config_directory in "$@"
+ cd /tmp/operator-config
+ find -L . -name '..*' -prune -o '(' -type f -print0 ')'
+ cpio -pmdLv0 /etc/cassandra
cpio: /etc/cassandra/./cassandra.yaml.d/001-operator-overrides.yaml: Cannot open: Permission denied
cpio: /etc/cassandra/./jvm.options.d/001-jvm-memory-gc.options: Cannot open: Permission denied
cpio: /etc/cassandra/./cassandra-env.sh.d/001-cassandra-exporter.sh: Cannot open: Permission denied
0 blocks

Well, dammit. Permission denied. Aren’t containers supposed to be solving these problems? Let’s work the problem a little further:

❯ oc debug pod/cassandra-test-cluster-dc1-rack1-0
Defaulting container name to cassandra.
Use 'oc describe pod/cassandra-test-cluster-dc1-rack1-0-debug -n akka-ledger' to see all of the containers in this pod.

Starting pod/cassandra-test-cluster-dc1-rack1-0-debug ...
Pod IP: 10.128.4.38
If you don't see a command prompt, try pressing enter.

$ whoami
1000690000
$ cat > /etc/cassandra/hello_world
/bin/sh: 3: cannot create /etc/cassandra/hello_world: Permission denied
$ ls -l /etc/| grep cassandra
drwxr-xr-x. 1 cassandra cassandra   267 Jul 30 14:14 cassandra
$ grep cassandra /etc/passwd
cassandra: x:999:999::/home/cassandra:

So there is the issue. User cassandra (uid 999) owns the /etc/cassandra folder, but when the image is running on OpenShift the user assigned to the pod has uid 1000690000. Going over this issue requires understanding of OpenShift SCC and how they can be adjusted.

Understanding Service Accounts and SCCs

Security Context Constraints, or SCCs, are the means for administrators in Openshift to control permissions for pods. Here are some relevant links:

What we need is to grant the operator the ability to spawn cassandra pods using the uid that is baked into the images, instead of randomly getting one in the range assigned in the namespace where the operator is running. See for example the namespace configuration I am using:

❯ oc describe namespace/akka-ledger
Name:         akka-ledger
Labels:       <none>
Annotations:  openshift.io/description:
              openshift.io/display-name:
              openshift.io/requester: AGeorgiadis
              openshift.io/sa.scc.mcs: s0:c26,c20
              openshift.io/sa.scc.supplemental-groups: 1000690000/10000
              openshift.io/sa.scc.uid-range: 1000690000/10000
Status:       Active

No resource quota.

No LimitRange resource.

openshift.io/sa.scc.uid-range annotation is being used to define the range of uids that will be used, hence the value of 1000690000 that I got.

So what can we do? Turns out one of the SCCs defined out of the box on OpenShift is anyuid, which allows the operator to override the uid range defined in the namespace and have the cassandra server run as uid 999 in the pod. This is an OpenShift specific issue, since I can use the same operator in a stock Kubernetes installation without getting any error. We can modify the cassandra Role that has been created and add the following rule to allow use of anyuid SCC:

- apiGroups:
  - security.openshift.io 
  resourceNames:
  - anyuid
  resources:
  - securitycontextconstraints 
  verbs: 
  - use

This role is bound to the cassandra service account which is declared in the pod spec of the stateful set that the operator is creating to provision the cassandra server pods.

After that, let’s monitor the logs again to verify that the pod starts with the correct uid:

[...]
WARN  [main] StartupChecks.java:332 Directory /var/lib/cassandra/data doesn't exist
ERROR [main] CassandraDaemon.java:775 Has no permission to create directory /var/lib/cassandra/data

So the process now succesfully updated the /etc/cassandra folder, but got a new error after a few steps. Again, let’s use debug to understand the problem:

❯ oc debug pod/cassandra-test-cluster-dc1-rack1-0
Defaulting container name to cassandra.
Use 'oc describe pod/cassandra-test-cluster-dc1-rack1-0-debug -n akka-ledger' to see all of the containers in this pod.

Starting pod/cassandra-test-cluster-dc1-rack1-0-debug ...

Pod IP: 10.128.4.41
If you don't see a command prompt, try pressing enter.

$ whoami
cassandra
$ ls -la /var/lib/cassandra
total 0
drwxr-xr-x. 2 root root  6 Aug  5 14:31 .
drwxr-xr-x. 1 root root 57 Jul 30 14:14 ..

So the pod has been configured with the proper uid. Unfortunately when the PV was mounted the permissions set only allow root to make notifications to the filesystem. In order to resolve this the operator would have to specify fsGroup: 999 on the stateful set that is used to provision the pods (see e.g. Set the security context for a Pod). Can we achieve the same thing without requiring changes to the operator code?

Turns out, we can. Just create a new SCC that presets the fsGroup to a suitable range. Like for example the following one:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: false
allowPrivilegedContainer: false
allowedCapabilities: null
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
  range:
  - min: 999
    max: 1000
  type: MustRunAs
groups: []
kind: SecurityContextConstraints
metadata:
  name: cassandra
priority: 10
readOnlyRootFilesystem: false
requiredDropCapabilities:
- MKNOD
runAsUser:
  type: RunAsAny
seLinuxContext:
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
users: []
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret

This is the same as anyuid SCC, with the highlighted changes applied, in place of a type: RunAsAny policy. Let’s edit again the cassandra role, replace anyuid with cassandra in resourceNames list and recreate the datacenter crd cluster definition.

After a while all 3 pods have been created correctly:

❯ oc get pod
NAME                                 READY   STATUS              RESTARTS   AGE
cassandra-operator-55d759bcd-8vtwq   1/1     Running             0          23m
cassandra-test-cluster-dc1-rack1-0   2/2     Running             0          3m43s
cassandra-test-cluster-dc1-rack1-1   2/2     Running             0          2m11s
cassandra-test-cluster-dc1-rack1-2   2/2     Running             0          27s

Nodetool status also reports a proper cluster formation:

❯ oc exec cassandra-test-cluster-dc1-rack1-0 -c cassandra -- nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.129.4.55  85.2 KiB   256          69.8%             2495ad71-384a-43fe-996f-65d245c15fe4  rack1
UN  10.128.4.42  84.79 KiB  256          65.4%             84011ee3-0dc5-482e-90ab-d06cf0844155  rack1
UN  10.129.2.13  15.5 KiB   256          64.8%             12bb34c0-1ac5-49dc-971d-9c0bc1df7a81  rack1

We can now follow the rest of the guide.

TODO

  • Enable optimizeKernelParams