Container Storage Interface Driver
Goal
In the end, OpenShift installation should provide a cluster that is usable out of the box. Users can create PVCs or StatefulSets and get a decent storage for them.
- Install one CSI driver, the most useful for the platform. Usually it’s one that provides block volumes (AWS EBS, GCP PD, Azure Disk, OpenStack Cinder, ...).
- Provide one default StorageClass that is good for a generic usage. Not too expensive, but not too slow either.
- Each cluster is different, don’t anticipate application needs, leave creation of additional StorageClasses to the cluster administrators. They know better what they need.
- If the platform provides more storage backends, provide CSI drivers for them via OLM. This is out of scope of this guide, but library-go can be used for it too, see AWS EFS CSI driver operator. It even supports un-installation of the CSI driver.
Overview
Installation workflow
Using GCP cloud and its Persistent Disk (PD) CSI driver as an example, but it works exactly the same on other platforms. During cluster installation, following things happen:
cluster-version-operator(CVO) startscluster-storage-operator(CSO) in namespaceopenshift-cluster-storage-operator. CVO does it by blindly applying YAML files inmanifest/directory of the operator image.- Notice that manifests in
cluster-storage-operatorimage will have the correct references to all images of all CSI drivers, CSI driver operators, CSI sidecars and so on. We have an automation that replaces the image names fromquay.io/openshifttoregistry.redhat.ioduring release build. The operator gets them as env. variables. cluster-storage-operatorsees that it runs on platform GCP and starts the GCP PD CSI driver operator in namespaceopenshift-cluster-csi-driversand createsClusterCSIDriverCR for it.cluster-storage-operatorpasses the correct images that the CSI driver operator should use as env. variables of the operator Deployment.- GCP PD CSI driver operator observes its
ClusterCSIDriverCR and installs GCP PD CSI driver. The operator reports the status of the installation to theClusterCSIDriverCR status. - GCP PD CSI driver operator gets all images that it should use for the driver + sidecars as env. vars.
cluster-storage-operatorobservesClusterCSIDriverCR status and reports its status tocluster-version-operatorusing ClusterOperator CR.cluster-version-operatorreports installation/upgrade status toClusterVersionCR.
Upgrade
Upgrade works in exactly the same way as installation, except that cluster-storage-operator, CSI driver operator and the CSI driver are updated, i.e. their Deployment / DaemonSets are updated instead of created. Updated image names are propagated exactly as during the installation - as env. variables from CVO through CSO to CSI driver operator(s).
Cluster Un-installation
Nothing is needed from the driver or its operator during cluster un-installation. However,
openshift-install destroy cluster itself should destroy all volumes that were dynamically provisioned in the cluster. In other words:
- The CSI driver must tag / label all volumes created during the cluster lifetime with the cluster ID. Most CSI drivers have a special argument to add tags to all created volumes. Their CSI driver operator can file it easily. See GCP PD CSI driver operator that sets tag (=label in GCP jargon)
kubernetes-io-cluster-${CLUSTER_ID}=owned. openshift-install destroy clustermust list all volumes that are tagged with the cluster ID and delete them. See code for GCP PD.
ClusterCSIDriver
ClusterCSIDriver is a Custom Resource. Its CRD is created during cluster installation. Each CSI driver operator uses its own dedicated CR name. For example, GCP PD CSI driver operator uses ClusterCSIDriver instance named pd.csi.storage.gke.io. See openshift/api repository for details about the available fields in the CR and for allowed CSI drivers.
The ClusterCSIDriver is heavily based on ClusterOperator CR - it has the same Spec and Status.
ClusterCSIDriver does not provide any configuration of the operator nor the driver except for log level, both of the operator and its operand (=the CSI driver). A CSI driver operator should install the CSI driver with some default parameters that suit the best in OpenShift. If the operator needs it, it can get parameters from install-config or other OCP components by getting corresponding API objects - most of the cluster is up and running at the time CSI driver operator starts.
Using ClusterCSIDriver.Status the CSI driver operator reports status of the CSI driver "up the chain", to cluster-storage-operator. Especially see ClusterCSIDriver.Status.Conditions, they're directly transferred by cluster-storage-operator to overall status of storage in ClusterOperator CR.
openshift/library-go
openshift/library-go is a collection of functions that allow us to build OpenShift operators easily. Most of the functionality of a CSI driver operator is already available there.
In the ideal case, a CSI driver operator just provides yaml files of the CSI driver (Deployment, DaemonSet, ServiceAccount, Role, ClusterRole, RoleBindings, ClusterRoleBindings, ...) and initializes + starts library-go Controllers that handle the rest.
There is already a lot of internal OpenShift knowledge in these controllers, for example:
* Inject cluster-wide HTTP proxy to driver pods.
* Inject custom CA bundles to the driver pods.
* Put the controller Pods to the master nodes (if they’re available).
* Scale down nr. of controller Pods on single-node cluster.
* Configure leader-election of the CSI sidecar appropriately for the platform.
* (Optionally) restart CSI driver pods when cloud credentials change, so the CSI driver does not need to reload the credentials on its own.
* Replace image name placeholders with the current images for an OCP release.
* Propagate log level changes.
* Create a default storage class.
* "Stomp" over any user changes in the driver Deployment / DaemonSets - the operator knows better how to run the CSI driver.
* Report its status correctly via ClusterCSIDriver.Status.Conditions.
* Etc.
CSI sidecars
OpenShift already ships CSI sidecars, usually with the version that corresponds to the Kubernetes version in each OCP release. These sidecars MUST be used by the CSI driver. A CSI driver operator gets their image names (SHAs) as env. variables.
Credentials
The CSI driver should be able to consume Secrets provided by the cloud-credentials-operator (CCO). Technically it’s possible that a CSI driver operator translates the Secrets from CCO to a format understood by the CSI driver, but you save some sweat and tears if you avoid doing so.
CredentialsRequest for the CSI driver must be present in cluster-storage-operator manifests/ directory, to be available during cluster installation and/or extraction of CredentialRequests during installation.
Metrics
In OpenShift, we provide metrics through HTTPS. Since most CSI driver and all CSI sidecars expose metrics as HTTP endpoints, we add kube-rbac-proxy containers to the driver Pods. They provide a proxy that listens on Pod's public IP address and proxy HTTPS requests from Prometheus to HTTP metric endpoints of the driver / sidecar. The HTTP endpoint is available only on loopback and it's not exposed outside of the driver Pod.
If possible, make the CSI driver working without metrics and add them later. They’re tricky to set up correctly. In general, follow GCP PD CSI driver operator example how the metric ports are exposed in the Pods and how their scraping is configured in ServiceMonitor CR.
OpenShift shall handle TLS side of things, i.e. it will provide a Secret with TLS key that can be used in kube-rbac-proxy sidecars using Secret volumes.
CI
If you do not do so yet, test your CSI driver in vanilla Kubernetes frequently using Kubernetes storage tests. Neither OpenShift or an operator is needed here and all your Kubernetes customers will benefit from it.
OpenShift packages the same tests as openshift-test binary / container image. The binary just wraps the tests with proper OpenShift privileges, but otherwise they're the same tests as upstream and consume the same manifest.yaml file. To run the tests:
- Place
manifest.yamlandkubeconfigin a single directory, e.g.data/ - Run
openshift-testsfromquay.io/repository/openshift/origin-testsimage:$ podman run -v `pwd`:/data:z --rm -it quay.io/repository/openshift/origin-tests:latest sh -c "KUBECONFIG=/data/kubeconfig TEST_CSI_DRIVER_FILES=/data/manifest.yaml /usr/bin/openshift-tests run openshift/csi --junit-dir /data/results"
In the end, a CI job will run the same tests. CI for a CSI driver is tricky to set up and probably should be done by Red Hat.
Deliverables
CSI driver
Fork the CSI driver under github.com/openshift and integrate it with Prow and Tide as described elsewhere in this guide.
Make sure that the CSI driver image is built in Red Hat's CI.
openshift/api changes
Add your CSI driver name to allowed ClusterCSIDriver names here and here. You must update cluster-storage-operator to the new openshift/api version to get the ClusterCSIDriver CRD updated during cluster installation / update. A simple go get -u github.com/openshift/api@master is enough.
CSI driver operator
The CSI driver must be installed via an operator and the operator must be based on library-go. As listed above, it has a lot of knowledge about OpenShift. You cannot use an operator previously developed in-house and using OLM!
- Ask your Red Hat representative to create a new empty repo in
gihub.com/openshift. Take the GCP PD CSI driver operator as a base and copy it there. Replace traces of "GCP PD" and "gcp-pd" in the whole repo with your CSI driver name. If you're lucky, you may be done! - The only useful code is in
pkg/operator/starter.goand it basically only instantiates + starts CSI driver controllers from library-go. - The most important thing is the
assets/directory. It contains YAML files of all objects that the CSI driver needs. - Check
${}"variables" in the files there - the operator will replace e.g.${LOG_LEVEL}with"2",${DRIVER_IMAGE}with the driver image name etc. - Check
kube-rbac-proxycontainers and how they provide HTTPS endpoints for metrics of each CSI sidecar.
CSI driver deployment
As result, the operator must deploy the CSI driver this way:
- The CSI driver runs in namespace
openshift-cluster-csi-drivers. - OpenShift can be deployed by various tools and in different modes.
- The most typical mode is a cluster with 3 dedicated master and a number of worker nodes. In this mode a CSI driver runs as:
Deploymentwith 2 replicas of CSI controler Pods (driver + external-provisioner, external-attacher, external-resizer, external snapshotter, livenessprobe and nr. of kube-rbac-proxies). These pods run on master nodes and have anti-affinity towards each other (i.e. run on different master nodes).DaemonSetthat matches all nodes (incl. masters) with the driver + node-driver-registrar + livenessprobe.- The DaemonSet has
Rollingupdate strategy + tolerates 10% of missing pods (to be able to update 10% of pods at the same time, speeding up updates of large clusters). - All containers (both in the
DaemonSetandDeployment) have CPU and memory requests filed
- In a single-node cluster, the
Deploymentwith the controller pods has 1 replica. - In a cluster without dedicated master nodes, the
Deploymenthas no node selector.
Most of the above is automatically configured and managed by our common library-go code!
CI
The operator repository must contain a yaml manifest for e2e tests in [test/e2e directory] ((https://github.com/openshift/gcp-pd-csi-driver-operator/tree/master/test/e2e) and Dockerfile.test for it. In our CI we can distribute "stuff" like the test manifest only as a container images, hence we build a container with the manifest in CI. FROM: src will then use the same image as we built for the driver sources, as checked out from github.
cluster-storage-operator
- Add
CredentialsRequestfor the CSI driver to `manifest/ directory of CSO](https://github.com/openshift/cluster-storage-operator/tree/master/manifests). Follow existing examples there. - Add complete assets of the CSI driver operator to
assets/csidriveroperators/<platform>. It must contain all RBACs that the operator needs (incl. those that the operator will grant to the driver and all CSI sidecars!),ServiceAccountandDeploymentof the CSI driver operator. Check GCE PD as an example. - Teach CSO how / when to start the CSI driver operator via a little glue code.
- Write
Get<driver name>OperatorConfig(), that returns a structure describing how to start the operator.- Replacement strings for ${OPERATOR_IMAGE} and ${DRIVER_IMAGE} in the operator Deployment.
- Platform, on which the operator should start.
- What static assets to create when the operator should run.
- Deployment asset.
- ClusterCSIDriver CR to create.
- Initialize list of supported CSI driver operators at the operator startup.
CSO will then handle the rest - if the platform where OCP runs corresponds to the platform that the CSI driver operator declared, CSO will start the operator + ClusterCSIDriver CR.
CSI driver operator deployment
CSO must deploy the CSI driver operator as a Deployment with 1 replica that runs on the master nodes, if they’re available.
Development
- It’s possible to start the CSI driver operator on the command line, see GCP PD example.
- It’s possible to start the cluster-storage-operator on the command line. It's currently not documented, but provide expected env. variables + start the operator in the same way as the GCP PD CSI driver operator above.
Troubleshooting
Both the driver + CSI driver operator API objects and logs will be available in must-gather collected by oc adm must-gater out of the box, no need to configure anything.
CSO
- CSO reports its status in ClusterOperator CR
oc get clusteroperator storageoc get clusteroperator storage -o yaml- In addition, CSO reports more detailed status in Storage CR
oc get storage -o yaml- Especially check its conditions.
- You can bump CSO log level in the Storage CR
oc edit storage- Note that the
logLeveldoes not propagate to CSI driver operators!
- Note that the
- You can "pause" the operator by setting
managementLevel: Unmanaged. - For example to test manual changes to the CSI driver operator
Deployment. Otherwise, the operator will overwrite any of your changes.
CSI driver operator
- The driver operator reports its status in
ClusterCSIDriverCR oc get clustercsidriver -o yaml- You can bump CSO log level in the ClusterCSIDriver CR
oc edit clustercsidriver- Note that
logLeveldoes propagate to the CSI driver!
- Note that
- You can "pause" the operator by setting
managementLevel: Unmanaged. For example to test manual changes to the driverDeployment/DaemonSet. Otherwise, the operator will overwrite any of your changes.