diff --git a/SUMMARY.md b/SUMMARY.md index 36c44b2..ee4321b 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -186,3 +186,10 @@ * [Developing a Client](nats-protocol/nats-protocol/nats-client-dev.md) * [NATS Cluster Protocol](nats-protocol/nats-server-protocol.md) +## NATS on Kubernetes + +* [Introduction](nats-kubernetes/README.md) +* [NATS Streaming Cluster with FT Mode](nats-kubernetes/stan-ft-k8s-aws.md) +* [NATS + Prometheus Operator](nats-kubernetes/prometheus-and-nats-operator) +* [NATS + Cert Manager](nats-kubernetes/nats-cluster-and-cert-manager.md) +* [Securing a NATS Cluster with cfssl](nats-kubernetes/operator-tls-setup-with-cfssl.md) diff --git a/nats-kubernetes/README.md b/nats-kubernetes/README.md new file mode 100644 index 0000000..fc0b520 --- /dev/null +++ b/nats-kubernetes/README.md @@ -0,0 +1,58 @@ +# NATS on Kubernetes + +In this section you can find several examples of how to deploy NATS, NATS Streaming +and other tools from the NATS ecosystem on Kubernetes. + + * [Getting Started](README.md#getting-started) + * [Creating a NATS Streaming Cluster in k8s with FT mode](stan-ft-k8s-aws.md) + * [NATS + Prometheus Operator](prometheus-and-nats-operator) + * [NATS + Cert Manager in k8s](nats-cluster-and-cert-manager.md) + * [Securing a NATS Cluster using cfssl](operator-tls-setup-with-cfssl.md) + + +## Running NATS on K8S + +### Getting started + +The fastest and easiest way to get started is with just one shell command: + +```sh +curl -sSL https://nats-io.github.io/k8s/setup.sh | sh +``` + +*In case you don't have a cluster already, you can find some notes on how to create a small cluster using one of the hosted Kubernetes providers [here](https://github.com/nats-io/k8s/docs/create-k8s-cluster.md).* + +This will run a `nats-setup` container with the [required policy](https://github.com/nats-io/k8s/blob/master/setup/bootstrap-policy.yml) +and deploy a NATS cluster on Kubernetes with external access, TLS and +decentralized authorization. + +[![asciicast](https://asciinema.org/a/282135.svg)](https://asciinema.org/a/282135) + +By default, the installer will deploy the [Prometheus Operator](https://github.com/coreos/prometheus-operator) and the +[Cert Manager](https://github.com/jetstack/cert-manager) for metrics and TLS support, and the NATS instances will +also bind the 4222 host port for external access. + +You can customize the installer to install without TLS or without Auth +to have a simpler setup as follows: + +```sh +# Disable TLS +curl -sSL https://nats-io.github.io/k8s/setup.sh | sh -s -- --without-tls + +# Disable Auth and TLS (also disables NATS surveyor and NATS Streaming) +curl -sSL https://nats-io.github.io/k8s/setup.sh | sh -s -- --without-tls --without-auth +``` + +**Note**: Since [NATS Streaming](https://github.com/nats-io/nats-streaming-server) will be running as a [leafnode](https://github.com/nats-io/docs/tree/master/leafnodes) to NATS +(under the STAN account) and that [NATS Surveyor](https://github.com/nats-io/nats-surveyor) +requires the [system account](,,/nats-server/nats_admin/sys_accounts) to monitor events, disabling auth also means that NATS Streaming and NATS Surveyor based monitoring will be disabled. + +The monitoring dashboard setup using NATS Surveyor can be accessed by using port-forward: + + kubectl port-forward deployments/nats-surveyor-grafana 3000:3000 + +Next, open the following URL in your browser: + + http://127.0.0.1:3000/d/nats/nats-surveyor?refresh=5s&orgId=1 + +![surveyor](https://user-images.githubusercontent.com/26195/69106844-79fdd480-0a24-11ea-8e0c-213f251fad90.gif) diff --git a/nats-kubernetes/nats-cluster-and-cert-manager.md b/nats-kubernetes/nats-cluster-and-cert-manager.md new file mode 100644 index 0000000..0823d94 --- /dev/null +++ b/nats-kubernetes/nats-cluster-and-cert-manager.md @@ -0,0 +1,166 @@ +```text +kubectl create namespace cert-manager +kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true +kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.7/deploy/manifests/cert-manager.yaml +``` + +```yaml +apiVersion: certmanager.k8s.io/v1alpha1 +kind: ClusterIssuer +metadata: + name: selfsigning +spec: + selfSigned: {} +``` + +```text +clusterissuer.certmanager.k8s.io/selfsigning unchanged +``` + +``` yaml +apiVersion: certmanager.k8s.io/v1alpha1 +kind: Certificate +metadata: + name: nats-ca +spec: + secretName: nats-ca + duration: 8736h # 1 year + renewBefore: 240h # 10 days + issuerRef: + name: selfsigning + kind: ClusterIssuer + commonName: nats-ca + organization: + - Your organization + isCA: true +``` + +```text +certificate.certmanager.k8s.io/nats-ca configured +``` + +``` yaml +apiVersion: certmanager.k8s.io/v1alpha1 +kind: Issuer +metadata: + name: nats-ca +spec: + ca: + secretName: nats-ca +``` + +```text +issuer.certmanager.k8s.io/nats-ca created +``` + +``` yaml +apiVersion: certmanager.k8s.io/v1alpha1 +kind: Certificate +metadata: + name: nats-server-tls +spec: + secretName: nats-server-tls + duration: 2160h # 90 days + renewBefore: 240h # 10 days + issuerRef: + name: nats-ca + kind: Issuer + organization: + - Your organization + commonName: nats.default.svc.cluster.local + dnsNames: + - nats.default.svc +``` + +```text +certificate.certmanager.k8s.io/nats-server-tls created +``` + +``` yaml +apiVersion: certmanager.k8s.io/v1alpha1 +kind: Certificate +metadata: + name: nats-routes-tls +spec: + secretName: nats-routes-tls + duration: 2160h # 90 days + renewBefore: 240h # 10 days + issuerRef: + name: nats-ca + kind: Issuer + organization: + - Your organization + commonName: "*.nats-mgmt.default.svc.cluster.local" + dnsNames: + - "*.nats-mgmt.default.svc" +``` + +``` +certificate.certmanager.k8s.io/nats-routes-tls configured +``` + +``` yaml +apiVersion: "nats.io/v1alpha2" +kind: "NatsCluster" +metadata: + name: "nats" +spec: + # Number of nodes in the cluster + size: 3 + version: "1.4.1" + + tls: + # Certificates to secure the NATS client connections: + serverSecret: "nats-server-tls" + + # Name of the CA in serverSecret + serverSecretCAFileName: "ca.crt" + + # Name of the key in serverSecret + serverSecretKeyFileName: "tls.key" + + # Name of the certificate in serverSecret + serverSecretCertFileName: "tls.crt" + + # Certificates to secure the routes. + routesSecret: "nats-routes-tls" + + # Name of the CA in routesSecret + routesSecretCAFileName: "ca.crt" + + # Name of the key in routesSecret + routesSecretKeyFileName: "tls.key" + + # Name of the certificate in routesSecret + routesSecretCertFileName: "tls.crt" +``` + +```text +natscluster.nats.io/nats created +``` + +``` sh +kubectl get pods -o wide +``` + +``` sh +NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE +nats-1 1/1 Running 0 4s 172.17.0.8 minikube +nats-2 1/1 Running 0 3s 172.17.0.9 minikube +nats-3 1/1 Running 0 2s 172.17.0.10 minikube +``` + +``` sh +kubectl logs nats-1 +``` +```text +: [1] 2019/05/08 22:35:11.192781 [INF] Starting nats-server version 1.4.1 +: [1] 2019/05/08 22:35:11.192819 [INF] Git commit [3e64f0b] +: [1] 2019/05/08 22:35:11.192952 [INF] Starting http monitor on 0.0.0.0:8222 +: [1] 2019/05/08 22:35:11.192981 [INF] Listening for client connections on 0.0.0.0:4222 +: [1] 2019/05/08 22:35:11.192987 [INF] TLS required for client connections +: [1] 2019/05/08 22:35:11.192989 [INF] Server is ready +: [1] 2019/05/08 22:35:11.193123 [INF] Listening for route connections on 0.0.0.0:6222 +: [1] 2019/05/08 22:35:12.487758 [INF] 172.17.0.9:49444 - rid:1 - Route connection created +: [1] 2019/05/08 22:35:13.450067 [INF] 172.17.0.10:46286 - rid:2 - Route connection created +``` \ No newline at end of file diff --git a/nats-kubernetes/operator-tls-setup-with-cfssl.md b/nats-kubernetes/operator-tls-setup-with-cfssl.md new file mode 100755 index 0000000..e79c593 --- /dev/null +++ b/nats-kubernetes/operator-tls-setup-with-cfssl.md @@ -0,0 +1,422 @@ +# Secure NATS Cluster in Kubernetes using the NATS Operator + +## Features + +- Clients TLS setup +- TLS based auth certs via secret + + Reloading supported by only updating secret +- Routes TLS setup +- Advertising public IP per NATS server for external access + +### Creating the Certificates + +#### Generating the Root CA Certs + +```js +{ + "CN": "nats.io", + "key": { + "algo": "rsa", + "size": 2048 + }, + "names": [ + { + "OU": "nats.io" + } + ] +} +``` + +```sh +( + cd certs + + # CA certs + cfssl gencert -initca ca-csr.json | cfssljson -bare ca - +) +``` + +Setup the profiles for the Root CA, we will have 3 main profiles: one +for the clients connecting, one for the servers, and another one for +the full mesh routing connections between the servers. + +```js :tangle certs/ca-config.json +{ + "signing": { + "default": { + "expiry": "43800h" + }, + "profiles": { + "server": { + "expiry": "43800h", + "usages": [ + "signing", + "key encipherment", + "server auth", + "client auth" + ] + }, + "client": { + "expiry": "43800h", + "usages": [ + "signing", + "key encipherment", + "client auth" + ] + }, + "route": { + "expiry": "43800h", + "usages": [ + "signing", + "key encipherment", + "server auth", + "client auth" + ] + } + } + } +} +``` + +#### Generating the NATS server certs + +First we generate the certificates for the server. + +```js +{ + "CN": "nats.io", + "hosts": [ + "localhost", + "*.nats-cluster.default.svc", + "*.nats-cluster-mgmt.default.svc", + "nats-cluster", + "nats-cluster-mgmt", + "nats-cluster.default.svc", + "nats-cluster-mgmt.default.svc", + "nats-cluster.default.svc.cluster.local", + "nats-cluster-mgmt.default.svc.cluster.local", + "*.nats-cluster.default.svc.cluster.local", + "*.nats-cluster-mgmt.default.svc.cluster.local" + ], + "key": { + "algo": "rsa", + "size": 2048 + }, + "names": [ + { + "OU": "Operator" + } + ] +} +``` + +```sh +( + # Generating the peer certificates + cd certs + cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server server.json | cfssljson -bare server +) +``` + +#### Generating the NATS server routes certs + +We will also be setting up TLS for the full mesh routes. + +```js +{ + "CN": "nats.io", + "hosts": [ + "localhost", + "*.nats-cluster.default.svc", + "*.nats-cluster-mgmt.default.svc", + "nats-cluster", + "nats-cluster-mgmt", + "nats-cluster.default.svc", + "nats-cluster-mgmt.default.svc", + "nats-cluster.default.svc.cluster.local", + "nats-cluster-mgmt.default.svc.cluster.local", + "*.nats-cluster.default.svc.cluster.local", + "*.nats-cluster-mgmt.default.svc.cluster.local" + ], + "key": { + "algo": "rsa", + "size": 2048 + }, + "names": [ + { + "OU": "Operator" + } + ] +} +``` + +```sh +# Generating the peer certificates +( + cd certs + cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=route route.json | cfssljson -bare route +) +``` + +#### Generating the certs for the clients (CNCF && ACME) + +```js +{ + "CN": "nats.io", + "hosts": [""], + "key": { + "algo": "rsa", + "size": 2048 + }, + "names": [ + { + "OU": "CNCF" + } + ] +} +``` + +```sh +( + cd certs + # Generating NATS client certs + cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client +) +``` + +#### Kubectl create + +```sh :results output +cd certs +kubectl create secret generic nats-tls-example --from-file=ca.pem --from-file=server-key.pem --from-file=server.pem +kubectl create secret generic nats-tls-routes-example --from-file=ca.pem --from-file=route-key.pem --from-file=route.pem +kubectl create secret generic nats-tls-client-example --from-file=ca.pem --from-file=client-key.pem --from-file=client.pem +``` + +### Create the Auth secret + +```js +{ + "users": [ + { "username": "CN=nats.io,OU=ACME" }, + { "username": "CN=nats.io,OU=CNCF", + "permissions": { + "publish": ["hello.*"], + "subscribe": ["hello.world"] + } + } + ], + "default_permissions": { + "publish": ["SANDBOX.*"], + "subscribe": ["PUBLIC.>"] + } +} +``` + +```sh +kubectl create secret generic nats-tls-users --from-file=users.json +``` + +### Create a cluster with TLS + +```sh +echo ' +apiVersion: "nats.io/v1alpha2" +kind: "NatsCluster" +metadata: + name: "nats-cluster" +spec: + size: 3 + + # Using custom edge nats server image for TLS verify and map support. + serverImage: "wallyqs/nats-server" + version: "edge-2.0.0-RC5" + + tls: + enableHttps: true + + # Certificates to secure the NATS client connections: + serverSecret: "nats-tls-example" + + # Certificates to secure the routes. + routesSecret: "nats-tls-routes-example" + + auth: + tlsVerifyAndMap: true + clientsAuthSecret: "nats-tls-users" + + # How long to wait for authentication + clientsAuthTimeout: 5 + + pod: + # To be able to reload the secret changes + enableConfigReload: true + reloaderImage: connecteverything/nats-server-config-reloader + + # Bind the port 4222 as the host port to allow external access. + enableClientsHostPort: true + + # Initializer container that resolves the external IP from the + # container where it is running. + advertiseExternalIP: true + + # Image of container that resolves external IP from K8S API + bootconfigImage: "wallyqs/nats-boot-config" + bootconfigImageTag: "0.5.0" + + # Service account required to be able to find the external IP + template: + spec: + serviceAccountName: "nats-server" +' | kubectl apply -f - +``` + +### Create APP using certs + +#### Adding a new pod which uses the certificates + +Development + +```go +package main + +import ( + "encoding/json" + "flag" + "fmt" + "log" + "time" + + "github.com/nats-io/go-nats" + "github.com/nats-io/nuid" +) + +func main() { + var ( + serverList string + rootCACertFile string + clientCertFile string + clientKeyFile string + ) + flag.StringVar(&serverList, "s", "tls://nats-1.nats-cluster.default.svc:4222", "List of NATS of servers available") + flag.StringVar(&rootCACertFile, "cacert", "./certs/ca.pem", "Root CA Certificate File") + flag.StringVar(&clientCertFile, "cert", "./certs/client.pem", "Client Certificate File") + flag.StringVar(&clientKeyFile, "key", "./certs/client-key.pem", "Client Private key") + flag.Parse() + + log.Println("NATS endpoint:", serverList) + log.Println("Root CA:", rootCACertFile) + log.Println("Client Cert:", clientCertFile) + log.Println("Client Key:", clientKeyFile) + + // Connect options + rootCA := nats.RootCAs(rootCACertFile) + clientCert := nats.ClientCert(clientCertFile, clientKeyFile) + alwaysReconnect := nats.MaxReconnects(-1) + + var nc *nats.Conn + var err error + for { + nc, err = nats.Connect(serverList, rootCA, clientCert, alwaysReconnect) + if err != nil { + log.Printf("Error while connecting to NATS, backing off for a sec... (error: %s)", err) + time.Sleep(1 * time.Second) + continue + } + break + } + + nc.Subscribe("discovery.*.status", func(m *nats.Msg) { + log.Printf("[Received on %q] %s", m.Subject, string(m.Data)) + }) + + discoverySubject := fmt.Sprintf("discovery.%s.status", nuid.Next()) + info := struct { + InMsgs uint64 `json:"in_msgs"` + OutMsgs uint64 `json:"out_msgs"` + Reconnects uint64 `json:"reconnects"` + CurrentServer string `json:"current_server"` + Servers []string `json:"servers"` + }{} + + for range time.NewTicker(1 * time.Second).C { + stats := nc.Stats() + info.InMsgs = stats.InMsgs + info.OutMsgs = stats.OutMsgs + info.Reconnects = stats.Reconnects + info.CurrentServer = nc.ConnectedUrl() + info.Servers = nc.Servers() + payload, err := json.Marshal(info) + if err != nil { + log.Printf("Error marshalling data: %s", err) + } + err = nc.Publish(discoverySubject, payload) + if err != nil { + log.Printf("Error during publishing: %s", err) + } + nc.Flush() + } +} +``` + +```text +FROM golang:1.11.0-alpine3.8 AS builder +COPY . /go/src/github.com/nats-io/nats-kubernetes/examples/nats-cluster-routes-tls/app +WORKDIR /go/src/github.com/nats-io/nats-kubernetes/examples/nats-cluster-routes-tls/app +RUN apk add --update git +RUN go get -u github.com/nats-io/go-nats +RUN go get -u github.com/nats-io/nuid +RUN CGO_ENABLED=0 go build -o nats-client-app -v -a ./client.go + +FROM scratch +COPY --from=builder /go/src/github.com/nats-io/nats-kubernetes/examples/nats-cluster-routes-tls/app/nats-client-app /nats-client-app +ENTRYPOINT ["/nats-client-app"] +``` + +```sh +docker build . -t wallyqs/nats-client-app +docker run wallyqs/nats-client-app +docker push wallyqs/nats-client-app +``` +Pod spec + +```sh :results output +echo ' +apiVersion: apps/v1beta2 +kind: Deployment + +# The name of the deployment +metadata: + name: nats-client-app + +spec: + # This selector has to match the template.metadata.labels section + # which is below in the PodSpec + selector: + matchLabels: + name: nats-client-app + + # Number of instances + replicas: 1 + + # PodSpec + template: + metadata: + labels: + name: nats-client-app + spec: + volumes: + - name: "client-tls-certs" + secret: + secretName: "nats-tls-client-example" + containers: + - name: nats-client-app + command: ["/nats-client-app", "-s", "tls://nats-cluster.default.svc:4222", "-cacert", '/etc/nats-client-tls-certs/ca.pem', '-cert', '/etc/nats-client-tls-certs/client.pem', '-key', '/etc/nats-client-tls-certs/client-key.pem'] + image: wallyqs/nats-client-app:latest + imagePullPolicy: Always + volumeMounts: + - name: "client-tls-certs" + mountPath: "/etc/nats-client-tls-certs/" +' | kubectl apply -f - +``` + diff --git a/nats-kubernetes/prometheus-and-nats-operator.md b/nats-kubernetes/prometheus-and-nats-operator.md new file mode 100755 index 0000000..8125e69 --- /dev/null +++ b/nats-kubernetes/prometheus-and-nats-operator.md @@ -0,0 +1,297 @@ + +# Prometheus Operator + NATS Operator + +## Installing the Operators + +Install the NATS Operator: + +``` sh +$ kubectl apply -f https://raw.githubusercontent.com/nats-io/nats-operator/master/deploy/00-prereqs.yaml +$ kubectl apply -f https://raw.githubusercontent.com/nats-io/nats-operator/master/deploy/10-deployment.yaml +``` + +Install the Prometheus Operator along with its RBAC definition (prometheus-operator service account): + +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + labels: + app.kubernetes.io/component: controller + app.kubernetes.io/name: prometheus-operator + app.kubernetes.io/version: v0.30.0 + name: prometheus-operator +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: prometheus-operator +subjects: +- kind: ServiceAccount + name: prometheus-operator + namespace: default +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + labels: + app.kubernetes.io/component: controller + app.kubernetes.io/name: prometheus-operator + app.kubernetes.io/version: v0.30.0 + name: prometheus-operator +rules: +- apiGroups: + - apiextensions.k8s.io + resources: + - customresourcedefinitions + verbs: + - '*' +- apiGroups: + - monitoring.coreos.com + resources: + - alertmanagers + - prometheuses + - prometheuses/finalizers + - alertmanagers/finalizers + - servicemonitors + - podmonitors + - prometheusrules + verbs: + - '*' +- apiGroups: + - apps + resources: + - statefulsets + verbs: + - '*' +- apiGroups: + - "" + resources: + - configmaps + - secrets + verbs: + - '*' +- apiGroups: + - "" + resources: + - pods + verbs: + - list + - delete +- apiGroups: + - "" + resources: + - services + - services/finalizers + - endpoints + verbs: + - get + - create + - update + - delete +- apiGroups: + - "" + resources: + - nodes + verbs: + - list + - watch +- apiGroups: + - "" + resources: + - namespaces + verbs: + - get + - list + - watch +--- +apiVersion: apps/v1beta2 +kind: Deployment +metadata: + labels: + app.kubernetes.io/component: controller + app.kubernetes.io/name: prometheus-operator + app.kubernetes.io/version: v0.30.0 + name: prometheus-operator + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/component: controller + app.kubernetes.io/name: prometheus-operator + template: + metadata: + labels: + app.kubernetes.io/component: controller + app.kubernetes.io/name: prometheus-operator + app.kubernetes.io/version: v0.30.0 + spec: + containers: + - args: + - --kubelet-service=kube-system/kubelet + - --logtostderr=true + - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1 + - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.30.0 + image: quay.io/coreos/prometheus-operator:v0.30.0 + name: prometheus-operator + ports: + - containerPort: 8080 + name: http + resources: + limits: + cpu: 200m + memory: 200Mi + requests: + cpu: 100m + memory: 100Mi + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + nodeSelector: + beta.kubernetes.io/os: linux + securityContext: + runAsNonRoot: true + runAsUser: 65534 + serviceAccountName: prometheus-operator +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + labels: + app.kubernetes.io/component: controller + app.kubernetes.io/name: prometheus-operator + app.kubernetes.io/version: v0.30.0 + name: prometheus-operator + namespace: default +--- +apiVersion: v1 +kind: Service +metadata: + labels: + app.kubernetes.io/component: controller + app.kubernetes.io/name: prometheus-operator + app.kubernetes.io/version: v0.30.0 + name: prometheus-operator + namespace: default +spec: + clusterIP: None + ports: + - name: http + port: 8080 + targetPort: http + selector: + app.kubernetes.io/component: controller + app.kubernetes.io/name: prometheus-operator +``` + +## Create a NATS Cluster Instance + +```yaml +apiVersion: "nats.io/v1alpha2" +kind: "NatsCluster" +metadata: + name: "nats-cluster" +spec: + size: 3 + version: "1.4.1" + pod: + enableMetrics: true + metricsImage: "synadia/prometheus-nats-exporter" + metricsImageTag: "0.3.0" +``` + +## Create a Prometheus instance + +### Create RBAC for the Prometheus instance + +```yaml +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: prometheus +--- +apiVersion: rbac.authorization.k8s.io/v1beta1 +kind: ClusterRole +metadata: + name: prometheus +rules: +- apiGroups: [""] + resources: + - nodes + - services + - endpoints + - pods + verbs: ["get", "list", "watch"] +- apiGroups: [""] + resources: + - configmaps + verbs: ["get"] +- nonResourceURLs: ["/metrics"] + verbs: ["get"] +--- +apiVersion: rbac.authorization.k8s.io/v1beta1 +kind: ClusterRoleBinding +metadata: + name: prometheus +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: prometheus +subjects: +- kind: ServiceAccount + name: prometheus + namespace: default +``` + +### Create the Prometheus instance + +```yaml +--- +apiVersion: monitoring.coreos.com/v1 +kind: Prometheus +metadata: + name: prometheus +spec: + serviceAccountName: prometheus + serviceMonitorSelector: + matchLabels: + app: nats + nats_cluster: nats-cluster + resources: + requests: + memory: 400Mi + enableAdminAPI: true +``` + +## Create the ServiceMonitor for the NATS cluster + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: nats-cluster + labels: + app: nats + nats_cluster: nats-cluster +spec: + selector: + matchLabels: + app: nats + nats_cluster: nats-cluster + endpoints: + - port: metrics +``` + +## Confirm + +``` +kubectl port-forward prometheus-prometheus-0 9090:9090 +``` + +Results: + + + + + diff --git a/nats-kubernetes/stan-ft-k8s-aws.md b/nats-kubernetes/stan-ft-k8s-aws.md new file mode 100644 index 0000000..4d4c666 --- /dev/null +++ b/nats-kubernetes/stan-ft-k8s-aws.md @@ -0,0 +1,390 @@ +# Creating a NATS Streaming cluster in K8S with FT mode + +## Preparation + +First, we need a Kubernetes cluster with a provider that offers a +service with a `ReadWriteMany` filesystem available. In this short guide, +we will create the cluster on AWS and then use EFS for the filesystem: + +``` +# Create 3 nodes Kubernetes cluster +eksctl create cluster --name nats-eks-cluster \ + --nodes 3 \ + --node-type=t3.large \ + --region=us-east-2 + +# Get the credentials for your cluster +eksctl utils write-kubeconfig --name nats-eks-cluster --region us-east-2 +``` + +For the FT mode to work, we will need to create an EFS volume which +can be shared by more than one pod. Go into the [AWS console](https://us-east-2.console.aws.amazon.com/efs/home?region=us-east-2#/wizard/1) and create one and make the sure that it is in a security group where the k8s nodes will have access to it. In case of clusters created via eksctl, this will be a security group named `ClusterSharedNodeSecurityGroup`: + +Screen Shot 2019-12-04 at 11 25 08 AM + +Screen Shot 2019-12-04 at 12 40 13 PM + +### Creating the EFS provisioner + +Confirm from the FilesystemID from the cluster and the DNS name, we will use those values to create an EFS provisioner controller within the K8S cluster: + +Screen Shot 2019-12-04 at 12 08 35 PM + +```yaml +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: efs-provisioner +--- +kind: ClusterRole +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: efs-provisioner-runner +rules: + - apiGroups: [""] + resources: ["persistentvolumes"] + verbs: ["get", "list", "watch", "create", "delete"] + - apiGroups: [""] + resources: ["persistentvolumeclaims"] + verbs: ["get", "list", "watch", "update"] + - apiGroups: ["storage.k8s.io"] + resources: ["storageclasses"] + verbs: ["get", "list", "watch"] + - apiGroups: [""] + resources: ["events"] + verbs: ["create", "update", "patch"] +--- +kind: ClusterRoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: run-efs-provisioner +subjects: + - kind: ServiceAccount + name: efs-provisioner + # replace with namespace where provisioner is deployed + namespace: default +roleRef: + kind: ClusterRole + name: efs-provisioner-runner + apiGroup: rbac.authorization.k8s.io +--- +kind: Role +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: leader-locking-efs-provisioner +rules: + - apiGroups: [""] + resources: ["endpoints"] + verbs: ["get", "list", "watch", "create", "update", "patch"] +--- +kind: RoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: leader-locking-efs-provisioner +subjects: + - kind: ServiceAccount + name: efs-provisioner + # replace with namespace where provisioner is deployed + namespace: default +roleRef: + kind: Role + name: leader-locking-efs-provisioner + apiGroup: rbac.authorization.k8s.io +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: efs-provisioner +data: + file.system.id: fs-c22a24bb + aws.region: us-east-2 + provisioner.name: synadia.com/aws-efs + dns.name: "" +--- +kind: Deployment +apiVersion: extensions/v1beta1 +metadata: + name: efs-provisioner +spec: + replicas: 1 + strategy: + type: Recreate + template: + metadata: + labels: + app: efs-provisioner + spec: + serviceAccountName: efs-provisioner + containers: + - name: efs-provisioner + image: quay.io/external_storage/efs-provisioner:latest + env: + - name: FILE_SYSTEM_ID + valueFrom: + configMapKeyRef: + name: efs-provisioner + key: file.system.id + - name: AWS_REGION + valueFrom: + configMapKeyRef: + name: efs-provisioner + key: aws.region + - name: DNS_NAME + valueFrom: + configMapKeyRef: + name: efs-provisioner + key: dns.name + + - name: PROVISIONER_NAME + valueFrom: + configMapKeyRef: + name: efs-provisioner + key: provisioner.name + volumeMounts: + - name: pv-volume + mountPath: /efs + volumes: + - name: pv-volume + nfs: + server: fs-c22a24bb.efs.us-east-2.amazonaws.com + path: / +--- +kind: StorageClass +apiVersion: storage.k8s.io/v1 +metadata: + name: aws-efs +provisioner: synadia.com/aws-efs +--- +kind: PersistentVolumeClaim +apiVersion: v1 +metadata: + name: efs + annotations: + volume.beta.kubernetes.io/storage-class: "aws-efs" +spec: + accessModes: + - ReadWriteMany + resources: + requests: + storage: 1Mi +``` + +Result of deploying the manifest: + +```sh +serviceaccount/efs-provisioner created +clusterrole.rbac.authorization.k8s.io/efs-provisioner-runner created +clusterrolebinding.rbac.authorization.k8s.io/run-efs-provisioner created +role.rbac.authorization.k8s.io/leader-locking-efs-provisioner created +rolebinding.rbac.authorization.k8s.io/leader-locking-efs-provisioner created +configmap/efs-provisioner created +deployment.extensions/efs-provisioner created +storageclass.storage.k8s.io/aws-efs created +persistentvolumeclaim/efs created +``` + +### Setting up the NATS Streaming cluster + +Now create a NATS Streaming cluster with FT mode enabled and using NATS embedded mode +that is mounting the EFS volume: + +```yaml +--- +apiVersion: v1 +kind: Service +metadata: + name: stan + labels: + app: stan +spec: + selector: + app: stan + clusterIP: None + ports: + - name: client + port: 4222 + - name: cluster + port: 6222 + - name: monitor + port: 8222 + - name: metrics + port: 7777 +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: stan-config +data: + stan.conf: | + http: 8222 + + cluster { + port: 6222 + routes [ + nats://stan:6222 + ] + cluster_advertise: $CLUSTER_ADVERTISE + connect_retries: 10 + } + streaming { + id: test-cluster + store: file + dir: /data/stan/store + ft_group_name: "test-cluster" + file_options { + buffer_size: 32mb + sync_on_flush: false + slice_max_bytes: 512mb + parallel_recovery: 64 + } + store_limits { + max_channels: 10 + max_msgs: 0 + max_bytes: 256gb + max_age: 1h + max_subs: 128 + } + } +--- +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: stan + labels: + app: stan +spec: + selector: + matchLabels: + app: stan + serviceName: stan + replicas: 3 + volumeClaimTemplates: + - metadata: + name: efs + annotations: + volume.beta.kubernetes.io/storage-class: "aws-efs" + spec: + accessModes: [ "ReadWriteMany" ] + resources: + requests: + storage: 1Mi + template: + metadata: + labels: + app: stan + spec: + # STAN Server + terminationGracePeriodSeconds: 30 + + containers: + - name: stan + image: nats-streaming:latest + + ports: + # In case of NATS embedded mode expose these ports + - containerPort: 4222 + name: client + - containerPort: 6222 + name: cluster + - containerPort: 8222 + name: monitor + args: + - "-sc" + - "/etc/stan-config/stan.conf" + + # Required to be able to define an environment variable + # that refers to other environment variables. This env var + # is later used as part of the configuration file. + env: + - name: POD_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + - name: POD_NAMESPACE + valueFrom: + fieldRef: + fieldPath: metadata.namespace + - name: CLUSTER_ADVERTISE + value: $(POD_NAME).stan.$(POD_NAMESPACE).svc + volumeMounts: + - name: config-volume + mountPath: /etc/stan-config + - name: efs + mountPath: /data/stan + resources: + requests: + cpu: 0 + livenessProbe: + httpGet: + path: / + port: 8222 + initialDelaySeconds: 10 + timeoutSeconds: 5 + - name: metrics + image: synadia/prometheus-nats-exporter:0.5.0 + args: + - -connz + - -routez + - -subz + - -varz + - -channelz + - -serverz + - http://localhost:8222 + ports: + - containerPort: 7777 + name: metrics + volumes: + - name: config-volume + configMap: + name: stan-config +``` + +Your cluster now will look something like this: + +``` +kubectl get pods +NAME READY STATUS RESTARTS AGE +efs-provisioner-6b7866dd4-4k5wx 1/1 Running 0 21m +stan-0 2/2 Running 0 6m35s +stan-1 2/2 Running 0 4m56s +stan-2 2/2 Running 0 4m42s +``` + +If everything was setup properly, one of the servers will be the active node. + +``` +$ kubectl logs stan-0 -c stan +[1] 2019/12/04 20:40:40.429359 [INF] STREAM: Starting nats-streaming-server[test-cluster] version 0.16.2 +[1] 2019/12/04 20:40:40.429385 [INF] STREAM: ServerID: 7j3t3Ii7e2tifWqanYKwFX +[1] 2019/12/04 20:40:40.429389 [INF] STREAM: Go version: go1.11.13 +[1] 2019/12/04 20:40:40.429392 [INF] STREAM: Git commit: [910d6e1] +[1] 2019/12/04 20:40:40.454212 [INF] Starting nats-server version 2.0.4 +[1] 2019/12/04 20:40:40.454360 [INF] Git commit [c8ca58e] +[1] 2019/12/04 20:40:40.454522 [INF] Starting http monitor on 0.0.0.0:8222 +[1] 2019/12/04 20:40:40.454830 [INF] Listening for client connections on 0.0.0.0:4222 +[1] 2019/12/04 20:40:40.454841 [INF] Server id is NB3A5RSGABLJP3WUYG6VYA36ZGE7MP5GVQIQVRG6WUYSRJA7B2NNMW57 +[1] 2019/12/04 20:40:40.454844 [INF] Server is ready +[1] 2019/12/04 20:40:40.456360 [INF] Listening for route connections on 0.0.0.0:6222 +[1] 2019/12/04 20:40:40.481927 [INF] STREAM: Starting in standby mode +[1] 2019/12/04 20:40:40.488193 [ERR] Error trying to connect to route (attempt 1): dial tcp: lookup stan on 10.100.0.10:53: no such host +[1] 2019/12/04 20:40:41.489688 [INF] 192.168.52.76:40992 - rid:6 - Route connection created +[1] 2019/12/04 20:40:41.489788 [INF] 192.168.52.76:40992 - rid:6 - Router connection closed +[1] 2019/12/04 20:40:41.489695 [INF] 192.168.52.76:6222 - rid:5 - Route connection created +[1] 2019/12/04 20:40:41.489955 [INF] 192.168.52.76:6222 - rid:5 - Router connection closed +[1] 2019/12/04 20:40:41.634944 [INF] STREAM: Server is active +[1] 2019/12/04 20:40:41.634976 [INF] STREAM: Recovering the state... +[1] 2019/12/04 20:40:41.655526 [INF] STREAM: No recovered state +[1] 2019/12/04 20:40:41.671435 [INF] STREAM: Message store is FILE +[1] 2019/12/04 20:40:41.671448 [INF] STREAM: Store location: /data/stan/store +[1] 2019/12/04 20:40:41.671524 [INF] STREAM: ---------- Store Limits ---------- +[1] 2019/12/04 20:40:41.671527 [INF] STREAM: Channels: 10 +[1] 2019/12/04 20:40:41.671529 [INF] STREAM: --------- Channels Limits -------- +[1] 2019/12/04 20:40:41.671531 [INF] STREAM: Subscriptions: 128 +[1] 2019/12/04 20:40:41.671533 [INF] STREAM: Messages : unlimited +[1] 2019/12/04 20:40:41.671535 [INF] STREAM: Bytes : 256.00 GB +[1] 2019/12/04 20:40:41.671537 [INF] STREAM: Age : 1h0m0s +[1] 2019/12/04 20:40:41.671539 [INF] STREAM: Inactivity : unlimited * +[1] 2019/12/04 20:40:41.671541 [INF] STREAM: ---------------------------------- +[1] 2019/12/04 20:40:41.671546 [INF] STREAM: Streaming Server is ready +```