1
0
mirror of https://github.com/taigrr/nats.docs synced 2025-01-18 04:03:23 -08:00

GitBook: [master] 326 pages and 16 assets modified

This commit is contained in:
Ginger Collison
2019-10-04 17:48:52 +00:00
committed by gitbook-bot
parent 8b7ba5c3bb
commit fb0d5c8355
203 changed files with 4640 additions and 3107 deletions

View File

@@ -0,0 +1,8 @@
# Managing A NATS Server
Managing a NATS server is simple, typical lifecycle operations include:
* [Sending signals](signals.md) to a server to reload a configuration or rotate log files
* [Upgrading](upgrading_cluster.md) a server \(or cluster\)
* Understanding [slow consumers](slow_consumers.md)

View File

@@ -0,0 +1,43 @@
# Signals
On Unix systems, the NATS server responds to the following signals:
| Signal | Result |
| :--- | :--- |
| `SIGKILL` | Kills the process immediately |
| `SIGINT` | Stops the server gracefully |
| `SIGUSR1` | Reopens the log file for log rotation |
| `SIGHUP` | Reloads server configuration file |
| `SIGUSR2` | Stops the server after evicting all clients \(lame duck mode\) |
The `nats-server` binary can be used to send these signals to running NATS servers using the `-sl` flag:
```bash
# Quit the server
nats-server --signal quit
# Stop the server
nats-server --signal stop
# Reopen log file for log rotation
nats-server --signal reopen
# Reload server configuration
nats-server --signal reload
# Lame duck mode server configuration
nats-server --signal ldm
```
If there are multiple `nats-server` processes running, or if `pgrep` isn't available, you must either specify a PID or the absolute path to a PID file:
```bash
nats-server --signal stop=<pid>
```
```bash
nats-server --signal stop=/path/to/pidfile
```
See the [Windows Service](../running/windows_srv.md) section for information on signaling the NATS server on Windows.

View File

@@ -0,0 +1,112 @@
# Slow Consumers
To support resiliency and high availability, NATS provides built-in mechanisms to automatically prune the registered listener interest graph that is used to keep track of subscribers, including slow consumers and lazy listeners. NATS automatically handles a slow consumer. If a client is not processing messages quick enough, the NATS server cuts it off. To support scaling, NATS provides for auto-pruning of client connections. If a subscriber does not respond to ping requests from the server within the [ping-pong interval](../../nats-protocol/nats-protocol/#PINGPONG), the client is cut off \(disconnected\). The client will need to have reconnect logic to reconnect with the server.
In core NATS, consumers that cannot keep up are handled differently from many other messaging systems: NATS favors the approach of protecting the system as a whole over accommodating a particular consumer to ensure message delivery.
**What is a slow consumer?**
A slow consumer is a subscriber that cannot keep up with the message flow delivered from the NATS server. This is a common case in distributed systems because it is often easier to generate data than it is to process it. When consumers cannot process data fast enough, back pressure is applied to the rest of the system. NATS has mechanisms to reduce this back pressure.
NATS identifies slow consumers in the client or the server, providing notification through registered callbacks, log messages, and statistics in the server's monitoring endpoints.
**What happens to slow consumers?**
When detected at the client, the application is notified and messages are dropped to allow the consumer to continue and reduce potential back pressure. When detected in the server, the server will disconnect the connection with the slow consumer to protect itself and the integrity of the messaging system.
## Slow consumers identified in the client
A [client can detect it is a slow consumer](../../developing-with-nats/intro-5/slow.md) on a local connection and notify the application through use of the asynchronous error callback. It is better to catch a slow consumer locally in the client rather than to allow the server to detect this condition. This example demonstrates how to define and register an asynchronous error handler that will handle slow consumer errors.
```go
func natsErrHandler(nc *nats.Conn, sub *nats.Subscription, natsErr error) {
fmt.Printf("error: %v\n", natsErr)
if natsErr == nats.ErrSlowConsumer {
pendingMsgs, _, err := sub.Pending()
if err != nil {
fmt.Printf("couldn't get pending messages: %v", err)
return
}
fmt.Printf("Falling behind with %d pending messages on subject %q.\n",
pendingMsgs, sub.Subject)
// Log error, notify operations...
}
// check for other errors
}
// Set the error handler when creating a connection.
nc, err := nats.Connect("nats://localhost:4222",
nats.ErrorHandler(natsErrHandler))
```
With this example code and default settings, a slow consumer error would generate output something like this:
```bash
error: nats: slow consumer, messages dropped
Falling behind with 65536 pending messages on subject "foo".
```
Note that if you are using a synchronous subscriber, `Subscription.NextMsg(timeout time.Duration)` will also return an error indicating there was a slow consumer and messages have been dropped.
## Slow consumers identified by the server
When a client does not process messages fast enough, the server will buffer messages in the outbound connection to the client. When this happens and the server cannot write data fast enough to the client, in order to protect itself, it will designate a subscriber as a "slow consumer" and may drop the associated connection.
When the server initiates a slow consumer error, you'll see the following in the server output:
```bash
[54083] 2017/09/28 14:45:18.001357 [INF] ::1:63283 - cid:7 - Slow Consumer Detected
```
The server will also keep count of the number of slow consumer errors encountered, available through the monitoring `varz` endpoint in the `slow_consumers` field.
## Handling slow consumers
Apart from using [NATS streaming](../../nats-streaming-concepts/intro.md) or optimizing your consuming application, there are a few options available: scale, meter, or tune NATS to your environment.
**Scaling with queue subscribers**
This is ideal if you do not rely on message order. Ensure your NATS subscription belongs to a [queue group](../../concepts/queue.md), then scale as required by creating more instances of your service or application. This is a great approach for microservices - each instance of your microservice will receive a portion of the messages to process, and simply add more instances of your service to scale. No code changes, configuration changes, or downtime whatsoever.
**Create a subject namespace that can scale**
You can distribute work further through the subject namespace, with some forethought in design. This approach is useful if you need to preserve message order. The general idea is to publish to a deep subject namespace, and consume with wildcard subscriptions while giving yourself room to expand and distribute work in the future.
For a simple example, if you have a service that receives telemetry data from IoT devices located throughout a city, you can publish to a subject namespace like `Sensors.North`, `Sensors.South`, `Sensors.East` and `Sensors.West`. Initially, you'll subscribe to `Sensors.>` to process everything in one consumer. As your enterprise grows and data rates exceed what one consumer can handle, you can replace your single consumer with four consuming applications to subscribe to each subject representing a smaller segment of your data. Note that your publishing applications remain untouched.
**Meter the publisher**
A less favorable option may be to meter the publisher. There are several ways to do this varying from simply slowing down your publisher to a more complex approach periodically issuing a blocking request/reply to match subscriber rates.
**Tune NATS through configuration**
The NATS server can be tuned to determine how much data can be buffered before a consumer is considered slow, and some officially supported clients allow buffer sizes to be adjusted. Decreasing buffer sizes will let you identify slow consumers more quickly. Increasing buffer sizes is not typically recommended unless you are handling temporary bursts of data. Often, increasing buffer capacity will only _postpone_ slow consumer problems.
### Server Configuration
The NATS server has a write deadline it uses to write to a connection. When this write deadline is exceeded, a client is considered to have a slow consumer. If you are encountering slow consumer errors in the server, you can increase the write deadline to buffer more data.
The `write_deadline` configuration option in the NATS server configuration file will tune this:
```text
write_deadline: 2s
```
Tuning this parameter is ideal when you have bursts of data to accommodate. _**Be sure you are not just postponing a slow consumer error.**_
### Client Configuration
Most officially supported clients have an internal buffer of pending messages and will notify your application through an asynchronous error callback if a local subscription is not catching up. Receiving an error locally does not necessarily mean that the server will have identified a subscription as a slow consumer.
This buffer can be configured through setting the pending limits after a subscription has been created:
```go
if err := sub.SetPendingLimits(1024*500, 1024*5000); err != nil {
log.Fatalf("Unable to set pending limits: %v", err)
}
```
The default subscriber pending message limit is `65536`, and the default subscriber pending byte limit is `65536*1024`
If the client reaches this internal limit, it will drop messages and continue to process new messages. This is aligned with NATS at most once delivery. It is up to your application to detect the missing messages and recover from this condition.

View File

@@ -0,0 +1,17 @@
# System Accounts
NATS servers leverage [Account](../../configuration/securing_nats/auth_intro/jwt_auth.md) support and generate events such as:
* account connect/disconnect
* authentication errors
* server shutdown
* server stat summary
In addition the server supports a limitted number of requests that can be used to query for account connections, server stat summaries, and pinging servers in the cluster.
These events are only accepted and visible to _system account_ users.
## The System Events Tutorial
You can learn more about System Accounts in the [Tutorial](sys_accounts.md).

View File

@@ -0,0 +1,250 @@
# Configuration
The following is a short tutorial on how you can activate a system account to:
* receive periodic updates from the server
* send requests to the server
* send an account update to the server
## Events and Services
The system account publishes messages under well known subject patterns.
Server initiated events:
* `$SYS.ACCOUNT.<id>.CONNECT` \(client connects\)
* `$SYS.ACCOUNT.<id>.DISCONNECT` \(client disconnects\)
* `$SYS.SERVER.ACCOUNT.<id>.CONNS` \(connections for an account changed\)
* `$SYS.SERVER.<id>.CLIENT.AUTH.ERR` \(authentication error\)
* `$SYS.ACCOUNT.<id>.LEAFNODE.CONNECT` \(leaf node connnects\)
* `$SYS.ACCOUNT.<id>.LEAFNODE.DISCONNECT` \(leaf node disconnects\)
* `$SYS.SERVER.<id>.STATSZ` \(stats summary\)
In addition other tools with system account privileges, can initiate requests:
* `$SYS.REQ.SERVER.<id>.STATSZ` \(request server stat summary\)
* `$SYS.REQ.SERVER.PING` \(discover servers - will return multiple messages\)
Servers like `nats-account-server` publish system account messages when a claim is updated, the nats-server listens for them, and updates its account information accordingly:
* `$SYS.ACCOUNT.<id>.CLAIMS.UPDATE`
With these few messages you can build fairly surprisingly useful monitoring tools:
* health/load of your servers
* client connects/disconnects
* account connections
* authentication errors
## Enabling System Events
To enable and access system events, you'll have to:
* Create an Operator, Account and User
* Run a NATS Account Server \(or Memory Resolver\)
### Create an Operator, Account, User
Let's create an operator, system account and system account user:
```text
# Create an operator if you
> nsc add operator -n SAOP
Generated operator key - private key stored "~/.nkeys/SAOP/SAOP.nk"
Success! - added operator "SAOP"
# Add the system account
> nsc add account -n SYS
Generated account key - private key stored "~/.nkeys/SAOP/accounts/SYS/SYS.nk"
Success! - added account "SYS"
# Add a system account user
> nsc add user -n SYSU
Generated user key - private key stored "~/.nkeys/SAOP/accounts/SYS/users/SYSU.nk"
Generated user creds file "~/.nkeys/SAOP/accounts/SYS/users/SYSU.creds"
Success! - added user "SYSU" to "SYS"
```
By default, the operator JWT can be found in `~/.nsc/nats/<operator_name>/<operator.name>.jwt`.
### Nats-Account-Server
To vend the credentials to the nats-server, we'll use a [nats-account-server](../../../nats-tools/nas/). Let's start a nats-account-server to serve the JWT credentials:
```text
> nats-account-server -nsc ~/.nsc/nats/SAOP
```
The server will by default vend JWT configurations on the an endpoint at: `http(s)://<server_url>/jwt/v1/accounts/`.
### NATS Server Configuration
The server configuration will need:
* The operator JWT - \(`~/.nsc/nats/<operator_name>/<operator.name>.jwt`\)
* The URL where the server can resolve accounts \(`http://localhost:9090/jwt/v1/accounts/`\)
* The public key of the `system_account`
The only thing we don't have handy is the public key for the system account. We can get it easy enough:
```text
> nsc list accounts -W
╭─────────────────────────────────────────────────────────────────╮
│ Accounts │
├──────┬──────────────────────────────────────────────────────────┤
│ Name │ Public Key │
├──────┼──────────────────────────────────────────────────────────┤
│ SYS │ ADWJVSUSEVC2GHL5GRATN2LOEOQOY2E6Z2VXNU3JEIK6BDGPWNIW3AXF │
╰──────┴──────────────────────────────────────────────────────────╯
```
Because the server has additional resolver implementations, you need to enclose the server url like: `URL(<url>)`.
Let's create server config with the following contents and save it to `server.conf`:
```text
operator: /Users/synadia/.nsc/nats/SAOP/SAOP.jwt
system_account: ADWJVSUSEVC2GHL5GRATN2LOEOQOY2E6Z2VXNU3JEIK6BDGPWNIW3AXF
resolver: URL(http://localhost:9090/jwt/v1/accounts/)
```
Let's start the nats-server:
```text
> nats-server -c server.conf
```
## Inspecting Server Events
Let's add a subscriber for all the events published by the system account:
```text
> nats-sub -creds ~/.nkeys/SAOP/accounts/SYS/users/SYSU.creds ">"
```
Very quickly we'll start seeing messages from the server as they are published by the NATS server. As should be expected, the messages are just JSON, so they can easily be inspected even if just using a simple `nats-sub` to read them.
To see an an account update:
```text
> nats-pub -creds ~/.nkeys/SAOP/accounts/SYS/users/SYSU.creds foo bar
```
The subscriber will print the connect and disconnect:
```text
"server": {
"host": "0.0.0.0",
"id": "NBTGVY3OKDKEAJPUXRHZLKBCRH3LWCKZ6ZXTAJRS2RMYN3PMDRMUZWPR",
"ver": "2.0.0-RC5",
"seq": 32,
"time": "2019-05-03T14:53:15.455266-05:00"
},
"acc": "ADWJVSUSEVC2GHL5GRATN2LOEOQOY2E6Z2VXNU3JEIK6BDGPWNIW3AXF",
"conns": 1,
"total_conns": 1
}'
"server": {
"host": "0.0.0.0",
"id": "NBTGVY3OKDKEAJPUXRHZLKBCRH3LWCKZ6ZXTAJRS2RMYN3PMDRMUZWPR",
"ver": "2.0.0-RC5",
"seq": 33,
"time": "2019-05-03T14:53:15.455304-05:00"
},
"client": {
"start": "2019-05-03T14:53:15.453824-05:00",
"host": "127.0.0.1",
"id": 6,
"acc": "ADWJVSUSEVC2GHL5GRATN2LOEOQOY2E6Z2VXNU3JEIK6BDGPWNIW3AXF",
"user": "UACPEXCAZEYWZK4O52MEGWGK4BH3OSGYM3P3C3F3LF2NGNZUS24IVG36",
"name": "NATS Sample Publisher",
"lang": "go",
"ver": "1.7.0",
"stop": "2019-05-03T14:53:15.45526-05:00"
},
"sent": {
"msgs": 1,
"bytes": 3
},
"received": {
"msgs": 0,
"bytes": 0
},
"reason": "Client Closed"
}'
```
## `$SYS.REQ.SERVER.PING` - Discovering Servers
To discover servers in the cluster, and get a small heath summary, publish a request to `$SYS.REQ.SERVER.PING`. Note that while the example below uses `nats-req`, only the first answer for the request will be printed. You can easily modify the example to wait until no additional responses are received for a specific amount of time, thus allowing for all responses to be collected.
```text
> nats-req -creds ~/.nkeys/SAOP/accounts/SYS/users/SYSU.creds \$SYS.REQ.SERVER.PING ""
Published [$SYS.REQ.SERVER.PING] : ''
Received [_INBOX.G5mbsf0k7l7nb4eWHa7GTT.omklmvnm] : '{
"server": {
"host": "0.0.0.0",
"id": "NCZQDUX77OSSTGN2ESEOCP4X7GISMARX3H4DBGZBY34VLAI4TQEPK6P6",
"ver": "2.0.0-RC9",
"seq": 47,
"time": "2019-05-02T14:02:46.402166-05:00"
},
"statsz": {
"start": "2019-05-02T13:41:01.113179-05:00",
"mem": 12922880,
"cores": 20,
"cpu": 0,
"connections": 2,
"total_connections": 2,
"active_accounts": 1,
"subscriptions": 10,
"sent": {
"msgs": 7,
"bytes": 2761
},
"received": {
"msgs": 0,
"bytes": 0
},
"slow_consumers": 0
}
}'
```
## `$SYS.SERVER.<id>.STATSZ` - Requesting Server Stats Summary
If you know the server id for a particular server \(such as from a response to `$SYS.REQ.SERVER.PING`\), you can query the specific server for its health information:
```text
nats-req -creds ~/.nkeys/SAOP/accounts/SYS/users/SYSU.creds \$SYS.REQ.SERVER.NC7AKPQRC6CIZGWRJOTVFIGVSL7VW7WXTQCTUJFNG7HTCMCKQTGE5PUL.STATSZ ""
Published [$SYS.REQ.SERVER.NC7AKPQRC6CIZGWRJOTVFIGVSL7VW7WXTQCTUJFNG7HTCMCKQTGE5PUL.STATSZ] : ''
Received [_INBOX.DQD44ugVt0O4Ur3pWIOOD1.WQOBevoq] : '{
"server": {
"host": "0.0.0.0",
"id": "NC7AKPQRC6CIZGWRJOTVFIGVSL7VW7WXTQCTUJFNG7HTCMCKQTGE5PUL",
"ver": "2.0.0-RC5",
"seq": 25,
"time": "2019-05-03T14:34:02.066077-05:00"
},
"statsz": {
"start": "2019-05-03T14:32:19.969037-05:00",
"mem": 11874304,
"cores": 20,
"cpu": 0,
"connections": 2,
"total_connections": 4,
"active_accounts": 1,
"subscriptions": 10,
"sent": {
"msgs": 26,
"bytes": 9096
},
"received": {
"msgs": 2,
"bytes": 0
},
"slow_consumers": 0
}
}'
```

View File

@@ -0,0 +1,90 @@
# Upgrading a Cluster
The basic strategy for upgrading a cluster revolves around the server's ability to gossip cluster configuration to clients and other servers. When cluster configuration changes, clients become aware of new servers automatically. In the case of a disconnect, a client has a list of servers that joined the cluster in addition to the ones it knew about from its connection settings.
Note that since each server stores it's own permission and authentication configuration, new servers added to a cluster should provide the same users and authorization to prevent clients from getting rejected or gaining unexpected privileges.
For purposes of describing the scenario, let's get some fingers on keyboards, and go through the motions. Let's consider a cluster of two servers: 'A' and 'B', and yes - clusters should be _three_ to _five_ servers, but for purposes of describing the behavior and cluster upgrade process, a cluster of two servers will suffice.
Let's build this cluster:
```bash
nats-server -D -p 4222 -cluster nats://localhost:6222 -routes nats://localhost:6222,nats://localhost:6333
```
The command above is starting nats-server with debug output enabled, listening for clients on port 4222, and accepting cluster connections on port 6222. The `-routes` option specifies a list of nats URLs where the server will attempt to connect to other servers. These URLs define the cluster ports enabled on the cluster peers.
Keen readers will notice a self-route. The NATS server will ignore the self-route, but it makes for a single consistent configuration for all servers.
You will see the server started, we notice it emits some warnings because it cannot connect to 'localhost:6333'. The message more accurately reads:
```text
Error trying to connect to route: dial tcp localhost:6333: connect: connection refused
```
Let's fix that, by starting the second server:
```bash
nats-server -D -p 4333 -cluster nats://localhost:6333 -routes nats://localhost:6222,nats://localhost:6333
```
The second server was started on port 4333 with its cluster port on 6333. Otherwise the same as 'A'.
Let's get one client, so we can observe it moving between servers as servers get removed:
```bash
nats-sub -s nats://localhost:4222 ">"
```
`nats-sub` is a subscriber sample included with all NATS clients. `nats-sub` subscribes to a subject and prints out any messages received. You can find the source code to the go version of `nats-sub` \[here\)\([https://github.com/nats-io/nats.go/tree/master/examples](https://github.com/nats-io/nats.go/tree/master/examples)\). After starting the subscriber you should see a message on 'A' that a new client connected.
We have two servers and a client. Time to simulate our rolling upgrade. But wait, before we upgrade 'A', let's introduce a new server 'C'. Server 'C' will join the existing cluster while we perform the upgrade. Its sole purpose is to provide an additional place where clients can go other than 'A' and ensure we don't end up with a single server serving all the clients after the upgrade procedure. Clients will randomly select a server when connecting unless a special option is provided that disables that functionality \(usually called 'DontRandomize' or 'noRandomize'\). You can read more about ["Avoiding the Thundering Herd"](../../developing-with-nats/intro-1/random.md). Suffice it to say that clients redistribute themselves about evenly between all servers in the cluster. In our case 1/2 of the clients on 'A' will jump over to 'B' and the remaining half to 'C'.
Let's start our temporary server:
```bash
nats-server -D -p 4444 -cluster nats://localhost:6444 -routes nats://localhost:6222,nats://localhost:6333
```
After an instant or so, clients on 'A' learn of the new cluster member that joined. On our hands-on tutorial, `nats-sub` is now aware of 3 possible servers, 'A' \(specified when we started the tool\) and 'B' and 'C' learned from the cluster gossip.
We invoke our admin powers and turn off 'A' by issuing a `CTRL+C` to the terminal on 'A' and observe that either 'B' or 'C' reports that a new client connected. That is our `nats-sub` client.
We perform the upgrade process, update the binary for 'A', and restart 'A':
```bash
nats-server -D -p 4222 -cluster nats://localhost:6222 -routes nats://localhost:6222,nats://localhost:6333
```
We move on to upgrade 'B'. Notice that clients from 'B' reconnect to 'A' and 'C'. We upgrade and restart 'B':
```bash
nats-server -D -p 4333 -cluster nats://localhost:6333 -routes nats://localhost:6222,nats://localhost:6333
```
If we had more servers, we would continue the stop, update, restart rotation as we did for 'A' and 'B'. After restarting the last server, we can go ahead and turn off 'C.' Any clients on 'C' will redistribute to our permanent cluster members.
## Seed Servers
In the examples above we started nats-server specifying two clustering routes. It is possible to allow the server gossip protocol drive it and reduce the amount of configuration. You could for example start A, B and C as follows:
### A - Seed Server
```bash
nats-server -D -p 4222 -cluster nats://localhost:6222
```
### B
```bash
nats-server -D -p 4333 -cluster nats://localhost:6333 -routes nats://localhost:6222
```
### C
```bash
nats-server -D -p 4444 -cluster nats://localhost:6444 -routes nats://localhost:6222
```
Once they connect to the 'seed server', they will learn about all the other servers and connect to each other forming the full mesh.