Commit Graph

293 Commits

Author SHA1 Message Date
Ivan Kozlovic
d6fe9d4c2d [ADDED] Support for route S2 compression
The new field `compression` in the `cluster{}` block allows to
specify which compression mode to use between servers.

It can be simply specified as a boolean or a string for the
simple modes, or as an object for the "s2_auto" mode where
a list of RTT thresholds can be specified.

By default, if no compression field is specified, the server
will use the s2_auto mode with default RTT thresholds of
10ms, 50ms and 100ms for the "uncompressed", "fast", "better"
and "best" modes.

```
cluster {
..
  # Possible values are "disabled", "off", "enabled", "on",
  # "accept", "s2_fast", "s2_better", "s2_best" or "s2_auto"
  compression: s2_fast
}
```

To specify a different list of thresholds for the s2_auto,
here is how it would look like:
```
cluster {
..
  compression: {
    mode: s2_auto
    # This means that for RTT up to 5ms (included), then
    # the compression level will be "uncompressed", then
    # from 5ms+ to 15ms, the mode will switch to "s2_fast",
    # then from 15ms+ to 50ms, the level will switch to
    # "s2_better", and anything above 50ms will result
    # in the "s2_best" compression mode.
    rtt_thresholds: [5ms, 15ms, 50ms]
  }
}
```

Note that the "accept" mode means that a server will accept
compression from a remote and switch to that same compression
mode, but will otherwise not initiate compression. That is,
if 2 servers are configured with "accept", then compression
will actually be "off". If one of the server had say s2_fast
then they would both use this mode.

If a server has compression mode set (other than "off") but
connects to an older server, there will be no compression between
those 2 routes.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-27 17:59:25 -06:00
Derek Collison
4ebdb69daf Merge branch 'main' into dev 2023-04-26 11:34:37 -07:00
Derek Collison
cae91b8cad In single server mode healthz could mistake a snapshot staging directory during a restore as an account.
If the restore took a long time, stalled, or was aborted, would cause healthz to fail.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-24 22:14:04 -07:00
Derek Collison
8375ab5cde Merge branch 'main' into dev 2023-04-14 16:44:25 -07:00
Waldemar Quevedo
d12152c48f Add server name / remote server name to routez
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-04-14 12:47:00 -07:00
Derek Collison
ce0d8514be Merge branch 'main' into dev 2023-04-07 05:32:05 -07:00
Derek Collison
c16915bff4 For checking the health of jetstream, do not hold the lock as we traverse the streams and consumers.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-06 11:56:55 -07:00
Derek Collison
1ae51b23a9 [ADDED] Multiple routes and ability to have per-account routes (#4001)
New configuration fields:
```
cluster {
   ...
   pool_size: 5
   accounts: ["A", "B"]
}
```

The configuration `pool_size` in the example above means that this
server will create 5 routes to a remote server, assuming that that
server has the same `pool_size` setting.

Accounts (which are not part of the `accounts[]` configuration)
are assigned a specific route in this pool, and this will be the
same route on all servers in the cluster.

Accounts that are defined in the `accounts` field will each have
a dedicated route connection. This will allow suppression of the
account name in some of the route protocols, reducing bytes transmitted
which may increase performance.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-03 15:33:46 -07:00
Derek Collison
01a2c0472d Merge branch 'main' into dev 2023-04-03 15:33:12 -07:00
Derek Collison
59175c491f Fix for a datarace
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 14:46:57 -07:00
Ivan Kozlovic
105237cba8 [ADDED] Multiple routes and ability to have per-account routes
New configuration fields:
```
cluster {
   ...
   pool_size: 5
   accounts: ["A", "B"]
}
```

The configuration `pool_size` in the example above means that this
server will create 5 routes to a remote server, assuming that that
server has the same `pool_size` setting.

Accounts (which are not part of the `accounts[]` configuration)
are assigned a specific route in this pool, and this will be the
same route on all servers in the cluster.

Accounts that are defined in the `accounts` field will each have
a dedicated route connection. This will allow suppression of the
account name in some of the route protocols, reducing bytes transmitted
which may increase performance.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-03 09:32:25 -06:00
Derek Collison
b9e7b58f5c Merge branch 'main' into dev 2023-04-02 18:58:54 -07:00
Derek Collison
ff3f102cdd Fix for datarace in healthcheck
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 16:30:13 -07:00
Derek Collison
48a5d270b2 Merge branch 'main' into dev 2023-04-02 04:23:52 -07:00
Derek Collison
4b8229ee42 Do not hold js lock for health check, use healthy not current for meta
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:52:54 -07:00
Derek Collison
6dd0f5a660 Merge branch 'main' into dev 2023-03-19 11:55:45 -07:00
Derek Collison
027f2e42c8 Remove snapshot of cores and maxprocs
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-17 15:09:50 -07:00
Derek Collison
6507a913b3 Merge branch 'main' into dev 2023-03-01 05:05:41 -08:00
Jeremy Saenz
26f241cb62 Updated LEAFZ names to use remoteServer name/id and added is_spoke 2023-02-28 18:09:24 -08:00
Derek Collison
7bd7cda021 Merge branch 'main' into dev 2023-02-28 15:17:24 -08:00
Jeremy Saenz
9d4a603aaf Update LEAFZ to include leafnode server/connection name 2023-02-28 14:20:18 -08:00
Waldemar Quevedo
891064318f Add raft query parameter to /jsz to include raft group info
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-02-27 05:54:44 -08:00
Waldemar Quevedo
74b703549d Add raft query parameter to /jsz to include raft group info
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-02-27 05:42:11 -08:00
Derek Collison
7ce23c46ce Merge branch 'main' into dev 2023-02-21 08:34:08 -08:00
Neil Twigg
68961ffedd Refactor ipQueue to use generics, reduce allocations 2023-02-21 14:50:09 +00:00
Derek Collison
469116deae Merge branch 'main' into dev 2023-01-26 09:39:32 -08:00
Neil Twigg
83932b4be6 Don't mark a clustered stream as unhealthy if making forward progress, add TestJetStreamClusterCurrentVsHealth 2023-01-26 16:57:34 +00:00
Derek Collison
3c4e47d540 Merge branch 'main' into dev 2023-01-20 13:29:55 -08:00
Derek Collison
2aeb5e2c5a Update snapshots to numCores and maxProcs after maxrocs.Set()
Signed-off-by: Derek Collison <derek@nats.io>
2023-01-20 11:30:43 -08:00
Neil Twigg
68953678bb Add profilez server endpoint for retrieving pprof profiles 2023-01-11 16:09:09 +00:00
Derek Collison
713f632fa7 If a stream's meta was not properly written but the file existed, we could re-add the stream but a subsequent restart would lose the stream again.
Also added in healthz for single server systems to make sure all stream directories resulted in recovered streams.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-29 20:08:56 -08:00
Byron Ruth
1477b675ff Add back existing HealthzOptions.JSEnabled field
This fixes a backwards incompat change for library usage as well as
using the healthz NATS API which depends on the JSON payload.

Signed-off-by: Byron Ruth <byron@nats.io>
2022-12-26 08:45:22 -05:00
Byron Ruth
566d1adfa7 Fix /healthz?js-enabled=true behavior
When js-enabled is set to true, the condition was only checked if
the `getJetStream()` call returned `nil`. However, if it non-nil,
all remaining checks were executed, including assessing the health
of the assets (streams and consumers).

This change addresses two issues:

- Switch to use `js.isEnabled()` which will check whether the value
  is nil OR `js.disabled = true` which can occur if the subsystem
  is temporarily disabled (insufficient resources).
- Correctly exit the check after the assertion and before meta and
  asset checks are performed.

In addition, the option has been renamed to `js-enabled-only` to align
with the `js-server-only` naming. The previous `js-enabled` name still
works, but is mapped to this new option. A warning is emitted noting
the previous option is deprecated.

Fix #3703

Signed-off-by: Byron Ruth <b@devel.io>
2022-12-10 07:34:32 -05:00
Raymond
4d8964e57b Added stream created timestamp to stream detail 2022-11-17 13:59:58 +01:00
Ivan Kozlovic
170ff49837 [ADDED] JetStream: peer (the hash of server name) in statsz/jsz
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
    "meta_cluster": {
      "name": "local",
      "leader": "A",
      "peer": "NUmM6cRx",
      "replicas": [
        {
          "name": "B",
          "current": true,
          "active": 690369000,
          "peer": "b2oh2L6w"
        },
        {
          "name": "Server name unknown at this time (peerID: jZ6RvVRH)",
          "current": false,
          "offline": true,
          "active": 0,
          "peer": "jZ6RvVRH"
        }
      ],
      "cluster_size": 3
    }
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-16 15:31:37 -06:00
Waldemar Quevedo
46d73eddae js: add per account reserved mem/store bytes
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2022-09-06 16:43:10 -07:00
Ivan Kozlovic
03ac1f256f Update based on code review
- Change finger_prints to cert_sha256 and use hex.EncodeToString
- Add spki_sha256 for RawSubjectPublicKeyInfo with hex.EncodeToString

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-24 14:16:37 -06:00
Ivan Kozlovic
d2784589a0 Change json tag name to finger_prints
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-22 12:40:20 -06:00
Ivan Kozlovic
951b7c38f6 [ADDED] Monitoring: TLS Peer Certificates in Connz when auth is on
Add basic peer certificates information in /connz endpoint when
the "auth" option is provided.

Resolves #3317

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-22 11:48:49 -06:00
Ivan Kozlovic
5d3ee8ebf4 [FIXED] Gateway: possible panic if monitor endpoint inspected too soon
The monitoring http server is started early and the gateway setup
(when configured) may not be fully ready when the `/gatewayz`
endpoint is inspected and could cause a panic.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-17 13:30:58 -06:00
Ivan Kozlovic
a4bf4e87f6 Merge pull request #3326 from mfaizanse/health_endpoint_params
Added param options to /healthz endpoint
2022-08-09 08:49:22 -06:00
Muhammad Faizan
1634f33de7 Added param options to /healthz endpoint 2022-08-09 08:32:54 +02:00
Derek Collison
2120be6476 nit: Cap stats
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-05 07:52:23 -07:00
Matthias Hanel
d53d2d0484 [Added] account specific monitoring endpoint(s) (#3250)
Added http monitoring endpoint /accstatz
It responds with a list of statz for all accounts with local connections
the argument "unused=1" can be provided to get statz for all accounts
This endpoint is also exposed as nats request under:

This monitoring endpoint is exposed via the system account.
$SYS.REQ.ACCOUNT.*.STATZ
Each server will respond with connection statistics for the requested
account. The format of the data section is a list (size 1) identical to the event
$SYS.ACCOUNT.%s.SERVER.CONNS which is sent periodically as well as on
connect/disconnect. Unless requested by options, server without the account,
or server where the account has no local connections, will not respond.

A PING endpoint exists as well. The response format is identical to
$SYS.REQ.ACCOUNT.*.STATZ
(however the data section will contain more than one account, if they exist)
In addition to general filter options the request takes a list of accounts and
an argument to include accounts without local connections (disabled by default)
$SYS.REQ.ACCOUNT.PING.STATZ

Each account has a new system account import where the local subject
$SYS.REQ.ACCOUNT.PING.STATZ essentially responds as if
the importing account name was used for $SYS.REQ.ACCOUNT.*.STATZ

The only difference between requesting ACCOUNT.PING.STATZ from within
the system account and an account is that the later can only retrieve
statz for the account the client requests from.

Also exposed the monitoring /healthz via the system account under
$SYS.REQ.SERVER.*.HEALTHZ
$SYS.REQ.SERVER.PING.HEALTHZ
No dedicated options are available for these.
HEALTHZ also accept general filter options.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-12 21:50:32 +02:00
Derek Collison
e6479dafd2 Close leafnode connection when same cluster name detected
Signed-off-by: Derek Collison <derek@nats.io>
2022-06-30 15:34:22 -07:00
Ivan Kozlovic
5261d98781 [ADDED] Monitoring: Routez's individual route has now more info
Added Start, LastActivity, Uptime and Idle that we normally have
in a Connz for non route connections. This info can be useful
to determine if a route is recent, etc..

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-18 13:18:53 -06:00
Ivan Kozlovic
5050092468 [FIXED] JetStream: possible lock inversion
When updating usage, there is a lock inversion in that the jetStream
lock was acquired while under the stream's (mset) lock, which is
not correct. Also, updateUsage was locking the jsAccount lock, which
again, is not really correct since jsAccount contains streams, so
it should be jsAccount->stream, not the other way around.

Removed the locking of jetStream to check for clustered state since
js.clustered is immutable.

Replaced using jsAccount lock to update usage with a dedicated lock.

Originally moved all the update/limit fields in jsAccount to new
structure to make sure that I would see all code that is updating
or reading those fields, and also all functions so that I could
make sure that I use the new lock when calling these. Once that
works was done, and to reduce code changes, I put the fields back
into jsAccount (although I grouped them under the new usageMu mutex
field).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-02 09:50:32 -06:00
Derek Collison
f702e279ab Fix for a consumer recovery issue.
Also update healthz to check all assets that are assigned, not just running.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-26 19:22:19 -07:00
Ivan Kozlovic
50c3986863 [FIXED] JetStream stream catchup issues
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-12 16:05:12 -06:00
Ivan Kozlovic
9e6f965913 [ADDED] LeafNode min_version new option
If set, a server configured to accept leafnode connections will
reject a remote server whose version is below that value. Note
that servers prior to v2.8.0 are not sending their version
in the CONNECT protocol, which means that anything below 2.8.0
would be rejected.

Configuration example:
```
leafnodes {
    port: 7422
    min_version: 2.8.0
}
```
The option is a string and can have the "v" prefix:
```
min_version: "v2.9.1"
```
Note that although suffix such as `-beta` would be accepted,
only the major, minor and update are used for the version comparison.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-06 18:40:33 -06:00