Streams with many interior deletes was causing issues due to the fact that the interior deletes were represented as a sorted []uint64.
This approach introduces 3 sub types of delete blocks, avl bitmask tree, a run length encoding, and the legacy format above.
We also take into account large interior deletes such that on receiving a snapshot we can skip things we already know about.
Signed-off-by: Derek Collison <derek@nats.io>
The new field `compression` in the `cluster{}` block allows to
specify which compression mode to use between servers.
It can be simply specified as a boolean or a string for the
simple modes, or as an object for the "s2_auto" mode where
a list of RTT thresholds can be specified.
By default, if no compression field is specified, the server
will use the s2_auto mode with default RTT thresholds of
10ms, 50ms and 100ms for the "uncompressed", "fast", "better"
and "best" modes.
```
cluster {
..
# Possible values are "disabled", "off", "enabled", "on",
# "accept", "s2_fast", "s2_better", "s2_best" or "s2_auto"
compression: s2_fast
}
```
To specify a different list of thresholds for the s2_auto,
here is how it would look like:
```
cluster {
..
compression: {
mode: s2_auto
# This means that for RTT up to 5ms (included), then
# the compression level will be "uncompressed", then
# from 5ms+ to 15ms, the mode will switch to "s2_fast",
# then from 15ms+ to 50ms, the level will switch to
# "s2_better", and anything above 50ms will result
# in the "s2_best" compression mode.
rtt_thresholds: [5ms, 15ms, 50ms]
}
}
```
Note that the "accept" mode means that a server will accept
compression from a remote and switch to that same compression
mode, but will otherwise not initiate compression. That is,
if 2 servers are configured with "accept", then compression
will actually be "off". If one of the server had say s2_fast
then they would both use this mode.
If a server has compression mode set (other than "off") but
connects to an older server, there will be no compression between
those 2 routes.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
New configuration fields:
```
cluster {
...
pool_size: 5
accounts: ["A", "B"]
}
```
The configuration `pool_size` in the example above means that this
server will create 5 routes to a remote server, assuming that that
server has the same `pool_size` setting.
Accounts (which are not part of the `accounts[]` configuration)
are assigned a specific route in this pool, and this will be the
same route on all servers in the cluster.
Accounts that are defined in the `accounts` field will each have
a dedicated route connection. This will allow suppression of the
account name in some of the route protocols, reducing bytes transmitted
which may increase performance.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Fixed one extraneous account update for $G. We sent for the addition before switching but suppressed the change back to 0.
We now suppress all for $G as was designed.
Signed-off-by: Derek Collison <derek@nats.io>
This adds the ability to augment or override the NATS auth system.
A server will send a signed request to $SYS.REQ.USER.AUTH on the specified account. The request will contain client information, all client options sent to the server, and optionally TLS information and client certificates.
The external auth service will respond with an empty message if not authorized, or a signed User JWT that the user will bind to.
The response can change the account the client will be bound to.
Signed-off-by: Derek Collison <derek@nats.io>
This is only added if set by a user or account expiration claim.
It is represented as a duration til expiration vs absolute time which would involve time zone and clock sync issues.
Signed-off-by: Derek Collison <derek@nats.io>
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
"meta_cluster": {
"name": "local",
"leader": "A",
"peer": "NUmM6cRx",
"replicas": [
{
"name": "B",
"current": true,
"active": 690369000,
"peer": "b2oh2L6w"
},
{
"name": "Server name unknown at this time (peerID: jZ6RvVRH)",
"current": false,
"offline": true,
"active": 0,
"peer": "jZ6RvVRH"
}
],
"cluster_size": 3
}
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
In some rare situations, it is possible that nodes are added
to the cluster but are not properly tracked and not shown as
offline when they exit the cluster.
Relates to #3258
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When a request for a system service like $SYS.REQ.ACCOUNT.*.CONNZ
is imported/exported we ensured that the requesting account is identical
to the account referenced in the subject.
In #3250 this check was extended from CONNZ to all $SYS.REQ.ACCOUNT.*.*
requests.
In general this check interferes with monitoring accounts that need
to query all other accounts, not just itself.
There the use case is that account A sends a request with account B
in the subject. The check for equal accounts prevents this.
This change removes the check to support these use cases.
Instead of the check, the default export now uses exportAuth
tokenPos to ensure that the 4th token is the importer account id.
This guarantees that an explicit export (done by user) can only import
for the own account.
This change also ensures that an explicit export is not overwritten
by the system.
This is not a problem when the export is public.
Automatic imports set the account id correctly and do not use wildcards.
To cover cases where the export is private, automatically added imports
are not subject a token check.
Signed-off-by: Matthias Hanel <mh@synadia.com>
I noticed some contention when I was investigating a catchup bug on the server write lock.
Medium term we could have a separate lock, longer term formal client support in the server will alleviate.
Signed-off-by: Derek Collison <derek@nats.io>
Added http monitoring endpoint /accstatz
It responds with a list of statz for all accounts with local connections
the argument "unused=1" can be provided to get statz for all accounts
This endpoint is also exposed as nats request under:
This monitoring endpoint is exposed via the system account.
$SYS.REQ.ACCOUNT.*.STATZ
Each server will respond with connection statistics for the requested
account. The format of the data section is a list (size 1) identical to the event
$SYS.ACCOUNT.%s.SERVER.CONNS which is sent periodically as well as on
connect/disconnect. Unless requested by options, server without the account,
or server where the account has no local connections, will not respond.
A PING endpoint exists as well. The response format is identical to
$SYS.REQ.ACCOUNT.*.STATZ
(however the data section will contain more than one account, if they exist)
In addition to general filter options the request takes a list of accounts and
an argument to include accounts without local connections (disabled by default)
$SYS.REQ.ACCOUNT.PING.STATZ
Each account has a new system account import where the local subject
$SYS.REQ.ACCOUNT.PING.STATZ essentially responds as if
the importing account name was used for $SYS.REQ.ACCOUNT.*.STATZ
The only difference between requesting ACCOUNT.PING.STATZ from within
the system account and an account is that the later can only retrieve
statz for the account the client requests from.
Also exposed the monitoring /healthz via the system account under
$SYS.REQ.SERVER.*.HEALTHZ
$SYS.REQ.SERVER.PING.HEALTHZ
No dedicated options are available for these.
HEALTHZ also accept general filter options.
Signed-off-by: Matthias Hanel <mh@synadia.com>
In some situations, a server may report that a remote server is
detected as orphaned (and the node is marked as offline). This is
because the orphaned detection relies on conns update to be received,
however, servers would suppress the update if an account does not
have any connections attached.
This PR ensures that the update is sent regardless if the account
is JS configured (not necessarily enabled at the moment).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
* [ADD] account specific in/out msgs/bytes stats to CONNS
This subject $SYS.ACCOUNT.%s.SERVER.CONNS will now respond with account
specific datastats for Received and sent messages as well as number of slow
consumers for the account.
Signed-off-by: Matthias Hanel <mh@synadia.com>
The system will allow an update to a stream, and subsequently all attached consumers, to be placed in another cluster either directly or via tag placement.
The meta layer will scale the underlying peerset appropriately to straddle the two clusters for both the stream and consumers, taking into account the consumer type.
Control will then pass to the current leaders of the assets who will monitor the catchup status of the new peers.
(Note we can optimize this later to only traverse once across a GW for any given asset, but for now this is simpler)
Once the original leaders have determined the assets are synched it will pass leadership to a member of the new peerset.
Once the new leader has been elected, it will forward a request for the meta layer to shrink the peerset by removing the old peers.
Signed-off-by: Derek Collison <derek@nats.io>
Got a data race:
```
==================
WARNING: DATA RACE
Write at 0x00c001c736b0 by goroutine 605:
runtime.mapassign_faststr()
/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:202 +0x0
github.com/nats-io/nats-server/v2/server.(*Account).addServiceImport()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/accounts.go:1868 +0xb7b
github.com/nats-io/nats-server/v2/server.(*Account).AddServiceImportWithClaim()
...
Previous read at 0x00c001c736b0 by goroutine 301:
runtime.mapaccess2_faststr()
/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:107 +0x0
github.com/nats-io/nats-server/v2/server.(*Server).registerSystemImports()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/events.go:1577 +0x284
github.com/nats-io/nats-server/v2/server.(*Server).updateAccountClaimsWithRefresh()
...
```
Also, remove some condition in gateway.go on how we were checking
if a subject was a serviec reply, which was causing a test to flap.
Finally, used AckSync() in a rest (instead of m.Respond(nil)) to
prevent it from flapping.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Removed the warnings, instead have a sync.Map where they are
registered/unregistered and can be inspected with an undocumented
monitor page.
Added the notion of "in progress" which is the number of messages
that have beend pop()'ed. When recycle() is invoked this count
goes down.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
We will only send if all peers in our group are >= 2.7.1 and we will check for updates.
When a consumer follower takes over it will notify all pending requests that those requests are invalid now.
Signed-off-by: Derek Collison <derek@nats.io>