Commit Graph

236 Commits

Author SHA1 Message Date
Ivan Kozlovic
49907c4537 [FIXED] Configuration Reload: possible panic if done during Shutdown
If a configuration reload is issued as the server is being shutdown,
we could get 2 different panics. One due to JetStream if an account
is JetStream enabled, and one due to the send to a go channel that
has been closed.

```
panic: send on closed channel [recovered]
        panic: send on closed channel

goroutine 440 [running]:
testing.tRunner.func1.2({0x1038d58e0, 0x1039e1270})
        /usr/local/go/src/testing/testing.go:1545 +0x274
testing.tRunner.func1()
        /usr/local/go/src/testing/testing.go:1548 +0x448
panic({0x1038d58e0?, 0x1039e1270?})
        /usr/local/go/src/runtime/panic.go:920 +0x26c
github.com/nats-io/nats-server/v2/server.(*Server).reloadAuthorization(0xc00024fb00)
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1998 +0x788
github.com/nats-io/nats-server/v2/server.(*Server).applyOptions(0xc00024fb00, 0xc00021dc00, {0xc00038e4e0, 0x2, 0xc00021dc28?})
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1746 +0x2b8
github.com/nats-io/nats-server/v2/server.(*Server).reloadOptions(0xc000293500?, 0xc000118a80, 0xc000293500)
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1121 +0x178
github.com/nats-io/nats-server/v2/server.(*Server).ReloadOptions(0xc00024fb00, 0xc000293500)
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1060 +0x368
github.com/nats-io/nats-server/v2/server.(*Server).Reload(0xc00024fb00)
        /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:995 +0x104
```

and

```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x10077b224]

goroutine 8 [running]:
testing.tRunner.func1.2({0x101351640, 0x101b7d2a0})
	/usr/local/go/src/testing/testing.go:1545 +0x274
testing.tRunner.func1()
	/usr/local/go/src/testing/testing.go:1548 +0x448
panic({0x101351640?, 0x101b7d2a0?})
	/usr/local/go/src/runtime/panic.go:920 +0x26c
github.com/nats-io/nats-server/v2/server.(*Account).EnableJetStream(0xc00020fb80, 0xc000220240)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:1045 +0xa4
github.com/nats-io/nats-server/v2/server.(*Server).configJetStream(0xc000226d80, 0xc00020fb80)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:707 +0xdc
github.com/nats-io/nats-server/v2/server.(*Server).configAllJetStreamAccounts(0xc000226d80)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:768 +0x2b0
github.com/nats-io/nats-server/v2/server.(*Server).enableJetStreamAccounts(0xc000226d80?)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:637 +0x128
github.com/nats-io/nats-server/v2/server.(*Server).reloadAuthorization(0xc000226d80)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:2039 +0x93c
github.com/nats-io/nats-server/v2/server.(*Server).applyOptions(0xc000226d80, 0xc000171c50, {0xc000074600, 0x2, 0xc000171c78?})
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1746 +0x2b8
github.com/nats-io/nats-server/v2/server.(*Server).reloadOptions(0xc000276000?, 0xc0000a6000, 0xc000276000)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1121 +0x178
github.com/nats-io/nats-server/v2/server.(*Server).ReloadOptions(0xc000226d80, 0xc000276000)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1060 +0x368
github.com/nats-io/nats-server/v2/server.(*Server).Reload(0xc000226d80)
	/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:995 +0x104
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-10-16 15:25:02 -06:00
Derek Collison
dba03dbc2f Optimizations to reduce contention for high connections in a JetStream enabled account with high API usage.
Several strategies which are listed below.

1. Checking a RaftNode to see if it is the leader now uses atomics.
2. Checking if we are the JetStream meta leader from the server now uses an atomic.
3. Accessing the JetStream context no longer requires a server lock, uses atomic.Pointer.
4. Filestore syncBlocks would hold msgBlock locks during sync, now does not.

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-30 14:52:15 -07:00
Derek Collison
7ce47fd182 Move server running state to atomic to avoid contention at NRG layer.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-25 11:18:15 -07:00
Derek Collison
4e9cd9aa36 Protect access to c.acc
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-17 10:01:24 -07:00
Derek Collison
f0e2765b44 Fixes for merge conflicts from main
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 15:55:31 -07:00
Derek Collison
fb8525b713 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 15:55:00 -07:00
Neil Twigg
d720a6931c Use own subject for LDM event
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-21 22:03:26 +01:00
Neil Twigg
7cc5838a6d Send shutdown event on LDM so that R1 assets do not get assigned to the LDM node
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-21 21:29:01 +01:00
R.I.Pienaar
1d916ef9c7 Adds a missing json encoding tag
Signed-off-by: R.I.Pienaar <rip@devco.net>
2023-08-14 17:41:02 +03:00
Jean-Noël Moyne
fc41ab1a5a Adds LDM and KICK server $SYS requests
Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-08-10 17:08:09 -07:00
Waldemar Quevedo
8b7dfe7d74 monitoring: track slow consumers per connection type
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-09 05:57:42 -07:00
Waldemar Quevedo
6b9008c1f4 Fixes to service imports on reload
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-05 18:21:01 -07:00
Todd Beets
209fcd70eb OCSP Peer Feature 2023-08-02 11:25:48 -07:00
Todd Beets
e51a42963a OCSP Peer Verification (#4258)
New security feature [ADR-38: OCSP Peer
Verification](https://github.com/nats-io/nats-architecture-and-design/pull/226/files#diff-575a9545de9d498a48d2889972b0cb57dbadebde3b4328b65ab02bb43f557935)
providing fine-grain certificate status check via OCSP verification; for
inbound NATS, MQTT, WebSocket, and Leaf client connections (mTLS) as
well as outbound Leaf connections to another NATS System.
2023-08-01 09:17:27 -07:00
Lev Brouk
a5aaf7811e PR feedback: improved error message, and cleaned up debugging from the test 2023-07-26 08:47:33 -07:00
Lev Brouk
21888d0c72 Merge branch 'dev' of github.com:nats-io/nats-server into lev-reload-by-message 2023-07-26 08:43:09 -07:00
Derek Collison
9d9d760af2 Merge branch 'main' into dev 2023-07-21 11:42:02 -07:00
Derek Collison
b68aed90d3 If we created lots of hashes, beyond server names, like for consumer or stream NRG groups etc, the maps would grow and not release memory.
In the benchmark on my machine, this added ~300ns per call, but I think that is ok for now vs the memory usage.

Signed-off-by: Derek Collison <derek@nats.io>
2023-07-20 15:11:28 -07:00
Todd Beets
99dc11551b OCSP Peer Verification 2023-07-19 12:14:21 -07:00
Lev Brouk
46a38929d9 Fixed tests 2023-07-19 12:06:49 -07:00
Lev Brouk
1ce772c1a0 Added: reload server config by sending it a message
gofmt

PR feedback: test that a regular account can not reload

PR feedback: set response.Data to nil on success
2023-07-19 12:04:19 -07:00
Derek Collison
4d7cd26956 Add in support for segmented binary stream snapshots.
Streams with many interior deletes was causing issues due to the fact that the interior deletes were represented as a sorted []uint64.
This approach introduces 3 sub types of delete blocks, avl bitmask tree, a run length encoding, and the legacy format above.
We also take into account large interior deletes such that on receiving a snapshot we can skip things we already know about.

Signed-off-by: Derek Collison <derek@nats.io>
2023-07-03 08:41:33 -07:00
Derek Collison
8a8c37231f Merge branch 'main' into dev 2023-06-10 20:56:42 -07:00
Derek Collison
11963e51fe Optimize statsz locking and only send if we know we have external interest.
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-10 20:25:05 -07:00
Derek Collison
4220502541 Merge branch 'main' into dev 2023-05-15 15:44:38 -07:00
Derek Collison
75d274a636 If a NATS system has multiple domains make sure to process those during a remote update before bailing.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-13 18:36:42 -07:00
Ivan Kozlovic
95e4f2dfe1 Fixed accounts configuration reload
Issues could manifest with subscription interest not properly
propagated.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-03 14:35:06 -06:00
Ivan Kozlovic
d6fe9d4c2d [ADDED] Support for route S2 compression
The new field `compression` in the `cluster{}` block allows to
specify which compression mode to use between servers.

It can be simply specified as a boolean or a string for the
simple modes, or as an object for the "s2_auto" mode where
a list of RTT thresholds can be specified.

By default, if no compression field is specified, the server
will use the s2_auto mode with default RTT thresholds of
10ms, 50ms and 100ms for the "uncompressed", "fast", "better"
and "best" modes.

```
cluster {
..
  # Possible values are "disabled", "off", "enabled", "on",
  # "accept", "s2_fast", "s2_better", "s2_best" or "s2_auto"
  compression: s2_fast
}
```

To specify a different list of thresholds for the s2_auto,
here is how it would look like:
```
cluster {
..
  compression: {
    mode: s2_auto
    # This means that for RTT up to 5ms (included), then
    # the compression level will be "uncompressed", then
    # from 5ms+ to 15ms, the mode will switch to "s2_fast",
    # then from 15ms+ to 50ms, the level will switch to
    # "s2_better", and anything above 50ms will result
    # in the "s2_best" compression mode.
    rtt_thresholds: [5ms, 15ms, 50ms]
  }
}
```

Note that the "accept" mode means that a server will accept
compression from a remote and switch to that same compression
mode, but will otherwise not initiate compression. That is,
if 2 servers are configured with "accept", then compression
will actually be "off". If one of the server had say s2_fast
then they would both use this mode.

If a server has compression mode set (other than "off") but
connects to an older server, there will be no compression between
those 2 routes.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-27 17:59:25 -06:00
Derek Collison
dfeac4a214 Merge branch 'main' into dev 2023-04-09 19:31:01 -07:00
Derek Collison
b78ec39b1f Fix data race and simplify logic
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-08 15:04:30 -07:00
Derek Collison
87d7263026 Fix to new system internal callback format for new callbacks
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-04 15:21:34 -07:00
Derek Collison
03d5cf0871 Merge branch 'main' into dev 2023-04-04 15:14:41 -07:00
Derek Collison
b14a400df6 Fix for debugSubscribers and claims test
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-04 14:36:16 -07:00
Derek Collison
3551c7b2bf Convert over zReq to not have to call msgparts
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-04 14:23:50 -07:00
Derek Collison
c323ec3086 Non-inline system callbacks need hdr and msg already split due to client context to split
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-04 14:15:30 -07:00
Derek Collison
9f95e993e2 Do not inline inbound system messages to avoid blocking routes, gateways etc.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-04 13:53:21 -07:00
Ivan Kozlovic
fe5d6bede4 Fixed accounts configuration reload
Issues could manifest with subscription interest not properly
propagated.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-03 09:32:28 -06:00
Ivan Kozlovic
105237cba8 [ADDED] Multiple routes and ability to have per-account routes
New configuration fields:
```
cluster {
   ...
   pool_size: 5
   accounts: ["A", "B"]
}
```

The configuration `pool_size` in the example above means that this
server will create 5 routes to a remote server, assuming that that
server has the same `pool_size` setting.

Accounts (which are not part of the `accounts[]` configuration)
are assigned a specific route in this pool, and this will be the
same route on all servers in the cluster.

Accounts that are defined in the `accounts` field will each have
a dedicated route connection. This will allow suppression of the
account name in some of the route protocols, reducing bytes transmitted
which may increase performance.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-03 09:32:25 -06:00
Derek Collison
6dd0f5a660 Merge branch 'main' into dev 2023-03-19 11:55:45 -07:00
Derek Collison
027f2e42c8 Remove snapshot of cores and maxprocs
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-17 15:09:50 -07:00
Derek Collison
7ce23c46ce Merge branch 'main' into dev 2023-02-21 08:34:08 -08:00
Neil Twigg
68961ffedd Refactor ipQueue to use generics, reduce allocations 2023-02-21 14:50:09 +00:00
Piotr Piotrowski
6ed82376a6 [ADDED] Number subscriptions in account STATZ 2023-02-16 13:56:37 +01:00
Derek Collison
acad660540 Make sure connection events during auth callouts correct.
Fixed one extraneous account update for $G. We sent for the addition before switching but suppressed the change back to 0.
We now suppress all for $G as was designed.

Signed-off-by: Derek Collison <derek@nats.io>
2023-01-20 18:42:36 -08:00
Derek Collison
235ce2caed Merge branch 'main' into dev 2023-01-11 18:34:01 -08:00
peaaceChoi
038037381b Fix some typos in code comment 2023-01-12 10:31:32 +09:00
Neil Twigg
68953678bb Add profilez server endpoint for retrieving pprof profiles 2023-01-11 16:09:09 +00:00
Derek Collison
a63929c528 Add in account scoped auth error event. If external auth, supply reason from the callout service.
Signed-off-by: Derek Collison <derek@nats.io>
2023-01-02 17:18:45 -08:00
Derek Collison
2daf90493b Authentication and Authorization callouts for server configuration mode.
This adds the ability to augment or override the NATS auth system.

A server will send a signed request to $SYS.REQ.USER.AUTH on the specified account. The request will contain client information, all client options sent to the server, and optionally TLS information and client certificates.
The external auth service will respond with an empty message if not authorized, or a signed User JWT that the user will bind to.

The response can change the account the client will be bound to.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-28 10:32:45 -08:00
Derek Collison
8365fb3ef4 Add in expiration to user info.
This is only added if set by a user or account expiration claim.
It is represented as a duration til expiration vs absolute time which would involve time zone and clock sync issues.

Signed-off-by: Derek Collison <derek@nats.io>
2022-11-28 09:20:14 -08:00