Commit Graph

612 Commits

Author SHA1 Message Date
Derek Collison
dba03dbc2f Optimizations to reduce contention for high connections in a JetStream enabled account with high API usage.
Several strategies which are listed below.

1. Checking a RaftNode to see if it is the leader now uses atomics.
2. Checking if we are the JetStream meta leader from the server now uses an atomic.
3. Accessing the JetStream context no longer requires a server lock, uses atomic.Pointer.
4. Filestore syncBlocks would hold msgBlock locks during sync, now does not.

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-30 14:52:15 -07:00
Neil Twigg
212d92ca7e Add more pprof labels to consumers, sources, mirrors
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-29 19:12:47 +01:00
Derek Collison
2737c56352 Only setup auto no-auth for $G account iff no authorization block was defined.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-28 13:51:45 -07:00
Derek Collison
c5b98f5c79 Make server shutdown an atomic and check inside unsubscribe to avoid unnecessary work.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-26 17:53:58 -07:00
Ivan Kozlovic
a84ce61a93 [FIXED] Account resolver lock inversion
There was a lock inversion but low risk since it happened during
server initialization. Still fixed it and added the ordering
in locksordering.txt file.

Also fixed multiple lock inversions that were caused by tests.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-09-25 15:09:11 -06:00
Neil Twigg
11feadfe7b Add prof_block_rate option for enabling/configuring the block profile
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-25 21:04:25 +01:00
Derek Collison
7ce47fd182 Move server running state to atomic to avoid contention at NRG layer.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-25 11:18:15 -07:00
Neil Twigg
d4e8a44499 Set S2 writer concurrency to 1
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-25 09:54:54 +01:00
Derek Collison
f95ef63ae1 In lameduck mode shutdown jetstream at start, do not leave running during connection drain.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-24 14:42:59 -07:00
Derek Collison
e7e8a330d4 Allow sync intervals to be set and the ability to have all data writes synchronous.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-04 11:05:13 -07:00
Waldemar Quevedo
b8200d1095 Fix data race when updating account
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-28 06:51:13 -07:00
Derek Collison
fb8525b713 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 15:55:00 -07:00
Neil Twigg
7cc5838a6d Send shutdown event on LDM so that R1 assets do not get assigned to the LDM node
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-21 21:29:01 +01:00
Neil Twigg
19397a5683 Don't set block profile rate
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-16 17:00:07 +01:00
Jean-Noël Moyne
61a0555336 Call SetBlockProfileRate even it the profiling port is not set
Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-08-14 10:58:20 -07:00
Jean-Noël Moyne
fc41ab1a5a Adds LDM and KICK server $SYS requests
Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-08-10 17:08:09 -07:00
Waldemar Quevedo
8b7dfe7d74 monitoring: track slow consumers per connection type
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-09 05:57:42 -07:00
Waldemar Quevedo
6b9008c1f4 Fixes to service imports on reload
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-05 18:21:01 -07:00
Waldemar Quevedo
eecb8af997 Remove reload fix from main
This workaround will not work for v2.10 branch features

Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-04 16:57:39 -07:00
Derek Collison
8079495903 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-04 10:15:35 -07:00
Derek Collison
9de5e3e64d OCSP backports and adds (#4362)
This PR backports the OCSP Peer feature option (as in 2.10 train) and
includes two fixes for the existing OCSP Staple feature.

OCSP Staple: 

1. Fixed and clarified how NATS Server determines its own Issuer CA when
obtaining and validating an OCSP Response for subsequent staple
2. Eliminated problematic assumption that all node peers are issued by
same CA when NATS Server validates ROUTE and GATEWAY peer nodes
3. Added OCSP Response effectivity checks on ROUTE and GATEWAY
peer-presented staple

Note for #3: Allowed host clock skew between node peers set at
30-seconds. If the OCSP Response contains an empty assertion for
NextUpdate, NATS Server will default to 1-hour validity (after
ThisUpdate). It is recommended that CA OCSP Responder should assert
NextUpdate.
2023-08-02 18:10:24 -07:00
Todd Beets
209fcd70eb OCSP Peer Feature 2023-08-02 11:25:48 -07:00
Waldemar Quevedo
2b252469ca fix: add missing default service imports on reload
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-01 23:34:07 -07:00
Todd Beets
e51a42963a OCSP Peer Verification (#4258)
New security feature [ADR-38: OCSP Peer
Verification](https://github.com/nats-io/nats-architecture-and-design/pull/226/files#diff-575a9545de9d498a48d2889972b0cb57dbadebde3b4328b65ab02bb43f557935)
providing fine-grain certificate status check via OCSP verification; for
inbound NATS, MQTT, WebSocket, and Leaf client connections (mTLS) as
well as outbound Leaf connections to another NATS System.
2023-08-01 09:17:27 -07:00
Derek Collison
1e15061400 Cleanup for some staticcheck warnings
Signed-off-by: Derek Collison <derek@nats.io>
2023-07-21 19:17:54 -07:00
Derek Collison
c6c5358513 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-07-20 13:21:35 -07:00
Derek Collison
6c9fb6a938 [FIXED] Server reload with highly active accounts with service imports could cause panic or dataloss (#4327)
When service imports were reloaded on active accounts with lots of
traffic the server could panic or lose data.

Signed-off-by: Derek Collison <derek@nats.io>
2023-07-20 13:19:17 -07:00
Derek Collison
7477ce8257 When service imports were reloaded on active accounts with lots of traffic the server could panic or lose data.
Signed-off-by: Derek Collison <derek@nats.io>
2023-07-20 12:20:50 -07:00
Caleb Lloyd
7993547bee Adjust in-process server info tls_required to tls_available
Signed-off-by: Caleb Lloyd <caleb@synadia.com>
2023-07-20 10:44:03 +01:00
Neil Twigg
ed9fafc796 Don't require TLS for in-process connection
Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-20 10:43:58 +01:00
Todd Beets
99dc11551b OCSP Peer Verification 2023-07-19 12:14:21 -07:00
Derek Collison
4d7cd26956 Add in support for segmented binary stream snapshots.
Streams with many interior deletes was causing issues due to the fact that the interior deletes were represented as a sorted []uint64.
This approach introduces 3 sub types of delete blocks, avl bitmask tree, a run length encoding, and the legacy format above.
We also take into account large interior deletes such that on receiving a snapshot we can skip things we already know about.

Signed-off-by: Derek Collison <derek@nats.io>
2023-07-03 08:41:33 -07:00
Neil Twigg
d2615b76f2 Annotate CPU and goroutine profiles with account/stream/consumer info
Signed-off-by: Neil Twigg <neil@nats.io>
2023-06-20 19:02:40 +01:00
Derek Collison
f342f6a758 Merge branch 'main' into dev 2023-06-05 14:13:18 -07:00
Artem Seleznev
27a8b96ee3 different panic fixes
Signed-off-by: Artem Seleznev <seleznyov.artyom@gmail.com>
2023-06-02 13:19:22 +03:00
Ivan Kozlovic
cf474d6333 Revert changes related to leafnode PING interval
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-16 13:49:00 -06:00
Ivan Kozlovic
67498af2dc [ADDED] LeafNode: Support for s2 compression
This is similar to PR #4115 but for LeafNodes.
Compression mode can be set on both side (the accept and in remotes).
```
leafnodes {
   port: 7422
   compression: s2_best
   remotes [
       {
         url: "nats://host2:74222"
         compression: s2_better
       }
   ]
}
```
Possible modes are similar than for routes (described in PR #4115),
except that when not defined we default to `s2_auto`.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-15 17:42:39 -06:00
Derek Collison
3ff9aed192 Merge branch 'main' into dev 2023-05-12 21:04:51 -07:00
Derek Collison
c5eb46cb06 Make sure closed clients captures all user types and works with user filtering as well
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-12 15:05:40 -07:00
Derek Collison
4c26cbb3de Merge branch 'main' into dev 2023-05-12 12:38:20 -07:00
Waldemar Quevedo
286a1632ca Use monotonic time for measuring time internally
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-05-12 08:27:46 -07:00
Derek Collison
4175e4ee9c Merge branch 'main' into dev 2023-05-06 09:55:34 -07:00
Derek Collison
80db7a22ab Optimizations for large single hub account leafnode fleets.
Added a leafnode lock to allow better traversal without copying of large leafnodes in a single hub account.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-05 13:14:49 -07:00
Ivan Kozlovic
8a4ead22bc Updates based on code review
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-03 16:14:51 -06:00
Ivan Kozlovic
95e4f2dfe1 Fixed accounts configuration reload
Issues could manifest with subscription interest not properly
propagated.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-03 14:35:06 -06:00
Ivan Kozlovic
840c264f45 Cleanup use of s.opts and fixed some lock (deadlock/inversion) issues
One should not access s.opts directly but instead use s.getOpts().
Also, server lock needs to be released when performing an account
lookup (since this may result in server lock being acquired).
A function was calling s.LookupAccount under the client lock, which
technically creates a lock inversion situation.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-03 14:09:02 -06:00
Derek Collison
0321eb6484 Merge branch 'main' into dev 2023-04-29 19:52:57 -07:00
Ivan Kozlovic
349f01e86a Change the absence of compression setting to default to "accept"
In that mode, a server accepts and will switch to same compression
level than the remote (if one is set) but will not initiate compression.
So if all servers in a cluster do not have compression setting set,
it defaults to "accept" which means that compression is "off".

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-28 15:33:17 -06:00
Ivan Kozlovic
5b8c9ee364 Changes based on code review
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-28 14:34:32 -06:00
Ivan Kozlovic
d6fe9d4c2d [ADDED] Support for route S2 compression
The new field `compression` in the `cluster{}` block allows to
specify which compression mode to use between servers.

It can be simply specified as a boolean or a string for the
simple modes, or as an object for the "s2_auto" mode where
a list of RTT thresholds can be specified.

By default, if no compression field is specified, the server
will use the s2_auto mode with default RTT thresholds of
10ms, 50ms and 100ms for the "uncompressed", "fast", "better"
and "best" modes.

```
cluster {
..
  # Possible values are "disabled", "off", "enabled", "on",
  # "accept", "s2_fast", "s2_better", "s2_best" or "s2_auto"
  compression: s2_fast
}
```

To specify a different list of thresholds for the s2_auto,
here is how it would look like:
```
cluster {
..
  compression: {
    mode: s2_auto
    # This means that for RTT up to 5ms (included), then
    # the compression level will be "uncompressed", then
    # from 5ms+ to 15ms, the mode will switch to "s2_fast",
    # then from 15ms+ to 50ms, the level will switch to
    # "s2_better", and anything above 50ms will result
    # in the "s2_best" compression mode.
    rtt_thresholds: [5ms, 15ms, 50ms]
  }
}
```

Note that the "accept" mode means that a server will accept
compression from a remote and switch to that same compression
mode, but will otherwise not initiate compression. That is,
if 2 servers are configured with "accept", then compression
will actually be "off". If one of the server had say s2_fast
then they would both use this mode.

If a server has compression mode set (other than "off") but
connects to an older server, there will be no compression between
those 2 routes.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-27 17:59:25 -06:00