Commit Graph

8362 Commits

Author SHA1 Message Date
Derek Collison
dbe700d192 Bump to 2.10.0-RC.14
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 16:11:30 -07:00
Derek Collison
3f1afb4ca2 [IMPROVED] Bumped inflight updates to 16 and move one lock to rlock. (#4621)
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 16:10:59 -07:00
Derek Collison
21e272360d [IMPROVED] Memory growth on compressed websocket connections. (#4620)
Holding onto the compressor and not recycling the internal byte slice
could cause havoc with GC.

This needs to be improved but this at least should allow the GC to
cleanup more effectively.

Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 15:37:01 -07:00
Derek Collison
2d21bc7008 Fix datarace
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 15:35:20 -07:00
Derek Collison
1ccc6dbf30 Bumped inflight updates to 16 and move one lock to rlock.
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 15:01:34 -07:00
Derek Collison
2f1a384bcb Holding onto the compressor and not recycling the interbal byte slice was causing havoc with GC.
This needs to be improved but this at least should allow the GC to cleanup more effectively.

Signed-off-by: Derek Collison <derek@nats.io>
2023-10-03 14:39:00 -07:00
Derek Collison
195227edfd Bump to 2.10.0-RC.12
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-02 09:53:30 -07:00
Derek Collison
e42b8ce02a [IMPROVED] Optimize locking for consumer info API (#4615)
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-02 09:52:58 -07:00
Derek Collison
e4ca15c2c3 Optimize locking for consumer info
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-02 09:22:44 -07:00
Derek Collison
4165f869d2 Bump to 2.10.2-RC.11
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-01 08:18:28 -07:00
Derek Collison
00839280fb [IMPROVED] Reduce contention for high connections in a JetStream enabled account with high API usage. (#4613)
Several strategies are used which are listed below.

1. Checking a RaftNode to see if it is the leader now uses atomics.
2. Checking if we are the JetStream meta leader from the server now uses
an atomic.
3. Accessing the JetStream context no longer requires a server lock,
uses atomic.Pointer.
4. Filestore syncBlocks would hold msgBlock locks during sync, now does
not.

Signed-off-by: Derek Collison <derek@nats.io>
2023-10-01 08:17:15 -07:00
Derek Collison
dba03dbc2f Optimizations to reduce contention for high connections in a JetStream enabled account with high API usage.
Several strategies which are listed below.

1. Checking a RaftNode to see if it is the leader now uses atomics.
2. Checking if we are the JetStream meta leader from the server now uses an atomic.
3. Accessing the JetStream context no longer requires a server lock, uses atomic.Pointer.
4. Filestore syncBlocks would hold msgBlock locks during sync, now does not.

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-30 14:52:15 -07:00
Derek Collison
6eee1f736b Fix consumer info if consumer was closed (#4610)
Co-authored-by: Derek Collison <derek@nats.io>
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-29 13:47:55 -07:00
Tomasz Pietrek
1f4b986125 Fix consumer info if consumer was closed
Co-authored-by: Derek Collison <derek@nats.io>
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-09-29 21:40:55 +02:00
Neil
15b46117af Add more pprof labels to consumers, sources, mirrors (#4609)
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-29 19:38:38 +01:00
Neil Twigg
212d92ca7e Add more pprof labels to consumers, sources, mirrors
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-29 19:12:47 +01:00
Derek Collison
720ac605a2 Bump to 2.10.0-RC.10
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-28 14:43:08 -07:00
Derek Collison
c9fa001ebf [IMPROVED] Add in additional warning when subject skew detected (#4606)
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-28 14:42:29 -07:00
Derek Collison
fa5b7afcb6 [FIXED] Do not bypass authorization blocks when turning on $SYS account access (#4605)
Only setup auto no-auth for $G account iff no authorization block was
defined.

Signed-off-by: Derek Collison <derek@nats.io>

Resolves #4535
2023-09-28 14:17:24 -07:00
Derek Collison
cb74f3f26e Add in additional warning when subject skew detected
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-28 14:16:27 -07:00
Derek Collison
2737c56352 Only setup auto no-auth for $G account iff no authorization block was defined.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-28 13:51:45 -07:00
Derek Collison
3d5564bbb1 [FIXED] Flapping TestMQTTLockedSession (#4604)
A test-only fix.

I can not reproduce the flapping behavior, but did see a race during
debugging suggesting that the CONNACK is delivered to the test before
`mqttProcessConnect` finishes and releases the record.
2023-09-28 13:16:46 -07:00
Lev Brouk
214711654e PR feedback: use checkFor 2023-09-28 12:42:18 -07:00
Lev Brouk
a05d4416ef PR feedback: nit 2023-09-28 12:02:35 -07:00
Phil Pennock
259e904401 Merge systemd: use SIGUSR2 for shutdown, for LDM (#4603) 2023-09-28 14:26:09 -04:00
Derek Collison
783edaa36d [FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once. (#4592)
- [X] Tests added
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [x] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

### Changes proposed in this pull request:

Fixes a race condition in some leader failover scenarios leading to
messages being potentially sourced more than once.

In some failure scenarios where the current leader of a stream sourcing
from other stream(s) gets shutdown while publications are happening on
the stream(s) being sourced leads to `setLeader(true)` being called on
the new leader for the sourcing stream before all the messages having
been sourced by the previous leader are completely processed such that
when the new leader does it's reverse scan from the last message in it's
view of the stream in order to know what sequence number to start the
consumer for the stream being sourced from, such that the last
message(s) sourced by the previous leader get sourced again, leading to
some messages being sourced more than once.

The existing `TestNoRaceJetStreamSuperClusterSources` test would
sidestep the issue by relying on the deduplication window in the
sourcing stream. Without deduplication the test is a flapper.

This avoid the race condition by adding a small delay before scanning
for the last message(s) having been sourced and starting the sources'
consumer(s). Now the test (without using the deduplication window) never
fails because more messages than expected have been received in the
sourcing stream.

(Also adds a guard to give up if `setupSourceConsumers()` is called and
we are no longer the leader for the stream (that check was already
present in `setupMirrorConsumer()` so assuming it was forgotten for
`setupSourceConsumers()`)
2023-09-28 11:22:20 -07:00
Lev Brouk
4b59efd6e7 [FIXED] Flapping TestMQTTLockedSession 2023-09-28 11:13:48 -07:00
Jean-Noël Moyne
71f96881ab [FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once.
- In some failure scenarios where the current leader of a stream sourcing from other stream(s) gets shutdown while publications are happening on the stream(s) being sourced leads to `setLeader(true)` being called on the new leader for the sourcing stream before all the messages having been sourced by the previous leader are completely processed such that when the new leader does it's reverse scan from the last message in it's view of the stream in order to know what sequence number to start the consumer for the stream being sourced from, such that the last message(s) sourced by the previous leader get sourced again, leading to some messages being sourced more than once.

The existing `TestNoRaceJetStreamSuperClusterSources` test would sidestep the issue by relying on the deduplication window in the sourcing stream. Without deduplication the test is a flapper.

This avoid the race condition by adding a small delay before scanning for the last message(s) having been sourced and starting the sources' consumer(s). Now the test (without using the deduplication window) never fails because more messages than expected have been received in the sourcing stream.

- Fix test TestJetStreamWorkQueueSourceRestart that expects the sourcing stream to get all of the expected messages right away by adding a small sleep before checking the number of messages pending on the consumer for that stream.

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-09-28 10:50:54 -07:00
Derek Collison
f5803ef20e [FIXED] Routes: Pinned Accounts connect/reconnect in some cases (#4602)
The issue is with a server that has a route for a given account but
connects to a server that does not support it. The creation of the route
for this account will fail - as expected - and the server will stop
trying to create the route for this account. But it needs to retry to
create this route if it were to reconnect to that same URL in case the
server (or its config) is updated to support a route for this account.

There was also an issue even with 2.10.0 servers in some gossip
situations. Namely, if server B is soliciting connections to A (but not
vice-versa) and A would solicit connections to C (but not vice-versa).
In this case, connections for pinned-accounts would not be created.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-09-28 10:47:58 -07:00
Phil Pennock
47db17a4c8 systemd: use SIGUSR2 for shutdown, for LDM 2023-09-28 13:16:48 -04:00
Ivan Kozlovic
1eb08505d4 [FIXED] Routes: Pinned Accounts connect/reconnect in some cases
The issue is with a server that has a route for a given account
but connects to a server that does not support it. The creation
of the route for this account will fail - as expected - and the
server will stop trying to create the route for this account.
But it needs to retry to create this route if it were to reconnect
to that same URL in case the server (or its config) is updated
to support a route for this account.

There was also an issue even with 2.10.0 servers in some gossip
situations. Namely, if server B is soliciting connections to A
(but not vice-versa) and A would solicit connections to C (but
not vice-versa). In this case, connections for pinned-accounts
would not be created.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-09-28 10:46:32 -06:00
Derek Collison
9c96576066 Bump to 2.10.2-RC.9
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 20:49:55 -07:00
Derek Collison
89c2c844a2 [IMPROVED] Additional markers for dirty state (#4601)
Under certain circumstances we could delay recovery if the state file
pointed to an absent msg block.
Found additional places to mark dirty and optionally kick the flusher.

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 20:48:32 -07:00
Derek Collison
b0743ec059 Additional markers for dirty state
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 20:32:17 -07:00
Derek Collison
4c368876d8 [IMPROVED] Concurrent stream creation of the same stream could return not found (#4600)
Here we know that if we can't find the stream but have the stream
assignment, this is a distinct possibility. So we wait, since not
processed inline, to see if it appears.

Fixes TestJetStreamClusterParallelStreamCreation as well that could
flap.

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 19:55:52 -07:00
Derek Collison
a7ca71017b When under load, concurrent stream creation of the same stream could return stream not found, which is odd.
Here we know that if we can't find the stream but have the stream assignment, this is a distinct possibility. So we wait, since not processed inline, to see if it appears.

Fixes TestJetStreamClusterParallelStreamCreation as well that could flap.

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 18:05:43 -07:00
Derek Collison
bc012d78c9 [IMPROVED] Add in warnings for filestore recover state if happy path fails. (#4599)
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 16:53:29 -07:00
Derek Collison
aeef0eff53 Add in warnings for filestore recover state if happy path fails.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 16:22:15 -07:00
Derek Collison
c6b26ab5d0 Miscellaneous JetStream benchmark improvements (#4595)
- [ ] Link to issue, e.g. `Resolves #NNN`
 - [ ] Documentation added (if applicable)
 - [ ] Tests added
- [x] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [ ] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [ ] Build is green in Travis CI
- [x] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

Resolves #

### Changes proposed in this pull request:

Miscellaneous fixes and improvements to server JetStream benchmarks.
Reviewers: notice the PR is broken down in 5 commit, each one is trivial
to review individually, but they can be definitely squashed before
merging for easier cherry-picking.
2023-09-27 14:16:12 -07:00
Derek Collison
46c417f4c9 Bump to 2.10.0-RC.8
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 12:08:45 -07:00
Derek Collison
fc5bccd2ca Updated Go client (#4597)
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 12:08:18 -07:00
Derek Collison
cbc490ab56 Don't take sublist write lock in match if sublist cache disabled (#4594)
We may be creating unnecessary lock contention on the sublist when the
cache is disabled by taking the write lock anyway.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-27 10:19:14 -07:00
Derek Collison
b3f5bac31a Update for Go client
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 09:55:38 -07:00
Marco Primi
d31236cea2 Refactor cluster creation for JS benchmarks 2023-09-27 09:26:11 -07:00
Marco Primi
be106d1ee5 Remove artificial limit on minimum number of operations 2023-09-27 09:26:11 -07:00
Marco Primi
c5698a9435 Cleanup unnecessary calls to setBytes in JS benchmarks 2023-09-27 09:26:11 -07:00
Marco Primi
e108096601 Improve JS asynchronous publish benchmark
Simplify logic and make sure no more than `asyncWindow` messages are 
ever in-flight
2023-09-27 09:26:11 -07:00
Marco Primi
03aa44dc3d Improve setup of JS Consume benchmark
Handle error condition during stream setup that was resulting in failed 
runs.
2023-09-27 09:26:11 -07:00
Neil Twigg
02d48ddd00 Don't take sublist write lock in match if sublist cache disabled
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-27 16:33:58 +01:00
Derek Collison
4c17eeb79e [IMPROVED] ServiceImport Reply Optimizations (#4591)
We added some small performance tweak to the func
checkForReverseEntries. In addition, we move the shutdown bool for the
server to an atomic so we could efficiently check it when doing unsubs.
If the server is going away there is really no need since the other side
will do its own thing when the connection goes away. And finally we do
not have to range over the account rrMap if the subscription going away
is a reserved reply.

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 08:07:56 -07:00