Commit Graph

228 Commits

Author SHA1 Message Date
Derek Collison
dd646f6b71 Set initial min on dmap caused subtle bugs with dmap. Some minor cleanup.
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-06 09:42:09 -07:00
Derek Collison
783edaa36d [FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once. (#4592)
- [X] Tests added
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [x] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

### Changes proposed in this pull request:

Fixes a race condition in some leader failover scenarios leading to
messages being potentially sourced more than once.

In some failure scenarios where the current leader of a stream sourcing
from other stream(s) gets shutdown while publications are happening on
the stream(s) being sourced leads to `setLeader(true)` being called on
the new leader for the sourcing stream before all the messages having
been sourced by the previous leader are completely processed such that
when the new leader does it's reverse scan from the last message in it's
view of the stream in order to know what sequence number to start the
consumer for the stream being sourced from, such that the last
message(s) sourced by the previous leader get sourced again, leading to
some messages being sourced more than once.

The existing `TestNoRaceJetStreamSuperClusterSources` test would
sidestep the issue by relying on the deduplication window in the
sourcing stream. Without deduplication the test is a flapper.

This avoid the race condition by adding a small delay before scanning
for the last message(s) having been sourced and starting the sources'
consumer(s). Now the test (without using the deduplication window) never
fails because more messages than expected have been received in the
sourcing stream.

(Also adds a guard to give up if `setupSourceConsumers()` is called and
we are no longer the leader for the stream (that check was already
present in `setupMirrorConsumer()` so assuming it was forgotten for
`setupSourceConsumers()`)
2023-09-28 11:22:20 -07:00
Jean-Noël Moyne
71f96881ab [FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once.
- In some failure scenarios where the current leader of a stream sourcing from other stream(s) gets shutdown while publications are happening on the stream(s) being sourced leads to `setLeader(true)` being called on the new leader for the sourcing stream before all the messages having been sourced by the previous leader are completely processed such that when the new leader does it's reverse scan from the last message in it's view of the stream in order to know what sequence number to start the consumer for the stream being sourced from, such that the last message(s) sourced by the previous leader get sourced again, leading to some messages being sourced more than once.

The existing `TestNoRaceJetStreamSuperClusterSources` test would sidestep the issue by relying on the deduplication window in the sourcing stream. Without deduplication the test is a flapper.

This avoid the race condition by adding a small delay before scanning for the last message(s) having been sourced and starting the sources' consumer(s). Now the test (without using the deduplication window) never fails because more messages than expected have been received in the sourcing stream.

- Fix test TestJetStreamWorkQueueSourceRestart that expects the sourcing stream to get all of the expected messages right away by adding a small sleep before checking the number of messages pending on the consumer for that stream.

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-09-28 10:50:54 -07:00
Neil Twigg
52b88fd94e Fix TestNoRaceJetStreamStreamInfoSubjectDetailsLimits for changes in nats.go
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-27 11:19:13 +01:00
Derek Collison
58b5fc4abf Fix for a flapper
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-12 17:19:30 -07:00
Derek Collison
970dfab52f Need a flush to make sure INFO processed
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-11 20:39:50 -07:00
Derek Collison
e4867455c2 Fix TestNoRaceJetStreamSparseConsumers (#4503)
This is the same fix as in
https://github.com/nats-io/nats-server/pull/4500

This means it adds `StallWait` to `PublishAsync`, as that's seems to be
the only
reason why this test fails.

Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-09-08 10:01:24 -07:00
Tomasz Pietrek
7b54a1e6a1 Fix TestNoRaceJetStreamSparseConsumers
This is the same fix as in https://github.com/nats-io/nats-server/pull/4500

This means it adds `StallWait` to `PublishAsync`, as that's seems to be the only
reason why this test fails.

Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-09-08 11:34:16 +02:00
Tomasz Pietrek
d07e8eb210 Fix TestNoRaceJetStreamInterestStreamCheckInterestRaceBug
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-09-07 19:14:31 +02:00
Neil Twigg
8de83bc2ef Use TempDir in more tests
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-04 16:54:36 +01:00
Derek Collison
49c30b6d2f Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 15:52:00 -07:00
Derek Collison
afb052651a Sending too fast to have replicas be caught up enough to register direct subs
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 15:16:19 -07:00
Derek Collison
a45281d51f Added check to test
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 14:00:14 -07:00
Derek Collison
adef8281a2 Updates to the way meta indexing is handled for filestore.
Historically we kept indexing information, either by sequence or by subject, as a per msg block operation. These were the "*.idx" and "*.fss" indexing files. When streams became very large this could have an impact on recovery time. Also, for encryption the fast path for determining if the indexing was current would require loading and decrypting the complete block.

This design moves to a more traditional WAL and snapshot approach. The snapshots for the complete stream, including sumary information, global per subject information maps (PSIM) and per msg block details including summary and dmap, are processed asynchronously. The snapshot includes the msh block and has for the last record considered in the snapshot. On recovery the snapshot is read and processed and any additional records past the point of the snapshot itself are processed. To this end, any removal of a message has to be expressed as a delete tombstone that is always added the the fs.lmb file. These are processed on recovery and our indexing layer knows to skip them.

Changing to this method drastically improves startup and recovery times, and has simplified the code. Some normal performance benefits have been seen as well.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-30 16:12:45 -07:00
Waldemar Quevedo
3cec8dc451 test: fix TestNoRaceJetStreamMemstoreWithLargeInteriorDeletes flake
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-09 13:33:48 -07:00
Waldemar Quevedo
b081f8c2ea test: update TestNoRaceJetStreamServiceImportAccountSwapIssue flake
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-08 01:07:19 -07:00
Derek Collison
75e1171bdd No longer compacting multiple blocks, so remove test check
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-05 13:20:38 -07:00
Derek Collison
1f00d0e3f2 Track deleted with single avl.SeqSet dmap for now vs old method.
Size of encoding may be a bit bigger then we wanted, but still way better then old method and very fast.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-05 12:32:29 -07:00
Derek Collison
d27c44e6cd Fix another test for more efficient deleteBlocks
Signed-off-by: Derek Collison <derek@nats.io>
2023-07-30 12:02:49 -07:00
Derek Collison
cb9f8c0bf4 Fix to test for more efficient deleteBlocks
Signed-off-by: Derek Collison <derek@nats.io>
2023-07-30 11:53:18 -07:00
Neil
b22cdf18fe Add support for re-encrypting streams with new key (#4296)
This adds a new `prev_key` field to the configuration file to allow
transitioning from one encryption key to another.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-27 10:10:08 +01:00
Derek Collison
9a8f846dbb Merge branch 'main' into dev 2023-07-26 22:22:34 -07:00
R.I.Pienaar
60e67ff9a5 Report correct consumer count in paged list response
Previously the Total in paged responses would always equal the
size of the first response this would stall paged clients after
the first page.

Now correctly sets the total so paging continues, improves the
test to verify these aspects of the report

Signed-off-by: R.I.Pienaar <rip@devco.net>
2023-07-27 07:52:24 +03:00
Neil Twigg
3df08c3f89 Add support for re-encrypting streams with new key
Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-26 14:04:28 +01:00
Derek Collison
ecf0fff411 Merge branch 'main' into dev 2023-07-17 10:41:51 -07:00
Neil Twigg
2527e11304 Increase threshold in TestNoRaceJetStreamSlowFilteredInititalPendingAndFirstMsg
Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-14 17:05:26 +01:00
Neil Twigg
1527000d1f Use crypto/rand.Read instead of math/rand.Read
As of Go 1.20, `math/rand.Read` is deprecated. In addition to that, it also
isn't recommended for use in combination with anything cryptographic.

I haven't replaced all `math/rand` with `crypto/rand` imports because there
are still some legitimate uses for the `math/rand` package in some places.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-13 12:04:58 +01:00
Derek Collison
4d7cd26956 Add in support for segmented binary stream snapshots.
Streams with many interior deletes was causing issues due to the fact that the interior deletes were represented as a sorted []uint64.
This approach introduces 3 sub types of delete blocks, avl bitmask tree, a run length encoding, and the legacy format above.
We also take into account large interior deletes such that on receiving a snapshot we can skip things we already know about.

Signed-off-by: Derek Collison <derek@nats.io>
2023-07-03 08:41:33 -07:00
Derek Collison
3501ca3c1f Merge branch 'main' into dev 2023-06-15 17:49:19 -07:00
Derek Collison
087a28a13e When creating replicated mirrors where the source stream had a very large starting sequence number, the server would use excessive CPU and Memory.
This is due to the mirroring functionality trying to skip messages when it detects a gap. In a replicated stream this puts excessive stress on the raft system.
This step is not needed at all if the mirror stream has no messages, we can simply jump ahead.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-15 17:20:15 -07:00
Derek Collison
a1f03513d8 Merge branch 'main' into dev 2023-06-09 09:29:13 -07:00
Derek Collison
9eeffbcf56 Fix performance issues with checkAckFloor.
Bail early if new consumer, meaning stream sequence floor is 0.
Decide which linear space to scan.
Do no work if no pending and we just need to adjust which we do at the end.

Also realized some tests were named wrong and were not being run, or were in wrong file.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-08 18:45:03 -07:00
Derek Collison
2f2440f270 Merge branch 'main' into dev 2023-05-09 20:11:53 -07:00
Neil Twigg
d7ae2cbb5f Backport #4120 to main
Signed-off-by: Neil Twigg <neil@nats.io>
2023-05-09 11:24:35 +01:00
Ivan Kozlovic
95e4f2dfe1 Fixed accounts configuration reload
Issues could manifest with subscription interest not properly
propagated.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-03 14:35:06 -06:00
Derek Collison
e158c46884 Merge branch 'main' into dev 2023-04-30 17:37:47 -07:00
Derek Collison
c15cc0054a When a fleet of leafnodes are isolated (not routed but using same cluster) we could do better at optimizing how we update the other leafnodes.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-30 17:08:16 -07:00
Ivan Kozlovic
70af04a63f Other flappers.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-28 11:22:04 -06:00
Ivan Kozlovic
73ed55ae5b Fixed flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-28 10:55:32 -06:00
Derek Collison
4ebdb69daf Merge branch 'main' into dev 2023-04-26 11:34:37 -07:00
Derek Collison
3340179b97 Fix flapper
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-24 22:22:27 -07:00
Derek Collison
1f6aa94405 SequenceSet is an AVL tree with variable bitmask nodes to contain large delete maps for streams.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-17 20:17:03 -07:00
Derek Collison
dfeac4a214 Merge branch 'main' into dev 2023-04-09 19:31:01 -07:00
Derek Collison
aee73a9c77 Fix flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-08 21:58:54 -07:00
Derek Collison
ffc49b8f86 Fix flapping test and data race in test
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-08 08:13:31 -07:00
Derek Collison
c5e19e19e7 Merge branch 'main' into dev 2023-04-03 21:22:53 -07:00
Derek Collison
07b34f707f Make sure to never process next message requests inline
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 20:50:01 -07:00
Ivan Kozlovic
fe5d6bede4 Fixed accounts configuration reload
Issues could manifest with subscription interest not properly
propagated.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-03 09:32:28 -06:00
Ivan Kozlovic
105237cba8 [ADDED] Multiple routes and ability to have per-account routes
New configuration fields:
```
cluster {
   ...
   pool_size: 5
   accounts: ["A", "B"]
}
```

The configuration `pool_size` in the example above means that this
server will create 5 routes to a remote server, assuming that that
server has the same `pool_size` setting.

Accounts (which are not part of the `accounts[]` configuration)
are assigned a specific route in this pool, and this will be the
same route on all servers in the cluster.

Accounts that are defined in the `accounts` field will each have
a dedicated route connection. This will allow suppression of the
account name in some of the route protocols, reducing bytes transmitted
which may increase performance.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-03 09:32:25 -06:00
Derek Collison
94278e731a More tweaks to test due to slow network proxy being more accurate
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 19:57:34 -07:00