nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-02 03:38:42 -07:00

Author	SHA1	Message	Date
Derek Collison	dd646f6b71	Set initial min on dmap caused subtle bugs with dmap. Some minor cleanup. Signed-off-by: Derek Collison <derek@nats.io>	2023-10-06 09:42:09 -07:00
Derek Collison	783edaa36d	[FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once. (#4592 ) - [X] Tests added - [X] Branch rebased on top of current main (`git pull --rebase origin main`) - [X] Changes squashed to a single commit (described [here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)) - [x] Build is green in Travis CI - [X] You have certified that the contribution is your original work and that you license the work to the project under the [Apache 2 license](https://github.com/nats-io/nats-server/blob/main/LICENSE) ### Changes proposed in this pull request: Fixes a race condition in some leader failover scenarios leading to messages being potentially sourced more than once. In some failure scenarios where the current leader of a stream sourcing from other stream(s) gets shutdown while publications are happening on the stream(s) being sourced leads to `setLeader(true)` being called on the new leader for the sourcing stream before all the messages having been sourced by the previous leader are completely processed such that when the new leader does it's reverse scan from the last message in it's view of the stream in order to know what sequence number to start the consumer for the stream being sourced from, such that the last message(s) sourced by the previous leader get sourced again, leading to some messages being sourced more than once. The existing `TestNoRaceJetStreamSuperClusterSources` test would sidestep the issue by relying on the deduplication window in the sourcing stream. Without deduplication the test is a flapper. This avoid the race condition by adding a small delay before scanning for the last message(s) having been sourced and starting the sources' consumer(s). Now the test (without using the deduplication window) never fails because more messages than expected have been received in the sourcing stream. (Also adds a guard to give up if `setupSourceConsumers()` is called and we are no longer the leader for the stream (that check was already present in `setupMirrorConsumer()` so assuming it was forgotten for `setupSourceConsumers()`)	2023-09-28 11:22:20 -07:00
Jean-Noël Moyne	71f96881ab	[FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once. - In some failure scenarios where the current leader of a stream sourcing from other stream(s) gets shutdown while publications are happening on the stream(s) being sourced leads to `setLeader(true)` being called on the new leader for the sourcing stream before all the messages having been sourced by the previous leader are completely processed such that when the new leader does it's reverse scan from the last message in it's view of the stream in order to know what sequence number to start the consumer for the stream being sourced from, such that the last message(s) sourced by the previous leader get sourced again, leading to some messages being sourced more than once. The existing `TestNoRaceJetStreamSuperClusterSources` test would sidestep the issue by relying on the deduplication window in the sourcing stream. Without deduplication the test is a flapper. This avoid the race condition by adding a small delay before scanning for the last message(s) having been sourced and starting the sources' consumer(s). Now the test (without using the deduplication window) never fails because more messages than expected have been received in the sourcing stream. - Fix test TestJetStreamWorkQueueSourceRestart that expects the sourcing stream to get all of the expected messages right away by adding a small sleep before checking the number of messages pending on the consumer for that stream. Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>	2023-09-28 10:50:54 -07:00
Neil Twigg	52b88fd94e	Fix `TestNoRaceJetStreamStreamInfoSubjectDetailsLimits` for changes in nats.go Signed-off-by: Neil Twigg <neil@nats.io>	2023-09-27 11:19:13 +01:00
Derek Collison	58b5fc4abf	Fix for a flapper Signed-off-by: Derek Collison <derek@nats.io>	2023-09-12 17:19:30 -07:00
Derek Collison	970dfab52f	Need a flush to make sure INFO processed Signed-off-by: Derek Collison <derek@nats.io>	2023-09-11 20:39:50 -07:00
Derek Collison	e4867455c2	Fix TestNoRaceJetStreamSparseConsumers (#4503 ) This is the same fix as in https://github.com/nats-io/nats-server/pull/4500 This means it adds `StallWait` to `PublishAsync`, as that's seems to be the only reason why this test fails. Signed-off-by: Tomasz Pietrek <tomasz@nats.io>	2023-09-08 10:01:24 -07:00
Tomasz Pietrek	7b54a1e6a1	Fix TestNoRaceJetStreamSparseConsumers This is the same fix as in https://github.com/nats-io/nats-server/pull/4500 This means it adds `StallWait` to `PublishAsync`, as that's seems to be the only reason why this test fails. Signed-off-by: Tomasz Pietrek <tomasz@nats.io>	2023-09-08 11:34:16 +02:00
Tomasz Pietrek	d07e8eb210	Fix TestNoRaceJetStreamInterestStreamCheckInterestRaceBug Signed-off-by: Tomasz Pietrek <tomasz@nats.io>	2023-09-07 19:14:31 +02:00
Neil Twigg	8de83bc2ef	Use `TempDir` in more tests Signed-off-by: Neil Twigg <neil@nats.io>	2023-09-04 16:54:36 +01:00
Derek Collison	49c30b6d2f	Merge branch 'main' into dev Signed-off-by: Derek Collison <derek@nats.io>	2023-08-31 15:52:00 -07:00
Derek Collison	afb052651a	Sending too fast to have replicas be caught up enough to register direct subs Signed-off-by: Derek Collison <derek@nats.io>	2023-08-31 15:16:19 -07:00
Derek Collison	a45281d51f	Added check to test Signed-off-by: Derek Collison <derek@nats.io>	2023-08-31 14:00:14 -07:00
Derek Collison	adef8281a2	Updates to the way meta indexing is handled for filestore. Historically we kept indexing information, either by sequence or by subject, as a per msg block operation. These were the ".idx" and ".fss" indexing files. When streams became very large this could have an impact on recovery time. Also, for encryption the fast path for determining if the indexing was current would require loading and decrypting the complete block. This design moves to a more traditional WAL and snapshot approach. The snapshots for the complete stream, including sumary information, global per subject information maps (PSIM) and per msg block details including summary and dmap, are processed asynchronously. The snapshot includes the msh block and has for the last record considered in the snapshot. On recovery the snapshot is read and processed and any additional records past the point of the snapshot itself are processed. To this end, any removal of a message has to be expressed as a delete tombstone that is always added the the fs.lmb file. These are processed on recovery and our indexing layer knows to skip them. Changing to this method drastically improves startup and recovery times, and has simplified the code. Some normal performance benefits have been seen as well. Signed-off-by: Derek Collison <derek@nats.io>	2023-08-30 16:12:45 -07:00
Waldemar Quevedo	3cec8dc451	test: fix TestNoRaceJetStreamMemstoreWithLargeInteriorDeletes flake Signed-off-by: Waldemar Quevedo <wally@nats.io>	2023-08-09 13:33:48 -07:00
Waldemar Quevedo	b081f8c2ea	test: update TestNoRaceJetStreamServiceImportAccountSwapIssue flake Signed-off-by: Waldemar Quevedo <wally@nats.io>	2023-08-08 01:07:19 -07:00
Derek Collison	75e1171bdd	No longer compacting multiple blocks, so remove test check Signed-off-by: Derek Collison <derek@nats.io>	2023-08-05 13:20:38 -07:00
Derek Collison	1f00d0e3f2	Track deleted with single avl.SeqSet dmap for now vs old method. Size of encoding may be a bit bigger then we wanted, but still way better then old method and very fast. Signed-off-by: Derek Collison <derek@nats.io>	2023-08-05 12:32:29 -07:00
Derek Collison	d27c44e6cd	Fix another test for more efficient deleteBlocks Signed-off-by: Derek Collison <derek@nats.io>	2023-07-30 12:02:49 -07:00
Derek Collison	cb9f8c0bf4	Fix to test for more efficient deleteBlocks Signed-off-by: Derek Collison <derek@nats.io>	2023-07-30 11:53:18 -07:00
Neil	b22cdf18fe	Add support for re-encrypting streams with new key (#4296 ) This adds a new `prev_key` field to the configuration file to allow transitioning from one encryption key to another. Signed-off-by: Neil Twigg <neil@nats.io>	2023-07-27 10:10:08 +01:00
Derek Collison	9a8f846dbb	Merge branch 'main' into dev	2023-07-26 22:22:34 -07:00
R.I.Pienaar	60e67ff9a5	Report correct consumer count in paged list response Previously the Total in paged responses would always equal the size of the first response this would stall paged clients after the first page. Now correctly sets the total so paging continues, improves the test to verify these aspects of the report Signed-off-by: R.I.Pienaar <rip@devco.net>	2023-07-27 07:52:24 +03:00
Neil Twigg	3df08c3f89	Add support for re-encrypting streams with new key Signed-off-by: Neil Twigg <neil@nats.io>	2023-07-26 14:04:28 +01:00
Derek Collison	ecf0fff411	Merge branch 'main' into dev	2023-07-17 10:41:51 -07:00
Neil Twigg	2527e11304	Increase threshold in `TestNoRaceJetStreamSlowFilteredInititalPendingAndFirstMsg` Signed-off-by: Neil Twigg <neil@nats.io>	2023-07-14 17:05:26 +01:00
Neil Twigg	1527000d1f	Use `crypto/rand.Read` instead of `math/rand.Read` As of Go 1.20, `math/rand.Read` is deprecated. In addition to that, it also isn't recommended for use in combination with anything cryptographic. I haven't replaced all `math/rand` with `crypto/rand` imports because there are still some legitimate uses for the `math/rand` package in some places. Signed-off-by: Neil Twigg <neil@nats.io>	2023-07-13 12:04:58 +01:00
Derek Collison	4d7cd26956	Add in support for segmented binary stream snapshots. Streams with many interior deletes was causing issues due to the fact that the interior deletes were represented as a sorted []uint64. This approach introduces 3 sub types of delete blocks, avl bitmask tree, a run length encoding, and the legacy format above. We also take into account large interior deletes such that on receiving a snapshot we can skip things we already know about. Signed-off-by: Derek Collison <derek@nats.io>	2023-07-03 08:41:33 -07:00
Derek Collison	3501ca3c1f	Merge branch 'main' into dev	2023-06-15 17:49:19 -07:00
Derek Collison	087a28a13e	When creating replicated mirrors where the source stream had a very large starting sequence number, the server would use excessive CPU and Memory. This is due to the mirroring functionality trying to skip messages when it detects a gap. In a replicated stream this puts excessive stress on the raft system. This step is not needed at all if the mirror stream has no messages, we can simply jump ahead. Signed-off-by: Derek Collison <derek@nats.io>	2023-06-15 17:20:15 -07:00
Derek Collison	a1f03513d8	Merge branch 'main' into dev	2023-06-09 09:29:13 -07:00
Derek Collison	9eeffbcf56	Fix performance issues with checkAckFloor. Bail early if new consumer, meaning stream sequence floor is 0. Decide which linear space to scan. Do no work if no pending and we just need to adjust which we do at the end. Also realized some tests were named wrong and were not being run, or were in wrong file. Signed-off-by: Derek Collison <derek@nats.io>	2023-06-08 18:45:03 -07:00
Derek Collison	2f2440f270	Merge branch 'main' into dev	2023-05-09 20:11:53 -07:00
Neil Twigg	d7ae2cbb5f	Backport #4120 to `main` Signed-off-by: Neil Twigg <neil@nats.io>	2023-05-09 11:24:35 +01:00
Ivan Kozlovic	95e4f2dfe1	Fixed accounts configuration reload Issues could manifest with subscription interest not properly propagated. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-05-03 14:35:06 -06:00
Derek Collison	e158c46884	Merge branch 'main' into dev	2023-04-30 17:37:47 -07:00
Derek Collison	c15cc0054a	When a fleet of leafnodes are isolated (not routed but using same cluster) we could do better at optimizing how we update the other leafnodes. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-30 17:08:16 -07:00
Ivan Kozlovic	70af04a63f	Other flappers. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-04-28 11:22:04 -06:00
Ivan Kozlovic	73ed55ae5b	Fixed flapper Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-04-28 10:55:32 -06:00
Derek Collison	4ebdb69daf	Merge branch 'main' into dev	2023-04-26 11:34:37 -07:00
Derek Collison	3340179b97	Fix flapper Signed-off-by: Derek Collison <derek@nats.io>	2023-04-24 22:22:27 -07:00
Derek Collison	1f6aa94405	SequenceSet is an AVL tree with variable bitmask nodes to contain large delete maps for streams. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-17 20:17:03 -07:00
Derek Collison	dfeac4a214	Merge branch 'main' into dev	2023-04-09 19:31:01 -07:00
Derek Collison	aee73a9c77	Fix flapping test Signed-off-by: Derek Collison <derek@nats.io>	2023-04-08 21:58:54 -07:00
Derek Collison	ffc49b8f86	Fix flapping test and data race in test Signed-off-by: Derek Collison <derek@nats.io>	2023-04-08 08:13:31 -07:00
Derek Collison	c5e19e19e7	Merge branch 'main' into dev	2023-04-03 21:22:53 -07:00
Derek Collison	07b34f707f	Make sure to never process next message requests inline Signed-off-by: Derek Collison <derek@nats.io>	2023-04-03 20:50:01 -07:00
Ivan Kozlovic	fe5d6bede4	Fixed accounts configuration reload Issues could manifest with subscription interest not properly propagated. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-04-03 09:32:28 -06:00
Ivan Kozlovic	105237cba8	[ADDED] Multiple routes and ability to have per-account routes New configuration fields: ``` cluster { ... pool_size: 5 accounts: ["A", "B"] } ``` The configuration `pool_size` in the example above means that this server will create 5 routes to a remote server, assuming that that server has the same `pool_size` setting. Accounts (which are not part of the `accounts[]` configuration) are assigned a specific route in this pool, and this will be the same route on all servers in the cluster. Accounts that are defined in the `accounts` field will each have a dedicated route connection. This will allow suppression of the account name in some of the route protocols, reducing bytes transmitted which may increase performance. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-04-03 09:32:25 -06:00
Derek Collison	94278e731a	More tweaks to test due to slow network proxy being more accurate Signed-off-by: Derek Collison <derek@nats.io>	2023-04-02 19:57:34 -07:00

1 2 3 4 5

228 Commits