Commit Graph

514 Commits

Author SHA1 Message Date
Derek Collison
783edaa36d [FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once. (#4592)
- [X] Tests added
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [x] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

### Changes proposed in this pull request:

Fixes a race condition in some leader failover scenarios leading to
messages being potentially sourced more than once.

In some failure scenarios where the current leader of a stream sourcing
from other stream(s) gets shutdown while publications are happening on
the stream(s) being sourced leads to `setLeader(true)` being called on
the new leader for the sourcing stream before all the messages having
been sourced by the previous leader are completely processed such that
when the new leader does it's reverse scan from the last message in it's
view of the stream in order to know what sequence number to start the
consumer for the stream being sourced from, such that the last
message(s) sourced by the previous leader get sourced again, leading to
some messages being sourced more than once.

The existing `TestNoRaceJetStreamSuperClusterSources` test would
sidestep the issue by relying on the deduplication window in the
sourcing stream. Without deduplication the test is a flapper.

This avoid the race condition by adding a small delay before scanning
for the last message(s) having been sourced and starting the sources'
consumer(s). Now the test (without using the deduplication window) never
fails because more messages than expected have been received in the
sourcing stream.

(Also adds a guard to give up if `setupSourceConsumers()` is called and
we are no longer the leader for the stream (that check was already
present in `setupMirrorConsumer()` so assuming it was forgotten for
`setupSourceConsumers()`)
2023-09-28 11:22:20 -07:00
Jean-Noël Moyne
71f96881ab [FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once.
- In some failure scenarios where the current leader of a stream sourcing from other stream(s) gets shutdown while publications are happening on the stream(s) being sourced leads to `setLeader(true)` being called on the new leader for the sourcing stream before all the messages having been sourced by the previous leader are completely processed such that when the new leader does it's reverse scan from the last message in it's view of the stream in order to know what sequence number to start the consumer for the stream being sourced from, such that the last message(s) sourced by the previous leader get sourced again, leading to some messages being sourced more than once.

The existing `TestNoRaceJetStreamSuperClusterSources` test would sidestep the issue by relying on the deduplication window in the sourcing stream. Without deduplication the test is a flapper.

This avoid the race condition by adding a small delay before scanning for the last message(s) having been sourced and starting the sources' consumer(s). Now the test (without using the deduplication window) never fails because more messages than expected have been received in the sourcing stream.

- Fix test TestJetStreamWorkQueueSourceRestart that expects the sourcing stream to get all of the expected messages right away by adding a small sleep before checking the number of messages pending on the consumer for that stream.

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-09-28 10:50:54 -07:00
Derek Collison
aeef0eff53 Add in warnings for filestore recover state if happy path fails.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 16:22:15 -07:00
Ivan Kozlovic
ca2a961fa7 [FIXED] JetStream: stream assignment data race
Two go routines could possibly execute the stream assignment at
the same time. A WaitGroup was used to prevent that, but an issue
caused the data race and possible concurrent execution.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-09-25 16:11:09 -06:00
Jean-Noël Moyne
9fc2603263 Removes the single subject transform dest field from StreamSource
Co-authored-by: Jean-Noël Moyne <jnmoyne@gmail.com>
Co-authored-by: Neil Twigg <neil@nats.io>

Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-20 15:28:45 +01:00
Jean-Noël Moyne
40ce0a9d7e Use filter_subject when calling extended consumer create API
The server consumer creation code is picky and does indeed not accept a request send to the ExT subject if that request specifies the subject filter in the array (even if there is only one entry in the array).

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
Signed-off-by: Neil Twigg <neil@nats.io>

Co-authored-by: Jean-Noël Moyne <jnmoyne@gmail.com>
Co-authored-by: Neil Twigg <neil@nats.io>
2023-09-20 10:51:19 +01:00
Neil Twigg
ad63d702c4 Use new consumer create subject when single subject filter specified in SubjectFilters
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-20 10:51:19 +01:00
Neil
ecbfac862c Ignore subject_transform_dest in stream sourcing (#4558)
This is a safer (less lines of code touched) alternative to #4557 for
now, which simply ignores the `subject_transform_dest` field in the API
and the stream assignments. We'll still look to merge the other PR to
clean up but will do so post-release when we have more time to test it.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-19 18:10:01 +01:00
Neil Twigg
887b92bfe2 Fix data race in setStreamAssignment by ensuring JS lock held
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-19 17:42:25 +01:00
Neil Twigg
dff12e465e Ignore subject_transform_dest in stream sourcing
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-19 11:35:26 +01:00
Neil Twigg
6f3f544841 Fix leaking timers in stream sources
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-14 10:30:24 +01:00
Derek Collison
349158a349 Fix for datarace accessing mirror tr
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-11 21:12:15 -07:00
Derek Collison
7d041da3c8 Fix for datarace on clfs
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-10 11:07:27 -07:00
Derek Collison
11f0ea99a4 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-06 13:33:47 -07:00
Waldemar Quevedo
e1574eca3e Revert "Enables 0s deduplication window duration when the stream has sources (#4476)"
This reverts commit db96238ad9.
2023-09-06 11:51:38 -07:00
Derek Collison
e7e8a330d4 Allow sync intervals to be set and the ability to have all data writes synchronous.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-04 11:05:13 -07:00
Derek Collison
1bb4a71a4d Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-02 12:15:40 -07:00
Jean-Noël Moyne
db96238ad9 Enables 0s deduplication window duration when the stream has sources (#4476)
- [X] Link to issue, e.g. `Resolves #NNN`
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [X] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

Resolves #4459

Allows the user to set the deduplication window duration to 0s when the
stream has sources defined. Remember that if the stream in question is
also listening on subjects as well as sourcing the deduplication window
is the same for sourced and listened messages.

---------

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-09-01 12:47:14 -07:00
Derek Collison
ad380d48f2 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-01 11:19:33 -07:00
Derek Collison
3a39786972 When we fail to deliver a message for a consumer, either through didNotDeliver() or LoadMsg() failure re-adjust delivered count and waitingRequest accounting.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-01 08:48:28 -07:00
Neil Twigg
487f58f16e Consumers inherit limits for max_ack_pending and inactive_threshold from stream
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-01 10:54:11 +01:00
Derek Collison
adef8281a2 Updates to the way meta indexing is handled for filestore.
Historically we kept indexing information, either by sequence or by subject, as a per msg block operation. These were the "*.idx" and "*.fss" indexing files. When streams became very large this could have an impact on recovery time. Also, for encryption the fast path for determining if the indexing was current would require loading and decrypting the complete block.

This design moves to a more traditional WAL and snapshot approach. The snapshots for the complete stream, including sumary information, global per subject information maps (PSIM) and per msg block details including summary and dmap, are processed asynchronously. The snapshot includes the msh block and has for the last record considered in the snapshot. On recovery the snapshot is read and processed and any additional records past the point of the snapshot itself are processed. To this end, any removal of a message has to be expressed as a delete tombstone that is always added the the fs.lmb file. These are processed on recovery and our indexing layer knows to skip them.

Changing to this method drastically improves startup and recovery times, and has simplified the code. Some normal performance benefits have been seen as well.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-30 16:12:45 -07:00
Jean-Noël Moyne
003daf3db8 Fixes possible message duplication in sourcing streams if upgrading to 2.10 and then back down to 2.9
2.10 adds a couple space separated fields to the sourcing message header from 2 to 4 but the current 2.9 code is too strict of checking the number of fields is exactly 2 rather than at least 2

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-08-30 15:27:26 -07:00
Jean-Noël Moyne
62f62d4071 Adds sfs to sourceInfo
Adds sfs to SourceInfo such that transforms with just a subject filter (and no transformation, meaning that the transform pointer in streamInfo is nil) can still be reflected in SourceInfo, which is important since the filtering is still happening, just no transformation as well.

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-08-19 12:26:42 -07:00
Jean-Noël Moyne
0cc43acb84 Fix Nats-Stream-Source header parsing when using multi-filter transforms
Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-08-15 19:22:09 -07:00
Jean-Noël Moyne
c2d3ef1021 Fix potential out of range for stream source transform update.
Clean up un-needed if statement as it's ok to call NewSubjectTransform with an empty destination (ie no transformation) it will return nil

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-08-15 16:35:19 -07:00
Jean-Noël Moyne
b839c53abc [ADDED] Full StreamSource (filters, transforms) functionality to stream mirror (#4354)
- [X] Tests added
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [x] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

Follow up to #4276 extending to Mirror the full StreamSource
functionality.

---------

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-08-12 15:17:48 -07:00
Neil Twigg
3c9c124b94 When checking replica count when updating retention, make sure stream assignment is set first
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-11 14:15:49 +01:00
Neil Twigg
d7f76da597 Allow switching from limits-based to interest-based retention in stream update
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-09 11:46:49 +01:00
Derek Collison
8079495903 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-04 10:15:35 -07:00
Derek Collison
cbe85c826a Also reset clseq to avoid immediate sequence mismatch
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-03 12:40:17 -07:00
Derek Collison
081140ee67 When taking over make sure to sync and reset clfs for clustered streams.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-03 10:41:10 -07:00
Derek Collison
42752ec551 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-01 21:46:54 -07:00
Derek Collison
5c8db89506 Make sure we do not drift on accounting.
Three issues were found and resolved.

1. Purge replays after recovery could execute full purge.
2. Callback was registered without lock, which could lead to skew.
3. Cluster reset could stop stream store and recreate it, which could lead to double accounting.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-01 18:35:20 -07:00
Jean-Noël Moyne
449b27535e [ADDED] Support for multi-filter in stream sources (#4276)
- [X] Tests added
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [X] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

### Changes proposed in this pull request:

Adds support for multi-filter (and associated transform destinations) to
stream sources

---------

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-08-01 10:50:11 -07:00
Neil Twigg
3c3ad47dd1 Prevent configuring first_seq on mirrors
Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-28 17:08:11 +01:00
Neil
b22cdf18fe Add support for re-encrypting streams with new key (#4296)
This adds a new `prev_key` field to the configuration file to allow
transitioning from one encryption key to another.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-27 10:10:08 +01:00
Derek Collison
9a8f846dbb Merge branch 'main' into dev 2023-07-26 22:22:34 -07:00
Neil Twigg
3df08c3f89 Add support for re-encrypting streams with new key
Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-26 14:04:28 +01:00
Neil Twigg
9538a1895b Add first_seq to StreamConfig for file store
Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-25 11:27:07 +01:00
Waldemar Quevedo
bbfeb2a887 Fix typo on internal function
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-07-22 20:40:26 -07:00
Derek Collison
e1a00a883c Fix bug that would race around check for last sequence per subject
Signed-off-by: Derek Collison <derek@nats.io>
2023-07-18 12:39:06 -07:00
Derek Collison
244dda809c Fix bug that would race around check for last sequence per subject
Signed-off-by: Derek Collison <derek@nats.io>
2023-07-18 11:29:59 -07:00
Jean-Noël Moyne
69e137c3d2 [FIXED] Stream config idempotency (#4292)
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [x] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

Fixes a behavior where idempotency of re-defining the same stream more
than once (with the same attributes) was broken due to the DeepEqual
failing due to the StreamSource struct received from the client app not
having a value for the `iname` structure field (as it's internal) but
the StreamSource struct return from `mset.config()` would have it set.

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-07-07 11:32:09 -07:00
Neil Twigg
d2615b76f2 Annotate CPU and goroutine profiles with account/stream/consumer info
Signed-off-by: Neil Twigg <neil@nats.io>
2023-06-20 19:02:40 +01:00
Derek Collison
3501ca3c1f Merge branch 'main' into dev 2023-06-15 17:49:19 -07:00
Derek Collison
087a28a13e When creating replicated mirrors where the source stream had a very large starting sequence number, the server would use excessive CPU and Memory.
This is due to the mirroring functionality trying to skip messages when it detects a gap. In a replicated stream this puts excessive stress on the raft system.
This step is not needed at all if the mirror stream has no messages, we can simply jump ahead.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-15 17:20:15 -07:00
Jean-Noël Moyne
7ff114162c Adds the same check for valid stream name for Mirror
Fix test using invalid stream names

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-06-08 07:49:47 -07:00
Jean-Noël Moyne
bd6c15d24e Adds a check that the stream name of a stream source is valid and associated new error if it isn't.
Adresses https://github.com/nats-io/nats-server/issues/4141

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-06-08 07:49:46 -07:00
Derek Collison
df5df3ce99 Panic fixes (#4214)
- [ ] Link to issue, e.g. `Resolves #NNN`
 - [ ] Documentation added (if applicable)
 - [ ] Tests added
- [ ] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [ ] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [x] Build is green in Travis CI
- [x] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

Resolves panics in the code.

### Changes proposed in this pull request:

 - This PR fixes some of the panics in the code
2023-06-05 13:02:05 -07:00