- [X] Tests added
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
- [x] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)
### Changes proposed in this pull request:
Fixes a race condition in some leader failover scenarios leading to
messages being potentially sourced more than once.
In some failure scenarios where the current leader of a stream sourcing
from other stream(s) gets shutdown while publications are happening on
the stream(s) being sourced leads to `setLeader(true)` being called on
the new leader for the sourcing stream before all the messages having
been sourced by the previous leader are completely processed such that
when the new leader does it's reverse scan from the last message in it's
view of the stream in order to know what sequence number to start the
consumer for the stream being sourced from, such that the last
message(s) sourced by the previous leader get sourced again, leading to
some messages being sourced more than once.
The existing `TestNoRaceJetStreamSuperClusterSources` test would
sidestep the issue by relying on the deduplication window in the
sourcing stream. Without deduplication the test is a flapper.
This avoid the race condition by adding a small delay before scanning
for the last message(s) having been sourced and starting the sources'
consumer(s). Now the test (without using the deduplication window) never
fails because more messages than expected have been received in the
sourcing stream.
(Also adds a guard to give up if `setupSourceConsumers()` is called and
we are no longer the leader for the stream (that check was already
present in `setupMirrorConsumer()` so assuming it was forgotten for
`setupSourceConsumers()`)
- In some failure scenarios where the current leader of a stream sourcing from other stream(s) gets shutdown while publications are happening on the stream(s) being sourced leads to `setLeader(true)` being called on the new leader for the sourcing stream before all the messages having been sourced by the previous leader are completely processed such that when the new leader does it's reverse scan from the last message in it's view of the stream in order to know what sequence number to start the consumer for the stream being sourced from, such that the last message(s) sourced by the previous leader get sourced again, leading to some messages being sourced more than once.
The existing `TestNoRaceJetStreamSuperClusterSources` test would sidestep the issue by relying on the deduplication window in the sourcing stream. Without deduplication the test is a flapper.
This avoid the race condition by adding a small delay before scanning for the last message(s) having been sourced and starting the sources' consumer(s). Now the test (without using the deduplication window) never fails because more messages than expected have been received in the sourcing stream.
- Fix test TestJetStreamWorkQueueSourceRestart that expects the sourcing stream to get all of the expected messages right away by adding a small sleep before checking the number of messages pending on the consumer for that stream.
Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
The issue is with a server that has a route for a given account but
connects to a server that does not support it. The creation of the route
for this account will fail - as expected - and the server will stop
trying to create the route for this account. But it needs to retry to
create this route if it were to reconnect to that same URL in case the
server (or its config) is updated to support a route for this account.
There was also an issue even with 2.10.0 servers in some gossip
situations. Namely, if server B is soliciting connections to A (but not
vice-versa) and A would solicit connections to C (but not vice-versa).
In this case, connections for pinned-accounts would not be created.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The issue is with a server that has a route for a given account
but connects to a server that does not support it. The creation
of the route for this account will fail - as expected - and the
server will stop trying to create the route for this account.
But it needs to retry to create this route if it were to reconnect
to that same URL in case the server (or its config) is updated
to support a route for this account.
There was also an issue even with 2.10.0 servers in some gossip
situations. Namely, if server B is soliciting connections to A
(but not vice-versa) and A would solicit connections to C (but
not vice-versa). In this case, connections for pinned-accounts
would not be created.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Under certain circumstances we could delay recovery if the state file
pointed to an absent msg block.
Found additional places to mark dirty and optionally kick the flusher.
Signed-off-by: Derek Collison <derek@nats.io>
Here we know that if we can't find the stream but have the stream
assignment, this is a distinct possibility. So we wait, since not
processed inline, to see if it appears.
Fixes TestJetStreamClusterParallelStreamCreation as well that could
flap.
Signed-off-by: Derek Collison <derek@nats.io>
Here we know that if we can't find the stream but have the stream assignment, this is a distinct possibility. So we wait, since not processed inline, to see if it appears.
Fixes TestJetStreamClusterParallelStreamCreation as well that could flap.
Signed-off-by: Derek Collison <derek@nats.io>
- [ ] Link to issue, e.g. `Resolves #NNN`
- [ ] Documentation added (if applicable)
- [ ] Tests added
- [x] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [ ] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
- [ ] Build is green in Travis CI
- [x] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)
Resolves #
### Changes proposed in this pull request:
Miscellaneous fixes and improvements to server JetStream benchmarks.
Reviewers: notice the PR is broken down in 5 commit, each one is trivial
to review individually, but they can be definitely squashed before
merging for easier cherry-picking.
We may be creating unnecessary lock contention on the sublist when the
cache is disabled by taking the write lock anyway.
Signed-off-by: Neil Twigg <neil@nats.io>
We added some small performance tweak to the func
checkForReverseEntries. In addition, we move the shutdown bool for the
server to an atomic so we could efficiently check it when doing unsubs.
If the server is going away there is really no need since the other side
will do its own thing when the connection goes away. And finally we do
not have to range over the account rrMap if the subscription going away
is a reserved reply.
Signed-off-by: Derek Collison <derek@nats.io>
There are changes in recent versions of nats.go that seemingly increase
the size of the stream info and cause this test to fail consistently
with `norace_test.go:4259: require no error, but got: nats: maximum
payload exceeded`. Fix the test to use larger limits and payloads so we
are not sensitive to this when nats.go is upgraded.
Signed-off-by: Neil Twigg <neil@nats.io>
- [ ] Link to issue, e.g. `Resolves #NNN`
- [ ] Documentation added (if applicable)
- [ ] Tests added
- [ ] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [ ] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
- [ ] Build is green in Travis CI
- [ ] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)
Resolves #
### Changes proposed in this pull request:
- Fixes links to the `nats-general` repository.
There was a lock inversion but low risk since it happened during server
initialization. Still fixed it and added the ordering in
locksordering.txt file.
Also fixed multiple lock inversions that were caused by tests.
Signed-off-by: Ivan Kozlovic <ijkozlovic@gmail.com>
Two go routines could possibly execute the stream assignment at the same
time. A WaitGroup was used to prevent that, but an issue caused the data
race and possible concurrent execution.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Two go routines could possibly execute the stream assignment at
the same time. A WaitGroup was used to prevent that, but an issue
caused the data race and possible concurrent execution.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
There was a lock inversion but low risk since it happened during
server initialization. Still fixed it and added the ordering
in locksordering.txt file.
Also fixed multiple lock inversions that were caused by tests.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The new `prof_block_rate` configuration option allows the block profiler
to be enabled on demand after it was previously disabled in #4402. The
option is also reloadable so that it can be changed after startup.
Signed-off-by: Neil Twigg <neil@nats.io>