This really was a cut/paste/typo error, the `else` should not have been
there. Came up in my testing.
The effect was that when there was a pending `PUBREL` in JetStream, and
a matching client connects - we would sometimes attempt to deliver the
PUBREL immediately once connected. `cpending` was already initialized,
but the pubrel map was not (yet).
This really was a cut/paste/typo error.
The effect was that when there was a pending PUBREL in JetStream, we would sometimes attempt to deliver it immediately once the client connected, cpending was already initialized, but the pubrel map was not (yet).
This simplifies the PR template, which is a bit cumbersome, and instead
replaces it with a simpler notice that includes a template sign-off and
a new `CONTRIBUTING.md` document.
Signed-off-by: Neil Twigg <neil@nats.io>
Co-authored-by: Byron Ruth <byron@nats.io>
Under heavy load with max msgs per subject of 1 the dmap, when
considered empty and resetting the initial min, could cause lookup
misses that would lead to excess messages in a stream and longer restore
issues.
Signed-off-by: Derek Collison <derek@nats.io>
Holding onto the compressor and not recycling the internal byte slice
could cause havoc with GC.
This needs to be improved but this at least should allow the GC to
cleanup more effectively.
Signed-off-by: Derek Collison <derek@nats.io>
Several strategies are used which are listed below.
1. Checking a RaftNode to see if it is the leader now uses atomics.
2. Checking if we are the JetStream meta leader from the server now uses
an atomic.
3. Accessing the JetStream context no longer requires a server lock,
uses atomic.Pointer.
4. Filestore syncBlocks would hold msgBlock locks during sync, now does
not.
Signed-off-by: Derek Collison <derek@nats.io>
Several strategies which are listed below.
1. Checking a RaftNode to see if it is the leader now uses atomics.
2. Checking if we are the JetStream meta leader from the server now uses an atomic.
3. Accessing the JetStream context no longer requires a server lock, uses atomic.Pointer.
4. Filestore syncBlocks would hold msgBlock locks during sync, now does not.
Signed-off-by: Derek Collison <derek@nats.io>
A test-only fix.
I can not reproduce the flapping behavior, but did see a race during
debugging suggesting that the CONNACK is delivered to the test before
`mqttProcessConnect` finishes and releases the record.
- [X] Tests added
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
- [x] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)
### Changes proposed in this pull request:
Fixes a race condition in some leader failover scenarios leading to
messages being potentially sourced more than once.
In some failure scenarios where the current leader of a stream sourcing
from other stream(s) gets shutdown while publications are happening on
the stream(s) being sourced leads to `setLeader(true)` being called on
the new leader for the sourcing stream before all the messages having
been sourced by the previous leader are completely processed such that
when the new leader does it's reverse scan from the last message in it's
view of the stream in order to know what sequence number to start the
consumer for the stream being sourced from, such that the last
message(s) sourced by the previous leader get sourced again, leading to
some messages being sourced more than once.
The existing `TestNoRaceJetStreamSuperClusterSources` test would
sidestep the issue by relying on the deduplication window in the
sourcing stream. Without deduplication the test is a flapper.
This avoid the race condition by adding a small delay before scanning
for the last message(s) having been sourced and starting the sources'
consumer(s). Now the test (without using the deduplication window) never
fails because more messages than expected have been received in the
sourcing stream.
(Also adds a guard to give up if `setupSourceConsumers()` is called and
we are no longer the leader for the stream (that check was already
present in `setupMirrorConsumer()` so assuming it was forgotten for
`setupSourceConsumers()`)