When consumers were R1 and the same name was reused, server restarts
could try to cleanup old ones and effect the new ones. These changes
allow consumer name reuse more effectively during server restarts.
Signed-off-by: Derek Collison <derek@nats.io>
- [ ] Link to issue, e.g. `Resolves #NNN`
- [ ] Documentation added (if applicable)
- [ ] Tests added
- [ ] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [ ] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
- [x] Build is green in Travis CI
- [x] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)
Resolves panics in the code.
### Changes proposed in this pull request:
- This PR fixes some of the panics in the code
1. When catching up do not try forever and if needed reset cluster state.
2. In checking if a stream is healthy check for node drift.
3. When restarting a stream make sure the current node is stopped.
Signed-off-by: Derek Collison <derek@nats.io>
One should not access s.opts directly but instead use s.getOpts().
Also, server lock needs to be released when performing an account
lookup (since this may result in server lock being acquired).
A function was calling s.LookupAccount under the client lock, which
technically creates a lock inversion situation.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is a simpler way to determine if we need to consider a snapshot that involves much less time and CPU and memory.
Signed-off-by: Derek Collison <derek@nats.io>
Fixes this test [TestJetStreamClusterDeleteAndRestoreAndRestart] which would flap since it would not snapshot since hash was same but had entries that would erase stream data.
Signed-off-by: Derek Collison <derek@nats.io>
Also sync other consumers when taking over as leader but no need to process snapshots when we are in fact the leader.
Signed-off-by: Derek Collison <derek@nats.io>
Use monitor check for streams like consumers.
Make sure to stop raft layer if exiting monitorConsumer early.
Allow consumers to force a snapshot on leadership change.
Signed-off-by: Derek Collison <derek@nats.io>
1. If we are retention based, make sure our consumers are running before entering into monitorStream logic.
2. If we skip messages and are interest based, make sure we check for a preAck state.
3. On finalization of recovery for consumers have them check against the interest based stream.
4. Do not process ack state updates if consumer is closed and shutting down.
5. When processing final state for a stream after upper layer catchup, check all attached consumers for ack skew.
6. During catchup of stream messages consult preAck state and skip messages as needed.
Signed-off-by: Derek Collison <derek@nats.io>
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.
In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.
Signed-off-by: Derek Collison <derek@nats.io>
This could lead to instability in the system.
The bug would manifest in replicated consumers when certain messages could be acked out of order, and, the pending list would never go to zero.
Signed-off-by: Derek Collison <derek@nats.io>