This is due to the mirroring functionality trying to skip messages when it detects a gap. In a replicated stream this puts excessive stress on the raft system.
This step is not needed at all if the mirror stream has no messages, we can simply jump ahead.
Signed-off-by: Derek Collison <derek@nats.io>
- [ ] Link to issue, e.g. `Resolves #NNN`
- [ ] Documentation added (if applicable)
- [ ] Tests added
- [ ] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [ ] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
- [x] Build is green in Travis CI
- [x] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)
Resolves panics in the code.
### Changes proposed in this pull request:
- This PR fixes some of the panics in the code
This can happen when we reset a stream internally and the stream had a prior snapshot.
Also make sure to always release resources back to the account regardless if the store is no longer present.
Signed-off-by: Derek Collison <derek@nats.io>
When adding or updating sources/mirrors, server was checking if the stream with
a given name exists to check for subject overlaps, among other things.
However, if sourced/mirrored stream was `External`, checks should
not be executed, as not only stream would never be found,
but also, if `External` stream had the same name as the sourcing stream,
the check would be wrongly performed against itself.
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
1. During ackMsg processing hold write lock to block concurrent access.
2. Check for presence of preAcks before and force removal if present.
3. Rework check for orphan msgs on startup to use checkStateForInterestStream().
Signed-off-by: Derek Collison <derek@nats.io>
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.
In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.
Signed-off-by: Derek Collison <derek@nats.io>
We noticed this was being called alot in user environments.
When the consumer was filtered with a wilcard and the stream had a high cardinality of subjects and was falling behind this could take a substantial amount of time.
Signed-off-by: Derek Collison <derek@nats.io>
Until now, purge updated all consumers sequences
even if purge subject was only a subset of given consumer filter.
Because of that, even messages from not purged subjects were not fetched
or properly accounted for existing consumers.
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
We were snappshotting more then needed, so double check that we should be doing this at the stream and consumer level.
At the raft level, we should have always been compacting the WAL to last+1, so made that consistent. Also fixed bug that would not skip last if more items behind the snapshot.
Signed-off-by: Derek Collison <derek@nats.io>
When a stream had a large number of consumers on a server that were sparse, the signaling mechanism would do a linear scan to signal matching consumers. As usage patterns have continued to have more consumers that are filteres and sparse, meaning a message is destined for a single or small number of consumers.
This change moves selection to a sublist that tracks only active consumer leaders for selection, which optimizes selection of consumers to signal when the number of consumers is large.
Signed-off-by: Derek Collison <derek@nats.io>
First issue was applications not getting any response.
However, there was also a more serious issue that would create multiple raft groups for each concurrent request.
The servers would only run one stream monitor loop, however they would update the state to the new raft group's name, so on server restart the stream would be using a different raft group then existing servers.
Signed-off-by: Derek Collison <derek@nats.io>
A stream with a WorkQueue retention policy is supposed to allow
more than one consumer if they user filtered subjects, but those
subjects should not overlap.
There was an issue that if a new consumer had a filter subject
"wider" than an existing one, the error was not detected and
the new consumer was incorrectly accepted.
Resolves#3639
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>