1. During ackMsg processing hold write lock to block concurrent access.
2. Check for presence of preAcks before and force removal if present.
3. Rework check for orphan msgs on startup to use checkStateForInterestStream().
Signed-off-by: Derek Collison <derek@nats.io>
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.
In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.
Signed-off-by: Derek Collison <derek@nats.io>
We noticed this was being called alot in user environments.
When the consumer was filtered with a wilcard and the stream had a high cardinality of subjects and was falling behind this could take a substantial amount of time.
Signed-off-by: Derek Collison <derek@nats.io>
Until now, purge updated all consumers sequences
even if purge subject was only a subset of given consumer filter.
Because of that, even messages from not purged subjects were not fetched
or properly accounted for existing consumers.
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
We were snappshotting more then needed, so double check that we should be doing this at the stream and consumer level.
At the raft level, we should have always been compacting the WAL to last+1, so made that consistent. Also fixed bug that would not skip last if more items behind the snapshot.
Signed-off-by: Derek Collison <derek@nats.io>
When a stream had a large number of consumers on a server that were sparse, the signaling mechanism would do a linear scan to signal matching consumers. As usage patterns have continued to have more consumers that are filteres and sparse, meaning a message is destined for a single or small number of consumers.
This change moves selection to a sublist that tracks only active consumer leaders for selection, which optimizes selection of consumers to signal when the number of consumers is large.
Signed-off-by: Derek Collison <derek@nats.io>
First issue was applications not getting any response.
However, there was also a more serious issue that would create multiple raft groups for each concurrent request.
The servers would only run one stream monitor loop, however they would update the state to the new raft group's name, so on server restart the stream would be using a different raft group then existing servers.
Signed-off-by: Derek Collison <derek@nats.io>
A stream with a WorkQueue retention policy is supposed to allow
more than one consumer if they user filtered subjects, but those
subjects should not overlap.
There was an issue that if a new consumer had a filter subject
"wider" than an existing one, the error was not detected and
the new consumer was incorrectly accepted.
Resolves#3639
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If start by time is before what we remember during recovery use that instead
Resolves#3559
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If a stream is created or updated with a negative replicas count,
and error is now returned. Same for consumers.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
"meta_cluster": {
"name": "local",
"leader": "A",
"peer": "NUmM6cRx",
"replicas": [
{
"name": "B",
"current": true,
"active": 690369000,
"peer": "b2oh2L6w"
},
{
"name": "Server name unknown at this time (peerID: jZ6RvVRH)",
"current": false,
"offline": true,
"active": 0,
"peer": "jZ6RvVRH"
}
],
"cluster_size": 3
}
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The CLI will now be able to display the peer IDs in MetaGroupInfo
if it choses to do so, and possibly help user select the peer ID
from a list with a new command to remove by peer ID instead of
by server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>