Use monitor check for streams like consumers.
Make sure to stop raft layer if exiting monitorConsumer early.
Allow consumers to force a snapshot on leadership change.
Signed-off-by: Derek Collison <derek@nats.io>
1. If we are retention based, make sure our consumers are running before entering into monitorStream logic.
2. If we skip messages and are interest based, make sure we check for a preAck state.
3. On finalization of recovery for consumers have them check against the interest based stream.
4. Do not process ack state updates if consumer is closed and shutting down.
5. When processing final state for a stream after upper layer catchup, check all attached consumers for ack skew.
6. During catchup of stream messages consult preAck state and skip messages as needed.
Signed-off-by: Derek Collison <derek@nats.io>
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.
In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.
Signed-off-by: Derek Collison <derek@nats.io>
This could lead to instability in the system.
The bug would manifest in replicated consumers when certain messages could be acked out of order, and, the pending list would never go to zero.
Signed-off-by: Derek Collison <derek@nats.io>
We now check for orphaned streams or consumers in clustered mode after our metastate has recovered.
Do not warn on failures for installing raft snapshots if this is due to the node being closed.
During a stream update make sure to check to see if our group assignment has changed out from underneath of us.
Stream info should always delay if we are not the leader. Could cause duplicate responses when it should not.
Signed-off-by: Derek Collison <derek@nats.io>
This can help sync on restarts and improve ghost ephemerals. Also added more code to suppress respnses and API audits when we know we are recovering.
Signed-off-by: Derek Collison <derek@nats.io>
If there was a spurious error on restart, or possibly on an update, we could delete a consumer which was the incorrect behavior.
Signed-off-by: Derek Collison <derek@nats.io>
We were snappshotting more then needed, so double check that we should be doing this at the stream and consumer level.
At the raft level, we should have always been compacting the WAL to last+1, so made that consistent. Also fixed bug that would not skip last if more items behind the snapshot.
Signed-off-by: Derek Collison <derek@nats.io>
1. Only snapshot with minSnap time window like consumers and meta. Make it consistent for all to 5s.
2. Only snapshot at the end of processing all entries pending vs inside the loop.
3. Use fast state when calculating sync request, do not need deleted details there.
Signed-off-by: Derek Collison <derek@nats.io>