1. If reset ignore Applied() that are greater then our commit.
2. Improved StepDown() by placing at back of queue if preferred.
3. Improved handling of leadership transfer during StepDown().
4. Do not store EntryLeaderTransfer records on disk.
5. Remove un-needed processing of older terms.
6. If append entry has higher term, also inherit pterm.
7. Only inherit a candidate's term if we decide to vote for them.
Signed-off-by: Derek Collison <derek@nats.io>
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.
In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.
Signed-off-by: Derek Collison <derek@nats.io>
Do not exit candidate state in place when stepping down, would cause double vote requests.
When truncating our WAL make sure to adjust commit and applied as needed.
On a miss where the index is less than ours, if we can not find the entry reset our state.
For a vote, if last processed term is higher than ours always agree if no vote has been cast.
If terms are equal make sure the requestor's index is at least as high as ours.
If we decide not to vote for someone, and we have not voted and we are a better fit, move forward with a campaign.
Signed-off-by: Derek Collison <derek@nats.io>
Due to bug, in rare circumstances could write an empty snapshot for aplied == 0. This would cause a spinning at the raft layer.
1. Allow Truncate() to also properly do a reset of the store when terms were only mismatch.
2. During testing fixed memstore truncate and also made sure per subject info was also cleaned up.
3. Then added fix to detect a bad snapshot on initialization and remove.
4. Do not allow snapshots for applied == 0.
Signed-off-by: Derek Collison <derek@nats.io>
We were snappshotting more then needed, so double check that we should be doing this at the stream and consumer level.
At the raft level, we should have always been compacting the WAL to last+1, so made that consistent. Also fixed bug that would not skip last if more items behind the snapshot.
Signed-off-by: Derek Collison <derek@nats.io>
The bug was when a timestamp for the pending state was exactly -1 which could happen based on timing of the redlivered pending items which would set pending.Timestamp into the future potentially and the timing on the encodeConsumerState call.
Minor fixes to raft.
Signed-off-by: Derek Collison <derek@nats.io>
- didRemove in applyMetaEntries() could be reset when processing
multiple entries
- change "no race" test names to include JetStream
- separate raft nodes leader stepdown and stop in server
shutdown process
- in InstallSnapshot, call wal.Compact() with lastIndex+1
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Send snapshot only if leader
- When processing snapshot, start with a smaller inactivity interval
that will double up to 10sec or use 10sec directly once we get a
message. Reason for that is that it is possible that the request
for snapshot is sent while the leader has not yet setup the subscription
that receives the requests (or subscription has not fully reached the
cluster).
- Don't remember snapfile on err.
- Do not consider current if we have not had any activity.
- Stabilize stream scale up under active heavy publishing.
- Due to the publish pressure move the check for followers direct subs spinning up til after we stop publishing.
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
With inlining election timeout updates we double the lock contention and most likely introduced head of line issues for routes under heavy load.
Also slowing down heartbeats with so many assets being deployed in our user ecosystem, also moved the normal follower to candidate timing further out, similar to the lost quorum.
Note that the happy path transfer will still be very quick.
Signed-off-by: Derek Collison <derek@nats.io>
This would happen in situation where a node receives an append
entry with a term higher than the node's (current leader).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Wait of some sort of routing to be in place before starting
the raft run loop
- Remove use of lock in apiDispatch that was not necessary but
could have cause a route to block, causing memory growth, etc..
Unrelated rename of some tests so that they start with TestJetStream
and TestJetStreamCluster for cluster tests, fixed some flappers
and ensure that tests that change RAFT timeouts put them back
to default values on exit.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
During elected stepdown and transfer allow the new leader to take over before we stepdown.
We could receive a leader change, so make sure to also check migration state.
Signed-off-by: Derek Collison <derek@nats.io>