Commit Graph

223 Commits

Author SHA1 Message Date
Derek Collison
e5625b9d9b If a leader is asked for an item and we have no items left, make sure to also step-down.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-25 10:20:07 -07:00
Derek Collison
2669f77190 Make sure to reset election timer on catching up
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-24 19:58:08 -07:00
Neil Twigg
6d9955d212 Send peer state when adding peers
Signed-off-by: Neil Twigg <neil@nats.io>
2023-06-08 15:25:18 +01:00
Derek Collison
238282d974 Fix some data races detected in internal testing
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-03 13:58:15 -07:00
Derek Collison
8e825001d2 When we receive a catchup request for an item beyond our current state, we should stepdown.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-17 17:30:35 -07:00
Derek Collison
a17357c6ae When doing leadership transfer stepdown as soon as we know we have sent the EntryLeaderTransfer entry.
Delaying could allow something to be sent from the old leader which would cause the new leader to bail on being a candidate even though it would have gotten all the votes.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-10 12:27:33 -07:00
Derek Collison
717afae9ef When doing a leader transfer clear vote state on leader and when non-chosen peers receive the update
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-10 07:49:22 -07:00
Derek Collison
b9af0d0294 Only do no-leader stepdown on transfer after a delay if we are still the leader
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-09 17:19:14 -07:00
Derek Collison
ae73f7be55 Small raft improvements.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-02 16:44:27 -07:00
Derek Collison
546dd0c9ab Make sure we can recover an underlying node being stopped.
Do not return healthy if the node is closed, and wait a bit longer for forward progress.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-29 07:42:23 -07:00
Derek Collison
3c964a12d7 Migration could be delayed due to transferring leadership while the new leader was still paused.
Also check quicker but slow down if the state we need to have is not there yet.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-25 18:58:49 -07:00
Waldemar Quevedo
d9cc8b0363 fix formatting of raft debug log
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-04-22 07:07:08 +02:00
Derek Collison
a5f5603645 Reset our WAL on edge conditions instead of trying to recover.
Also if we are timing out and trying to become a candidate but are doing a catchup check if we are stalled.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-15 12:23:44 -07:00
Derek Collison
66ca46e145 If we see another leader with same term we should step down
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-14 16:21:40 -07:00
Waldemar Quevedo
a4833d0889 Fix raft log debug reloading
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-04-13 14:57:04 -07:00
Derek Collison
808a2e8c90 On failure to send snapshot to follower, also reset, and on reset make sure to reset term
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-12 11:48:22 -07:00
Derek Collison
a92bb9fe61 Fix bad unlock which could cause crash
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-12 11:48:22 -07:00
Derek Collison
340fcc90bc Basic raft tests
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-12 11:48:22 -07:00
Derek Collison
80a57a3d51 Remove peers from string intern map
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-09 08:01:36 -07:00
Derek Collison
6fa55540a7 Better us of entryPool
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-09 07:48:31 -07:00
Derek Collison
35bb7c1737 Pool CommittedEntries as well with a ReturnToPool() that will also recycle the Entry. Needs to integrate with upper layers
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-08 11:34:10 -07:00
Derek Collison
3be25fdedb Do not put an appendEntryResponse back in the pool if catching up until complete
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-07 10:30:06 -07:00
Derek Collison
2ff6f18ccd Use sync.Map for peers vs internal storage for appendEntryResponses
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-07 08:16:42 -07:00
Derek Collison
1caa56a34f Use pools for appendEntries
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-07 07:38:19 -07:00
Derek Collison
3afdb99f75 Use pools for appendEntryResponses. Also use interior space for peer name from the wire
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-07 06:43:51 -07:00
Derek Collison
e76b0b9b96 Move check for out of resources which would want a read lock out of inline processing
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-05 20:28:19 -07:00
Derek Collison
b806a8e7e7 Do not opt-out of normal processing for leadership transfers, but make sure they are only processed if explicitly new
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 14:46:55 -07:00
Derek Collison
58ca525b3b Process replicated ack regardless of store update. Delay but still stepdown
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:16 -07:00
Derek Collison
874b2b2e02 Hold the lock while checking health since we could update catchup state.
Do not stepdown right away when executing leadership transfer, wait for the commit.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:08 -07:00
Derek Collison
4646f4af5d Do not allow any JetStream leaders to be placed on a lameduck server
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 20:15:41 -07:00
Derek Collison
e274693490 On bad or corrupt message load during commit, reset WAL vs mark write error
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 14:07:14 -07:00
Derek Collison
35d1a7747a Snapshots of no length can hold state as well
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:04 -07:00
Derek Collison
182bf6cbae Bug fixes and general stability improvements.
1. If reset ignore Applied() that are greater then our commit.
2. Improved StepDown() by placing at back of queue if preferred.
3. Improved handling of leadership transfer during StepDown().
4. Do not store EntryLeaderTransfer records on disk.
5. Remove un-needed processing of older terms.
6. If append entry has higher term, also inherit pterm.
7. Only inherit a candidate's term if we decide to vote for them.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:46 -07:00
Derek Collison
ec89823e1c Only process out of resources condition from raft layer if err matches condition
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-23 08:13:22 -07:00
Derek Collison
ed9de4b0a1 Improved publisher performance under some instances of asymmetric network latency clusters on interest based streams.
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.

In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-20 20:53:45 -07:00
Derek Collison
0c1301ec14 Fix for data race
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-19 10:52:52 -07:00
Derek Collison
531fadd3e2 Don't warn if error is node closed.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-15 16:45:33 -07:00
Derek Collison
2beca1a2a6 Partial cache errors are also not critical write errors
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 22:52:02 -08:00
Derek Collison
c586014477 General raft improvements under heavy corruption.
Do not exit candidate state in place when stepping down, would cause double vote requests.
When truncating our WAL make sure to adjust commit and applied as needed.
On a miss where the index is less than ours, if we can not find the entry reset our state.
For a vote, if last processed term is higher than ours always agree if no vote has been cast.
If terms are equal make sure the requestor's index is at least as high as ours.
If we decide not to vote for someone, and we have not voted and we are a better fit, move forward with a campaign.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 22:06:50 -08:00
Derek Collison
fa8afba68f Only warn on write errors if not closed in case they linger under pressure and blocking on dios
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-27 18:56:55 -08:00
Derek Collison
2711460b7b Prevent benign spin between competing leaders with same index but differen term.
Remove lock from route processing for updating peers progress, altready handled in trackPeer.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-27 11:21:33 -08:00
Derek Collison
4fa0ea32c3 [FIXED] If a truncate for a raft WAL failed we could spin.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-25 19:07:27 -08:00
Derek Collison
ea2bfad8ea Fixed bug where snapshot would not compact through applied. This mean a subsequent request for exactly applied would return that entry only not the full state snapshot.
Fixed bug where we would not snapshot when we should.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-23 22:19:37 -08:00
Derek Collison
45859e6476 Make sure preferred peer for stepdown is healthy.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-23 13:06:13 -08:00
Neil Twigg
68961ffedd Refactor ipQueue to use generics, reduce allocations 2023-02-21 14:50:09 +00:00
Derek Collison
e028b7230a Need to compact wal on snapshot to pindex+1
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-20 14:37:37 -08:00
Derek Collison
9c02be2409 Various fixes for snapshots.
Due to bug, in rare circumstances could write an empty snapshot for aplied == 0. This would cause a spinning at the raft layer.

1. Allow Truncate() to also properly do a reset of the store when terms were only mismatch.
2. During testing fixed memstore truncate and also made sure per subject info was also cleaned up.
3. Then added fix to detect a bad snapshot on initialization and remove.
4. Do not allow snapshots for applied == 0.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-04 13:46:06 -08:00
Derek Collison
e9a983c802 Do not let !NeedSnapshot() avoid snapshots and compaction.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-01 22:05:25 -07:00
Derek Collison
6058056e3b Minor fixes and optimizations for snapshots.
We were snappshotting more then needed, so double check that we should be doing this at the stream and consumer level.
At the raft level, we should have always been compacting the WAL to last+1, so made that consistent. Also fixed bug that would not skip last if more items behind the snapshot.

Signed-off-by: Derek Collison <derek@nats.io>
2023-01-30 17:54:18 -08:00
Derek Collison
bf49f23bb1 Only hold on to so many pending in memory, will fetch from WAL
Signed-off-by: Derek Collison <derek@nats.io>
2023-01-28 11:34:55 -08:00