Commit Graph

197 Commits

Author SHA1 Message Date
Derek Collison
b806a8e7e7 Do not opt-out of normal processing for leadership transfers, but make sure they are only processed if explicitly new
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 14:46:55 -07:00
Derek Collison
58ca525b3b Process replicated ack regardless of store update. Delay but still stepdown
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:16 -07:00
Derek Collison
874b2b2e02 Hold the lock while checking health since we could update catchup state.
Do not stepdown right away when executing leadership transfer, wait for the commit.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:08 -07:00
Derek Collison
4646f4af5d Do not allow any JetStream leaders to be placed on a lameduck server
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 20:15:41 -07:00
Derek Collison
e274693490 On bad or corrupt message load during commit, reset WAL vs mark write error
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 14:07:14 -07:00
Derek Collison
35d1a7747a Snapshots of no length can hold state as well
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:04 -07:00
Derek Collison
182bf6cbae Bug fixes and general stability improvements.
1. If reset ignore Applied() that are greater then our commit.
2. Improved StepDown() by placing at back of queue if preferred.
3. Improved handling of leadership transfer during StepDown().
4. Do not store EntryLeaderTransfer records on disk.
5. Remove un-needed processing of older terms.
6. If append entry has higher term, also inherit pterm.
7. Only inherit a candidate's term if we decide to vote for them.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:46 -07:00
Derek Collison
ec89823e1c Only process out of resources condition from raft layer if err matches condition
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-23 08:13:22 -07:00
Derek Collison
ed9de4b0a1 Improved publisher performance under some instances of asymmetric network latency clusters on interest based streams.
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.

In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-20 20:53:45 -07:00
Derek Collison
0c1301ec14 Fix for data race
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-19 10:52:52 -07:00
Derek Collison
531fadd3e2 Don't warn if error is node closed.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-15 16:45:33 -07:00
Derek Collison
2beca1a2a6 Partial cache errors are also not critical write errors
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 22:52:02 -08:00
Derek Collison
c586014477 General raft improvements under heavy corruption.
Do not exit candidate state in place when stepping down, would cause double vote requests.
When truncating our WAL make sure to adjust commit and applied as needed.
On a miss where the index is less than ours, if we can not find the entry reset our state.
For a vote, if last processed term is higher than ours always agree if no vote has been cast.
If terms are equal make sure the requestor's index is at least as high as ours.
If we decide not to vote for someone, and we have not voted and we are a better fit, move forward with a campaign.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 22:06:50 -08:00
Derek Collison
fa8afba68f Only warn on write errors if not closed in case they linger under pressure and blocking on dios
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-27 18:56:55 -08:00
Derek Collison
2711460b7b Prevent benign spin between competing leaders with same index but differen term.
Remove lock from route processing for updating peers progress, altready handled in trackPeer.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-27 11:21:33 -08:00
Derek Collison
4fa0ea32c3 [FIXED] If a truncate for a raft WAL failed we could spin.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-25 19:07:27 -08:00
Derek Collison
ea2bfad8ea Fixed bug where snapshot would not compact through applied. This mean a subsequent request for exactly applied would return that entry only not the full state snapshot.
Fixed bug where we would not snapshot when we should.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-23 22:19:37 -08:00
Derek Collison
45859e6476 Make sure preferred peer for stepdown is healthy.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-23 13:06:13 -08:00
Neil Twigg
68961ffedd Refactor ipQueue to use generics, reduce allocations 2023-02-21 14:50:09 +00:00
Derek Collison
e028b7230a Need to compact wal on snapshot to pindex+1
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-20 14:37:37 -08:00
Derek Collison
9c02be2409 Various fixes for snapshots.
Due to bug, in rare circumstances could write an empty snapshot for aplied == 0. This would cause a spinning at the raft layer.

1. Allow Truncate() to also properly do a reset of the store when terms were only mismatch.
2. During testing fixed memstore truncate and also made sure per subject info was also cleaned up.
3. Then added fix to detect a bad snapshot on initialization and remove.
4. Do not allow snapshots for applied == 0.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-04 13:46:06 -08:00
Derek Collison
e9a983c802 Do not let !NeedSnapshot() avoid snapshots and compaction.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-01 22:05:25 -07:00
Derek Collison
6058056e3b Minor fixes and optimizations for snapshots.
We were snappshotting more then needed, so double check that we should be doing this at the stream and consumer level.
At the raft level, we should have always been compacting the WAL to last+1, so made that consistent. Also fixed bug that would not skip last if more items behind the snapshot.

Signed-off-by: Derek Collison <derek@nats.io>
2023-01-30 17:54:18 -08:00
Derek Collison
bf49f23bb1 Only hold on to so many pending in memory, will fetch from WAL
Signed-off-by: Derek Collison <derek@nats.io>
2023-01-28 11:34:55 -08:00
Neil Twigg
83932b4be6 Don't mark a clustered stream as unhealthy if making forward progress, add TestJetStreamClusterCurrentVsHealth 2023-01-26 16:57:34 +00:00
Derek Collison
ad53d455f8 When migrating leaders off a server when the leafnode is not connected, also ensure leaders can not return until reconnected.
Signed-off-by: Derek Collison <derek@nats.io>
2023-01-05 08:02:50 -08:00
Todd Beets
47c87eb71c fix and test for clustered mem store asset no-quorum if leader restarted 2022-12-14 16:16:08 -08:00
Derek Collison
894115b82b Fix for server panic when consumer state was not decoded correctly.
The bug was when a timestamp for the pending state was exactly -1 which could happen based on timing of the redlivered pending items which would set pending.Timestamp into the future potentially and the timing on the encodeConsumerState call.

Minor fixes to raft.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-06 14:16:20 -08:00
Derek Collison
3ac6052b32 Updated pae threshold and reporting modulo to not spam logs as much.
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-11 16:08:58 -08:00
Derek Collison
98bf861a7a Updates to stream and consumer move logic.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 16:11:35 -07:00
Derek Collison
212adf5775 General improvements to clustered streams during server restart and KV/CAS scenarios.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-22 18:36:15 -07:00
Ivan Kozlovic
7de4497815 Install consumer snapshot on clean exit and few other fixes
- didRemove in applyMetaEntries() could be reset when processing
multiple entries
- change "no race" test names to include JetStream
- separate raft nodes leader stepdown and stop in server
shutdown process
- in InstallSnapshot, call wal.Compact() with lastIndex+1

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-16 17:05:49 -06:00
Ivan Kozlovic
3c9a7cc6e5 Move to Go 1.19, remote io/util, fix data race and a flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-05 09:55:37 -06:00
Ivan Kozlovic
37c923c28e Downgrade a RAFT warning to debug
This is related to PR #3307.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-02 18:06:39 -06:00
Derek Collison
5e98263de8 General stability improvements
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-29 16:02:31 -07:00
Derek Collison
27d87a68a4 Improvements to raft layer with snapshots on catchup.
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-29 09:01:03 -07:00
Matthias Hanel
04ffed48b0 fix peer tracking by removing peers before scaledown (#3289)
in doRemovePeerAsLeader the leader also records the removed peer in the removed set

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-26 22:01:03 +02:00
Ivan Kozlovic
1a6c5f1c90 [FIXED] JetStream: Some scaling up issues
- Send snapshot only if leader
- When processing snapshot, start with a smaller inactivity interval
  that will double up to 10sec or use 10sec directly once we get a
  message. Reason for that is that it is possible that the request
  for snapshot is sent while the leader has not yet setup the subscription
  that receives the requests (or subscription has not fully reached the
  cluster).
- Don't remember snapfile on err.
- Do not consider current if we have not had any activity.
- Stabilize stream scale up under active heavy publishing.
- Due to the publish pressure move the check for followers direct subs spinning up til after we stop publishing.

Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-07-25 18:44:18 -06:00
Matthias Hanel
51b6d5233f Fix raft issue where pindex of follower was off by 1 (#3277)
introduced by 57395bba02

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-21 00:51:26 +02:00
Derek Collison
e1c8f9fb55 This improves when a server is under load or low on resources like FDs and a user is trying to delete a stream with lots of consumers.
Signed-off-by: Derek Collison <derek@nats.io>
2022-06-04 16:49:17 -07:00
Derek Collison
ef3eea4d73 Speed up raft for tests
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-18 16:28:58 -07:00
Derek Collison
ccd2290355 With use cases bringing us more data I wanted to suggest these changes.
With inlining election timeout updates we double the lock contention and most likely introduced head of line issues for routes under heavy load.
Also slowing down heartbeats with so many assets being deployed in our user ecosystem, also moved the normal follower to candidate timing further out, similar to the lost quorum.
Note that the happy path transfer will still be very quick.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-15 09:55:22 -07:00
Ivan Kozlovic
2ce1dc1561 [FIXED] JetStream: possible lockup due to a return prior to unlock
This would happen in situation where a node receives an append
entry with a term higher than the node's (current leader).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-10 17:11:57 -06:00
Derek Collison
6f54b032d6 Raft and cluster improvements.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-03 15:20:46 -07:00
Ivan Kozlovic
2659b30113 [IMPROVED] JetStream: add file names for invalid checksums
On restart, we report when we find error in checksums, but we
did not report the name of the file.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-18 13:35:08 -06:00
Derek Collison
4aaea8e4c4 Improvements to move semantics.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-16 07:55:05 -07:00
Derek Collison
2a8b123706 Don't quickly declare lost quorum after scale up
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-15 13:28:34 -07:00
Ivan Kozlovic
4e7c72ab33 Update based on code review
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-14 11:00:33 -06:00
Ivan Kozlovic
bd61d51a1c [IMPROVED] JetStream: reduce unnecessary leader election
- Wait of some sort of routing to be in place before starting
the raft run loop
- Remove use of lock in apiDispatch that was not necessary but
could have cause a route to block, causing memory growth, etc..

Unrelated rename of some tests so that they start with TestJetStream
and TestJetStreamCluster for cluster tests, fixed some flappers
and ensure that tests that change RAFT timeouts put them back
to default values on exit.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-14 10:47:14 -06:00
Derek Collison
9748925f13 Improvements to stream and consumer move.
During elected stepdown and transfer allow the new leader to take over before we stepdown.
We could receive a leader change, so make sure to also check migration state.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-14 07:27:29 -07:00