nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-17 03:24:40 -07:00

Author	SHA1	Message	Date
Derek Collison	546dd0c9ab	Make sure we can recover an underlying node being stopped. Do not return healthy if the node is closed, and wait a bit longer for forward progress. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-29 07:42:23 -07:00
Derek Collison	3c964a12d7	Migration could be delayed due to transferring leadership while the new leader was still paused. Also check quicker but slow down if the state we need to have is not there yet. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-25 18:58:49 -07:00
Waldemar Quevedo	d9cc8b0363	fix formatting of raft debug log Signed-off-by: Waldemar Quevedo <wally@nats.io>	2023-04-22 07:07:08 +02:00
Derek Collison	a5f5603645	Reset our WAL on edge conditions instead of trying to recover. Also if we are timing out and trying to become a candidate but are doing a catchup check if we are stalled. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-15 12:23:44 -07:00
Derek Collison	66ca46e145	If we see another leader with same term we should step down Signed-off-by: Derek Collison <derek@nats.io>	2023-04-14 16:21:40 -07:00
Waldemar Quevedo	a4833d0889	Fix raft log debug reloading Signed-off-by: Waldemar Quevedo <wally@nats.io>	2023-04-13 14:57:04 -07:00
Derek Collison	808a2e8c90	On failure to send snapshot to follower, also reset, and on reset make sure to reset term Signed-off-by: Derek Collison <derek@nats.io>	2023-04-12 11:48:22 -07:00
Derek Collison	a92bb9fe61	Fix bad unlock which could cause crash Signed-off-by: Derek Collison <derek@nats.io>	2023-04-12 11:48:22 -07:00
Derek Collison	340fcc90bc	Basic raft tests Signed-off-by: Derek Collison <derek@nats.io>	2023-04-12 11:48:22 -07:00
Derek Collison	80a57a3d51	Remove peers from string intern map Signed-off-by: Derek Collison <derek@nats.io>	2023-04-09 08:01:36 -07:00
Derek Collison	6fa55540a7	Better us of entryPool Signed-off-by: Derek Collison <derek@nats.io>	2023-04-09 07:48:31 -07:00
Derek Collison	35bb7c1737	Pool CommittedEntries as well with a ReturnToPool() that will also recycle the Entry. Needs to integrate with upper layers Signed-off-by: Derek Collison <derek@nats.io>	2023-04-08 11:34:10 -07:00
Derek Collison	3be25fdedb	Do not put an appendEntryResponse back in the pool if catching up until complete Signed-off-by: Derek Collison <derek@nats.io>	2023-04-07 10:30:06 -07:00
Derek Collison	2ff6f18ccd	Use sync.Map for peers vs internal storage for appendEntryResponses Signed-off-by: Derek Collison <derek@nats.io>	2023-04-07 08:16:42 -07:00
Derek Collison	1caa56a34f	Use pools for appendEntries Signed-off-by: Derek Collison <derek@nats.io>	2023-04-07 07:38:19 -07:00
Derek Collison	3afdb99f75	Use pools for appendEntryResponses. Also use interior space for peer name from the wire Signed-off-by: Derek Collison <derek@nats.io>	2023-04-07 06:43:51 -07:00
Derek Collison	e76b0b9b96	Move check for out of resources which would want a read lock out of inline processing Signed-off-by: Derek Collison <derek@nats.io>	2023-04-05 20:28:19 -07:00
Derek Collison	b806a8e7e7	Do not opt-out of normal processing for leadership transfers, but make sure they are only processed if explicitly new Signed-off-by: Derek Collison <derek@nats.io>	2023-04-03 14:46:55 -07:00
Derek Collison	58ca525b3b	Process replicated ack regardless of store update. Delay but still stepdown Signed-off-by: Derek Collison <derek@nats.io>	2023-04-02 03:53:16 -07:00
Derek Collison	874b2b2e02	Hold the lock while checking health since we could update catchup state. Do not stepdown right away when executing leadership transfer, wait for the commit. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-02 03:53:08 -07:00
Derek Collison	4646f4af5d	Do not allow any JetStream leaders to be placed on a lameduck server Signed-off-by: Derek Collison <derek@nats.io>	2023-03-29 20:15:41 -07:00
Derek Collison	e274693490	On bad or corrupt message load during commit, reset WAL vs mark write error Signed-off-by: Derek Collison <derek@nats.io>	2023-03-29 14:07:14 -07:00
Derek Collison	35d1a7747a	Snapshots of no length can hold state as well Signed-off-by: Derek Collison <derek@nats.io>	2023-03-29 12:44:04 -07:00
Derek Collison	182bf6cbae	Bug fixes and general stability improvements. 1. If reset ignore Applied() that are greater then our commit. 2. Improved StepDown() by placing at back of queue if preferred. 3. Improved handling of leadership transfer during StepDown(). 4. Do not store EntryLeaderTransfer records on disk. 5. Remove un-needed processing of older terms. 6. If append entry has higher term, also inherit pterm. 7. Only inherit a candidate's term if we decide to vote for them. Signed-off-by: Derek Collison <derek@nats.io>	2023-03-29 12:43:46 -07:00
Derek Collison	ec89823e1c	Only process out of resources condition from raft layer if err matches condition Signed-off-by: Derek Collison <derek@nats.io>	2023-03-23 08:13:22 -07:00
Derek Collison	ed9de4b0a1	Improved publisher performance under some instances of asymmetric network latency clusters on interest based streams. Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself. In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node. Signed-off-by: Derek Collison <derek@nats.io>	2023-03-20 20:53:45 -07:00
Derek Collison	0c1301ec14	Fix for data race Signed-off-by: Derek Collison <derek@nats.io>	2023-03-19 10:52:52 -07:00
Derek Collison	531fadd3e2	Don't warn if error is node closed. Signed-off-by: Derek Collison <derek@nats.io>	2023-03-15 16:45:33 -07:00
Derek Collison	2beca1a2a6	Partial cache errors are also not critical write errors Signed-off-by: Derek Collison <derek@nats.io>	2023-03-01 22:52:02 -08:00
Derek Collison	c586014477	General raft improvements under heavy corruption. Do not exit candidate state in place when stepping down, would cause double vote requests. When truncating our WAL make sure to adjust commit and applied as needed. On a miss where the index is less than ours, if we can not find the entry reset our state. For a vote, if last processed term is higher than ours always agree if no vote has been cast. If terms are equal make sure the requestor's index is at least as high as ours. If we decide not to vote for someone, and we have not voted and we are a better fit, move forward with a campaign. Signed-off-by: Derek Collison <derek@nats.io>	2023-03-01 22:06:50 -08:00
Derek Collison	fa8afba68f	Only warn on write errors if not closed in case they linger under pressure and blocking on dios Signed-off-by: Derek Collison <derek@nats.io>	2023-02-27 18:56:55 -08:00
Derek Collison	2711460b7b	Prevent benign spin between competing leaders with same index but differen term. Remove lock from route processing for updating peers progress, altready handled in trackPeer. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-27 11:21:33 -08:00
Derek Collison	4fa0ea32c3	[FIXED] If a truncate for a raft WAL failed we could spin. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-25 19:07:27 -08:00
Derek Collison	ea2bfad8ea	Fixed bug where snapshot would not compact through applied. This mean a subsequent request for exactly applied would return that entry only not the full state snapshot. Fixed bug where we would not snapshot when we should. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-23 22:19:37 -08:00
Derek Collison	45859e6476	Make sure preferred peer for stepdown is healthy. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-23 13:06:13 -08:00
Neil Twigg	68961ffedd	Refactor `ipQueue` to use generics, reduce allocations	2023-02-21 14:50:09 +00:00
Derek Collison	e028b7230a	Need to compact wal on snapshot to pindex+1 Signed-off-by: Derek Collison <derek@nats.io>	2023-02-20 14:37:37 -08:00
Derek Collison	9c02be2409	Various fixes for snapshots. Due to bug, in rare circumstances could write an empty snapshot for aplied == 0. This would cause a spinning at the raft layer. 1. Allow Truncate() to also properly do a reset of the store when terms were only mismatch. 2. During testing fixed memstore truncate and also made sure per subject info was also cleaned up. 3. Then added fix to detect a bad snapshot on initialization and remove. 4. Do not allow snapshots for applied == 0. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-04 13:46:06 -08:00
Derek Collison	e9a983c802	Do not let !NeedSnapshot() avoid snapshots and compaction. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-01 22:05:25 -07:00
Derek Collison	6058056e3b	Minor fixes and optimizations for snapshots. We were snappshotting more then needed, so double check that we should be doing this at the stream and consumer level. At the raft level, we should have always been compacting the WAL to last+1, so made that consistent. Also fixed bug that would not skip last if more items behind the snapshot. Signed-off-by: Derek Collison <derek@nats.io>	2023-01-30 17:54:18 -08:00
Derek Collison	bf49f23bb1	Only hold on to so many pending in memory, will fetch from WAL Signed-off-by: Derek Collison <derek@nats.io>	2023-01-28 11:34:55 -08:00
Neil Twigg	83932b4be6	Don't mark a clustered stream as unhealthy if making forward progress, add `TestJetStreamClusterCurrentVsHealth`	2023-01-26 16:57:34 +00:00
Derek Collison	ad53d455f8	When migrating leaders off a server when the leafnode is not connected, also ensure leaders can not return until reconnected. Signed-off-by: Derek Collison <derek@nats.io>	2023-01-05 08:02:50 -08:00
Todd Beets	47c87eb71c	fix and test for clustered mem store asset no-quorum if leader restarted	2022-12-14 16:16:08 -08:00
Derek Collison	894115b82b	Fix for server panic when consumer state was not decoded correctly. The bug was when a timestamp for the pending state was exactly -1 which could happen based on timing of the redlivered pending items which would set pending.Timestamp into the future potentially and the timing on the encodeConsumerState call. Minor fixes to raft. Signed-off-by: Derek Collison <derek@nats.io>	2022-12-06 14:16:20 -08:00
Derek Collison	3ac6052b32	Updated pae threshold and reporting modulo to not spam logs as much. Signed-off-by: Derek Collison <derek@nats.io>	2022-11-11 16:08:58 -08:00
Derek Collison	98bf861a7a	Updates to stream and consumer move logic. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-30 16:11:35 -07:00
Derek Collison	212adf5775	General improvements to clustered streams during server restart and KV/CAS scenarios. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-22 18:36:15 -07:00
Ivan Kozlovic	7de4497815	Install consumer snapshot on clean exit and few other fixes - didRemove in applyMetaEntries() could be reset when processing multiple entries - change "no race" test names to include JetStream - separate raft nodes leader stepdown and stop in server shutdown process - in InstallSnapshot, call wal.Compact() with lastIndex+1 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-16 17:05:49 -06:00
Ivan Kozlovic	3c9a7cc6e5	Move to Go 1.19, remote io/util, fix data race and a flapper Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-05 09:55:37 -06:00

1 2 3 4 5

214 Commits