Commit Graph

4625 Commits

Author SHA1 Message Date
Derek Collison
ff3f102cdd Fix for datarace in healthcheck
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 16:30:13 -07:00
Derek Collison
d5ac4d283a Fix for flapping test, can return invalid sequence as well
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 16:18:23 -07:00
Derek Collison
64b22011dc Better use of LoadAndStore based on review feedback
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 14:50:22 -07:00
Derek Collison
1fb1efd748 Make sure to remove any inflight entries when done
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 14:41:49 -07:00
Derek Collison
e6447c982a Protect against concurrent creation of streams and consumers.
Also make sure we have exited monotoring routines when doing resets for both streams and consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 14:29:52 -07:00
Derek Collison
f3cab83ccf Bump to 2.9.16-RC.3
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 04:24:21 -07:00
Derek Collison
58ca525b3b Process replicated ack regardless of store update. Delay but still stepdown
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:16 -07:00
Derek Collison
a8bd2793d5 Fix concurrent map bug on preAcks.
Use monitor check for streams like consumers.
Make sure to stop raft layer if exiting monitorConsumer early.
Allow consumers to force a snapshot on leadership change.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:11 -07:00
Derek Collison
874b2b2e02 Hold the lock while checking health since we could update catchup state.
Do not stepdown right away when executing leadership transfer, wait for the commit.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:08 -07:00
Derek Collison
b5358fa4b3 Wait for shutdown and sleep to let state build up
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:05 -07:00
Derek Collison
b752b8b30d Snapshot on clean shutdown if needed or interest based retention
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:03 -07:00
Derek Collison
e54019f87f All should be lowercase
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:01 -07:00
Derek Collison
872a9e7927 Add in monitor status similar to consumer
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:52:59 -07:00
Derek Collison
df4982948c Gate remove calls, disqualify delivered and ack updates quicker
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:52:56 -07:00
Derek Collison
4b8229ee42 Do not hold js lock for health check, use healthy not current for meta
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:52:54 -07:00
Derek Collison
e2839e9ec1 Fix for flapper
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:52:43 -07:00
Derek Collison
ad5bb366a0 Updates to preacks when multiple consumers are present but mutually exlusive (filtered).
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-31 10:43:28 -07:00
Derek Collison
c194047caf Bump to 2.9.16-RC.2
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 21:23:51 -07:00
Derek Collison
5e85889790 [IMPROVED] Improvements to preAcks. (#4006)
Better handling of multiple consumers so as to not delete messages too
early.
Better cleanup handling.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 21:08:34 -07:00
Derek Collison
8c0a45edf9 Make sure to lock on clearing if not removing.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 20:42:28 -07:00
Derek Collison
937ef0d2a6 Improvements to preAcks.
Better handling of multiple consumers so as to not delete too early.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 20:29:15 -07:00
Ivan Kozlovic
a4df4f8727 Fixed some tests
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-03-30 15:02:59 -06:00
Derek Collison
4646f4af5d Do not allow any JetStream leaders to be placed on a lameduck server
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 20:15:41 -07:00
Derek Collison
873ab0f6b9 Fix for flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 18:55:41 -07:00
Derek Collison
fbc90adf93 Bump to 2.9.16-RC.1
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 17:21:57 -07:00
Derek Collison
02702e4620 [IMPROVEMENT] General stability and bug fixes. (#3999)
This PR has general improvements and fixes to filestore, raft, and the
clustering layer.

Summary

1. Additional support for preAck handling for interest based streams
when replicated acks arrive before the message itself.
2. Better handling when checking state to determine whether to remove an
interest based message.
3. Improved StepDown() and leadership transfer handling after restarts.
4. Improved voting logic for high load systems.
5. Various improvements and fixes for filestore Compact(), which is used
heavily in the raft layer when updating snapshots and the raft wal.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 17:09:44 -07:00
Derek Collison
c546828359 Moved log running test to NoRace suite
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 16:56:04 -07:00
Derek Collison
ade0e9d295 Snapshot meta for this function to use in case it gets removed out from underneath of us.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 16:51:17 -07:00
Derek Collison
9a714e7d7d Update based on review feedback
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 15:47:54 -07:00
Derek Collison
152b25c314 Update server/stream.go
Pre-allocate

Co-authored-by: Neil <neil@nats.io>
2023-03-29 15:29:51 -07:00
Derek Collison
c77872b519 Update server/jetstream_cluster.go
Pre-allocate

Co-authored-by: Neil <neil@nats.io>
2023-03-29 15:29:38 -07:00
Derek Collison
2b89fea9b0 Double check here if the jetstream cluster was shutdown when we released the lock
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 14:46:49 -07:00
Derek Collison
e274693490 On bad or corrupt message load during commit, reset WAL vs mark write error
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 14:07:14 -07:00
Derek Collison
6c3e64b83b Always make sure cluster and meta raft node available when needed
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 13:56:04 -07:00
Derek Collison
ddfa5cdfec Additional protection for bad state when rebuilding a message block
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:11 -07:00
Derek Collison
a9a4df859f Fix for flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:08 -07:00
Derek Collison
35d1a7747a Snapshots of no length can hold state as well
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:04 -07:00
Derek Collison
c4da37ecc7 Make sure consumer is valid and state was returned
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:01 -07:00
Derek Collison
e97ddcd14f Tweak tests due to changes, make test timeouts uniform.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:59 -07:00
Derek Collison
52fbac644c Since we no longer store leaderTransfers, which is proper, some tests were getting and advantage on that after server restart.
This change speeds up raft layer more to avoid timeouts.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:57 -07:00
Derek Collison
0d9f707b4b Additional tests to stress interest based streams with pull subscribers during rolling restarts.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:55 -07:00
Derek Collison
71af150448 General improvements to interest based stream processing when acks arrive before the actual msgs.
1. If we are retention based, make sure our consumers are running before entering into monitorStream logic.
2. If we skip messages and are interest based, make sure we check for a preAck state.
3. On finalization of recovery for consumers have them check against the interest based stream.
4. Do not process ack state updates if consumer is closed and shutting down.
5. When processing final state for a stream after upper layer catchup, check all attached consumers for ack skew.
6. During catchup of stream messages consult preAck state and skip messages as needed.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:53 -07:00
Derek Collison
5cabc365df General improvements around handling interest retention.
1. During ackMsg processing hold write lock to block concurrent access.
2. Check for presence of preAcks before and force removal if present.
3. Rework check for orphan msgs on startup to use checkStateForInterestStream().

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:51 -07:00
Derek Collison
e516c47a4b Improvements to consumers attached to an interest retention stream.
1. Do not process an ack if we are closed.
2. When checking for needing an ack for a given consumer, hold lock entire time.
3. During recovery and restarts we check if we need to replay acks to the parent stream.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:49 -07:00
Derek Collison
182bf6cbae Bug fixes and general stability improvements.
1. If reset ignore Applied() that are greater then our commit.
2. Improved StepDown() by placing at back of queue if preferred.
3. Improved handling of leadership transfer during StepDown().
4. Do not store EntryLeaderTransfer records on disk.
5. Remove un-needed processing of older terms.
6. If append entry has higher term, also inherit pterm.
7. Only inherit a candidate's term if we decide to vote for them.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:46 -07:00
Derek Collison
6d4304146f Bug fixes and general stability improvements.
1. Fixed a bug that would process a removal of a message after the message block was closed.
2. Improved removal of non-existant message when we know the store is empty.
3. Improved last write index size tracking when opening the file descriptor after being closed.
4. Improved Compact() by not loading messages for last block twice.
5. Improved Compact() determination of calling purge by determing last sequence under write lock.
6. Improved Compact() by only compacting underlying message block if over certain size threshold.
7. Improved Compact() by writing the index file if needed while still holding lock avoiding an unecessary re-lock.
8. Improved Compact() by not calling out to upper layers on no messages being purged.
9. Fixed a bug in Compact() that would not delete members from a block's delete map.
10. Fixed a bug in reset() when a callback was not registered (raft logs) which avoiding msg block cleanup.
11. Improved consumer store Update() call for when to avoid an outdated update.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:42 -07:00
Neil Twigg
8d5519356e Shut down RAFT groups when disabling JetStream
Signed-off-by: Neil Twigg <neil@nats.io>
2023-03-23 16:54:01 +00:00
Derek Collison
ec89823e1c Only process out of resources condition from raft layer if err matches condition
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-23 08:13:22 -07:00
Derek Collison
9ccd7abdf8 Test for preAcks
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-21 12:08:24 -07:00
Derek Collison
ed9de4b0a1 Improved publisher performance under some instances of asymmetric network latency clusters on interest based streams.
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.

In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-20 20:53:45 -07:00