Commit Graph

4590 Commits

Author SHA1 Message Date
Derek Collison
ddfa5cdfec Additional protection for bad state when rebuilding a message block
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:11 -07:00
Derek Collison
a9a4df859f Fix for flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:08 -07:00
Derek Collison
35d1a7747a Snapshots of no length can hold state as well
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:04 -07:00
Derek Collison
c4da37ecc7 Make sure consumer is valid and state was returned
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:01 -07:00
Derek Collison
e97ddcd14f Tweak tests due to changes, make test timeouts uniform.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:59 -07:00
Derek Collison
52fbac644c Since we no longer store leaderTransfers, which is proper, some tests were getting and advantage on that after server restart.
This change speeds up raft layer more to avoid timeouts.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:57 -07:00
Derek Collison
0d9f707b4b Additional tests to stress interest based streams with pull subscribers during rolling restarts.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:55 -07:00
Derek Collison
71af150448 General improvements to interest based stream processing when acks arrive before the actual msgs.
1. If we are retention based, make sure our consumers are running before entering into monitorStream logic.
2. If we skip messages and are interest based, make sure we check for a preAck state.
3. On finalization of recovery for consumers have them check against the interest based stream.
4. Do not process ack state updates if consumer is closed and shutting down.
5. When processing final state for a stream after upper layer catchup, check all attached consumers for ack skew.
6. During catchup of stream messages consult preAck state and skip messages as needed.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:53 -07:00
Derek Collison
5cabc365df General improvements around handling interest retention.
1. During ackMsg processing hold write lock to block concurrent access.
2. Check for presence of preAcks before and force removal if present.
3. Rework check for orphan msgs on startup to use checkStateForInterestStream().

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:51 -07:00
Derek Collison
e516c47a4b Improvements to consumers attached to an interest retention stream.
1. Do not process an ack if we are closed.
2. When checking for needing an ack for a given consumer, hold lock entire time.
3. During recovery and restarts we check if we need to replay acks to the parent stream.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:49 -07:00
Derek Collison
182bf6cbae Bug fixes and general stability improvements.
1. If reset ignore Applied() that are greater then our commit.
2. Improved StepDown() by placing at back of queue if preferred.
3. Improved handling of leadership transfer during StepDown().
4. Do not store EntryLeaderTransfer records on disk.
5. Remove un-needed processing of older terms.
6. If append entry has higher term, also inherit pterm.
7. Only inherit a candidate's term if we decide to vote for them.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:46 -07:00
Derek Collison
6d4304146f Bug fixes and general stability improvements.
1. Fixed a bug that would process a removal of a message after the message block was closed.
2. Improved removal of non-existant message when we know the store is empty.
3. Improved last write index size tracking when opening the file descriptor after being closed.
4. Improved Compact() by not loading messages for last block twice.
5. Improved Compact() determination of calling purge by determing last sequence under write lock.
6. Improved Compact() by only compacting underlying message block if over certain size threshold.
7. Improved Compact() by writing the index file if needed while still holding lock avoiding an unecessary re-lock.
8. Improved Compact() by not calling out to upper layers on no messages being purged.
9. Fixed a bug in Compact() that would not delete members from a block's delete map.
10. Fixed a bug in reset() when a callback was not registered (raft logs) which avoiding msg block cleanup.
11. Improved consumer store Update() call for when to avoid an outdated update.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:42 -07:00
Derek Collison
ec89823e1c Only process out of resources condition from raft layer if err matches condition
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-23 08:13:22 -07:00
Derek Collison
9ccd7abdf8 Test for preAcks
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-21 12:08:24 -07:00
Derek Collison
ed9de4b0a1 Improved publisher performance under some instances of asymmetric network latency clusters on interest based streams.
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.

In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-20 20:53:45 -07:00
Derek Collison
0c1301ec14 Fix for data race
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-19 10:52:52 -07:00
Derek Collison
5a16f98427 Fixed an off by one bug that under certain circumstances could cause large consumer replica states.
This could lead to instability in the system.

The bug would manifest in replicated consumers when certain messages could be acked out of order, and, the pending list would never go to zero.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-19 10:41:59 -07:00
Derek Collison
027f2e42c8 Remove snapshot of cores and maxprocs
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-17 15:09:50 -07:00
Derek Collison
f0e1585490 Fix flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-17 13:14:43 -07:00
Neil Twigg
4647e14b3e Don't recycle buffer more than once 2023-03-17 09:25:17 +00:00
Neil Twigg
9f99efad03 Use pooled buffer for flushing encrypted message blocks 2023-03-16 17:43:09 +00:00
Derek Collison
5bb6f167b9 Make sure to cleanup messages on a follower consumer for an interest based stream when the consumer leader sends a state snapshot.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-15 20:11:16 -07:00
Derek Collison
8dbfbbe577 Fix test
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-15 17:23:51 -07:00
Derek Collison
531fadd3e2 Don't warn if error is node closed.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-15 16:45:33 -07:00
Neil
c0784bc363 Merge pull request #3952 from nats-io/neil/fssdirty
Only mark fss dirty if a change is made
2023-03-15 09:25:11 +00:00
Waldemar Quevedo
da7a8b63bc Reword ocsp routes/gateways terminology to 'peers' instead
Add test for verify_and_map usage with ocsp

Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-03-14 17:01:42 -07:00
Waldemar Quevedo
f8914788f5 Fix leaf client connection failing in ocsp setup
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-03-14 14:07:18 -07:00
Derek Collison
5a1878b015 Fix for workqueue stream scaling up and not removing acked messages.
Make sure when scaling up streams that are workqueue or interest policy that consumers scale as well.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-13 17:13:49 -07:00
Neil Twigg
7105df5afc Don't use string builder in subjString (it is slow) 2023-03-13 11:56:05 +00:00
Neil Twigg
1ead6df6f1 Only mark fss dirty if a change is made 2023-03-10 12:53:29 +00:00
Derek Collison
062dec7f5e Added in error warning if stream or consumer delete fails.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-07 19:26:29 -05:00
Derek Collison
e0cbe503ed Do not hold jetstream lock cleaning up orphans.
Could optionally deadlock.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-07 06:42:53 -05:00
Tomasz Pietrek
df282a221c Fix Pull Consumer not sending request timeout
Server did check for timeouts in `processWaiting`,
but that needs to be also checked in `nextWaiting` in case of
tight timings, as `nextWaiting` can remove Pull Request based on
timeouts too.

Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-03-03 14:49:04 +01:00
Waldemar Quevedo
8f1ca99fb7 Fix flaky test TestMonitorJsz/raftgroups
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-03-02 10:41:30 -08:00
Byron Ruth
ee4f1f85ba Bump 2.9.16-beta
Signed-off-by: Byron Ruth <byron@nats.io>
2023-03-02 12:14:58 -05:00
Byron Ruth
92b93af06a Release v2.9.15
Signed-off-by: Byron Ruth <byron@nats.io>
2023-03-02 11:56:38 -05:00
Derek Collison
c873cb38c0 Bump to 2.9.15-RC.4
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 23:20:26 -08:00
Derek Collison
2beca1a2a6 Partial cache errors are also not critical write errors
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 22:52:02 -08:00
Derek Collison
c07087c99d Do metasnapshots optionally on quit and leader change, do not force.
Do not require force snapshots for all consumer deletes.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 22:46:29 -08:00
Derek Collison
f358bf2687 General improvements to the JetStream clustering layer during meta corruption.
We now check for orphaned streams or consumers in clustered mode after our metastate has recovered.
Do not warn on failures for installing raft snapshots if this is due to the node being closed.
During a stream update make sure to check to see if our group assignment has changed out from underneath of us.
Stream info should always delay if we are not the leader. Could cause duplicate responses when it should not.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 22:08:13 -08:00
Derek Collison
c586014477 General raft improvements under heavy corruption.
Do not exit candidate state in place when stepping down, would cause double vote requests.
When truncating our WAL make sure to adjust commit and applied as needed.
On a miss where the index is less than ours, if we can not find the entry reset our state.
For a vote, if last processed term is higher than ours always agree if no vote has been cast.
If terms are equal make sure the requestor's index is at least as high as ours.
If we decide not to vote for someone, and we have not voted and we are a better fit, move forward with a campaign.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 22:06:50 -08:00
Derek Collison
deddf8f094 Fix since we have two streams and order in slice not guaranteed
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 07:58:01 -08:00
Derek Collison
ebe08040e9 Attempt to fix flapper again
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 06:24:51 -08:00
Derek Collison
baca7bd751 Fix for test flapper
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 04:58:01 -08:00
Derek Collison
9f1580686a Revert behavior for JetStream published directly from client to be handled inline.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 04:35:52 -08:00
Derek Collison
8f7a88103b Merge pull request #3926 from nats-io/fix-3924
[FIXED] Fix for MQTT Spec 4.7.2-1 violation
2023-02-28 21:58:24 -08:00
Derek Collison
d9933b1f7a Fix for MQTT Spec 4.7.2-1
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 20:43:46 -08:00
Jeremy Saenz
26f241cb62 Updated LEAFZ names to use remoteServer name/id and added is_spoke 2023-02-28 18:09:24 -08:00
Derek Collison
95ed471866 Bump to 2.9.15-RC.3
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 14:56:56 -08:00
Derek Collison
321afe6aee Merge pull request #3923 from nats-io/JMS-LeafZNames
Update LEAFZ to include leafnode server/connection name
2023-02-28 14:54:43 -08:00