Commit Graph

437 Commits

Author SHA1 Message Date
Derek Collison
2f4677d29e Delay a bit longer if we are not the actual leader, helpful for very large stream reports to avoid possible dupes
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-12 12:36:47 -07:00
Derek Collison
3b9cf1e381 Needed to do more in separate go routine to avoid deadlock
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-08 18:43:58 -07:00
Derek Collison
35bb7c1737 Pool CommittedEntries as well with a ReturnToPool() that will also recycle the Entry. Needs to integrate with upper layers
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-08 11:34:10 -07:00
Derek Collison
d02d59534f Fix data race
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-07 07:18:30 -07:00
Derek Collison
c16915bff4 For checking the health of jetstream, do not hold the lock as we traverse the streams and consumers.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-06 11:56:55 -07:00
Neil Twigg
03a5a4deaf Possibly de-race sysRequest
Signed-off-by: Neil Twigg <neil@nats.io>
2023-04-04 10:30:59 +01:00
Derek Collison
b0c3cf0dbd Only apply consumer entries if not recovering
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 17:22:50 -07:00
Derek Collison
59175c491f Fix for a datarace
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 14:46:57 -07:00
Derek Collison
9dd727034a Make sure to not stop raft layer when we detect we are already running the monitor
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 14:46:47 -07:00
Derek Collison
ff3f102cdd Fix for datarace in healthcheck
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 16:30:13 -07:00
Derek Collison
e6447c982a Protect against concurrent creation of streams and consumers.
Also make sure we have exited monotoring routines when doing resets for both streams and consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 14:29:52 -07:00
Derek Collison
58ca525b3b Process replicated ack regardless of store update. Delay but still stepdown
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:16 -07:00
Derek Collison
a8bd2793d5 Fix concurrent map bug on preAcks.
Use monitor check for streams like consumers.
Make sure to stop raft layer if exiting monitorConsumer early.
Allow consumers to force a snapshot on leadership change.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:11 -07:00
Derek Collison
ad5bb366a0 Updates to preacks when multiple consumers are present but mutually exlusive (filtered).
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-31 10:43:28 -07:00
Derek Collison
937ef0d2a6 Improvements to preAcks.
Better handling of multiple consumers so as to not delete too early.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-30 20:29:15 -07:00
Derek Collison
ade0e9d295 Snapshot meta for this function to use in case it gets removed out from underneath of us.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 16:51:17 -07:00
Derek Collison
c77872b519 Update server/jetstream_cluster.go
Pre-allocate

Co-authored-by: Neil <neil@nats.io>
2023-03-29 15:29:38 -07:00
Derek Collison
2b89fea9b0 Double check here if the jetstream cluster was shutdown when we released the lock
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 14:46:49 -07:00
Derek Collison
6c3e64b83b Always make sure cluster and meta raft node available when needed
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 13:56:04 -07:00
Derek Collison
71af150448 General improvements to interest based stream processing when acks arrive before the actual msgs.
1. If we are retention based, make sure our consumers are running before entering into monitorStream logic.
2. If we skip messages and are interest based, make sure we check for a preAck state.
3. On finalization of recovery for consumers have them check against the interest based stream.
4. Do not process ack state updates if consumer is closed and shutting down.
5. When processing final state for a stream after upper layer catchup, check all attached consumers for ack skew.
6. During catchup of stream messages consult preAck state and skip messages as needed.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:53 -07:00
Derek Collison
ed9de4b0a1 Improved publisher performance under some instances of asymmetric network latency clusters on interest based streams.
Under asymmetric network latency based clusters, if a node in an R3 was replicating a consumer and the parent stream, but was the leader of neither, but the path from the stream leader was faster then the consumer leader a replicated ack could arrive before the message itself.

In this case we used to forward a delete message request to the stream leader which would then replicate that to all stream replicas, causing more work which could lead to increased publisher times on clients connected to the slow node.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-20 20:53:45 -07:00
Derek Collison
5a16f98427 Fixed an off by one bug that under certain circumstances could cause large consumer replica states.
This could lead to instability in the system.

The bug would manifest in replicated consumers when certain messages could be acked out of order, and, the pending list would never go to zero.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-19 10:41:59 -07:00
Derek Collison
5bb6f167b9 Make sure to cleanup messages on a follower consumer for an interest based stream when the consumer leader sends a state snapshot.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-15 20:11:16 -07:00
Derek Collison
5a1878b015 Fix for workqueue stream scaling up and not removing acked messages.
Make sure when scaling up streams that are workqueue or interest policy that consumers scale as well.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-13 17:13:49 -07:00
Derek Collison
062dec7f5e Added in error warning if stream or consumer delete fails.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-07 19:26:29 -05:00
Derek Collison
e0cbe503ed Do not hold jetstream lock cleaning up orphans.
Could optionally deadlock.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-07 06:42:53 -05:00
Derek Collison
c07087c99d Do metasnapshots optionally on quit and leader change, do not force.
Do not require force snapshots for all consumer deletes.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 22:46:29 -08:00
Derek Collison
f358bf2687 General improvements to the JetStream clustering layer during meta corruption.
We now check for orphaned streams or consumers in clustered mode after our metastate has recovered.
Do not warn on failures for installing raft snapshots if this is due to the node being closed.
During a stream update make sure to check to see if our group assignment has changed out from underneath of us.
Stream info should always delay if we are not the leader. Could cause duplicate responses when it should not.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-01 22:08:13 -08:00
Derek Collison
1956fa3e23 Signal a metasnapshot for consumer deletes as well
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 14:30:23 -08:00
Derek Collison
724160ebac Fix flapping tests
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 14:30:23 -08:00
Derek Collison
68cd312870 Be more conservative on defaultMaxTotalCatchupOutBytes, default to 64M
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 13:28:09 -08:00
Derek Collison
3807441fd7 Always process inbound messages in separate execution context.
Do not duplicate work on leader, sealed and clustered state.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 11:45:31 -08:00
Derek Collison
b19fe508c4 Do not block routes/gws on internal stream and consumer info requests
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 11:17:29 -08:00
Derek Collison
bee149b458 Only need server's rlock here.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 11:17:29 -08:00
Derek Collison
aad8aa6f21 Do not need lock to grab js here
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-27 18:56:55 -08:00
Derek Collison
576d31748f Sometimes do force meta snapshot
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-27 18:56:55 -08:00
Derek Collison
9721309601 Do not allow meta snapshot processing during recovery to override.
Make sure to process all stream updates during recovery through the ru structure.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-27 18:56:55 -08:00
Derek Collison
43916290df Make minimum snapshot time for all assets 10s.
Do not lock on clustered test for JetStream, not needed.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-27 11:20:37 -08:00
Derek Collison
4fa0ea32c3 [FIXED] If a truncate for a raft WAL failed we could spin.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-25 19:07:27 -08:00
Derek Collison
ea2bfad8ea Fixed bug where snapshot would not compact through applied. This mean a subsequent request for exactly applied would return that entry only not the full state snapshot.
Fixed bug where we would not snapshot when we should.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-23 22:19:37 -08:00
Derek Collison
d347cb116a When becoming leader optionally send current snapshot to followers if caught up.
This can help sync on restarts and improve ghost ephemerals. Also added more code to suppress respnses and API audits when we know we are recovering.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-23 10:30:36 -08:00
Neil Twigg
cfea34c80c Install snapshot and compact when WAL grows, even when no state changes occur 2023-02-22 20:00:57 +00:00
Neil Twigg
68961ffedd Refactor ipQueue to use generics, reduce allocations 2023-02-21 14:50:09 +00:00
Derek Collison
3c64d07691 Warn of consumer state update failures.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-20 17:28:11 -08:00
Derek Collison
6a62ac4560 Fix for merge conflict
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-18 11:12:15 -08:00
Derek Collison
6a4c61e1a3 Merge branch 'main' into bad-consumer-delete 2023-02-18 11:09:56 -08:00
Derek Collison
01fa89a0b4 Fix for deleting consumers on restarts and non-fatal update errors.
If there was a spurious error on restart, or possibly on an update, we could delete a consumer which was the incorrect behavior.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-18 09:46:52 -08:00
Derek Collison
efa3bcc49d Parallel consumer creation could drop responses (create and info) and could also run monitorConsumer twice.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-18 05:16:05 -08:00
Derek Collison
6a2063f5b3 Revert logic
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-06 22:14:37 +04:00
Derek Collison
e9a983c802 Do not let !NeedSnapshot() avoid snapshots and compaction.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-01 22:05:25 -07:00