Commit Graph

427 Commits

Author SHA1 Message Date
Derek Collison
c679f9d7f6 Added in detail info when failing to load in a message for a consumer.
E.g. `Unexpected partial cache error looking up message for consumer '$G > TEST > dlc'`

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-01 09:06:29 -07:00
Derek Collison
3a39786972 When we fail to deliver a message for a consumer, either through didNotDeliver() or LoadMsg() failure re-adjust delivered count and waitingRequest accounting.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-01 08:48:28 -07:00
Derek Collison
48bf7ba151 When a consumer reached a max delivered condition, we did not properly synchronize the state such that on a restore or leader switch the ack pending could jump and be higher than max ack pending and block the consumer.
This propagates a delivered update and we updated the store state engine to do the right thing when the condition is reached.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-24 16:00:27 -07:00
Derek Collison
e5d208bf33 When moving streams, we could check too soon and be in a gap where the replica peer has not registered a catchup request.
This would cause us to think the replica was caughtup incorrectly and drop our leadership, which would cancel any cacthup requests.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 20:07:48 -07:00
Derek Collison
43314fd439 Fix for a bug that would allow old leaders of pull based durables to delete a consumer from an inactivity threshold.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 14:53:09 -07:00
Neil Twigg
c437157c1f Recover in consumer assignment when asset already existed
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-17 23:22:10 +01:00
Neil Twigg
c0636d117f Tweak consumer replica scaling, add unit test for orphaned consumer subjects
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-17 15:27:29 +01:00
Derek Collison
9280a552b8 Don't error to server logs if message deleted
Signed-off-by: Derek Collison <derek@nats.io>
2023-07-20 14:07:35 -07:00
Derek Collison
9eeffbcf56 Fix performance issues with checkAckFloor.
Bail early if new consumer, meaning stream sequence floor is 0.
Decide which linear space to scan.
Do no work if no pending and we just need to adjust which we do at the end.

Also realized some tests were named wrong and were not being run, or were in wrong file.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-08 18:45:03 -07:00
Derek Collison
4ac45ff6f3 When consumers were R1 and the same name was reused, server restarts could try to cleanup old ones and effect the new ones.
These changes allow consumer name reuse more effectively during server restarts.

Signed-off-by: Derek Collison <derek@nats.io>
2023-06-05 12:48:18 -07:00
Derek Collison
27bbfb7a85 Only check ack floor if we are interest policy based.
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-02 11:04:00 -07:00
Derek Collison
80db7a22ab Optimizations for large single hub account leafnode fleets.
Added a leafnode lock to allow better traversal without copying of large leafnodes in a single hub account.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-05 13:14:49 -07:00
Derek Collison
db972048ce Detect when we are shutting down or if a consumer is already closed when removing a stream.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-29 11:18:10 -07:00
Derek Collison
fac5658966 If we fail to create a consumer, make sure to clean up any raft nodes in meta layer and to shutdown the consumer if created but we encountered an error.
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-29 08:15:33 -07:00
Derek Collison
7f06d6f5a7 When Jsz() was asked for consumer details, would report incorrect data if not a consumer leader.
This is due to the way state is maintained for leaders vs followers for consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-26 15:03:15 -07:00
Derek Collison
da9a17fd68 Spelling
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-21 12:40:19 -07:00
Derek Collison
8b7c2d12aa Run a check for ack floor drift when taking over as a leader and the ack go routine is spun up.
Also periodically check. If all normal will be very cheap.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-21 11:59:35 -07:00
Tomasz Pietrek
692f384f2d Fix consumer reply subject escaping
If the Consumer had a name containing `%`, it could result in
reply subject failing to format with `fmt.Sprintf`, as the `%`
was not properly escaped with `%%`.

Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-04-12 09:22:08 +02:00
Derek Collison
c6b2a97ef4 Use entry pool
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-08 19:58:43 -07:00
Derek Collison
ebe4f8957f Spelling based on review feedback
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 21:08:59 -07:00
Derek Collison
dcbefd5cc4 We can receive these on push consumers, so error if we do
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 21:07:08 -07:00
Derek Collison
07b34f707f Make sure to never process next message requests inline
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-03 20:50:01 -07:00
Derek Collison
e6447c982a Protect against concurrent creation of streams and consumers.
Also make sure we have exited monotoring routines when doing resets for both streams and consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 14:29:52 -07:00
Derek Collison
b752b8b30d Snapshot on clean shutdown if needed or interest based retention
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-02 03:53:03 -07:00
Derek Collison
ad5bb366a0 Updates to preacks when multiple consumers are present but mutually exlusive (filtered).
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-31 10:43:28 -07:00
Derek Collison
9a714e7d7d Update based on review feedback
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 15:47:54 -07:00
Derek Collison
c4da37ecc7 Make sure consumer is valid and state was returned
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:44:01 -07:00
Derek Collison
e516c47a4b Improvements to consumers attached to an interest retention stream.
1. Do not process an ack if we are closed.
2. When checking for needing an ack for a given consumer, hold lock entire time.
3. During recovery and restarts we check if we need to replay acks to the parent stream.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-29 12:43:49 -07:00
Derek Collison
5bb6f167b9 Make sure to cleanup messages on a follower consumer for an interest based stream when the consumer leader sends a state snapshot.
Signed-off-by: Derek Collison <derek@nats.io>
2023-03-15 20:11:16 -07:00
Derek Collison
5a1878b015 Fix for workqueue stream scaling up and not removing acked messages.
Make sure when scaling up streams that are workqueue or interest policy that consumers scale as well.

Signed-off-by: Derek Collison <derek@nats.io>
2023-03-13 17:13:49 -07:00
Tomasz Pietrek
df282a221c Fix Pull Consumer not sending request timeout
Server did check for timeouts in `processWaiting`,
but that needs to be also checked in `nextWaiting` in case of
tight timings, as `nextWaiting` can remove Pull Request based on
timeouts too.

Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-03-03 14:49:04 +01:00
Derek Collison
724160ebac Fix flapping tests
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 14:30:23 -08:00
Derek Collison
24cb570646 Do not lock on stream name for consumer write state error
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 13:24:18 -08:00
Derek Collison
d85bec2007 Do not block in place on warning, and only warn if consumer not closed
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 11:45:31 -08:00
Derek Collison
b19fe508c4 Do not block routes/gws on internal stream and consumer info requests
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-28 11:17:29 -08:00
Derek Collison
2642a8c03d Optimize locking for when under heavy loads.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-27 18:56:55 -08:00
Derek Collison
13167f46b9 Optimize some locking for when under heavy loads.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-27 18:56:55 -08:00
Derek Collison
daacbf5580 Added optimized store NumPending() call.
Optimized and fixed a bug in filestore filteredPending().
Optimized memstore FilteredState().

Added comprehensive tests for NumPending() and FilteredState().

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-25 17:26:26 -08:00
Neil Twigg
68961ffedd Refactor ipQueue to use generics, reduce allocations 2023-02-21 14:50:09 +00:00
Derek Collison
3c64d07691 Warn of consumer state update failures.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-20 17:28:11 -08:00
Derek Collison
d2179e0939 Make sure to also cleanup pending if below our stream ack floor
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-20 11:56:37 -08:00
Derek Collison
b6149c51f0 Make sure to clean up redelivered state on purge.
Make sure to update ack floors on messages being expired out from underneath of us.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-20 11:16:12 -08:00
Derek Collison
6c9a9fb45e Fixed bug that would lose ack pending state during partial stream purge.
General code cleanup to be more correct.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-19 14:21:53 -08:00
Derek Collison
06fd81d096 Fixed a bug where a named consumer under interest policy was spinning up inactive threshold timers in all replicas not just the leader.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-19 06:08:43 -08:00
Derek Collison
efa3bcc49d Parallel consumer creation could drop responses (create and info) and could also run monitorConsumer twice.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-18 05:16:05 -08:00
Derek Collison
11b0f214d0 Do not re-calculate NumPending on consumer info calls.
We noticed this was being called alot in user environments.
When the consumer was filtered with a wilcard and the stream had a high cardinality of subjects and was falling behind this could take a substantial amount of time.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-16 16:30:14 -08:00
Derek Collison
b611e37e95 For updating a consumer filter subject make sure locking ordere correct and that our sublist is present.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-06 21:34:48 +04:00
Derek Collison
1252653c16 Merge pull request #3829 from nats-io/jarema/fix-message-after-update
Fix Consumer not getting messages after filter update
2023-01-30 19:59:32 -08:00
Derek Collison
6058056e3b Minor fixes and optimizations for snapshots.
We were snappshotting more then needed, so double check that we should be doing this at the stream and consumer level.
At the raft level, we should have always been compacting the WAL to last+1, so made that consistent. Also fixed bug that would not skip last if more items behind the snapshot.

Signed-off-by: Derek Collison <derek@nats.io>
2023-01-30 17:54:18 -08:00
Tomasz Pietrek
836848ca64 Fix Consumer not getting messages after filter update
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-01-30 20:47:17 +01:00