Commit Graph

373 Commits

Author SHA1 Message Date
Derek Collison
786298469c Simplify locking for consumer info requests.
Signed-off-by: Derek Collison <derek@nats.io>
2022-12-14 18:52:15 -08:00
Derek Collison
c90fe9a2fa Improve performance and latency with large number of sparse consumers.
When a stream had a large number of consumers on a server that were sparse, the signaling mechanism would do a linear scan to signal matching consumers. As usage patterns have continued to have more consumers that are filteres and sparse, meaning a message is destined for a single or small number of consumers.

This change moves selection to a sublist that tracks only active consumer leaders for selection, which optimizes selection of consumers to signal when the number of consumers is large.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-13 15:25:55 -08:00
Derek Collison
549b77ca2d Ensure that ephemeral consumers that are deleted on startup properly are removed from the system.
Signed-off-by: Derek Collison <derek@nats.io>
2022-12-06 15:07:46 -08:00
Derek Collison
5e8c1993cb Server might crash if a pull consumer with inactivity threshold acks a msg then immediately deletes the consumer.
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-21 15:22:23 -08:00
Ivan Kozlovic
49faba9e33 [FIXED] JetStream: WorkQueue not preventing overlapping consumers
A stream with a WorkQueue retention policy is supposed to allow
more than one consumer if they user filtered subjects, but those
subjects should not overlap.

There was an issue that if a new consumer had a filter subject
"wider" than an existing one, the error was not detected and
the new consumer was incorrectly accepted.

Resolves #3639

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-11-16 17:09:30 -07:00
Derek Collison
6cb7f68ef7 Updates based on feedback w/ Ivan
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-16 11:53:46 -08:00
Derek Collison
aa57adbcd0 Use default here
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-16 11:31:49 -08:00
Derek Collison
08c94096db Allow any type of activity to prolong auto cleanup of a consumer.
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-15 17:25:18 -08:00
Derek Collison
b92ea86b80 When reading state, even on consumer init, should lock.
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-15 07:40:44 -08:00
Derek Collison
36ef788112 When determing whether we need an ack, no need to copy since under consumer lock.
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-14 11:47:31 -08:00
Derek Collison
47dd97e389 Fix logic bug that would prevent some messages from being deleted on an interest based stream.
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-13 17:32:38 -08:00
Derek Collison
c6031382a1 Fix for #3499
When we deleted a consumer from an interest policy stream we would make sure to clean up any unacked messages.
However we only based start from the ack floor for the consumer and did not take into account the first sequence of the stream.

Signed-off-by: Derek Collison <derek@nats.io>
2022-11-05 13:56:45 -07:00
Derek Collison
72ff2edb5f Fix for #3603.
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-03 12:46:41 -07:00
Ivan Kozlovic
abcfe2e7ac Add the pending msgs/bytes on 409 Shutdown
This is related to PR #3572 and PR #3576

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-10-27 13:59:21 -06:00
Tomasz Pietrek
f0219e1d95 Merge pull request #3572 from nats-io/jarema/add-pending-info-to-request-timeout
Added pending messages/bytes info to request statuses and errors
2022-10-26 21:20:20 +02:00
Tomasz Pietrek
ef764598ee Add pending messages/bytes info to request errors and statuses
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2022-10-26 20:02:11 +02:00
Derek Collison
15dc72db50 Removed migration of ephemerals, added proper signaling for pul consumers pending requests.
Signed-off-by: Derek Collison <derek@nats.io>
2022-10-25 14:35:20 -07:00
Ivan Kozlovic
39f31b0dbe [FIXED] JetStream: InactivityThreshold updates not always working
This is based of @neilalexander PR #3558.

It ensures that the timer is reset/canceled on configuration
update (by the leader only).

Fixed also the issue with a super-cluster where the delete timer
would always be reset at every gateway interval check.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-10-25 09:54:01 -06:00
Neil Alexander
ff23c217ea Add missing RUnlock in needAck 2022-10-14 11:54:06 +01:00
Ivan Kozlovic
3c7aa554f7 [FIXED] JetStream: return error on negative replicas count
If a stream is created or updated with a negative replicas count,
and error is now returned. Same for consumers.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-10-10 12:32:41 -06:00
Ivan Kozlovic
46aec649e4 [FIXED] JetStream: redeliveries for LastPerSubject delivery policy
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-29 14:00:18 -06:00
Ivan Kozlovic
08968287d5 [FIXED] JetStream: prevent panic on consumer assignment
It could be that while the routine processing the consumer assignment
runs the stream is being stopped, which would lead to a panic.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-26 13:11:35 -06:00
Ivan Kozlovic
170ff49837 [ADDED] JetStream: peer (the hash of server name) in statsz/jsz
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
    "meta_cluster": {
      "name": "local",
      "leader": "A",
      "peer": "NUmM6cRx",
      "replicas": [
        {
          "name": "B",
          "current": true,
          "active": 690369000,
          "peer": "b2oh2L6w"
        },
        {
          "name": "Server name unknown at this time (peerID: jZ6RvVRH)",
          "current": false,
          "offline": true,
          "active": 0,
          "peer": "jZ6RvVRH"
        }
      ],
      "cluster_size": 3
    }
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-16 15:31:37 -06:00
Ivan Kozlovic
ff0bda415b [FIXE] JetStream: Pull requests closed due to max_bytes were silent
If the client pull requests has a max_bytes value and the server
cannot deliver a single message (because size is too big), it
is sending a 409 to signal that to the client library. However,
if it sends at least a message then it would close the request
without notifying the client with a 409, which would cause the
client library to have to wait for its expiration/timeout.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-15 16:55:41 -06:00
Derek Collison
6c97733bb8 Optimize needAck.
Signed-off-by: Derek Collison <derek@nats.io>
2022-09-14 16:25:50 -07:00
Tomasz Pietrek
dbf7636e15 Add error if Consumer Durable and Name are not equal
This error will happen only if both Name and Durable are specified.

Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2022-09-14 20:31:18 +02:00
Derek Collison
dedf21d45d Fix for issue #3455
When hitting max ack pending from getNextMsg would remove one shots incorrectly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-09-08 11:56:57 -07:00
Derek Collison
b32814d5fd Better accounting for max-bytes for pull consumers
Signed-off-by: Derek Collison <derek@nats.io>
2022-09-08 11:56:57 -07:00
Derek Collison
84b95be56e NoAck allowed now on pull consumers
Signed-off-by: Derek Collison <derek@nats.io>
2022-09-06 15:19:17 -07:00
Derek Collison
aa94a0bc0f New consumer create that allows elevation of stream and consumer names, and optional filter subject to the request subject.
Similar to changes in direct get allows proper security if needed for filter subject selection.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 09:29:38 -07:00
Derek Collison
d04763eb7d CAS operations improved, hold lock past store. Use separate lock for consumer list and storage updates.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-24 18:30:44 -07:00
Derek Collison
212adf5775 General improvements to clustered streams during server restart and KV/CAS scenarios.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-22 18:36:15 -07:00
Matthias Hanel
c02d1ad69e fix consumer subject validation on recovery (#3389)
This fixes an issue introduced in #3080
The consumer filter subject check was skipped on recovery.

The intent was to bypass the upstream stream subjects.
But it also filtered the downstream stream subject.
This became a problem when the downstream was itself an upstream.

Then during recover, the stream subject was not checked, which
lead to delivery of filtered messages that should never have been
delivered.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-22 14:30:00 -07:00
Ivan Kozlovic
7de4497815 Install consumer snapshot on clean exit and few other fixes
- didRemove in applyMetaEntries() could be reset when processing
multiple entries
- change "no race" test names to include JetStream
- separate raft nodes leader stepdown and stop in server
shutdown process
- in InstallSnapshot, call wal.Compact() with lastIndex+1

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-16 17:05:49 -06:00
Ivan Kozlovic
f0b098af92 [FIXED] JetStream: issue with max deliver and server/cluster restart
This is a regression introduced in v2.8.3. If a message reaches
the max redeliver count, it stops being delivered to the consumer.
However, after a server or cluster restart, those messages would
be redelivered again.

Resolves #3361

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-16 17:05:47 -06:00
Byron Ruth
095cfef9eb Add check and test to prevent updating consumer MaxWaiting
Signed-off-by: Byron Ruth <b@devel.io>
2022-07-29 15:05:00 -04:00
Matthias Hanel
6212087feb fix race by locking arround o.isLeader (#3291)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-26 21:49:04 +02:00
Ivan Kozlovic
a3e62f000c Merge pull request #3265 from abegaj/fix-max-deliver-update
[FIXED] Maximum Deliveries update has no effect
2022-07-21 16:06:41 -06:00
Matthias Hanel
89b5e872ac Move and cancel fixes (#3270)
The Move/Cancel/Downscale mechanism did not take into account that
the consumer's replica count can be set independently.

This also alters peer selection to have the ability to skip 
unique tag prefix check for server that will be replaced.
Say you have 3 az, and want to add another server to az:1, 
in order to replace a server that is the same zone.
Without this change, uniqueTagPrefix check would filter 
the server to replace with and cause a failure.

The cancel move response could not be received due to 
the wrong account name.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-18 18:42:03 +02:00
Ardit Begaj
4d81b8e4a2 Make MaxDeliver update changes take effect 2022-07-14 15:38:59 +02:00
Derek Collison
16f788adce Consumer isFiltered could crash if the stream was a mirror with no direct subjects but it was in interest or workqueue retention mode.
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-07 03:35:41 +02:00
Matthias Hanel
70be4b77f9 fixes peer removal, simplifies move, more tests
Make sure when processing a peer removal that the stream assignment agrees.
When a new leader takes over it can resend a peer removal, and if the stream/consumer really was rescheduled we could remove by accident.

Also need to make sure that when we remove a stream we remove the node as part of the stream assignment.
If we didn't, if the same asset returned to this server we would not start up the monitoring loop.

Simplify migration logic in monitorStream, to be driven by leader only

Improved unit tests

Added failure when server not in peer list

Move command does not require server anymore

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-07 03:32:13 +02:00
Derek Collison
52baa95e2a When stalled for MaxAckPending, expire all pull requests that are one shot, meaning had at least 1 delivered message
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-04 19:14:49 -07:00
Derek Collison
1b580c67f3 Make pull consumers FIFO per message, not per request.
This effectively means that requests with batch > 1 will process a message and go to the end of the line.

Signed-off-by: Derek Collison <derek@nats.io>
2022-07-04 13:05:57 -07:00
Ivan Kozlovic
c519df7e0d [FIXED] Pull consumer may be incorrectly removed after InactiveThreshold
When creating a pull consumer with InactiveThreshold set, if the
application is doing pull requests with "no_wait" at regular interval
(lower than InactiveThreshold), the JS consumer should be considered
active and not deleted.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-06-29 10:15:09 -06:00
Derek Collison
abc5905aa9 Merge pull request #3221 from nats-io/direct
Made direct get from a stream part of the $JS.API hierarchy vs separate.
2022-06-28 09:59:44 -07:00
Derek Collison
b8ef9b19a0 Made direct get from a stream part of the $JS.API hierarchy vs separate.
Also for direct get and for pull requests, if we are not on a client connection check how long we have been away from the readloop.
If need be execute in a separate go routine.

Signed-off-by: Derek Collison <derek@nats.io>
2022-06-28 08:53:48 -07:00
Derek Collison
b0c4ec69ab Merge pull request #3216 from nats-io/update_consumer_filtered_subj
Allow consumer filter subjects to be updated
2022-06-28 08:42:08 -07:00
Derek Collison
e02016db26 Fix race
Signed-off-by: Derek Collison <derek@nats.io>
2022-06-27 16:05:03 -07:00
Derek Collison
9154fca7f1 Allow consumer filter subjects to be updated
Signed-off-by: Derek Collison <derek@nats.io>
2022-06-24 12:38:01 -07:00