When a stream had a large number of consumers on a server that were sparse, the signaling mechanism would do a linear scan to signal matching consumers. As usage patterns have continued to have more consumers that are filteres and sparse, meaning a message is destined for a single or small number of consumers.
This change moves selection to a sublist that tracks only active consumer leaders for selection, which optimizes selection of consumers to signal when the number of consumers is large.
Signed-off-by: Derek Collison <derek@nats.io>
A stream with a WorkQueue retention policy is supposed to allow
more than one consumer if they user filtered subjects, but those
subjects should not overlap.
There was an issue that if a new consumer had a filter subject
"wider" than an existing one, the error was not detected and
the new consumer was incorrectly accepted.
Resolves#3639
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When we deleted a consumer from an interest policy stream we would make sure to clean up any unacked messages.
However we only based start from the ack floor for the consumer and did not take into account the first sequence of the stream.
Signed-off-by: Derek Collison <derek@nats.io>
This is based of @neilalexander PR #3558.
It ensures that the timer is reset/canceled on configuration
update (by the leader only).
Fixed also the issue with a super-cluster where the delete timer
would always be reset at every gateway interval check.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If a stream is created or updated with a negative replicas count,
and error is now returned. Same for consumers.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
It could be that while the routine processing the consumer assignment
runs the stream is being stopped, which would lead to a panic.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
"meta_cluster": {
"name": "local",
"leader": "A",
"peer": "NUmM6cRx",
"replicas": [
{
"name": "B",
"current": true,
"active": 690369000,
"peer": "b2oh2L6w"
},
{
"name": "Server name unknown at this time (peerID: jZ6RvVRH)",
"current": false,
"offline": true,
"active": 0,
"peer": "jZ6RvVRH"
}
],
"cluster_size": 3
}
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If the client pull requests has a max_bytes value and the server
cannot deliver a single message (because size is too big), it
is sending a 409 to signal that to the client library. However,
if it sends at least a message then it would close the request
without notifying the client with a 409, which would cause the
client library to have to wait for its expiration/timeout.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This fixes an issue introduced in #3080
The consumer filter subject check was skipped on recovery.
The intent was to bypass the upstream stream subjects.
But it also filtered the downstream stream subject.
This became a problem when the downstream was itself an upstream.
Then during recover, the stream subject was not checked, which
lead to delivery of filtered messages that should never have been
delivered.
Signed-off-by: Matthias Hanel <mh@synadia.com>
- didRemove in applyMetaEntries() could be reset when processing
multiple entries
- change "no race" test names to include JetStream
- separate raft nodes leader stepdown and stop in server
shutdown process
- in InstallSnapshot, call wal.Compact() with lastIndex+1
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is a regression introduced in v2.8.3. If a message reaches
the max redeliver count, it stops being delivered to the consumer.
However, after a server or cluster restart, those messages would
be redelivered again.
Resolves#3361
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The Move/Cancel/Downscale mechanism did not take into account that
the consumer's replica count can be set independently.
This also alters peer selection to have the ability to skip
unique tag prefix check for server that will be replaced.
Say you have 3 az, and want to add another server to az:1,
in order to replace a server that is the same zone.
Without this change, uniqueTagPrefix check would filter
the server to replace with and cause a failure.
The cancel move response could not be received due to
the wrong account name.
Signed-off-by: Matthias Hanel <mh@synadia.com>
Make sure when processing a peer removal that the stream assignment agrees.
When a new leader takes over it can resend a peer removal, and if the stream/consumer really was rescheduled we could remove by accident.
Also need to make sure that when we remove a stream we remove the node as part of the stream assignment.
If we didn't, if the same asset returned to this server we would not start up the monitoring loop.
Simplify migration logic in monitorStream, to be driven by leader only
Improved unit tests
Added failure when server not in peer list
Move command does not require server anymore
Signed-off-by: Matthias Hanel <mh@synadia.com>
This effectively means that requests with batch > 1 will process a message and go to the end of the line.
Signed-off-by: Derek Collison <derek@nats.io>
When creating a pull consumer with InactiveThreshold set, if the
application is doing pull requests with "no_wait" at regular interval
(lower than InactiveThreshold), the JS consumer should be considered
active and not deleted.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Also for direct get and for pull requests, if we are not on a client connection check how long we have been away from the readloop.
If need be execute in a separate go routine.
Signed-off-by: Derek Collison <derek@nats.io>