When a stream had a large number of consumers on a server that were sparse, the signaling mechanism would do a linear scan to signal matching consumers. As usage patterns have continued to have more consumers that are filteres and sparse, meaning a message is destined for a single or small number of consumers.
This change moves selection to a sublist that tracks only active consumer leaders for selection, which optimizes selection of consumers to signal when the number of consumers is large.
Signed-off-by: Derek Collison <derek@nats.io>
With the increase use of subject based limits not being able
to store due to limits exceeded happens frequently and makes
running the server in debug quite noisy, so we rate limit this
log line even in debug
Signed-off-by: R.I.Pienaar <rip@devco.net>
First issue was applications not getting any response.
However, there was also a more serious issue that would create multiple raft groups for each concurrent request.
The servers would only run one stream monitor loop, however they would update the state to the new raft group's name, so on server restart the stream would be using a different raft group then existing servers.
Signed-off-by: Derek Collison <derek@nats.io>
A stream with a WorkQueue retention policy is supposed to allow
more than one consumer if they user filtered subjects, but those
subjects should not overlap.
There was an issue that if a new consumer had a filter subject
"wider" than an existing one, the error was not detected and
the new consumer was incorrectly accepted.
Resolves#3639
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If start by time is before what we remember during recovery use that instead
Resolves#3559
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If a stream is created or updated with a negative replicas count,
and error is now returned. Same for consumers.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
"meta_cluster": {
"name": "local",
"leader": "A",
"peer": "NUmM6cRx",
"replicas": [
{
"name": "B",
"current": true,
"active": 690369000,
"peer": "b2oh2L6w"
},
{
"name": "Server name unknown at this time (peerID: jZ6RvVRH)",
"current": false,
"offline": true,
"active": 0,
"peer": "jZ6RvVRH"
}
],
"cluster_size": 3
}
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The CLI will now be able to display the peer IDs in MetaGroupInfo
if it choses to do so, and possibly help user select the peer ID
from a list with a new command to remove by peer ID instead of
by server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Code change:
- Do not start the processMirrorMsgs and processSourceMsgs go routine
if the server has been detected to be shutdown. This would otherwise
leave some go routine running at the end of some tests.
- Pass the fch and qch to the consumerFileStore's flushLoop otherwise
in some tests this routine could be left running.
Tests changes:
- Added missing defer NATS connection close
- Added missing defer server shutdown
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This fixes an issue introduced in #3080
The consumer filter subject check was skipped on recovery.
The intent was to bypass the upstream stream subjects.
But it also filtered the downstream stream subject.
This became a problem when the downstream was itself an upstream.
Then during recover, the stream subject was not checked, which
lead to delivery of filtered messages that should never have been
delivered.
Signed-off-by: Matthias Hanel <mh@synadia.com>
if an origin stream contains:
1M msgs with subject foo and 1M msgs with subject bar
IF the source consumer changes their filter from foo to bar
Then it would have received messages for subject bar.
This happens because this tail was filtered and their
respective seqno was not communicated to the consumer.
This is somewhat unexpected. It is also coincidental.
Had the last message in the stream had subject foo then
this wouldn't happen.
Therefore, when completely changing the subject say,
from foo to bar, we only receive messages received
after the time the change was made.
However, if the old and new subject overlap in any way,
we go by sequence number. Meaning in these cases the
outlined behavior remains in order to not induce artificial
message loss for the part of the subject space that is
covered by old and new filter.
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
In standalone mode, when attempting to create a stream which has
subjects that overlap with an existing stream, the generic
stream create error "10049" was returned instead of the more
accurate "10065" error code corresponding to subject overlap,
as it was the case in clustered mode.
Resolves#3362
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
In normal message get, the returned format is RFC3339Nano, which
is what is being used when using JSON marshaling. However, for
the direct get we had to pass a string to construct the header
and we were using time.Time.String() which was using a different
layout. So use time.Time.MarshalJSON() to be consistent with
the non-direct get message.
Libraries that already parsed the non RFC3339Nano time format
can be updated since none should have been released yet (since
the feature in the server is not released yet)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
* Fix race between stream stop and monitorStream
monitorCluster stops the stream, when doing so, monitorStream
needs to be stopped to avoid miscounting of store size.
In a test stop and reset of store size happened first and then
was followed by storing more messages via monitorStream
Signed-off-by: Matthias Hanel <mh@synadia.com>
* fixed consumer restart on source filter update
When a stream source filter subject was updated, the internal consumer
was not re created
If the upstream stream contains a tail of previously filtered messages,
these will now be delivered
Signed-off-by: Matthias Hanel <mh@synadia.com>
* Added check for source/mirror filter subjects
When the origin stream exists, the sourec/mirror filter subject
will be checked against the stream subjects.
If there is no overlap, an error will be returned
Signed-off-by: Matthias Hanel <mh@synadia.com>
* Adding account purge operation
The new request is available for the system account.
The subject to send the request to is $JS.API.ACCOUNT.PURGE.*
With the name of the account to purge instead of the wildcard.
Also added directory cleanup code such that server do not
end up with empty streams directories and account dirs that
only contain streams
Also adding ACCOUNT to leaf node domain rewrite table
Addresses #3186 and #3306 by providing a way to
get rid of the streams for existing and non existing accounts
Signed-off-by: Matthias Hanel <mh@synadia.com>
- Send snapshot only if leader
- When processing snapshot, start with a smaller inactivity interval
that will double up to 10sec or use 10sec directly once we get a
message. Reason for that is that it is possible that the request
for snapshot is sent while the leader has not yet setup the subscription
that receives the requests (or subscription has not fully reached the
cluster).
- Don't remember snapfile on err.
- Do not consider current if we have not had any activity.
- Stabilize stream scale up under active heavy publishing.
- Due to the publish pressure move the check for followers direct subs spinning up til after we stop publishing.
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The `JSStreamNameExistErr` will now include in the description that
the stream exists with a different configuration, because that is
the error clients would get when trying to add a stream with a
different configuration (otherwise this is a no-op and client
don't get an error).
Since that error was used in case of restore, a new error is added
but uses the same description prefix "stream name already in use"
but adds ", cannot restore" to indicate that this is a restore
failure because the stream already exists.
Resolves#3273
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
With the previous commit, this should no longer be possible, but
otherwise we got report of a panic when accessing the mirror configuration.
The following test would be able to produce the panic:
```
s := RunBasicJetStreamServer()
if config := s.JetStreamConfig(); config != nil {
defer removeDir(t, config.StoreDir)
}
defer s.Shutdown()
nc, js := jsClientConnect(t, s)
defer nc.Close()
_, err := js.AddStream(&nats.StreamConfig{Name: "SOURCE"})
require_NoError(t, err)
cfg := &nats.StreamConfig{
Name: "M",
Mirror: &nats.StreamSource{Name: "SOURCE"},
}
_, err = js.AddStream(cfg)
require_NoError(t, err)
err = js.DeleteStream("SOURCE")
require_NoError(t, err)
cfg.Mirror = nil
_, err = js.UpdateStream(cfg)
require_NoError(t, err)
time.Sleep(5 * time.Second)
```
Again, now that we reject the mirror config update, the panic
should no longer happen, but adding preventive code in case we
allow in the future.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>