Three issues were found and resolved.
1. Purge replays after recovery could execute full purge.
2. Callback was registered without lock, which could lead to skew.
3. Cluster reset could stop stream store and recreate it, which could lead to double accounting.
Signed-off-by: Derek Collison <derek@nats.io>
- [X] Tests added
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
- [X] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)
### Changes proposed in this pull request:
Adds support for multi-filter (and associated transform destinations) to
stream sources
---------
Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
This records the server time when info for streams and consumers are
created so that tools such as the nats cli can calculate time deltas for
last ack, last delivered and so forth in the context of the server
clock.
This will help aleviate problems with client devices experiencing clock
jitter that can show up in user interfaces as negative seconds since
last ack etc
This records the server time when info for streams and
consumers are created so that tools such as the nats cli
can calculate time deltas for last ack, last delivered and
so forth in the context of the server clock.
This will help aleviate problems with client devices experiencing
clock jitter that can show up in user interfaces as negative
seconds since last ack etc
Signed-off-by: R.I.Pienaar <rip@devco.net>
First issue was applications not getting any response.
However, there was also a more serious issue that would create multiple raft groups for each concurrent request.
The servers would only run one stream monitor loop, however they would update the state to the new raft group's name, so on server restart the stream would be using a different raft group then existing servers.
Signed-off-by: Derek Collison <derek@nats.io>
It is possible that a stream info request would be handled at a
time where the raft group would not yet be set/created, causing
a panic.
Resolves#3626 (at least the panic reports there)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
"meta_cluster": {
"name": "local",
"leader": "A",
"peer": "NUmM6cRx",
"replicas": [
{
"name": "B",
"current": true,
"active": 690369000,
"peer": "b2oh2L6w"
},
{
"name": "Server name unknown at this time (peerID: jZ6RvVRH)",
"current": false,
"offline": true,
"active": 0,
"peer": "jZ6RvVRH"
}
],
"cluster_size": 3
}
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The CLI will now be able to display the peer IDs in MetaGroupInfo
if it choses to do so, and possibly help user select the peer ID
from a list with a new command to remove by peer ID instead of
by server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This can be helpful after a partial cluster restart since in that
case the server name may not be known. However "server report jetstream"
would report the peer ID that then can be used.
For instance here is the output after a cluster restart where server "C"
is not restarted.
```
nats -s nats://sys:pwd@localhost:4222 server report jetstream
...
╭────────────────────────────────────────────────────────────────────────────────────────────────╮
│ RAFT Meta Group Information │
├─────────────────────────────────────────────────────┬────────┬─────────┬────────┬────────┬─────┤
│ Name │ Leader │ Current │ Online │ Active │ Lag │
├─────────────────────────────────────────────────────┼────────┼─────────┼────────┼────────┼─────┤
│ A │ yes │ true │ true │ 0.00s │ 0 │
│ B │ │ true │ true │ 0.53s │ 0 │
│ Server name unknown at this time (peerID: jZ6RvVRH) │ │ false │ false │ 0.00s │ 0 │
╰─────────────────────────────────────────────────────┴────────┴─────────┴────────┴────────┴─────╯
```
With a change to the NATS CLI we could have something like:
```
nats -s nats://sys:pwd@localhost:4222 server raft peer-remove jZ6RvVRH --by_id
```
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
* changed format of JSClusterNoPeers error
This error was introduced in #3342 and reveals to much information
This change gets rid of cluster names and peer counts.
All other counts where changed to booleans,
which are only included in the output when the filter was hit.
In addition, the set of not matching tags is included.
Furthermore, the static error description in server/errors.json
is moved into selectPeerError
sample errors:
1) no suitable peers for placement, tags not matched ['cloud:GCP', 'country:US']"
2) no suitable peers for placement, insufficient storage
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Co-authored-by: Ivan Kozlovic <ivan@synadia.com>
This can happen if the move was initiated by the user.
A subsequent cancel resets the initial peer list.
The original peer list was picked on the old set of tags.
A cancel would then keep the new list of tags but reset
to the old peers. Thus tags and peers diverge.
The problem is that at the time of cancel, the old
placement tags can't be found anymore.
This fix causes cancel to remove the placement tags, if
the old peers do not satisfy the new placement tags.
Signed-off-by: Matthias Hanel <mh@synadia.com>
* better error when peer selection fails
It is pretty hard to diagnose what went wrong when not enough peers for
an operation where found. This change now returns counts of reasons why
peers where discarded.
Changed the error to JSClusterNoPeers as it seems more appropriate
of an error for that operation. Not having enough resources is one of
the conditions for a peer not being considered. But so is having a non
matching tag. Which is why JSClusterNoPeers seems more appropriate
In addition, JSClusterNoPeers was already used as error after one call
to selectPeerGroup already.
example:
no suitable peers for placement: peer selection cluster 'C' with 3 peers
offline: 0
excludeTag: 1
noTagMatch: 2
noSpace: 0
uniqueTag: 0
misc: 0
Examle for mqtt:
mid:12 - "mqtt" - unable to connect: create sessions stream for account "$G":
no suitable peers for placement: peer selection cluster 'MQTT' with 3 peers
offline: 0
excludeTag: 0
noTagMatch: 0
noSpace: 0
uniqueTag: 0
misc: 0
(10005)
Signed-off-by: Matthias Hanel <mh@synadia.com>
* review comment
Signed-off-by: Matthias Hanel <mh@synadia.com>
* Adding account purge operation
The new request is available for the system account.
The subject to send the request to is $JS.API.ACCOUNT.PURGE.*
With the name of the account to purge instead of the wildcard.
Also added directory cleanup code such that server do not
end up with empty streams directories and account dirs that
only contain streams
Also adding ACCOUNT to leaf node domain rewrite table
Addresses #3186 and #3306 by providing a way to
get rid of the streams for existing and non existing accounts
Signed-off-by: Matthias Hanel <mh@synadia.com>
The interest moved across the leafnode would be for the mapping, and not the actual qsub.
So when received if we did detect that we are mapped and do not have a queue filter present make sure to ignore.
This will allow queue subscriber processing on the local server that received the message from the leafnode.
Signed-off-by: Derek Collison <derek@nats.io>
If there are more stream names that the current limit of 1024, getting
the list of names would return them all instead of using pagination.
For "stream infos", the Total amount returned would be the API limit
instead of the actual number of streams.
Resolves https://github.com/nats-io/natscli/issues/541
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>