Several strategies are used which are listed below.
1. Checking a RaftNode to see if it is the leader now uses atomics.
2. Checking if we are the JetStream meta leader from the server now uses
an atomic.
3. Accessing the JetStream context no longer requires a server lock,
uses atomic.Pointer.
4. Filestore syncBlocks would hold msgBlock locks during sync, now does
not.
Signed-off-by: Derek Collison <derek@nats.io>
Several strategies which are listed below.
1. Checking a RaftNode to see if it is the leader now uses atomics.
2. Checking if we are the JetStream meta leader from the server now uses an atomic.
3. Accessing the JetStream context no longer requires a server lock, uses atomic.Pointer.
4. Filestore syncBlocks would hold msgBlock locks during sync, now does not.
Signed-off-by: Derek Collison <derek@nats.io>
Here we know that if we can't find the stream but have the stream assignment, this is a distinct possibility. So we wait, since not processed inline, to see if it appears.
Fixes TestJetStreamClusterParallelStreamCreation as well that could flap.
Signed-off-by: Derek Collison <derek@nats.io>
Three issues were found and resolved.
1. Purge replays after recovery could execute full purge.
2. Callback was registered without lock, which could lead to skew.
3. Cluster reset could stop stream store and recreate it, which could lead to double accounting.
Signed-off-by: Derek Collison <derek@nats.io>
- [X] Tests added
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
- [X] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)
### Changes proposed in this pull request:
Adds support for multi-filter (and associated transform destinations) to
stream sources
---------
Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
This records the server time when info for streams and consumers are
created so that tools such as the nats cli can calculate time deltas for
last ack, last delivered and so forth in the context of the server
clock.
This will help aleviate problems with client devices experiencing clock
jitter that can show up in user interfaces as negative seconds since
last ack etc
This records the server time when info for streams and
consumers are created so that tools such as the nats cli
can calculate time deltas for last ack, last delivered and
so forth in the context of the server clock.
This will help aleviate problems with client devices experiencing
clock jitter that can show up in user interfaces as negative
seconds since last ack etc
Signed-off-by: R.I.Pienaar <rip@devco.net>
First issue was applications not getting any response.
However, there was also a more serious issue that would create multiple raft groups for each concurrent request.
The servers would only run one stream monitor loop, however they would update the state to the new raft group's name, so on server restart the stream would be using a different raft group then existing servers.
Signed-off-by: Derek Collison <derek@nats.io>
It is possible that a stream info request would be handled at a
time where the raft group would not yet be set/created, causing
a panic.
Resolves#3626 (at least the panic reports there)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
"meta_cluster": {
"name": "local",
"leader": "A",
"peer": "NUmM6cRx",
"replicas": [
{
"name": "B",
"current": true,
"active": 690369000,
"peer": "b2oh2L6w"
},
{
"name": "Server name unknown at this time (peerID: jZ6RvVRH)",
"current": false,
"offline": true,
"active": 0,
"peer": "jZ6RvVRH"
}
],
"cluster_size": 3
}
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The CLI will now be able to display the peer IDs in MetaGroupInfo
if it choses to do so, and possibly help user select the peer ID
from a list with a new command to remove by peer ID instead of
by server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This can be helpful after a partial cluster restart since in that
case the server name may not be known. However "server report jetstream"
would report the peer ID that then can be used.
For instance here is the output after a cluster restart where server "C"
is not restarted.
```
nats -s nats://sys:pwd@localhost:4222 server report jetstream
...
╭────────────────────────────────────────────────────────────────────────────────────────────────╮
│ RAFT Meta Group Information │
├─────────────────────────────────────────────────────┬────────┬─────────┬────────┬────────┬─────┤
│ Name │ Leader │ Current │ Online │ Active │ Lag │
├─────────────────────────────────────────────────────┼────────┼─────────┼────────┼────────┼─────┤
│ A │ yes │ true │ true │ 0.00s │ 0 │
│ B │ │ true │ true │ 0.53s │ 0 │
│ Server name unknown at this time (peerID: jZ6RvVRH) │ │ false │ false │ 0.00s │ 0 │
╰─────────────────────────────────────────────────────┴────────┴─────────┴────────┴────────┴─────╯
```
With a change to the NATS CLI we could have something like:
```
nats -s nats://sys:pwd@localhost:4222 server raft peer-remove jZ6RvVRH --by_id
```
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
* changed format of JSClusterNoPeers error
This error was introduced in #3342 and reveals to much information
This change gets rid of cluster names and peer counts.
All other counts where changed to booleans,
which are only included in the output when the filter was hit.
In addition, the set of not matching tags is included.
Furthermore, the static error description in server/errors.json
is moved into selectPeerError
sample errors:
1) no suitable peers for placement, tags not matched ['cloud:GCP', 'country:US']"
2) no suitable peers for placement, insufficient storage
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Co-authored-by: Ivan Kozlovic <ivan@synadia.com>
This can happen if the move was initiated by the user.
A subsequent cancel resets the initial peer list.
The original peer list was picked on the old set of tags.
A cancel would then keep the new list of tags but reset
to the old peers. Thus tags and peers diverge.
The problem is that at the time of cancel, the old
placement tags can't be found anymore.
This fix causes cancel to remove the placement tags, if
the old peers do not satisfy the new placement tags.
Signed-off-by: Matthias Hanel <mh@synadia.com>
* better error when peer selection fails
It is pretty hard to diagnose what went wrong when not enough peers for
an operation where found. This change now returns counts of reasons why
peers where discarded.
Changed the error to JSClusterNoPeers as it seems more appropriate
of an error for that operation. Not having enough resources is one of
the conditions for a peer not being considered. But so is having a non
matching tag. Which is why JSClusterNoPeers seems more appropriate
In addition, JSClusterNoPeers was already used as error after one call
to selectPeerGroup already.
example:
no suitable peers for placement: peer selection cluster 'C' with 3 peers
offline: 0
excludeTag: 1
noTagMatch: 2
noSpace: 0
uniqueTag: 0
misc: 0
Examle for mqtt:
mid:12 - "mqtt" - unable to connect: create sessions stream for account "$G":
no suitable peers for placement: peer selection cluster 'MQTT' with 3 peers
offline: 0
excludeTag: 0
noTagMatch: 0
noSpace: 0
uniqueTag: 0
misc: 0
(10005)
Signed-off-by: Matthias Hanel <mh@synadia.com>
* review comment
Signed-off-by: Matthias Hanel <mh@synadia.com>