There was an observed degradation (around 5%) for large fan out in
v2.9.0 compared to earlier release. This is because we added
accounting of the in/out messages for the account, which result
in 4 atomic operations, 2 for in and 2 for out, however, it means
that for a fan-out of say 100 matching subscriptions, it is now
2 + 2 * 100 = 202.
This PR rework how the stats accounting is done which removes
the regression and even boost a bit the numbers since we are
doing the server stats update as an aggregate too.
There are still degradation for queues and no-sub at all that
need to be looked at.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Updating a consumer configuration from say R3 to R1 would work
but no response was received by the client sending the request.
Resolves#3493
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The write cache may be pinned for longer than needed when creating
a new write block. This could be seen in some benchmark tests.
The old block cache would be kept for 5 more seconds, which, with
a fast rate of inserts could start to show in some memory profiling.
This was a change introduced in https://github.com/nats-io/nats-server/pull/3351
which was different than code in v2.8.4
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
It could be that while the routine processing the consumer assignment
runs the stream is being stopped, which would lead to a panic.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Protocol errors print arguments that contain arbitrary []byte
and are possibly not formattable strings; use %q to escape
Signed-off-by: Caleb Lloyd <caleb@synadia.com>
If the `no_auth_user` is set in the `websocket{}` block and a
server creates a leafnode connection using the websocket port,
and does not provide credentials, that no_auth_user should be
used, but was not.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When a block's subject meta state was swapped out and subsequently loaded back in with only one subject present, but other messages with different subjects were added later, a filtered get could return the wrong result.
Signed-off-by: Derek Collison <derek@nats.io>
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
"meta_cluster": {
"name": "local",
"leader": "A",
"peer": "NUmM6cRx",
"replicas": [
{
"name": "B",
"current": true,
"active": 690369000,
"peer": "b2oh2L6w"
},
{
"name": "Server name unknown at this time (peerID: jZ6RvVRH)",
"current": false,
"offline": true,
"active": 0,
"peer": "jZ6RvVRH"
}
],
"cluster_size": 3
}
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This was discovered by new test TestJetStreamClusterRemovePeerByID.
I saw this on Travis and repeating the test locally with -count=10
I was able to reproduce. The issue is cc.meta being nil but accessing
cc.meta.ID() directly.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If the client pull requests has a max_bytes value and the server
cannot deliver a single message (because size is too big), it
is sending a 409 to signal that to the client library. However,
if it sends at least a message then it would close the request
without notifying the client with a 409, which would cause the
client library to have to wait for its expiration/timeout.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Since the second batch was already past the 5min mark and a bit
longer than the first batch, it is a good opportunity to add
this new test in a new file. Updated runTestsOnTravis and travis.yml
accordingly.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The CLI will now be able to display the peer IDs in MetaGroupInfo
if it choses to do so, and possibly help user select the peer ID
from a list with a new command to remove by peer ID instead of
by server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This can be helpful after a partial cluster restart since in that
case the server name may not be known. However "server report jetstream"
would report the peer ID that then can be used.
For instance here is the output after a cluster restart where server "C"
is not restarted.
```
nats -s nats://sys:pwd@localhost:4222 server report jetstream
...
â•────────────────────────────────────────────────────────────────────────────────────────────────╮
│ RAFT Meta Group Information │
├─────────────────────────────────────────────────────┬────────┬─────────┬────────┬────────┬─────┤
│ Name │ Leader │ Current │ Online │ Active │ Lag │
├─────────────────────────────────────────────────────┼────────┼─────────┼────────┼────────┼─────┤
│ A │ yes │ true │ true │ 0.00s │ 0 │
│ B │ │ true │ true │ 0.53s │ 0 │
│ Server name unknown at this time (peerID: jZ6RvVRH) │ │ false │ false │ 0.00s │ 0 │
╰─────────────────────────────────────────────────────┴────────┴─────────┴────────┴────────┴─────╯
```
With a change to the NATS CLI we could have something like:
```
nats -s nats://sys:pwd@localhost:4222 server raft peer-remove jZ6RvVRH --by_id
```
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
As per the release process, bumping the version to next update
with beta suffix once the release is out.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
* changed format of JSClusterNoPeers error
This error was introduced in #3342 and reveals to much information
This change gets rid of cluster names and peer counts.
All other counts where changed to booleans,
which are only included in the output when the filter was hit.
In addition, the set of not matching tags is included.
Furthermore, the static error description in server/errors.json
is moved into selectPeerError
sample errors:
1) no suitable peers for placement, tags not matched ['cloud:GCP', 'country:US']"
2) no suitable peers for placement, insufficient storage
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Co-authored-by: Ivan Kozlovic <ivan@synadia.com>
Code change:
- Do not start the processMirrorMsgs and processSourceMsgs go routine
if the server has been detected to be shutdown. This would otherwise
leave some go routine running at the end of some tests.
- Pass the fch and qch to the consumerFileStore's flushLoop otherwise
in some tests this routine could be left running.
Tests changes:
- Added missing defer NATS connection close
- Added missing defer server shutdown
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This could happen for stream with R>1 but with a durable that
has an override of R=1.
Fixed a test to make sure assets have an elected leader.
Also fixed a gateway test that would cause a data race.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>