Also added in healthz for single server systems to make sure all stream directories resulted in recovered streams.
Signed-off-by: Derek Collison <derek@nats.io>
This fixes a backwards incompat change for library usage as well as
using the healthz NATS API which depends on the JSON payload.
Signed-off-by: Byron Ruth <byron@nats.io>
When js-enabled is set to true, the condition was only checked if
the `getJetStream()` call returned `nil`. However, if it non-nil,
all remaining checks were executed, including assessing the health
of the assets (streams and consumers).
This change addresses two issues:
- Switch to use `js.isEnabled()` which will check whether the value
is nil OR `js.disabled = true` which can occur if the subsystem
is temporarily disabled (insufficient resources).
- Correctly exit the check after the assertion and before meta and
asset checks are performed.
In addition, the option has been renamed to `js-enabled-only` to align
with the `js-server-only` naming. The previous `js-enabled` name still
works, but is mapped to this new option. A warning is emitted noting
the previous option is deprecated.
Fix#3703
Signed-off-by: Byron Ruth <b@devel.io>
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
"meta_cluster": {
"name": "local",
"leader": "A",
"peer": "NUmM6cRx",
"replicas": [
{
"name": "B",
"current": true,
"active": 690369000,
"peer": "b2oh2L6w"
},
{
"name": "Server name unknown at this time (peerID: jZ6RvVRH)",
"current": false,
"offline": true,
"active": 0,
"peer": "jZ6RvVRH"
}
],
"cluster_size": 3
}
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Change finger_prints to cert_sha256 and use hex.EncodeToString
- Add spki_sha256 for RawSubjectPublicKeyInfo with hex.EncodeToString
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Add basic peer certificates information in /connz endpoint when
the "auth" option is provided.
Resolves#3317
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The monitoring http server is started early and the gateway setup
(when configured) may not be fully ready when the `/gatewayz`
endpoint is inspected and could cause a panic.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Added http monitoring endpoint /accstatz
It responds with a list of statz for all accounts with local connections
the argument "unused=1" can be provided to get statz for all accounts
This endpoint is also exposed as nats request under:
This monitoring endpoint is exposed via the system account.
$SYS.REQ.ACCOUNT.*.STATZ
Each server will respond with connection statistics for the requested
account. The format of the data section is a list (size 1) identical to the event
$SYS.ACCOUNT.%s.SERVER.CONNS which is sent periodically as well as on
connect/disconnect. Unless requested by options, server without the account,
or server where the account has no local connections, will not respond.
A PING endpoint exists as well. The response format is identical to
$SYS.REQ.ACCOUNT.*.STATZ
(however the data section will contain more than one account, if they exist)
In addition to general filter options the request takes a list of accounts and
an argument to include accounts without local connections (disabled by default)
$SYS.REQ.ACCOUNT.PING.STATZ
Each account has a new system account import where the local subject
$SYS.REQ.ACCOUNT.PING.STATZ essentially responds as if
the importing account name was used for $SYS.REQ.ACCOUNT.*.STATZ
The only difference between requesting ACCOUNT.PING.STATZ from within
the system account and an account is that the later can only retrieve
statz for the account the client requests from.
Also exposed the monitoring /healthz via the system account under
$SYS.REQ.SERVER.*.HEALTHZ
$SYS.REQ.SERVER.PING.HEALTHZ
No dedicated options are available for these.
HEALTHZ also accept general filter options.
Signed-off-by: Matthias Hanel <mh@synadia.com>
Added Start, LastActivity, Uptime and Idle that we normally have
in a Connz for non route connections. This info can be useful
to determine if a route is recent, etc..
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When updating usage, there is a lock inversion in that the jetStream
lock was acquired while under the stream's (mset) lock, which is
not correct. Also, updateUsage was locking the jsAccount lock, which
again, is not really correct since jsAccount contains streams, so
it should be jsAccount->stream, not the other way around.
Removed the locking of jetStream to check for clustered state since
js.clustered is immutable.
Replaced using jsAccount lock to update usage with a dedicated lock.
Originally moved all the update/limit fields in jsAccount to new
structure to make sure that I would see all code that is updating
or reading those fields, and also all functions so that I could
make sure that I use the new lock when calling these. Once that
works was done, and to reduce code changes, I put the fields back
into jsAccount (although I grouped them under the new usageMu mutex
field).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If set, a server configured to accept leafnode connections will
reject a remote server whose version is below that value. Note
that servers prior to v2.8.0 are not sending their version
in the CONNECT protocol, which means that anything below 2.8.0
would be rejected.
Configuration example:
```
leafnodes {
port: 7422
min_version: 2.8.0
}
```
The option is a string and can have the "v" prefix:
```
min_version: "v2.9.1"
```
Note that although suffix such as `-beta` would be accepted,
only the major, minor and update are used for the version comparison.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Data race that has been seen:
```
Read at 0x00c00134bec0 by goroutine 159:
github.com/nats-io/nats-server/v2/server.(*client).msgHeaderForRouteOrLeaf()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:2935 +0x254
github.com/nats-io/nats-server/v2/server.(*client).processMsgResults()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4364 +0x2147
(...)
Previous write at 0x00c00134bec0 by goroutine 201:
github.com/nats-io/nats-server/v2/server.(*Server).addRoute()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1475 +0xdb4
github.com/nats-io/nats-server/v2/server.(*client).processRouteInfo()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:641 +0x1704
```
Also fixed some flappers and removed use of `s.js.` since we have
already captured `js` in Jsz monitoring.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Kubernetes probes don't use nor log the reponse body of health
endpoints. This means that for some reason a nats node running in
Kubernetes becomes on a Not Ready state we won't have a way to know why
other than to manually access the cluster and call the /healthz endpoint
manually and see the error.
This change adds an error log so we can observe what is going wrong with
a nats node that is not ready.
Signed-off-by: Samuel Torres <samuel.torres@form3.tech>
The data from other nodes are usually wrong, this can be quite
confusing for users so we now only send it when we are the leader
Signed-off-by: R.I.Pienaar <rip@devco.net>
Removed the warnings, instead have a sync.Map where they are
registered/unregistered and can be inspected with an undocumented
monitor page.
Added the notion of "in progress" which is the number of messages
that have beend pop()'ed. When recycle() is invoked this count
goes down.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Gateway connection will be closed and error reported if a remote
has a name that is a duplicate of the local cluster.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
user and activation token did not honor the jwt value for all * on
connect.
activation token where not re evaluated when the export revoked a key.
In part this is a consistency measure so servers that already have an
account and servers that don't behave the same way.
in jwt activation token revocations are stored per export.
The server stored them per account, thus effectively merging
revocations. Now they are stored per export inside the server too.
fixes nats-io/nsc/issues/442
Signed-off-by: Matthias Hanel <mh@synadia.com>
Currently this code returns a 200 and { "status": "ok" } iff all configured ports are open
and if JetStream is configured and we have contact with the metaleader and the cluster and all streams are up to date.
Signed-off-by: Derek Collison <derek@nats.io>
This allows stream placement to overflow to adjacent clusters.
We also do more balanced placement based on resources (store or mem). We can continue to expand this as well.
We also introduce an account requirement that stream configs contain a MaxBytes value.
We now track account limits and server limits more distinctly, and do not reserver server resources based on account limits themselves.
Signed-off-by: Derek Collison <derek@nats.io>
Appears what happens is that the getPublicConsumers()
is called which produces a list of consumers and that
between the time the list is made and the Info() is
called the ephemeral was removed.
Signed-off-by: R.I.Pienaar <rip@devco.net>