Commit Graph

524 Commits

Author SHA1 Message Date
Matthias Hanel
d53d2d0484 [Added] account specific monitoring endpoint(s) (#3250)
Added http monitoring endpoint /accstatz
It responds with a list of statz for all accounts with local connections
the argument "unused=1" can be provided to get statz for all accounts
This endpoint is also exposed as nats request under:

This monitoring endpoint is exposed via the system account.
$SYS.REQ.ACCOUNT.*.STATZ
Each server will respond with connection statistics for the requested
account. The format of the data section is a list (size 1) identical to the event
$SYS.ACCOUNT.%s.SERVER.CONNS which is sent periodically as well as on
connect/disconnect. Unless requested by options, server without the account,
or server where the account has no local connections, will not respond.

A PING endpoint exists as well. The response format is identical to
$SYS.REQ.ACCOUNT.*.STATZ
(however the data section will contain more than one account, if they exist)
In addition to general filter options the request takes a list of accounts and
an argument to include accounts without local connections (disabled by default)
$SYS.REQ.ACCOUNT.PING.STATZ

Each account has a new system account import where the local subject
$SYS.REQ.ACCOUNT.PING.STATZ essentially responds as if
the importing account name was used for $SYS.REQ.ACCOUNT.*.STATZ

The only difference between requesting ACCOUNT.PING.STATZ from within
the system account and an account is that the later can only retrieve
statz for the account the client requests from.

Also exposed the monitoring /healthz via the system account under
$SYS.REQ.SERVER.*.HEALTHZ
$SYS.REQ.SERVER.PING.HEALTHZ
No dedicated options are available for these.
HEALTHZ also accept general filter options.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-12 21:50:32 +02:00
Ivan Kozlovic
5ed212ce04 Rework startupComplete gate from PR #2360
The "InProcess" change make readyForConnections() possibly return
much faster than it used to, which could cause tests to fail.

Restore the original behavior, but in case of DontListen option
wait on the startupComplete gate.

Also fixed some missing checks for leafnode connections.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-06-28 17:36:39 -06:00
Neil Alexander
558293e096 Fix the lock 2022-06-28 18:05:57 +01:00
Neil Alexander
cedf08a1b7 Commit properly 2022-06-28 17:03:42 +01:00
Neil Alexander
ff696f00d8 Remainder of time after waiting for startupComplete, add waitgroup done call after createClient 2022-06-28 17:03:42 +01:00
Neil Alexander
5b04c49df9 Re-add startupComplete channel, make readyForConnections wait for it 2022-06-28 17:03:42 +01:00
Neil Alexander
9190feb05f Review comments @kozlovic 2022-06-28 17:03:40 +01:00
Neil Alexander
90d7e007c0 Update comments (re. review) 2022-06-28 17:02:47 +01:00
Neil Alexander
e9abc5801e Add InProcessConn, DontListen 2022-06-28 17:02:47 +01:00
Derek Collison
92cd7821de Convert server mutex to RW.
Signed-off-by: Derek Collison <derek@nats.io>
2022-06-27 16:05:03 -07:00
Derek Collison
cc197771ec Allow compile and staticheck to pass.
Signed-off-by: Derek Collison <derek@nats.io>
2022-06-24 09:17:12 -07:00
Ivan Kozlovic
4bf81420e2 [FIXED] Fast routed JetStream API requests were dropped
If a JS API request is received from a non client connection, it
was processed in its own go routine. To reduce the number of
such go routine, we were limiting the number of outstanding routines
to 4096. However, in some situations, it was possible to issue
many requests at the same time that would then cause those requests
to be dropped.

(an example was an MQTT benchmark tool that would create 5000
sessions, each with one QoS1 R1 consumer (with the use of consumer_replicas=1).
On abrupt exit of the tool, the consumers and their sessions needed
to be deleted. Since would cause fast incoming delete consumer requests
which would cause the original code to drop some of them)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-23 11:15:55 -06:00
Ivan Kozlovic
3cdbba16cb Revert "[added] support for jwt operator option DisallowBearerToken" 2022-05-04 11:11:25 -06:00
Matthias Hanel
c9217bad33 review comments
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-29 20:00:37 -04:00
Matthias Hanel
bd2883122e [added] support for jwt operator option DisallowBearerToken
I modified an existing data structure that held a similar attribute already.
Instead this data structure references the claim.

change 3 out of 3. Fixes #3084
corresponds to:
https://github.com/nats-io/jwt/pull/177
https://github.com/nats-io/nsc/pull/495

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-29 14:18:11 -04:00
Matthias Hanel
d520a27c36 [fixed] step down timing, consumer stream seqno, clear redelivery (#3079)
Step down timing for consumers or streams.
Signals loss of leadership and sleeps before stepping down.
This makes it less likely that messages are being processed during step
down.

When becoming leader, consumer stream seqno got reset,
even though the consumer existed already.

Proper cleanup of redelivery data structures and timer

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-27 03:32:08 -04:00
Ivan Kozlovic
50c3986863 [FIXED] JetStream stream catchup issues
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-12 16:05:12 -06:00
Matthias Hanel
13e5ab10bd fix js nex interest check where leaf node masked gw subj propagation (#3016)
basically a gw subject propagation issue could be hidden behind a leaf
node.
also change error text when this was the case

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-11 14:04:09 -04:00
Ivan Kozlovic
366d217f44 Some changes based on review
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-01 17:55:33 -06:00
Ivan Kozlovic
19783a9f11 [CHANGED] Rate limit similar warnings
Some warnings, especially when dealing with JS limits that were
printed on a per-message basis, are now limited to ~1 per second
if the content of the warning is already found in a map.

This is also for "client" warnings, but the client porting of the
warning is not taken into account so that helps with reducing logging
for similar content, but coming from different clients.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-01 15:24:03 -06:00
Ivan Kozlovic
7bb7309f4c [FIXED] Monitoring: verify_and_map in tls{} config would break monitoring
This was introduced in v2.6.6. In order to solve a config reload
issue, we used tls.Config.GetConfigForClient which allowed the
TLS configuration to be "refreshed" with the latest. However, in
this case, the tls.Config.ClientAuth was not reset to tls.NoClientCert
which we need for monitoring port.

Resolves #2980

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-30 18:50:52 -06:00
Matthias Hanel
0c5f3688a7 [ADDED] Tiered limits and fix limit issues on updates (#2945)
* Adding tiered limits and fix limit issues on updates

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-28 20:47:54 -04:00
Ivan Kozlovic
91bdcc30cc [FIXED] Server version check
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:11:55 -06:00
Derek Collison
edcddfae58 Make at least work
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 19:12:31 -07:00
Derek Collison
1d38a73bcb Fix for version comparison
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 18:39:28 -07:00
Ivan Kozlovic
a23b1b73ef Merge pull request #2931 from nats-io/ipq_changes
Changes to IPQueues
2022-03-17 19:13:02 -06:00
Ivan Kozlovic
c3da392832 Changes to IPQueues
Removed the warnings, instead have a sync.Map where they are
registered/unregistered and can be inspected with an undocumented
monitor page.
Added the notion of "in progress" which is the number of messages
that have beend pop()'ed. When recycle() is invoked this count
goes down.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 17:53:06 -06:00
Derek Collison
fa098f1af0 Show version on main monitoring page with link to source
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-17 11:04:11 -07:00
Ivan Kozlovic
63c750e295 [CHANGED] Gateway: Detect duplicate names between clusters
Gateway connection will be closed and error reported if a remote
has a name that is a duplicate of the local cluster.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-15 15:00:13 -06:00
Matthias Hanel
9a2da9ed8c Adding denies $KV.>/$OBJ.> along leaf connections on differing domain (#2916)
* Adding denies $KV.>/$OBJ.> along leaf connections on differing domain

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-09 13:17:59 -05:00
Ivan Kozlovic
3538aea34e Merge pull request #2915 from nats-io/fix_atomic_unaligned
[FIXED] Panic when monitoring enabled on non 64bit architectures
2022-03-09 10:30:50 -07:00
Ivan Kozlovic
dde235a92e [FIXED] Panic when monitoring enabled on non 64bit architectures
This is due to an unaligned 64-bit atomic operation. Move the
field at top of structure with 64-bit aligned preceding fields.

Resolves #2011

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 09:29:29 -07:00
Ivan Kozlovic
85b3f8a7fd Gateways: data race when setting first ping timer
This was introduced when fixing #2881. The call to setFirstPingTimer
needed to be done under the client's lock.

Moved setFirstPingTimer from a server receiver to a client receiver.
The only reason it was a server receiver is because we need the
server options, but c.srv is always set when invoking this function,
so we will get the server from c.srv in that function now.

Related to #2881

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-04 19:55:07 -07:00
Derek Collison
ca1132a01d Allow stream placement by tags.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-15 17:07:32 -08:00
Derek Collison
a0a2e32185 Remove dynamic account behaviors.
We used these in tests and for experimenting with sandboxed environments like the demo network.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-04 13:32:18 -08:00
Derek Collison
6486cd8fc8 Added in /healthz endpoint for health and liveness probes in environments like k8s.
Currently this code returns a 200 and { "status": "ok" } iff all configured ports are open
and if JetStream is configured and we have contact with the metaleader and the cluster and all streams are up to date.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 19:30:10 -08:00
Derek Collison
89435d50b2 Merge pull request #2813 from nats-io/pull_consumer_2
Updates to Pull Consumers
2022-01-24 13:52:16 -08:00
Derek Collison
d962500827 Track reply subjects for pending pull requests across clustered consumers.
We will only send if all peers in our group are >= 2.7.1 and we will check for updates.
When a consumer follower takes over it will notify all pending requests that those requests are invalid now.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-21 16:31:59 -08:00
Jaime Piña
c82b583d7a Fix race condition in HTTP monitoring shutdown (#2805) 2022-01-20 15:29:17 -08:00
Ivan Kozlovic
29c40c874c Adding logger for IPQueue
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:14:00 -07:00
Ivan Kozlovic
92e8997506 Replaced system event queue
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:03:33 -07:00
Ivan Kozlovic
c9c603b7a0 Merge pull request #2573 from julius-welink/implement-rate-limiting
[ADDED] TLS connection rate limiter
2022-01-13 10:35:19 -07:00
Waldemar Quevedo
ce4e4b5d47 Start monitoring before JetStream
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2022-01-12 21:38:22 -08:00
Julius Žaromskis
a47e5e045c [ADDED] TLS connection rate limiter 2022-01-11 16:57:19 +02:00
Derek Collison
52da55c8c6 Implement overflow placement for JetStream streams.
This allows stream placement to overflow to adjacent clusters.
We also do more balanced placement based on resources (store or mem). We can continue to expand this as well.
We also introduce an account requirement that stream configs contain a MaxBytes value.

We now track account limits and server limits more distinctly, and do not reserver server resources based on account limits themselves.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-06 19:33:08 -08:00
Matthias Hanel
42ae3f5325 Merge pull request #2757 from nats-io/sys-acc-err
Fixed system account issue where the wrong struct got updated
2021-12-23 12:13:25 -05:00
Matthias Hanel
fe5f47f43b Fixed system account issue where the wrong struct got updated
s.fetchAccount should not be used for the system account,
 as it creates a new struct

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-22 16:18:00 -05:00
Derek Collison
b43cb5b352 Added in ability to have account limits configured in server config.
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-21 18:31:07 -08:00
Matthias Hanel
3e8b66286d Js leaf deny (#2693)
Along a leaf node connection, unless the system account is shared AND the JetStream domain name is identical, the default JetStream traffic (without a domain set) will be denied.

As a consequence, all clients that wants to access a domain that is not the one in the server they are connected to, a domain name must be specified.
Affected from this change are setups where: a leaf node had no local JetStream OR the server the leaf node connected to had no local JetStream. 
One of the two accounts that are connected via a leaf node remote, must have no JetStream enabled.
The side that does not have JetStream enabled, will loose JetStream access and it's clients must set `nats.Domain` manually.

For workarounds on how to restore the old behavior, look at:
https://github.com/nats-io/nats-server/pull/2693#issuecomment-996212582

New config values added:
`default_js_domain` is a mapping from account to domain, settable when JetStream is not enabled in an account.
`extension_hint` are hints for non clustered server to start in clustered mode (and be usable to extend)
`js_domain` is a way to set the JetStream domain to use for mqtt.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-16 16:53:20 -05:00
Ben Werthmann
d7eec1edd4 [CHANGED] Profiler: Start profile_port earlier
Enables use of pprof to investigate server startup.

Co-authored-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Ben Werthmann <ben@synadia.com>
2021-12-01 16:56:57 -05:00