Commit Graph

513 Commits

Author SHA1 Message Date
Ivan Kozlovic
4bf81420e2 [FIXED] Fast routed JetStream API requests were dropped
If a JS API request is received from a non client connection, it
was processed in its own go routine. To reduce the number of
such go routine, we were limiting the number of outstanding routines
to 4096. However, in some situations, it was possible to issue
many requests at the same time that would then cause those requests
to be dropped.

(an example was an MQTT benchmark tool that would create 5000
sessions, each with one QoS1 R1 consumer (with the use of consumer_replicas=1).
On abrupt exit of the tool, the consumers and their sessions needed
to be deleted. Since would cause fast incoming delete consumer requests
which would cause the original code to drop some of them)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-23 11:15:55 -06:00
Ivan Kozlovic
3cdbba16cb Revert "[added] support for jwt operator option DisallowBearerToken" 2022-05-04 11:11:25 -06:00
Matthias Hanel
c9217bad33 review comments
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-29 20:00:37 -04:00
Matthias Hanel
bd2883122e [added] support for jwt operator option DisallowBearerToken
I modified an existing data structure that held a similar attribute already.
Instead this data structure references the claim.

change 3 out of 3. Fixes #3084
corresponds to:
https://github.com/nats-io/jwt/pull/177
https://github.com/nats-io/nsc/pull/495

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-29 14:18:11 -04:00
Matthias Hanel
d520a27c36 [fixed] step down timing, consumer stream seqno, clear redelivery (#3079)
Step down timing for consumers or streams.
Signals loss of leadership and sleeps before stepping down.
This makes it less likely that messages are being processed during step
down.

When becoming leader, consumer stream seqno got reset,
even though the consumer existed already.

Proper cleanup of redelivery data structures and timer

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-27 03:32:08 -04:00
Ivan Kozlovic
50c3986863 [FIXED] JetStream stream catchup issues
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-12 16:05:12 -06:00
Matthias Hanel
13e5ab10bd fix js nex interest check where leaf node masked gw subj propagation (#3016)
basically a gw subject propagation issue could be hidden behind a leaf
node.
also change error text when this was the case

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-11 14:04:09 -04:00
Ivan Kozlovic
366d217f44 Some changes based on review
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-01 17:55:33 -06:00
Ivan Kozlovic
19783a9f11 [CHANGED] Rate limit similar warnings
Some warnings, especially when dealing with JS limits that were
printed on a per-message basis, are now limited to ~1 per second
if the content of the warning is already found in a map.

This is also for "client" warnings, but the client porting of the
warning is not taken into account so that helps with reducing logging
for similar content, but coming from different clients.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-01 15:24:03 -06:00
Ivan Kozlovic
7bb7309f4c [FIXED] Monitoring: verify_and_map in tls{} config would break monitoring
This was introduced in v2.6.6. In order to solve a config reload
issue, we used tls.Config.GetConfigForClient which allowed the
TLS configuration to be "refreshed" with the latest. However, in
this case, the tls.Config.ClientAuth was not reset to tls.NoClientCert
which we need for monitoring port.

Resolves #2980

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-30 18:50:52 -06:00
Matthias Hanel
0c5f3688a7 [ADDED] Tiered limits and fix limit issues on updates (#2945)
* Adding tiered limits and fix limit issues on updates

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-28 20:47:54 -04:00
Ivan Kozlovic
91bdcc30cc [FIXED] Server version check
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:11:55 -06:00
Derek Collison
edcddfae58 Make at least work
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 19:12:31 -07:00
Derek Collison
1d38a73bcb Fix for version comparison
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 18:39:28 -07:00
Ivan Kozlovic
a23b1b73ef Merge pull request #2931 from nats-io/ipq_changes
Changes to IPQueues
2022-03-17 19:13:02 -06:00
Ivan Kozlovic
c3da392832 Changes to IPQueues
Removed the warnings, instead have a sync.Map where they are
registered/unregistered and can be inspected with an undocumented
monitor page.
Added the notion of "in progress" which is the number of messages
that have beend pop()'ed. When recycle() is invoked this count
goes down.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 17:53:06 -06:00
Derek Collison
fa098f1af0 Show version on main monitoring page with link to source
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-17 11:04:11 -07:00
Ivan Kozlovic
63c750e295 [CHANGED] Gateway: Detect duplicate names between clusters
Gateway connection will be closed and error reported if a remote
has a name that is a duplicate of the local cluster.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-15 15:00:13 -06:00
Matthias Hanel
9a2da9ed8c Adding denies $KV.>/$OBJ.> along leaf connections on differing domain (#2916)
* Adding denies $KV.>/$OBJ.> along leaf connections on differing domain

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-09 13:17:59 -05:00
Ivan Kozlovic
3538aea34e Merge pull request #2915 from nats-io/fix_atomic_unaligned
[FIXED] Panic when monitoring enabled on non 64bit architectures
2022-03-09 10:30:50 -07:00
Ivan Kozlovic
dde235a92e [FIXED] Panic when monitoring enabled on non 64bit architectures
This is due to an unaligned 64-bit atomic operation. Move the
field at top of structure with 64-bit aligned preceding fields.

Resolves #2011

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 09:29:29 -07:00
Ivan Kozlovic
85b3f8a7fd Gateways: data race when setting first ping timer
This was introduced when fixing #2881. The call to setFirstPingTimer
needed to be done under the client's lock.

Moved setFirstPingTimer from a server receiver to a client receiver.
The only reason it was a server receiver is because we need the
server options, but c.srv is always set when invoking this function,
so we will get the server from c.srv in that function now.

Related to #2881

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-04 19:55:07 -07:00
Derek Collison
ca1132a01d Allow stream placement by tags.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-15 17:07:32 -08:00
Derek Collison
a0a2e32185 Remove dynamic account behaviors.
We used these in tests and for experimenting with sandboxed environments like the demo network.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-04 13:32:18 -08:00
Derek Collison
6486cd8fc8 Added in /healthz endpoint for health and liveness probes in environments like k8s.
Currently this code returns a 200 and { "status": "ok" } iff all configured ports are open
and if JetStream is configured and we have contact with the metaleader and the cluster and all streams are up to date.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 19:30:10 -08:00
Derek Collison
89435d50b2 Merge pull request #2813 from nats-io/pull_consumer_2
Updates to Pull Consumers
2022-01-24 13:52:16 -08:00
Derek Collison
d962500827 Track reply subjects for pending pull requests across clustered consumers.
We will only send if all peers in our group are >= 2.7.1 and we will check for updates.
When a consumer follower takes over it will notify all pending requests that those requests are invalid now.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-21 16:31:59 -08:00
Jaime Piña
c82b583d7a Fix race condition in HTTP monitoring shutdown (#2805) 2022-01-20 15:29:17 -08:00
Ivan Kozlovic
29c40c874c Adding logger for IPQueue
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:14:00 -07:00
Ivan Kozlovic
92e8997506 Replaced system event queue
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:03:33 -07:00
Ivan Kozlovic
c9c603b7a0 Merge pull request #2573 from julius-welink/implement-rate-limiting
[ADDED] TLS connection rate limiter
2022-01-13 10:35:19 -07:00
Waldemar Quevedo
ce4e4b5d47 Start monitoring before JetStream
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2022-01-12 21:38:22 -08:00
Julius Žaromskis
a47e5e045c [ADDED] TLS connection rate limiter 2022-01-11 16:57:19 +02:00
Derek Collison
52da55c8c6 Implement overflow placement for JetStream streams.
This allows stream placement to overflow to adjacent clusters.
We also do more balanced placement based on resources (store or mem). We can continue to expand this as well.
We also introduce an account requirement that stream configs contain a MaxBytes value.

We now track account limits and server limits more distinctly, and do not reserver server resources based on account limits themselves.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-06 19:33:08 -08:00
Matthias Hanel
42ae3f5325 Merge pull request #2757 from nats-io/sys-acc-err
Fixed system account issue where the wrong struct got updated
2021-12-23 12:13:25 -05:00
Matthias Hanel
fe5f47f43b Fixed system account issue where the wrong struct got updated
s.fetchAccount should not be used for the system account,
 as it creates a new struct

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-22 16:18:00 -05:00
Derek Collison
b43cb5b352 Added in ability to have account limits configured in server config.
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-21 18:31:07 -08:00
Matthias Hanel
3e8b66286d Js leaf deny (#2693)
Along a leaf node connection, unless the system account is shared AND the JetStream domain name is identical, the default JetStream traffic (without a domain set) will be denied.

As a consequence, all clients that wants to access a domain that is not the one in the server they are connected to, a domain name must be specified.
Affected from this change are setups where: a leaf node had no local JetStream OR the server the leaf node connected to had no local JetStream. 
One of the two accounts that are connected via a leaf node remote, must have no JetStream enabled.
The side that does not have JetStream enabled, will loose JetStream access and it's clients must set `nats.Domain` manually.

For workarounds on how to restore the old behavior, look at:
https://github.com/nats-io/nats-server/pull/2693#issuecomment-996212582

New config values added:
`default_js_domain` is a mapping from account to domain, settable when JetStream is not enabled in an account.
`extension_hint` are hints for non clustered server to start in clustered mode (and be usable to extend)
`js_domain` is a way to set the JetStream domain to use for mqtt.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-16 16:53:20 -05:00
Ben Werthmann
d7eec1edd4 [CHANGED] Profiler: Start profile_port earlier
Enables use of pprof to investigate server startup.

Co-authored-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Ben Werthmann <ben@synadia.com>
2021-12-01 16:56:57 -05:00
Ivan Kozlovic
40c0f03153 [FIXED] Monitoring: tls configuration not updated on reload
When creating the http server, we need to provide a TLS configuration.
After a config reload, the new TLS config would not be reflected.

We had the same issue with Websocket and was fixed with the use
of tls.Config.GetConfigForClient API, which makes the TLS handshake
to ask for a TLS config. That fix for websocket was simply not applied
to the HTTPs monitoring case.

I have also fixed some flappers due to the use of localhost instead
of 127.0.0.1 (connections possibly would resolve to some IPv6 address
that the server would not accept, etc..)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-11-30 10:18:46 -07:00
Derek Collison
476c264560 If we are in a simple mixed-mode setup with just global account and system account and clustered, allow pass through.
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-26 09:41:01 -07:00
Matthias Hanel
7f1833ab1a Adding counter for number of failed logons due to pinned accounts
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-08-23 18:56:56 -04:00
Derek Collison
84ff537e66 Make sure jwt claim update does not wipe system imports
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-17 10:03:30 -07:00
Derek Collison
10167b1bcf Added in ability for normal accounts to access scoped connz info.
Added in client kind and sub type for clients.
Added in ability to filter connections based on matching subject interest.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-13 10:19:12 -07:00
Ivan Kozlovic
2939814999 Report the limit using MAX_PAYLOAD_MAX_SIZE instead of hardcoded in warning
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-08-04 16:54:31 -06:00
Ivan Kozlovic
d1365b7412 Add warning if max_payload > 8MB
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-08-04 16:52:17 -06:00
Ivan Kozlovic
4865dc7ae3 [CHANGED] Check that max_payload is not greater than max_pending
This is related to PR #2407. Since the 64MB pending size is actually
configurable, we should fail only if max_payload is greater than
the configured max_pending. This is done in validateOptions() which
covers both config file and direct options in embedded cases.
The check in opts.go is reverted to max int32 since at this point
we don't know if/what max_pending will be, so we simply check
that it is not more than a int32.

For the next minor release, we could have another change that
imposes a lower limit to max_payload (regardless if max_pending
is higher).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-08-04 16:33:21 -06:00
Derek Collison
154bc40718 Fix for reentrant read lock on a stream that once anyone else wanted the write lock would deadlock.
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-03 15:46:40 -07:00
Derek Collison
925a6fe6b2 Fix for #2388. Leafnodes with no JS can seamlessly access a HUB with JS.
This is the reverse of the early work to have LNs extend a non-JS cluster.
Also have mixed mode tests as well.

Signed-off-by: Derek Collison <derek@nats.io>
2021-08-01 14:57:47 -07:00
Matthias Hanel
a40ea298e5 [fixed] jetstream unique server name requirement across domains (#2378)
* [fixed] jetstream unique server name requirement across domains

including domain in server info
adding check for cluster name in duplicate leaf node connection check

This does not address non unique domains in the same domain, say within
super cluster.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-07-27 18:42:19 -04:00