Commit Graph

5253 Commits

Author SHA1 Message Date
Derek Collison
1a37f0963a Avoid race condition
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-29 08:26:10 -08:00
Derek Collison
bd495f3b18 Merge pull request #2764 from nats-io/issue-2742
Large number of ephemeral consumers could exhaust Go runtime's max OS threads.
2021-12-29 07:42:30 -08:00
Derek Collison
c5fbb63614 JetStream ephemeral consumers could create a situation where the server would exhaust the OS thread limit - default 10k.
Under certain situations large number of consumers that are racing to update state or delete their stores during a delete
would start taking up OS threads due to blocking disk IO. When this happened and their were a bunch of Go routines becoming
runnable the Go runtime would create extra OS threads to fill in the runnable pool and would exhaust the max thread setting.

This code places a channel as a simple semaphore to limit the number of disk IO blocking OS threads.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-29 07:05:34 -08:00
Derek Collison
36d34492cd Bump to 2.7.0-beta
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-27 12:04:39 -08:00
Derek Collison
34555aecca Merge pull request #2761 from nats-io/fs_partial_err
Fix for when consumer would stop working due to errPartialCache returned from fileStore.
2021-12-27 12:03:31 -08:00
Derek Collison
b7c61cd0bf Stabilize filstore to eliminate sporadic errPartialCache errors under certain situations. Related to #2732
The filestore would release a msgBlock lock while trying to load a cache block if it thought it needed to flush pending data.
With async false, this should be very rare but was possible after careful inspection.

I constructed an artificial test with sleeps throughout the filestore code to reproduce.
It involved having 2 Go routines that were through and waiting on the last msg block, and another one that was writing.
After the write, but before we flushed after releasing the lock we would also artificially sleep.
This would lead to the second read seeing the cache load was already in progress and return no error.
If the load was for a sequence before the current write sequence, and async was false, the cache fseq would be higher than what was requested.
This would cause the errPartialCache to be returned.

Once returned to the consumer level in loopAndGather, it would exit that Go routine and the consumer would cease to function.

This change removed the unlock of a msgBlock to perform and flush, ensuring that two cacheLoads would not yield the errPartialCache.

I also updated the consumer in the case this does happen in the future to not exit the loopAndGather Go routine.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-27 09:54:02 -08:00
Matthias Hanel
42ae3f5325 Merge pull request #2757 from nats-io/sys-acc-err
Fixed system account issue where the wrong struct got updated
2021-12-23 12:13:25 -05:00
Derek Collison
89b94ae650 Improved selectMsgBlock with lots of messages. Also have fetchMsg return hint about clearing cache.
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-22 17:45:12 -08:00
Matthias Hanel
fe5f47f43b Fixed system account issue where the wrong struct got updated
s.fetchAccount should not be used for the system account,
 as it creates a new struct

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-22 16:18:00 -05:00
Derek Collison
91042b399f Merge pull request #2755 from nats-io/acc_config_limits
Added in ability to have account limits configured in server config.
2021-12-21 19:50:38 -08:00
Derek Collison
3619241326 Merge pull request #2756 from nats-io/xacc_interest
Added test to show cross account interest for push consumers works.
2021-12-21 19:49:25 -08:00
Derek Collison
c4198d603c Added test to show cross account interest for push consumers works
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-21 19:30:35 -08:00
Derek Collison
b43cb5b352 Added in ability to have account limits configured in server config.
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-21 18:31:07 -08:00
Ivan Kozlovic
8203a083d6 Merge pull request #2753 from nats-io/fs_remove_last
[FIXED] JetStream: stream first/last sequence possibly reset
2021-12-21 09:15:55 -07:00
Ivan Kozlovic
7c3c9ef1ee [FIXED] JetStream: stream first/last sequence possibly reset
A low-level Filestore issue would cause a new block to be created
when the last block was empty, but the index for the new block
would not be forced to be written on disk.

The observed issue could be that with a stream with a WorkQueue
retention policy, its first/last sequence values could be reset
after a pull subscriber would have consumed all messages and
the server was restarted without a clean shutdown.
This would cause the pull subscriber to "stall" until enough
new messages are sent to reach a stream sequence that catches
up with the consumer's view of the stream first sequence prior
to the restart.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-20 19:08:08 -07:00
Derek Collison
0968a94265 Merge pull request #2752 from nats-io/memstore_lid
Memstore tracking of interior deletes improved.
2021-12-20 18:06:06 -08:00
Derek Collison
af4d7dbe52 Memory store tracked interior deletes for stream state, but under KV semantics this could be very large.
Actually faster to not track at all and generate on the fly. Saves lots of memory too.

When we update the stream state to include runs, etc will update this as well.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-20 17:37:16 -08:00
Derek Collison
490acf5f29 Full stream state with interior delete details not needed by recipient of snapshot
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-20 17:37:07 -08:00
Ivan Kozlovic
4d71296cd7 Merge pull request #2751 from nats-io/js_raft_apply_commit_error
[FIXED] JetStream: stream blocked recovering snapshot
2021-12-20 12:06:37 -07:00
Ivan Kozlovic
299b6b53eb [FIXED] JetStream: stream blocked recovering snapshot
If a node falled behind, when catching up with the rest of the
cluster, it is possible that a lot of append entries accumulate
and the server would print warnings such as:
```
[WRN] RAFT [jZ6RvVRH - S-R3F-CQw2ImK6] <some number> append entries pending
```
It would then continously print the following warning:
```
AppendEntry failed to be placed on internal channel
```
When that happens, this node would always be shown with be running the
same number of operations behind (using `nats s info`) if there are
no new messages added to the stream, or an increasing number of
operations if there is still activity.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-20 11:41:34 -07:00
Ivan Kozlovic
6810e48874 Merge pull request #2750 from nats-io/fix_2745
[FIXED] JetStream: interest across gateways
2021-12-20 09:45:50 -07:00
Ivan Kozlovic
3053039ff3 [FIXED] JetStream: interest across gateways
If the interest existed prior to the initial creation of the
consumer, the gateway "watcher" would not be started, which means
that interest moving across the super-cluster after that would
not be detected.

The watcher runs every second and not sure if this is costly or
not, so we may want to go a different approach of having a separate
interest change channel that would be specific to gateways. But this
means adding a new sublist where the interest would be registered
and that sublist would need to be updated when processing GW RSub
and RUnsub?

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-16 17:20:16 -07:00
Matthias Hanel
3e8b66286d Js leaf deny (#2693)
Along a leaf node connection, unless the system account is shared AND the JetStream domain name is identical, the default JetStream traffic (without a domain set) will be denied.

As a consequence, all clients that wants to access a domain that is not the one in the server they are connected to, a domain name must be specified.
Affected from this change are setups where: a leaf node had no local JetStream OR the server the leaf node connected to had no local JetStream. 
One of the two accounts that are connected via a leaf node remote, must have no JetStream enabled.
The side that does not have JetStream enabled, will loose JetStream access and it's clients must set `nats.Domain` manually.

For workarounds on how to restore the old behavior, look at:
https://github.com/nats-io/nats-server/pull/2693#issuecomment-996212582

New config values added:
`default_js_domain` is a mapping from account to domain, settable when JetStream is not enabled in an account.
`extension_hint` are hints for non clustered server to start in clustered mode (and be usable to extend)
`js_domain` is a way to set the JetStream domain to use for mqtt.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-16 16:53:20 -05:00
Ivan Kozlovic
575bb4eee0 Merge pull request #2747 from nats-io/fix_tls_map_check
[FIXED] TLS map: panic for existing user but conn type not allowed
2021-12-15 12:15:32 -07:00
Ivan Kozlovic
8e5dff3e30 [FIXED] TLS map: panic for existing user but conn type not allowed
For TLS configuration with `verify_and_map` set to true, if a
connection connects and has a certificate with ID that matches
a user, but that user's `allowed_connection_types` is specified
and does not have the connection type in its list, then the
server will panic.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-15 10:09:18 -07:00
Ivan Kozlovic
ad4e14ffb0 Merge pull request #2744 from nats-io/fix_no_auth_check
[FIXED] Check for no_auth_user
2021-12-14 16:13:23 -07:00
Ivan Kozlovic
69525f3083 [FIXED] Check for no_auth_user
Check for a no_auth_user should be done only when no authentication
at all is provided by the user. This was not the case. For instance,
if the user provided a token, the server would still check for
no_auth_user if users are defined. It was not really an issue since
the admin cannot configure users AND token, but it is better for
the application to fail if providing a token that is actually not
being used. If the admin configures a no_auth_user, this should
be used only when no authentication is provided.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-14 10:00:54 -07:00
R.I.Pienaar
de3e7cab50 Merge pull request #2743 from ripienaar/jsz_panic
fixes a nil panic on jsz
2021-12-13 18:11:41 +01:00
R.I.Pienaar
1146e66f30 fixes a nil panic on jsz
Appears what happens is that the getPublicConsumers()
is called which produces a list of consumers and that
between the time the list is made and the Info() is
called the ephemeral was removed.

Signed-off-by: R.I.Pienaar <rip@devco.net>
2021-12-13 11:51:33 +01:00
Matthias Hanel
628251d11d Merge pull request #2739 from nats-io/list-missing
Adding missing entry to stream/consumer list
2021-12-09 14:35:02 -05:00
Matthias Hanel
0ba2544c5a removed suffix from "missing" list
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-08 19:33:35 -05:00
Ivan Kozlovic
be066b7a21 Merge pull request #2738 from nats-io/fix_2720
[FIXED] JetStream: panic "could not decode consumer snapshot"
2021-12-08 17:16:51 -07:00
Matthias Hanel
dd735f4a18 Adding missing entry to stream/consumer list
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-08 18:44:40 -05:00
Ivan Kozlovic
1b8878138a [FIXED] JetStream: panic "could not decode consumer snapshot"
Resolves #2720

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-08 12:22:03 -07:00
Ivan Kozlovic
f55ee21941 Merge pull request #2735 from nats-io/mqtt_ws
[ADDED] MQTT: Support for Websocket
2021-12-07 09:09:27 -07:00
Ivan Kozlovic
2e07c3f614 [ADDED] MQTT: Support for Websocket
Clients will need to connect to the Websocket port and have `/mqtt`
as the URL path.

Resolves #2433

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-06 16:13:13 -07:00
Ivan Kozlovic
67c345270c Merge pull request #2734 from nats-io/fix_2514
[IMPROVED] Websocket: added client IP from X-Forwarded-For header
2021-12-06 16:11:17 -07:00
Ivan Kozlovic
833f823efb [IMPROVED] Websocket: added client IP from X-Forwarded-For header
This is for cases when there is a proxy (Nginx, HAProxy, etc..)
between the client and the NATS Server. If this header is present,
the first IP is the one of the originating client and will be
used as the host/IP in server's representation of the client host.

Resolves #2514

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-06 15:00:22 -07:00
Ivan Kozlovic
893b415434 Merge pull request #2727 from nats-io/update_crypto_rev
[UPDATED] golang.org/x/crypto dependency
2021-12-03 12:49:37 -07:00
Ivan Kozlovic
cbfa93e4a8 [UPDATED] golang.org/x/crypto dependency
They just released some fix that is not affecting the NATS Server
but could cause some security vulnerability reports.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-02 13:48:21 -07:00
Ivan Kozlovic
878afadcf0 Merge pull request #2726 from nats-io/release_2_6_6
Release v2.6.6
2021-12-02 12:14:53 -07:00
Ivan Kozlovic
f16e2f8f2a Release v2.6.6
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-02 11:44:56 -07:00
Matthias Hanel
cd3838aa14 Merge pull request #2725 from nats-io/consumer-list-err
Set incomplete error when cluster list fails
2021-12-02 13:20:41 -05:00
Matthias Hanel
aa25a2f600 Set incomplete error when cluster list fails
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-02 12:31:31 -05:00
Ivan Kozlovic
97c8214bbc Merge pull request #2724 from Shoothzj/update-docs-link
Update documentation link
2021-12-02 09:54:47 -07:00
Shoothzj
c585f808ab Update documentation link 2021-12-02 16:16:43 +08:00
Ivan Kozlovic
4464ba9bf3 Merge pull request #2723 from nats-io/early-profile-start
[CHANGED] Profiler: Start profile_port earlier
2021-12-01 15:22:05 -07:00
Ben Werthmann
d7eec1edd4 [CHANGED] Profiler: Start profile_port earlier
Enables use of pprof to investigate server startup.

Co-authored-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Ben Werthmann <ben@synadia.com>
2021-12-01 16:56:57 -05:00
Ivan Kozlovic
adf974d681 Merge pull request #2721 from nats-io/bad_stream_subjects
There were situations where invalid subjects could be assigned to streams.
2021-12-01 14:19:39 -07:00
Matthias Hanel
0dc695762d Merge pull request #2722 from nats-io/stream-list-to
Aligning timeout to be shorter than 5 second cli default
2021-12-01 16:04:59 -05:00