Commit Graph

3428 Commits

Author SHA1 Message Date
Derek Collison
34555aecca Merge pull request #2761 from nats-io/fs_partial_err
Fix for when consumer would stop working due to errPartialCache returned from fileStore.
2021-12-27 12:03:31 -08:00
Derek Collison
b7c61cd0bf Stabilize filstore to eliminate sporadic errPartialCache errors under certain situations. Related to #2732
The filestore would release a msgBlock lock while trying to load a cache block if it thought it needed to flush pending data.
With async false, this should be very rare but was possible after careful inspection.

I constructed an artificial test with sleeps throughout the filestore code to reproduce.
It involved having 2 Go routines that were through and waiting on the last msg block, and another one that was writing.
After the write, but before we flushed after releasing the lock we would also artificially sleep.
This would lead to the second read seeing the cache load was already in progress and return no error.
If the load was for a sequence before the current write sequence, and async was false, the cache fseq would be higher than what was requested.
This would cause the errPartialCache to be returned.

Once returned to the consumer level in loopAndGather, it would exit that Go routine and the consumer would cease to function.

This change removed the unlock of a msgBlock to perform and flush, ensuring that two cacheLoads would not yield the errPartialCache.

I also updated the consumer in the case this does happen in the future to not exit the loopAndGather Go routine.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-27 09:54:02 -08:00
Matthias Hanel
42ae3f5325 Merge pull request #2757 from nats-io/sys-acc-err
Fixed system account issue where the wrong struct got updated
2021-12-23 12:13:25 -05:00
Derek Collison
89b94ae650 Improved selectMsgBlock with lots of messages. Also have fetchMsg return hint about clearing cache.
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-22 17:45:12 -08:00
Matthias Hanel
fe5f47f43b Fixed system account issue where the wrong struct got updated
s.fetchAccount should not be used for the system account,
 as it creates a new struct

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-22 16:18:00 -05:00
Derek Collison
91042b399f Merge pull request #2755 from nats-io/acc_config_limits
Added in ability to have account limits configured in server config.
2021-12-21 19:50:38 -08:00
Derek Collison
c4198d603c Added test to show cross account interest for push consumers works
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-21 19:30:35 -08:00
Derek Collison
b43cb5b352 Added in ability to have account limits configured in server config.
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-21 18:31:07 -08:00
Ivan Kozlovic
7c3c9ef1ee [FIXED] JetStream: stream first/last sequence possibly reset
A low-level Filestore issue would cause a new block to be created
when the last block was empty, but the index for the new block
would not be forced to be written on disk.

The observed issue could be that with a stream with a WorkQueue
retention policy, its first/last sequence values could be reset
after a pull subscriber would have consumed all messages and
the server was restarted without a clean shutdown.
This would cause the pull subscriber to "stall" until enough
new messages are sent to reach a stream sequence that catches
up with the consumer's view of the stream first sequence prior
to the restart.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-20 19:08:08 -07:00
Derek Collison
af4d7dbe52 Memory store tracked interior deletes for stream state, but under KV semantics this could be very large.
Actually faster to not track at all and generate on the fly. Saves lots of memory too.

When we update the stream state to include runs, etc will update this as well.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-20 17:37:16 -08:00
Derek Collison
490acf5f29 Full stream state with interior delete details not needed by recipient of snapshot
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-20 17:37:07 -08:00
Ivan Kozlovic
299b6b53eb [FIXED] JetStream: stream blocked recovering snapshot
If a node falled behind, when catching up with the rest of the
cluster, it is possible that a lot of append entries accumulate
and the server would print warnings such as:
```
[WRN] RAFT [jZ6RvVRH - S-R3F-CQw2ImK6] <some number> append entries pending
```
It would then continously print the following warning:
```
AppendEntry failed to be placed on internal channel
```
When that happens, this node would always be shown with be running the
same number of operations behind (using `nats s info`) if there are
no new messages added to the stream, or an increasing number of
operations if there is still activity.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-20 11:41:34 -07:00
Ivan Kozlovic
3053039ff3 [FIXED] JetStream: interest across gateways
If the interest existed prior to the initial creation of the
consumer, the gateway "watcher" would not be started, which means
that interest moving across the super-cluster after that would
not be detected.

The watcher runs every second and not sure if this is costly or
not, so we may want to go a different approach of having a separate
interest change channel that would be specific to gateways. But this
means adding a new sublist where the interest would be registered
and that sublist would need to be updated when processing GW RSub
and RUnsub?

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-16 17:20:16 -07:00
Matthias Hanel
3e8b66286d Js leaf deny (#2693)
Along a leaf node connection, unless the system account is shared AND the JetStream domain name is identical, the default JetStream traffic (without a domain set) will be denied.

As a consequence, all clients that wants to access a domain that is not the one in the server they are connected to, a domain name must be specified.
Affected from this change are setups where: a leaf node had no local JetStream OR the server the leaf node connected to had no local JetStream. 
One of the two accounts that are connected via a leaf node remote, must have no JetStream enabled.
The side that does not have JetStream enabled, will loose JetStream access and it's clients must set `nats.Domain` manually.

For workarounds on how to restore the old behavior, look at:
https://github.com/nats-io/nats-server/pull/2693#issuecomment-996212582

New config values added:
`default_js_domain` is a mapping from account to domain, settable when JetStream is not enabled in an account.
`extension_hint` are hints for non clustered server to start in clustered mode (and be usable to extend)
`js_domain` is a way to set the JetStream domain to use for mqtt.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-16 16:53:20 -05:00
Ivan Kozlovic
8e5dff3e30 [FIXED] TLS map: panic for existing user but conn type not allowed
For TLS configuration with `verify_and_map` set to true, if a
connection connects and has a certificate with ID that matches
a user, but that user's `allowed_connection_types` is specified
and does not have the connection type in its list, then the
server will panic.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-15 10:09:18 -07:00
Ivan Kozlovic
69525f3083 [FIXED] Check for no_auth_user
Check for a no_auth_user should be done only when no authentication
at all is provided by the user. This was not the case. For instance,
if the user provided a token, the server would still check for
no_auth_user if users are defined. It was not really an issue since
the admin cannot configure users AND token, but it is better for
the application to fail if providing a token that is actually not
being used. If the admin configures a no_auth_user, this should
be used only when no authentication is provided.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-14 10:00:54 -07:00
R.I.Pienaar
1146e66f30 fixes a nil panic on jsz
Appears what happens is that the getPublicConsumers()
is called which produces a list of consumers and that
between the time the list is made and the Info() is
called the ephemeral was removed.

Signed-off-by: R.I.Pienaar <rip@devco.net>
2021-12-13 11:51:33 +01:00
Matthias Hanel
628251d11d Merge pull request #2739 from nats-io/list-missing
Adding missing entry to stream/consumer list
2021-12-09 14:35:02 -05:00
Matthias Hanel
0ba2544c5a removed suffix from "missing" list
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-08 19:33:35 -05:00
Matthias Hanel
dd735f4a18 Adding missing entry to stream/consumer list
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-08 18:44:40 -05:00
Ivan Kozlovic
1b8878138a [FIXED] JetStream: panic "could not decode consumer snapshot"
Resolves #2720

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-08 12:22:03 -07:00
Ivan Kozlovic
2e07c3f614 [ADDED] MQTT: Support for Websocket
Clients will need to connect to the Websocket port and have `/mqtt`
as the URL path.

Resolves #2433

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-06 16:13:13 -07:00
Ivan Kozlovic
833f823efb [IMPROVED] Websocket: added client IP from X-Forwarded-For header
This is for cases when there is a proxy (Nginx, HAProxy, etc..)
between the client and the NATS Server. If this header is present,
the first IP is the one of the originating client and will be
used as the host/IP in server's representation of the client host.

Resolves #2514

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-06 15:00:22 -07:00
Ivan Kozlovic
f16e2f8f2a Release v2.6.6
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-02 11:44:56 -07:00
Matthias Hanel
cd3838aa14 Merge pull request #2725 from nats-io/consumer-list-err
Set incomplete error when cluster list fails
2021-12-02 13:20:41 -05:00
Matthias Hanel
aa25a2f600 Set incomplete error when cluster list fails
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-02 12:31:31 -05:00
Ben Werthmann
d7eec1edd4 [CHANGED] Profiler: Start profile_port earlier
Enables use of pprof to investigate server startup.

Co-authored-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Ben Werthmann <ben@synadia.com>
2021-12-01 16:56:57 -05:00
Ivan Kozlovic
adf974d681 Merge pull request #2721 from nats-io/bad_stream_subjects
There were situations where invalid subjects could be assigned to streams.
2021-12-01 14:19:39 -07:00
Matthias Hanel
0dc695762d Merge pull request #2722 from nats-io/stream-list-to
Aligning timeout to be shorter than 5 second cli default
2021-12-01 16:04:59 -05:00
Derek Collison
6f5263e12d Add in a warning when detecting subjects on a mirror
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-01 14:00:31 -07:00
Derek Collison
ca12a11be3 There were situations where invalid subjects could be assigned to streams.
This will patch them on the fly during recovery. Specifically subjects with leading or trailing spaces and mirror streams with any subjects at all.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-01 14:00:23 -07:00
Matthias Hanel
39a710780e Aligning timeout to be shorter than 5 second cli default
Also align stream and consumer timeouts

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-01 15:44:06 -05:00
Ivan Kozlovic
1cf8b40304 Merge pull request #2719 from nats-io/js_mem_corruption
[FIXED] Corrupted headers receiving from consumer with meta-only
2021-12-01 13:42:47 -07:00
Ivan Kozlovic
9f30bf00e0 [FIXED] Corrupted headers receiving from consumer with meta-only
When a consumer is configured with "meta-only" option, and the
stream was backed by a memory store, a memory corruption could
happen causing the application to receive corrupted headers.

Also replaced most of usage of `append(a[:0:0], a...)` to make
copies. This was based on this wiki:
https://github.com/go101/go101/wiki/How-to-efficiently-clone-a-slice%3F

But since Go 1.15, it is actually faster to call make+copy instead.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-01 10:50:15 -07:00
R.I.Pienaar
c025d25899 prevent stream update to add subjects to mirrors
Signed-off-by: R.I.Pienaar <rip@devco.net>
2021-12-01 18:12:49 +01:00
R.I.Pienaar
cf097bfab4 Merge pull request #2717 from ripienaar/stream_valid_subjects
Stream valid subjects
2021-12-01 17:43:41 +01:00
R.I.Pienaar
4f1bfa969f ensure streams have only valid interest subjects
Signed-off-by: R.I.Pienaar <rip@devco.net>
2021-12-01 17:03:28 +01:00
Matthias Hanel
581dfb27d0 hitting an account limit left an outgoing leaf node conn in bad state (#2715)
since no error was traced or the connection closed, subscriptions where
not forwarded

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-11-30 17:48:07 -05:00
Ivan Kozlovic
40c0f03153 [FIXED] Monitoring: tls configuration not updated on reload
When creating the http server, we need to provide a TLS configuration.
After a config reload, the new TLS config would not be reflected.

We had the same issue with Websocket and was fixed with the use
of tls.Config.GetConfigForClient API, which makes the TLS handshake
to ask for a TLS config. That fix for websocket was simply not applied
to the HTTPs monitoring case.

I have also fixed some flappers due to the use of localhost instead
of 127.0.0.1 (connections possibly would resolve to some IPv6 address
that the server would not accept, etc..)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-11-30 10:18:46 -07:00
Derek Collison
529095be40 [FIXED #2708] Removing a source depending on timing could cause a server panic.
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-29 12:48:08 -08:00
Derek Collison
e65f3d4a30 [FIXED #2706] - Only utilize full state with deleted details when really needed. Otherwise fast state will suffice.
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-29 10:50:28 -08:00
Ivan Kozlovic
ede8124fb2 [FIXED/CHANGED] Add leafnode websocket connection type
This was missing since WEBSOCKET allowed connection type is really
used for client connections.
If one wants to limit a configured user to leafnode connections,
including if the connection is over websocket, but does not
want an application to connect over websocket using this user,
this would have been impossible to configure.

The JWT library has been updated to add LEAFNODE_WS and MQTT_WS for
future work.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-11-22 10:32:58 -07:00
Ivan Kozlovic
6fc4c76ed1 Release v2.6.5
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-11-19 10:39:58 -07:00
Derek Collison
72ad68fada [FIXED] Bug in memstore that when setting max msgs per subject to 1 would not work properly.
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-19 09:13:43 -08:00
Derek Collison
60c48356e9 Bump version
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 15:10:59 -08:00
Derek Collison
98757253f9 Recreate client in case shutdown server was the one we were connected to
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 14:50:22 -08:00
Derek Collison
6e78bf315e Use local variable that we got under the lock
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 14:43:33 -08:00
Derek Collison
63c4c23cae Needed to undo since we already recorded
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 14:09:52 -08:00
Derek Collison
49c5c873ca Better handling of stream mismatch scenarios.
1. When a snapshot did not yield actionable data, we were not setting new last sequence if we have to readjust based on snapshot. This could lead to spinning on stream reset for followers.
2. When a stream has lots of failures by design, like KV abstraction, if we cleared the clfs state we would endlessly spin trying to reset the stream.

Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 14:00:41 -08:00
Derek Collison
7e615a1de9 Handle skip msgs better, do not update mb stats, clear erased bit
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 13:59:29 -08:00