Commit Graph

82 Commits

Author SHA1 Message Date
Phil Pennock
fc6df0fbbc Redact URLs before logging or returning in error (#2643)
* Redact URLs before logging or returning in error

This does not affect strings which failed to parse, and in such a scenario
there's a mix of "which evil" to accept; we can't sanely find what should be
redacted in those cases, so we leave them alone for debugging.

The JWT library returns some errors for Operator URLs, but it rejects URLs
which contain userinfo, so there can't be passwords in those and they're safe.

Fixes #2597

* Test the URL redaction auxiliary functions

* End-to-end tests for secrets in debug/trace

Create internal/testhelper and move DummyLogger there, so it can be used from
the test/ sub-dir too.

Let DummyLogger optionally accumulate all log messages, not just retain the
last-seen message.

Confirm no passwords logged by TestLeafNodeBasicAuthFailover.

Change TestNoPasswordsFromConnectTrace to check all trace messages, not just the
most recent.

Validate existing trace redaction in TestRouteToSelf.

* Test for password in solicited route reconnect debug
2021-10-27 12:44:59 -04:00
David Simner
31814aa169 Update test 2021-10-19 12:39:08 +02:00
Ivan Kozlovic
a025ce7472 Set defaultServerOptions port to -1 for random
Updated some tests based on this change but also missing defer
connection close or server shutdown.

Fixed how the OCSP run go routine would shutdown, which would
never complete because grWG was not decremented by this go routine
prior to invoking s.Shutdown()

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-09-02 14:22:56 -06:00
Derek Collison
476c264560 If we are in a simple mixed-mode setup with just global account and system account and clustered, allow pass through.
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-26 09:41:01 -07:00
Matthias Hanel
41a253dabb fix daisy chained leaf node subject propagation issue. (#2468)
fixes #2448 

initLeafNodeSmapAndSendSubs did not pick up enough local subscriptions.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-08-25 18:10:09 -04:00
Derek Collison
5523a337d3 Fix for test
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-25 11:45:07 -07:00
David Laban
1d5cc21c3c make error actionable when adding operator+leafnodes
There are many examples in the documentation for one half of this configuration or the other,
but none which configure a leafnode remote on an operator-authenticated cluster.

The error "operator mode requires account nkeys in remotes." is not very clear or actionable.
2021-08-18 18:07:53 +01:00
Ivan Kozlovic
d7a124baaf [FIXED] LeafNode with "wss://.." url was not always initiating TLS
If the remote did not have any TLS configuration, the URL scheme
"wss://" was not used as the indicating that the connection should
be attempted as a TLS connection, causing "invalid websocket connection"
in the server attempting to create the remote leafnode connection.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-08-15 12:39:10 -06:00
Derek Collison
944dd248c4 Fix for tests
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-14 17:39:51 -07:00
Matthias Hanel
d39822e1d0 added check assuring subscription made it to other end of leaf connection
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-08-12 12:57:50 -04:00
Matthias Hanel
db9fd45be2 [fixed] issue where js overwrote leafnode remotes permissions from creds
Fixes #2415. We did a set instead of merge.
changes in `jwt_test.go` are to make the `createUserWithLimit` usable by my new test.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-08-12 12:57:50 -04:00
Matthias Hanel
a40ea298e5 [fixed] jetstream unique server name requirement across domains (#2378)
* [fixed] jetstream unique server name requirement across domains

including domain in server info
adding check for cluster name in duplicate leaf node connection check

This does not address non unique domains in the same domain, say within
super cluster.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-07-27 18:42:19 -04:00
Matthias Hanel
9f6ba90d3e [fixing] leafnode missing retry and service export interest propagation (#2288)
* [fixing] leafnode missing retry and service export interest propagation

A missing account on initial connect attempt caused the leaf node
connection to never be established.

An account service import subscription was not propagated along leaf
node connections.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-06-17 19:10:05 -04:00
Ivan Kozlovic
c45f4f0353 [FIXED] LeafNode config reload failed without any change made
Issuing a configuration reload for a leafnode that has remotes
defined with remotes having more than 1 url could lead to a failure.
This is because we have introduced shuffling of remote urls but
that was done in the server's options object, which then would
cause the DeepEqual when diff'ing options to fail.
We move the suffling to the private list of urls.

The other issue was that the "old" remote option may not have
had a local account and it was not set to "$G", which could make
the DeepEqual fail.

Resolves #2273

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-06-09 16:12:39 -06:00
Derek Collison
50d5875aa3 Fix test
Signed-off-by: Derek Collison <derek@nats.io>
2021-05-06 18:46:32 -06:00
Ivan Kozlovic
f5eb8bef89 Fixed some tests to manually close account resolver
Those tests don't really start the server, so the account resolver's
internal expiration routine would be left running.
Doing an explicit close solves this issue.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-05-06 18:46:32 -06:00
Ivan Kozlovic
e2e3de9977 [FIXED] Message loop with cluster, leaf nodes and queue subs
In a setup with a cluster of servers to which 2 different leaf nodes
attach to, and queue subs are attached to one of the leaf, if the
leaf server is restarted and reconnects to another server in the
cluster, there was a risk for an infinite message loop between
some servers in the "hub" cluster.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-04-28 17:11:51 -06:00
Matthias Hanel
a67704e245 [fixed] crash when using nats-resolver without system account (#2162)
* [fixed] crash when using nats-resolver without system account

Fixes #2160
Will raise an error instead

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-04-26 20:50:56 -04:00
Jaime Piña
4d04f281fc Randomize leafnode route URLs and add option to disable 2021-04-23 14:59:15 -07:00
Ivan Kozlovic
1014041be3 [FIXED] Possible panic due to concurrent access to unlocked map
This could happen when a leafnode has permissions set and another
connection (client, etc..) is about to assign a message to the
leafnode while the leafnode itself is receiving messages and they
both check permissions at the same time.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-04-20 21:18:13 -06:00
Ivan Kozlovic
6e1205b660 Cleanup some tests + GetTLSConnectionState() race fix
Missing defers

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-04-15 11:37:43 -06:00
Ivan Kozlovic
c369a26c03 Fixed leafnode flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-04-12 09:31:33 -06:00
Jaime Piña
27e9628c3a Run gofmt -s to simplify code 2021-04-09 15:18:06 -07:00
Ivan Kozlovic
452685b9b1 [FIXED] LeafNode: set first ping timer after receiving CONNECT
We were setting the ping timer in the accepting server as soon
as the leafnode connection is created, just after sending
the INFO and setting the auth timer.

Sending a PING too soon may cause the solicit side to process
this PING and send a PONG in response, possibly before sending
the CONNECT, which the accepting side would fail as an authentication
error, since first protocol is expected to be a CONNECT.

Since LeafNode always expect a CONNECT, we always set the auth
timer. So now on accept, instead of starting the ping timer just
after sending the INFO, we will delay setting this timer only
after receiving the CONNECT.

The auth timer will take care of a stale connection in the time
it takes to receives the CONNECT.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-04-08 14:36:39 -06:00
Jaime Piña
d929ee1348 Check errors when removing test directories and files
Currently in tests, we have calls to os.Remove and os.RemoveAll where we
don't check the returned error. This hides useful error messages when
tests fail to run, such as "too many open files".

This change checks for more filesystem related errors and calls t.Fatal
if there is an error.
2021-04-07 11:09:47 -07:00
Jaime Piña
e44275b963 Consolidate temporary test files and directories
Currently, temporary test files and directories are written in lots of
different paths within the OS's temp dir. This makes it hard to know
which files are from nats-server and which are unrelated. This in turn
makes it hard to clean up nats-server test files.
2021-04-06 10:42:55 -07:00
Ivan Kozlovic
21a9bfa1d8 [FIXED] Leafnode: incorrect loop detection in multi-cluster setup
If leafnodes from a cluster were to reconnect to a server in
a different cluster, it was possible for that server to send
to the leafnodes some their own subscriptions that could cause
an inproper loop detection error.

There was also a defect that would cause subscriptions over route
for leafnode subscriptions to be registered under the wrong key,
which would lead to those subscriptions not being properly removed
on route disconnect.

Finally, during route disconnect, the leafnodes map was not updated.
This PR fixes that too.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-04-05 16:49:37 -06:00
Matthias Hanel
9f753a2475 [fixed] issue where verify_and_map: true in leaf node config was not used (#2038)
* [fixed] issue where verify_and_map: true in leaf node config was not used

This broke the setup in such a way that any connect relying on this would have failed.
This also fixes an issue where specifying no account did not result in using $G.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-03-26 19:24:01 -04:00
Ivan Kozlovic
b17f38e356 [FIXED] Websocket: do not generate empty frames + LN corruption
- It was possible that when the server was sending frames to a
webbrowser, it would send empty frames. While technically not wrong,
prevent that from happening.
- Not copying enqueued buffers could cause corruption with LN+WS.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-03-26 16:17:46 -06:00
Ivan Kozlovic
eafc6b7a25 [fixed] LeafNode sending message using stream's import subject.
A publish on "a" becomes an LMSG on ">" which
is the stream import's subject. The subscriber on "a" on the other
side did not receive the message.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-02-19 00:11:41 -05:00
Ivan Kozlovic
ac0a1ee8fd Fixed compression http header request/response
The issue was introduced by PR #1858.

Key points:

- Sec-WebSocket-Extensions must contain approved headers, so moving
the "no-masking" private extension to its own header "Nats-No-Masking".

- The format of the permessage-deflate negotiation response became
invalid, I have fixed that.

- For leaf nodes, if `permessage-deflate` extension is not at all
present in the response, then simply disable compression, however
if it is present but there is no server/client no context take over,
then we have to fail the connection.

- A leafnode test was not setting the "NoMasking" option so the
test TestLeafNodeWSNoMaskingRejected was not capturing possible
error if negotiation failed.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-02-01 12:10:37 -07:00
Ivan Kozlovic
9587bf8cd4 Changed option to make masking the default and option to disable it
This will allow a better experience if there is a load balancer
in between and expects websocket frames to be masked.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-29 11:22:22 -07:00
Ivan Kozlovic
2b8c6e0124 Support for Websocket Leafnode connections
Added two options in the remote leaf node configuration

- compress, for websocket only at the moment
- ws_masking, to force remote leafnode connections to mask websocket
frames (default is no masking since it is communication between
server to server)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-28 13:13:11 -07:00
Ivan Kozlovic
131be1cb33 Make TLS client/server handshake helpers function
This reduces code duplication

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-28 13:13:11 -07:00
Ivan Kozlovic
9716aa8b4c Merge pull request #1846 from nats-io/ln_save_tls_name
[FIXED] LeafNode: save hostname that may be used during TLS handshake
2021-01-26 14:51:11 -07:00
Ivan Kozlovic
af57f55738 Fixing some flappers (leafnode and mqtt)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-26 14:23:49 -07:00
Ivan Kozlovic
6666f5aa43 [FIXED] LeafNode: save hostname that may be used during TLS handshake
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-26 12:10:57 -07:00
Ivan Kozlovic
0d78bce9cf Fixed some leafnode issues introduced from JS cluster work
Also fixed a flapper.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-15 12:00:34 -07:00
Derek Collison
f0cdf89c61 JetStream Clustering WIP
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 01:14:52 -08:00
Ivan Kozlovic
d5f255b98e Merge pull request #1771 from nats-io/gw_ln_tls_config_reload
[FIXED] Config reload for gateways/leaf remote TLS configurations
2020-12-12 10:51:52 -07:00
Ivan Kozlovic
2d2f85267b Add fix for TestLeafNodeLoop and others
Based on timing, it is possible that the first error is about
connection refused as opposed to "Loop detected". So use a dedicated
logger to notify only when expected error is found.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-12-11 18:15:49 -07:00
Ivan Kozlovic
fc1521636c [FIXED] Config reload for gateways/leaf remote TLS configurations
Presence of TLS config in any remote gateway or leafnode would
cause the config reload to fail (because TLS config internal
content may change which fails the DeepEqual check).

This PR excludes the TLS configs in such case to check for
changes in gateways and leafnodes.

Although GW and LN config reload is technically supported, this
PR updates the internal remotes' TLS configuration so that
changes/updates to TLS certificates would take effect after
a configuration reload.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-12-11 16:56:25 -07:00
Ivan Kozlovic
77aead807c Send LS- without origin to route
When cluster origin code was added, a server may send LS+ with
an origin cluster name in the protocol. Parsing code from a ROUTER
connection was adjusted to understand this LS+ protocol.
However, the server was also sending an LS- with origin but the
parsing code was not able to understand that. When the unsub was
for a queue subscription, this would cause the parser to error out
and close the route connection.

This PR sends an LS- without the origin in this case (so that tracing
makes sense in term of LS+/LS- sent to a route). The receiving side
then traces appropriate LS- but processes as a normal RS-.

Resolves #1751

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-11-30 13:31:32 -07:00
Ivan Kozlovic
88475014ef [FIXED] Split LMSG across routes
If a LeafNode message is sent across a route, and the message
does not fit in the buffer, the parser would incorrectly process
the "pub args" as if it was a ROUTED message, not a LEAF message.
This caused clonePubArg() to return an error that would cause
the parser to end with a protocol violation.

Keep track that we are processing an LMSG so that we can pass
that information to clonePubArg() and do proper parsing in split
scenario.

Resolves #1743

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-11-24 14:55:50 -07:00
Ivan Kozlovic
120b031ffd Merge pull request #1739 from nats-io/leaf-warning
[Added] account name checks for leaf nodes in operator mode
2020-11-24 12:35:31 -07:00
Matthias Hanel
a8390b7432 Incorporating comments and moving code
Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-11-23 23:27:44 -05:00
Matthias Hanel
e2e69b6daf [Added] account name checks for leaf nodes in operator mode
Rules out implausible ones.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-11-23 15:38:41 -05:00
Ivan Kozlovic
f155c75da7 [FIXED] LeafNode reject duplicate remote
There was a test to prevent an errorneous loop detection when a
remote would reconnect (due to a stale connection) while the accepting
side did not detect the bad connection yet.

However, this test was racy because the test was done prior to add
the connections to the map.

In the case of a misconfiguration where the remote creates 2 different
remote connections that end-up binding to the same account in the
accepting side, then it was possible that this would not be detected.
And when it was, the remote side would be unaware since the disconnect/
reconnect attempts would not show up if not running in debug mode.

This change makes sure that the detection is no longer racy and returns
an error to the remote so at least the log/console of the remote will
show the "duplicate connection" error messages.

Resolves #1730

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-11-23 13:28:18 -07:00
Ivan Kozlovic
55b0f8d855 [FIXED] LeafNode: duplicate queue messages in complex routing setup
Suppose a cluster of 2 servers, let's call them leaf1 and leaf2.
These servers are routed and have a leaf connection to another
server, let's call it srv1.
They share the same cluster name.

If a queue subscriber runs on srv1 and a queue subscriber on the
same subject/group name runs on leaf1, if a requestor runs on
leaf2, the request should reach only one of the 2 queue subs.

The defect was that sometimes both queue subs would receive the
message.

The added test checks that only one reply is ever received and
that the local "leaf" cluster is preferred.

Resolves #1722

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-11-18 11:23:08 -07:00
Ivan Kozlovic
2605ae71ed [FIXED] Prevent LeafNode loop detection on early reconnect
If the soliciting side detects the disconnect and attempts to
reconnect but the accepting side did not yet close the connection,
a "loop detected" error would be reported and the soliciting server
would not try to reconnect for 30 seconds.

Made a change so that the accepting server checks for existing
leafnode connection for the same server and same account, and if
it is found, close the "old" connection so it is replaced by
the "new" one.

Resolves #1606

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-09-22 16:58:36 -06:00