Commit Graph

141 Commits

Author SHA1 Message Date
Waldemar Quevedo
8b7dfe7d74 monitoring: track slow consumers per connection type
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-09 05:57:42 -07:00
Ivan Kozlovic
d6fe9d4c2d [ADDED] Support for route S2 compression
The new field `compression` in the `cluster{}` block allows to
specify which compression mode to use between servers.

It can be simply specified as a boolean or a string for the
simple modes, or as an object for the "s2_auto" mode where
a list of RTT thresholds can be specified.

By default, if no compression field is specified, the server
will use the s2_auto mode with default RTT thresholds of
10ms, 50ms and 100ms for the "uncompressed", "fast", "better"
and "best" modes.

```
cluster {
..
  # Possible values are "disabled", "off", "enabled", "on",
  # "accept", "s2_fast", "s2_better", "s2_best" or "s2_auto"
  compression: s2_fast
}
```

To specify a different list of thresholds for the s2_auto,
here is how it would look like:
```
cluster {
..
  compression: {
    mode: s2_auto
    # This means that for RTT up to 5ms (included), then
    # the compression level will be "uncompressed", then
    # from 5ms+ to 15ms, the mode will switch to "s2_fast",
    # then from 15ms+ to 50ms, the level will switch to
    # "s2_better", and anything above 50ms will result
    # in the "s2_best" compression mode.
    rtt_thresholds: [5ms, 15ms, 50ms]
  }
}
```

Note that the "accept" mode means that a server will accept
compression from a remote and switch to that same compression
mode, but will otherwise not initiate compression. That is,
if 2 servers are configured with "accept", then compression
will actually be "off". If one of the server had say s2_fast
then they would both use this mode.

If a server has compression mode set (other than "off") but
connects to an older server, there will be no compression between
those 2 routes.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-27 17:59:25 -06:00
Derek Collison
4ebdb69daf Merge branch 'main' into dev 2023-04-26 11:34:37 -07:00
Neil Twigg
2206f9e468 Re-add coalescing to outbound queues
Originally I thought there was a race condition happening here,
but it turns out it is safe after all and the race condition I
was seeing was due to other problems in the WebSocket code.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-04-25 12:15:11 +01:00
Derek Collison
1de9a1cf3b Merge branch 'main' into dev 2023-04-21 14:09:35 -07:00
Neil Twigg
5f884349db Remove TestClientOutboundQueueCoalesce as no longer needed
Signed-off-by: Neil Twigg <neil@nats.io>
2023-04-21 15:40:49 +01:00
Ivan Kozlovic
fe5d6bede4 Fixed accounts configuration reload
Issues could manifest with subscription interest not properly
propagated.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-03 09:32:28 -06:00
Neil Twigg
f2bffec366 Refactor outbound queues, remove dynamic sizing, add buffer reuse
Also try to reduce flakiness of `TestClusterQueueSubs` and `TestCrossAccountServiceResponseTypes`
2023-03-15 09:37:40 +00:00
Neil Twigg
e4b6ba2f23 Refactor outbound queues, remove dynamic sizing, add buffer reuse
Also try to reduce flakiness of `TestClusterQueueSubs` and `TestCrossAccountServiceResponseTypes`
2023-01-09 09:35:22 +00:00
Derek Collison
3877ee2411 Merge branch 'main' into dev 2022-12-13 13:08:35 -08:00
Marco Primi
f8a030bc4a Use testing.TempDir() where possible
Refactor tests to use go built-in temporary directory utility for tests.

Also avoid binding to default port (which may be in use)
2022-12-12 13:18:44 -08:00
Derek Collison
d2f1b04d34 Add in user info requests to have connected users get info for bound account and permissions.
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-27 18:16:02 -08:00
Derek Collison
06bab2c4de If no_auth_user is set, clear auth required for server info.
Signed-off-by: Derek Collison <derek@nats.io>
2022-11-21 20:26:54 -08:00
Derek Collison
5690059dac Reserve a system queue group
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-06 13:16:13 -07:00
Derek Collison
e6479dafd2 Close leafnode connection when same cluster name detected
Signed-off-by: Derek Collison <derek@nats.io>
2022-06-30 15:34:22 -07:00
Ivan Kozlovic
27d1a50b35 [FIXED] A slow consumer could cause the publisher to block
The server reads data from a client from a go routine. When receiving
messages, it checks for matching subscriptions, and if found, would
send those messages from the producer's readLoop.
A notion of "budget" was used to make sure the server does not spend
too much time sending to clients from the producer's readLoop, however,
regardless of how small the budget was, if one of the subscription's
connection TCP buffer was full, a TCP write would block for as long
as the defined write_deadline (which is now 10 seconds).

We are removing this behavior and therefore clients (like it was the
case for other type of connections) will now always notify the
subscriber's writeLoop that data is ready to be sent, but the send
will not occur in the producer's writeLoop.

Resolves #2679

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-11-09 17:22:15 -07:00
Phil Pennock
fc6df0fbbc Redact URLs before logging or returning in error (#2643)
* Redact URLs before logging or returning in error

This does not affect strings which failed to parse, and in such a scenario
there's a mix of "which evil" to accept; we can't sanely find what should be
redacted in those cases, so we leave them alone for debugging.

The JWT library returns some errors for Operator URLs, but it rejects URLs
which contain userinfo, so there can't be passwords in those and they're safe.

Fixes #2597

* Test the URL redaction auxiliary functions

* End-to-end tests for secrets in debug/trace

Create internal/testhelper and move DummyLogger there, so it can be used from
the test/ sub-dir too.

Let DummyLogger optionally accumulate all log messages, not just retain the
last-seen message.

Confirm no passwords logged by TestLeafNodeBasicAuthFailover.

Change TestNoPasswordsFromConnectTrace to check all trace messages, not just the
most recent.

Validate existing trace redaction in TestRouteToSelf.

* Test for password in solicited route reconnect debug
2021-10-27 12:44:59 -04:00
Ivan Kozlovic
a025ce7472 Set defaultServerOptions port to -1 for random
Updated some tests based on this change but also missing defer
connection close or server shutdown.

Fixed how the OCSP run go routine would shutdown, which would
never complete because grWG was not decremented by this go routine
prior to invoking s.Shutdown()

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-09-02 14:22:56 -06:00
Matthias Hanel
c68ffe5ad5 [adding] kind and client_type to account connect/disconnect events (#2351)
* [adding] kind and client_type to client info. specifically account connect/disconnect events

Kind is Client/Leafnode but can take the value of Router/Gateway/JetStream/Account/System in the future.
When kind is Client, then client_type is set to mqtt/websocket/nats
This fixes #2291

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-07-07 17:43:50 -04:00
Ivan Kozlovic
1d3cddfa7c [CHANGED] Reduce print for an account subs limit to every 2 sec
We could make it for all limits by having a map of error types
instead of applying just to max subs.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-06-22 11:00:41 -06:00
R.I.Pienaar
5e06e5e232 Export the clientOpts structure
This structure is used in ClientAuthentication, an interface
designed to let 3rd parties extend the authentication mechanisms
of the server

In order to allow those 3rd parties to create unit tests, mocks etc
we need to export this structure so it's accessible externally

Signed-off-by: R.I.Pienaar <rip@devco.net>
2021-05-07 15:51:31 +02:00
alexpantyukhin
84884a93b5 put typestring to map and add tests 2021-04-05 22:03:14 +04:00
Ivan Kozlovic
41fac39f8e Split createClient() into versions for normal, WS and MQTT clients.
This duplicate quite a bit of code, but reduces the conditionals
in the createClient() function.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-12-02 13:50:50 -07:00
Ivan Kozlovic
1dba6418ed [ADDED] MQTT Support
This PR introduces native support for MQTT clients. It requires use
of accounts with JetStream enabled. Since as of now clustering is
not available, MQTT will be limited to single instance.

Only QoS 0 and 1 are supported at the moment. MQTT clients can
exchange messages with NATS clients and vice-versa.

Since JetStream is required, accounts with JetStream enabled must
exist in order for an MQTT client to connect to the NATS Server.
The administrator can limit the users that can use MQTT with the
allowed_connection_types option in the user section. For instance:
```
accounts {
  mqtt {
    users [
      {user: all, password: pwd, allowed_connection_types: ["STANDARD", "WEBSOCKET", "MQTT"]}
      {user: mqtt_only, password: pwd, allowed_connection_types: "MQTT"}
    ]
    jetstream: enabled
  }
}
```
The "mqtt_only" can only be used for MQTT connections, which the user
"all" accepts standard, websocket and MQTT clients.

Here is what a configuration to enable MQTT looks like:
```
mqtt {
  # Specify a host and port to listen for websocket connections
  #
  # listen: "host:port"

  # It can also be configured with individual parameters,
  # namely host and port.
  #
  # host: "hostname"
  port: 1883

  # TLS configuration section
  #
  # tls {
  #  cert_file: "/path/to/cert.pem"
  #  key_file: "/path/to/key.pem"
  #  ca_file: "/path/to/ca.pem"
  #
  #  # Time allowed for the TLS handshake to complete
  #  timeout: 2.0
  #
  #  # Takes the user name from the certificate
  #  #
  #  # verify_an_map: true
  #}

  # Authentication override. Here are possible options.
  #
  # authorization {
  #   # Simple username/password
  #   #
  #   user: "some_user_name"
  #   password: "some_password"
  #
  #   # Token. The server will check the MQTT's password in the connect
  #   # protocol against this token.
  #   #
  #   # token: "some_token"
  #
  #   # Time allowed for the client to send the MQTT connect protocol
  #   # after the TCP connection is established.
  #   #
  #   timeout: 2.0
  #}

  # If an MQTT client connects and does not provide a username/password and
  # this option is set, the server will use this client (and therefore account).
  #
  # no_auth_user: "some_user_name"

  # This is the time after which the server will redeliver a QoS 1 message
  # sent to a subscription that has not acknowledged (PUBACK) the message.
  # The default is 30 seconds.
  #
  # ack_wait: "1m"

  # This limits the number of QoS1 messages sent to a session without receiving
  # acknowledgement (PUBACK) from that session. MQTT specification defines
  # a packet identifier as an unsigned int 16, which means that the maximum
  # value is 65535. The default value is 1024.
  #
  # max_ack_pending: 100
}
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-11-30 20:08:44 -07:00
Ivan Kozlovic
bea9fca24c Prevent panic when accepting TLS leafnode connections
This is an addition to PR #1652. I have simply added a check but
at this point in time there is no risk that connection is closed
this early.
I also renamed the small helper function and fixed a test that
had an improper `s.mu.Unlock()` in an error condition.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-10-21 14:53:03 -06:00
Ivan Kozlovic
0c804f5ffb Moving TestQueueAutoUnsubscribe to norace_test.go
This test has been found to cause TestAccountNATSResolverFetch to
fail on macOS. We did not find the exact reason yet, but it seem
that with `-race`, the queue auto-unsub test (that creates 2,000
queue subs and sends 1,000 messages) cause mem to grow to 256MB
(which we know -race is memory hungry) and that may be causing
interactions with the account resolver test.

For now, moving it to norace_test.go, which consumes much less
memory (25MB) and anyway is a better place since it would stress
better the "races" of having a queue sub being unsubscribed while
messages were inflight to this queue sub.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-09-29 18:06:16 -06:00
Matthias Hanel
3d2b65228a [Fixed] race condition where account conns timer was disabled too soon
The connection count sent and the connection count used to determine if
the timer should be disabled could differ.

Also fixed issues in unit test triggering this behavior.
It did not check if remote connections where set to 0 prior to doing
more tests.

Fixes #1613

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-09-24 18:49:32 -04:00
Ivan Kozlovic
12d84c646c Merge pull request #1535 from harrisa1/improveLogging
[CHANGED] add client provided info into server side client logs when available
2020-09-23 14:53:06 -06:00
Matthias Hanel
9d1526cbb8 Adding user jwt payload and subscriber limits
Addresses part of #1552

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-08-24 18:16:25 -04:00
Ivan Kozlovic
92970316ee [FIXED] Possible panic for TLS connections that are aborted early
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-08-18 16:38:35 -06:00
Andrew Harris
17930baa18 add client provided info into server side client logs when available 2020-07-24 15:40:39 -04:00
Ivan Kozlovic
2c8bf520d1 [FIXED] Connection name in log statement for some IPv6 addresses
If an IPv6 address contains some "%" characters, this was causing
the connection name in log statement to mess up the Sprintf formatting.
The solution is to escape those "%" characters.

Resolves #1505

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-07-08 10:17:14 -06:00
Ivan Kozlovic
27540ee255 Fixed some flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-07-03 11:30:48 -06:00
Ivan Kozlovic
342acb0fe7 Fixed async client tests
In some tests, async clients are used with a running server (which
was not original intent). This resulted in the client writeLoop
to be started twice.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-07-01 15:53:08 -06:00
Derek Collison
ddb4219f7a Allow support for a empty response message when no responders are present.
This will also set a response status of 503 with the new header support.

Signed-off-by: Derek Collison <derek@nats.io>
2020-06-15 10:10:21 -07:00
Derek Collison
5f130128ce Read INFO used bufio.Reader
Signed-off-by: Derek Collison <derek@nats.io>
2020-06-01 11:46:18 -07:00
Derek Collison
2bd7553c71 System Account on by default.
Most of the changes are to turn it off for tests that were watching subscriptions and such.

Signed-off-by: Derek Collison <derek@nats.io>
2020-05-29 17:56:45 -07:00
Ivan Kozlovic
b0e43b6aa9 Fix flappers
- TestResponsePermissions: ensure subscription for service is
registered by server before sending requests.
- TestReloadDoesNotWipeAccountsWithOperatorMode: wait for subject
propagation.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-05-28 13:41:02 -06:00
Ivan Kozlovic
dc0f688cbf [FIXED] LameDuckMode sends INFO to clients
Also send an INFO to routes so that the remotes can remove the
LDM's server client URLs and notify their own clients of this
change.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-05-21 12:15:20 -06:00
Ivan Kozlovic
9715848a8e [ADDED] Websocket support
Websocket support can be enabled with a new websocket
configuration block:

```
websocket {
    # Specify a host and port to listen for websocket connections
    # listen: "host:port"

    # It can also be configured with individual parameters,
    # namely host and port.
    # host: "hostname"
    # port: 4443

    # This will optionally specify what host:port for websocket
    # connections to be advertised in the cluster
    # advertise: "host:port"

    # TLS configuration is required
    tls {
      cert_file: "/path/to/cert.pem"
      key_file: "/path/to/key.pem"
    }

    # If same_origin is true, then the Origin header of the
    # client request must match the request's Host.
    # same_origin: true

    # This list specifies the only accepted values for
    # the client's request Origin header. The scheme,
    # host and port must match. By convention, the
    # absence of port for an http:// scheme will be 80,
    # and for https:// will be 443.
    # allowed_origins [
    #    "http://www.example.com"
    #    "https://www.other-example.com"
    # ]

    # This enables support for compressed websocket frames
    # in the server. For compression to be used, both server
    # and client have to support it.
    # compression: true

    # This is the total time allowed for the server to
    # read the client request and write the response back
    # to the client. This include the time needed for the
    # TLS handshake.
    # handshake_timeout: "2s"
}
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-05-20 11:14:39 -06:00
Derek Collison
f5ceab339a Server support for headers between routes
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:33:06 -07:00
Derek Collison
55c77d1e4e Added support for delivery of HMSG and support for older clients
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:33:06 -07:00
Derek Collison
d61f1f5d92 Add in support for client header bool in CONNECT and tests
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:33:06 -07:00
Derek Collison
d51566881e First pass at headers awareness for server
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:33:06 -07:00
Derek Collison
a7f1bca534 Additional service latency upgrades.
We now share more information about the responder and the requestor. The requestor information by default is not shared, but can be when declaring the import.

Also fixed bug for error handling on old request style requests that would always result on a 408 response.

Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:26:46 -07:00
Ivan Kozlovic
46f880bc52 [FIXED] Early closed connection may linger in the server
If the connection is marked as closed while sending the INFO, the
connection would not be removed from the internal map, which would
cause it to be shown in the monitoring list of opened connections.

Resolves #1384

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-05-08 12:01:15 -06:00
Ivan Kozlovic
be00ea96cf [IMPROVED] Added close reason in the connection close log statement
This gives the close reason directly in the log without having to
get that information from the monitoring endpoint. Here is an
example of a route closed due to the remote side not replying to
PINGs:

```
[INF] 127.0.0.1:53839 - rid:2 - Router connection closed: Stale Connection
```

Without this change, the log statement would have been:
```
[INF] 127.0.0.1:53839 - rid:2 - Router connection closed
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-04-15 15:36:54 -06:00
Ivan Kozlovic
6f3418687b Capture original length of the first slice and updated test
Changed test to make the previous code in flushOutbound fail.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-03-03 19:12:42 -07:00
Ivan Kozlovic
fd8539f15f [FIXED] Incorrect buffer reuse in case of partial connection write
Added a test that demonstrates the issue and a proposed fix.

Also decrement c.out.pb if closing due to max pending limit.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-03-03 16:50:03 -07:00
Matthias Hanel
a8e6af30a3 On client connect, send first ping after ping interval.
On connect message resend reset timer with setFirstPingTimer, so RTT can
be obtained quicker.

Disable short first ping in default server options for client_test.
In log_test prevent immediate scheduling by setting ping interval.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-03-02 20:10:15 -05:00