Commit Graph

149 Commits

Author SHA1 Message Date
Derek Collison
c16d361ead Merge branch 'main' into dev 2023-06-27 20:41:57 -07:00
Derek Collison
9805101953 Use an account protected method to check for service imports to avoid data race when reloading accounts.
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-27 19:52:43 -07:00
Derek Collison
8a8c37231f Merge branch 'main' into dev 2023-06-10 20:56:42 -07:00
Derek Collison
11963e51fe Optimize statsz locking and only send if we know we have external interest.
Signed-off-by: Derek Collison <derek@nats.io>
2023-06-10 20:25:05 -07:00
Derek Collison
4c26cbb3de Merge branch 'main' into dev 2023-05-12 12:38:20 -07:00
Waldemar Quevedo
286a1632ca Use monotonic time for measuring time internally
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-05-12 08:27:46 -07:00
Ivan Kozlovic
311e3feb5f Merge branch 'main' into dev 2023-05-03 17:38:40 -06:00
Waldemar Quevedo
938ffcba20 Fix race in reload and gateway sublist check
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-05-02 17:51:53 -07:00
Ivan Kozlovic
105237cba8 [ADDED] Multiple routes and ability to have per-account routes
New configuration fields:
```
cluster {
   ...
   pool_size: 5
   accounts: ["A", "B"]
}
```

The configuration `pool_size` in the example above means that this
server will create 5 routes to a remote server, assuming that that
server has the same `pool_size` setting.

Accounts (which are not part of the `accounts[]` configuration)
are assigned a specific route in this pool, and this will be the
same route on all servers in the cluster.

Accounts that are defined in the `accounts` field will each have
a dedicated route connection. This will allow suppression of the
account name in some of the route protocols, reducing bytes transmitted
which may increase performance.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-04-03 09:32:25 -06:00
peaaceChoi
038037381b Fix some typos in code comment 2023-01-12 10:31:32 +09:00
Ivan Kozlovic
8d9c57ad44 [IMPROVED] Fan-out performance
There was an observed degradation (around 5%) for large fan out in
v2.9.0 compared to earlier release. This is because we added
accounting of the in/out messages for the account, which result
in 4 atomic operations, 2 for in and 2 for out, however, it means
that for a fan-out of say 100 matching subscriptions, it is now
2 + 2 * 100 = 202.

This PR rework how the stats accounting is done which removes
the regression and even boost a bit the numbers since we are
doing the server stats update as an aggregate too.

There are still degradation for queues and no-sub at all that
need to be looked at.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-27 19:43:32 -06:00
Ivan Kozlovic
170ff49837 [ADDED] JetStream: peer (the hash of server name) in statsz/jsz
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
    "meta_cluster": {
      "name": "local",
      "leader": "A",
      "peer": "NUmM6cRx",
      "replicas": [
        {
          "name": "B",
          "current": true,
          "active": 690369000,
          "peer": "b2oh2L6w"
        },
        {
          "name": "Server name unknown at this time (peerID: jZ6RvVRH)",
          "current": false,
          "offline": true,
          "active": 0,
          "peer": "jZ6RvVRH"
        }
      ],
      "cluster_size": 3
    }
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-16 15:31:37 -06:00
Ivan Kozlovic
8d1fb4bc92 [FIXED] JetStream: possible routing issues through gateways
Internally jetstream may subscribe to some subject and then send
a request with a reply subject matching that subscription.
Due to interest propagation through a super cluster, it is possible
that the reply comes back to a node that is not yet aware of
the subscription interest which would cause the reply to be dropped.

Some code detects that the subscription is recent and "map" the
reply subject so that it can be routed back to the origin server.
However, this was done with the use of the connection object that
created the subscription, but at the time of the send, a different
internal "*client" object may be used which would then cause
the code to not be aware of the recent subscription and not do
the mapping.

This code was changed to scope at the account level instead of
connection.

A recent change in PR #3412 is no longer needed and was reverted
in favor of changes in this PR.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-31 14:18:28 -06:00
Derek Collison
98bf861a7a Updates to stream and consumer move logic.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 16:11:35 -07:00
Ivan Kozlovic
9d1e773e8f [FIXED] Gateway: system request/replies may not work properly
When a subscription is recently made, gateway code ensures that if
there is a reply subject, the reply is "mapped" or rewritten to allow
the reply to come back to the origin cluster, regardless of subscription
interest propagation.

The issue was that this uses a map with a `*client` as the key
but the pointer for SYSTEM clients would not always be the same,
which meant that the rewrite would not happen, causing possible
"loss" of replies.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-29 14:05:51 -06:00
Ivan Kozlovic
f6c4e5fcee [CHANGED] Gateway: Switch all accounts to interest-only mode
We are phasing out the optimistic-only mode. Servers accepting
inbound gateway connections will switch the accounts to interest-only
mode.

The servers with outbound gateway connection will check interest
and ignore the "optimistic" mode if it is known that the corresponding
inbound is going to switch the account to interest-only. This is
done using a boolean in the gateway INFO protocol.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-19 16:41:44 -06:00
Ivan Kozlovic
5d3ee8ebf4 [FIXED] Gateway: possible panic if monitor endpoint inspected too soon
The monitoring http server is started early and the gateway setup
(when configured) may not be fully ready when the `/gatewayz`
endpoint is inspected and could cause a panic.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-17 13:30:58 -06:00
Ivan Kozlovic
3c9a7cc6e5 Move to Go 1.19, remote io/util, fix data race and a flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-05 09:55:37 -06:00
Ivan Kozlovic
98c1f0ecb2 Fixed some data race and some flappers
Got a data race:
```
==================
WARNING: DATA RACE
Write at 0x00c001c736b0 by goroutine 605:
  runtime.mapassign_faststr()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:202 +0x0
  github.com/nats-io/nats-server/v2/server.(*Account).addServiceImport()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/accounts.go:1868 +0xb7b
  github.com/nats-io/nats-server/v2/server.(*Account).AddServiceImportWithClaim()
...
Previous read at 0x00c001c736b0 by goroutine 301:
  runtime.mapaccess2_faststr()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:107 +0x0
  github.com/nats-io/nats-server/v2/server.(*Server).registerSystemImports()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/events.go:1577 +0x284
  github.com/nats-io/nats-server/v2/server.(*Server).updateAccountClaimsWithRefresh()
...
```

Also, remove some condition in gateway.go on how we were checking
if a subject was a serviec reply, which was causing a test to flap.

Finally, used AckSync() in a rest (instead of m.Respond(nil)) to
prevent it from flapping.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-29 19:02:41 -06:00
Ivan Kozlovic
63c750e295 [CHANGED] Gateway: Detect duplicate names between clusters
Gateway connection will be closed and error reported if a remote
has a name that is a duplicate of the local cluster.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-15 15:00:13 -06:00
Ivan Kozlovic
85b3f8a7fd Gateways: data race when setting first ping timer
This was introduced when fixing #2881. The call to setFirstPingTimer
needed to be done under the client's lock.

Moved setFirstPingTimer from a server receiver to a client receiver.
The only reason it was a server receiver is because we need the
server options, but c.srv is always set when invoking this function,
so we will get the server from c.srv in that function now.

Related to #2881

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-04 19:55:07 -07:00
Ivan Kozlovic
08d6aaa78f [FIXED] Gateway: connect could fail due to PING sent before CONNECT
When a gateway connection was created (either accepted or initiated)
the timer to fire the first PING was started at that time, which
means that for an outbound connection, if the INFO coming from
the other side was delayed, it was possible for the outbound to
send the PING protocol before the CONNECT, which would cause
the accepting side to close the connection due to a "parse" error
(since the CONNECT for an inbound is supposed to be the very
first protocol).

Also noticed that we were not setting the auth timer like we do
for the other type of connections. If authorization{timeout:<n>}
is not set, the default is 2 seconds.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-02-23 15:19:20 -07:00
Ivan Kozlovic
5fc9e0e1cc [FIXED] Gateway URLs gossip and /varz report issues
- When detecting duplicate route, it was possible that a server
would lose track of the peer's gateway URL, which would prevent
it from gossiping that URL to inbound gateway connections
- When a server has gateways enabled and has as a remote its
own gateway, the monitoring endpoint `/varz` would include it
but without the "urls" array.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-10-28 12:05:30 -06:00
Ivan Kozlovic
0bd38bd424 [FIXED] Monitoring: /varz gateway URLs not always updated
When servers leave a cluster and their gateway URLs was not in
the remote cluster's configuration, it is possible that their
gateway URL do not disappear from the list of URLs in the `/varz`
monitoring endpoint.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-10-26 13:11:06 -06:00
Matthias Hanel
1c508220d8 Review comment
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-10-19 18:03:59 -04:00
Matthias Hanel
c4a3a4c95e fix timer not being stopped prior to reset
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-10-19 16:56:20 -04:00
Derek Collison
f13fa767c2 Remove the swapping of accounts during processing of service imports.
When processing service imports we would swap out the accounts during processing.
With the addition of internal subscriptions and internal clients publishing in JetStream we had an issue with the wrong account being used.
This was specific to delyaed pull subscribers trying to unsubscribe due to max of 1 while other JetStream API calls were running concurrently.
2021-07-26 07:57:10 -07:00
Derek Collison
1270977322 When receiving a response across a gateway that has headers and a globally routed subject (_GR_) we were dropping header information.
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-10 14:29:33 -07:00
Matthias Hanel
b1dee292e6 [changed] pinned certs to check the server connected to as well (#2247)
* [changed] pinned certs to check the server connected to as well

on reload clients with removed pinned certs will be disconnected.
The check happens only on tls handshake now.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-05-24 17:28:32 -04:00
Matthias Hanel
6f6f22e9a7 [added] pinned_cert option to tls block hex(sha256(spki)) (#2233)
* [added] pinned_cert option to tls block hex(sha256(spki))

When read form config, the values are automatically lower cased.
The check when seeing the values programmatically requires 
lower case to avoid having to alter the map at this point.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-05-20 17:00:09 -04:00
Ivan Kozlovic
2881e4a1f0 [FIXED] MQTT fixes and improvements
Some issues that have been fixed would manifest by timeouts on
connect, unexpected memory usage on high publish message rate.

Some details:
- Replies were not always GW routed properly because we were looking
at the wrong connection's rsubs
- GW routed replies would not be found because they were tracked
in the subscription's client object, which may not be the same used
to send the reply
- Increased the mqtt timeout to wait for JS replies since in some
tests it was sometimes taking more than the original 2 seconds
- Incoming gateway messages destined for an MQTT internal subscription
may have been rejected as a no interest if the account had service imports
- Don't use time.After(), instead create explicit timer so it can
be stopped when not timing out.
- Unnecessary copy of a slice since we were converting to a string anyway.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-05-04 20:48:14 -06:00
Jaime Piña
e12181cb83 Return not ready for connection reason
Currently, we use ReadyForConnections in server tests to wait for the
server to be ready. However, when this fails we don't get a clue about
why it failed.

This change adds a new unexported method called readyForConnections that
returns an error describing which check failed. The exported
ReadyForConnections version works exactly as before. The unexported
version gets used in internal tests only.
2021-04-20 11:45:08 -07:00
Ivan Kozlovic
56d0d9ec87 Do not propagate service import interest across GW and ROUTES
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-04-15 11:34:36 -06:00
Derek Collison
8eefff2b3b Make sure the jetstream accounts use the name as the key to the map.
This prevents possible double adds under reload or restart scenarios.

Signed-off-by: Derek Collison <derek@nats.io>
2021-03-18 17:29:26 -07:00
Ivan Kozlovic
cbcff97244 [CHANGED] Move Gateway interest-only mode switch from INF to DBG
Also fixed a test that would sometimes fail depending on timing.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-03-14 11:34:36 -06:00
Ivan Kozlovic
27f51d4028 Fix ephemeral consumer delete in single cluster
Also remove retry of sources/mirror in the setSourceConsumer() itself
when not getting a response.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-03-10 15:16:31 -07:00
Ivan Kozlovic
e7e756034a Switch Gateway JS accounts to interest-only mode + some other fixes
- Fixed the close of a TLS connection which starting Go 1.16
set the deadline to 5 seconds.

- Fixed an issue with setHeader that was causing these error messages
```
=== RUN   TestServiceImportReplyMatchCycleMultiHops
nats: message could not decode headers on connection [4] for subscription on "foo"
--- PASS: TestServiceImportReplyMatchCycleMultiHops (0.04s)
```

- Fixed names of tests in norace_test.go since they must start with
TestNoRace in order to make sure that we execute them in Travis:
```
go test -v -run=TestNoRace --failfast -p=1 ./...
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-03-03 19:15:28 -07:00
Matthias Hanel
c50ee2a1c6 [Changed] all times exposed will be computed in UTC (#1943)
This also applies to times that end up in that json.
Where applicable moved time.Now() to where it is used.
Moved calls to .UTC() to where time is created it that time is converted
later anyway.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-03-02 21:37:42 -05:00
Ivan Kozlovic
1652fe62ef Updates to when do snapshot
Remove panic on runAsLeader when not able to subscribe (which happens
on shutdown)
Gateway name access does not need lock since it is immutable. Will
prevent deadlocks in some situations.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-02-23 19:06:07 -07:00
Derek Collison
bb58d455f6 Revert switching to interest only mode
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-23 18:00:47 -08:00
Derek Collison
6d6a6c07ff Don't send empty subjects, always put system account in interest only
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-23 10:57:12 -08:00
Derek Collison
fa8a74ceb5 Allow placement directives for metacontroller stepdown to allow placement to new clusters.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-19 10:55:22 -08:00
Ivan Kozlovic
61bd1b8d86 MQTT clustering
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-02-19 08:50:00 -07:00
Ivan Kozlovic
8598de6dbe [FIXED] Gateway's implicit connection not using global user/pass
If a gateway is configured with an authorization block containing
username and password and accepts an unknown Gateway connection,
when initiating the outbound connection, it should use the
gateway authorization's user/pass information.

Resolves #1912

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-02-16 10:06:06 -07:00
Derek Collison
6d32c307ef Remove pretty indent for json.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-06 20:09:44 -08:00
Ivan Kozlovic
2b8c6e0124 Support for Websocket Leafnode connections
Added two options in the remote leaf node configuration

- compress, for websocket only at the moment
- ws_masking, to force remote leafnode connections to mask websocket
frames (default is no masking since it is communication between
server to server)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-28 13:13:11 -07:00
Ivan Kozlovic
131be1cb33 Make TLS client/server handshake helpers function
This reduces code duplication

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-28 13:13:11 -07:00
Ivan Kozlovic
ef38abe75b Fixed gateway reply mapping following changes in JetStream clustering
Those changes are required to maintain backward compatibility.
Since the replies are "_G_.<gateway name hash>.<server ID hash>"
and the hash were 6 characters long, changing to 8 the hash function
would break things.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-15 17:32:04 -07:00
Derek Collison
f0cdf89c61 JetStream Clustering WIP
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 01:14:52 -08:00
Ivan Kozlovic
d24e9b75b3 Fixed GW implicit reconnection
PR #1412 had a fix for races during implicit GW reconnection.
However, the fix was a bit too simplistic in that it was checking
only if there was any inbound gateway to decide to try to reconnect
an implicit disconnected GW. We need to check the name, not only
presence of inbound GW connections.

Related to #1412

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-12-28 12:28:55 -07:00