nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-02 11:48:43 -07:00

Author	SHA1	Message	Date
Matthias Hanel	c50ee2a1c6	[Changed] all times exposed will be computed in UTC (#1943 ) This also applies to times that end up in that json. Where applicable moved time.Now() to where it is used. Moved calls to .UTC() to where time is created it that time is converted later anyway. Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-03-02 21:37:42 -05:00
Ivan Kozlovic	ac0a1ee8fd	Fixed compression http header request/response The issue was introduced by PR #1858. Key points: - Sec-WebSocket-Extensions must contain approved headers, so moving the "no-masking" private extension to its own header "Nats-No-Masking". - The format of the permessage-deflate negotiation response became invalid, I have fixed that. - For leaf nodes, if `permessage-deflate` extension is not at all present in the response, then simply disable compression, however if it is present but there is no server/client no context take over, then we have to fail the connection. - A leafnode test was not setting the "NoMasking" option so the test TestLeafNodeWSNoMaskingRejected was not capturing possible error if negotiation failed. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-02-01 12:10:37 -07:00
Ivan Kozlovic	9587bf8cd4	Changed option to make masking the default and option to disable it This will allow a better experience if there is a load balancer in between and expects websocket frames to be masked. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-01-29 11:22:22 -07:00
Ivan Kozlovic	2b8c6e0124	Support for Websocket Leafnode connections Added two options in the remote leaf node configuration - compress, for websocket only at the moment - ws_masking, to force remote leafnode connections to mask websocket frames (default is no masking since it is communication between server to server) Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-01-28 13:13:11 -07:00
Ivan Kozlovic	131be1cb33	Make TLS client/server handshake helpers function This reduces code duplication Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-01-28 13:13:11 -07:00
Ivan Kozlovic	6666f5aa43	[FIXED] LeafNode: save hostname that may be used during TLS handshake Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-01-26 12:10:57 -07:00
Ivan Kozlovic	c9bba7d1e3	Change back "server_name" to "name" for backward compatibility The LeafNode connect protocol's Name field had json tag "name" but was changed to "server_name" in the JetStream cluster branch. Changing it back to "name" to not have to deal with different places where to get the name from. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-01-15 14:00:21 -07:00
Ivan Kozlovic	0d78bce9cf	Fixed some leafnode issues introduced from JS cluster work Also fixed a flapper. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-01-15 12:00:34 -07:00
Derek Collison	f0cdf89c61	JetStream Clustering WIP Signed-off-by: Derek Collison <derek@nats.io>	2021-01-14 01:14:52 -08:00
Ivan Kozlovic	14aecb2202	Fixed headers support for inbound leafnode connection The server that solicits a LeafNode connection does not send an INFO, so the accepting side had no way to know if the remote supports headers or not. The solicit side will now send the headers support capability in the CONNECT protocol so that the receiving side can mark the inbound connection with headers support based on that and its own support for headers. Resolves #1781 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-12-21 11:53:24 -07:00
Ivan Kozlovic	fc1521636c	[FIXED] Config reload for gateways/leaf remote TLS configurations Presence of TLS config in any remote gateway or leafnode would cause the config reload to fail (because TLS config internal content may change which fails the DeepEqual check). This PR excludes the TLS configs in such case to check for changes in gateways and leafnodes. Although GW and LN config reload is technically supported, this PR updates the internal remotes' TLS configuration so that changes/updates to TLS certificates would take effect after a configuration reload. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-12-11 16:56:25 -07:00
Ivan Kozlovic	406dc7ee56	Fixed data race on leafnode check for remote cluster A newly introduced test (TestLeafNodeTwoRemotesBindToSameAccount) had a server creating two remotes to the same server/account. This test quite often show the data race: ``` go test -race -v -run=TestLeafNodeTwoRemotesBindToSameAccount ./server -count 100 --failfast === RUN TestLeafNodeTwoRemotesBindToSameAccount ================== WARNING: DATA RACE Write at 0x00c000168790 by goroutine 34: github.com/nats-io/nats-server/v2/server.(client).processLeafNodeConnect() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:1177 +0x314 github.com/nats-io/nats-server/v2/server.(client).processConnect() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/client.go:1719 +0x9e4 github.com/nats-io/nats-server/v2/server.(client).parse() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/parser.go:870 +0xf88 github.com/nats-io/nats-server/v2/server.(client).readLoop() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/client.go:1052 +0x7a5 github.com/nats-io/nats-server/v2/server.(Server).createLeafNode.func4() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:872 +0x52 Previous read at 0x00c000168790 by goroutine 32: github.com/nats-io/nats-server/v2/server.(client).remoteCluster() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:1203 +0x42d github.com/nats-io/nats-server/v2/server.(Server).updateLeafNodes() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:1375 +0x2cf github.com/nats-io/nats-server/v2/server.(client).processLeafSub() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:1619 +0x858 github.com/nats-io/nats-server/v2/server.(client).parse() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/parser.go:624 +0x5031 github.com/nats-io/nats-server/v2/server.(client).readLoop() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/client.go:1052 +0x7a5 github.com/nats-io/nats-server/v2/server.(Server).createLeafNode.func4() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:872 +0x52 Goroutine 34 (running) created at: github.com/nats-io/nats-server/v2/server.(Server).startGoRoutine() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:2627 +0xc7 github.com/nats-io/nats-server/v2/server.(Server).createLeafNode() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:872 +0xf7a github.com/nats-io/nats-server/v2/server.(Server).startLeafNodeAcceptLoop.func1() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:474 +0x5e github.com/nats-io/nats-server/v2/server.(Server).acceptConnections.func1() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:1784 +0x57 Goroutine 32 (running) created at: github.com/nats-io/nats-server/v2/server.(Server).startGoRoutine() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:2627 +0xc7 github.com/nats-io/nats-server/v2/server.(Server).createLeafNode() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:872 +0xf7a github.com/nats-io/nats-server/v2/server.(Server).startLeafNodeAcceptLoop.func1() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:474 +0x5e github.com/nats-io/nats-server/v2/server.(*Server).acceptConnections.func1() /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:1784 +0x57 ================== testing.go:965: race detected during execution of test --- FAIL: TestLeafNodeTwoRemotesBindToSameAccount (0.05s) ``` This is because as soon as a LEAF is registered with the account, it is available in the account's lleafs map, even before the CONNECT for this connectio is processed. If another LEAF connection is processing a LSUB, the code goes over all leaf connections for the account and may find the new connection that is in the process of connecting. The check accesses c.leaf.remoteCluster unlocked which is also set unlocked during the CONNECT. The fix is to have the set and check on that particular location using the client's lock. Ideally I believe that the connection should not have been in the account's lleafs, or at least not used until the CONNECT for this leaf connection is fully processed. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-11-24 15:42:30 -07:00
Ivan Kozlovic	120b031ffd	Merge pull request #1739 from nats-io/leaf-warning [Added] account name checks for leaf nodes in operator mode	2020-11-24 12:35:31 -07:00
Matthias Hanel	b0461e3921	Fixed comment Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-11-24 12:47:41 -05:00
Matthias Hanel	a0dc9ea3e3	Reducing complexity of lookup Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-11-24 12:31:44 -05:00
Matthias Hanel	a8390b7432	Incorporating comments and moving code Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-11-23 23:27:44 -05:00
Ivan Kozlovic	f155c75da7	[FIXED] LeafNode reject duplicate remote There was a test to prevent an errorneous loop detection when a remote would reconnect (due to a stale connection) while the accepting side did not detect the bad connection yet. However, this test was racy because the test was done prior to add the connections to the map. In the case of a misconfiguration where the remote creates 2 different remote connections that end-up binding to the same account in the accepting side, then it was possible that this would not be detected. And when it was, the remote side would be unaware since the disconnect/ reconnect attempts would not show up if not running in debug mode. This change makes sure that the detection is no longer racy and returns an error to the remote so at least the log/console of the remote will show the "duplicate connection" error messages. Resolves #1730 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-11-23 13:28:18 -07:00
Ivan Kozlovic	bea9fca24c	Prevent panic when accepting TLS leafnode connections This is an addition to PR #1652. I have simply added a check but at this point in time there is no risk that connection is closed this early. I also renamed the small helper function and fixed a test that had an improper `s.mu.Unlock()` in an error condition. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-10-21 14:53:03 -06:00
Ivan Kozlovic	9bd088e0b9	Make it a small function Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-10-19 10:50:20 -06:00
Ivan Kozlovic	3b8d00e046	[FIXED] Possible panic when server accepts TLS leafnode connection Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-10-19 10:29:32 -06:00
Ivan Kozlovic	2605ae71ed	[FIXED] Prevent LeafNode loop detection on early reconnect If the soliciting side detects the disconnect and attempts to reconnect but the accepting side did not yet close the connection, a "loop detected" error would be reported and the soliciting server would not try to reconnect for 30 seconds. Made a change so that the accepting server checks for existing leafnode connection for the same server and same account, and if it is found, close the "old" connection so it is replaced by the "new" one. Resolves #1606 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-09-22 16:58:36 -06:00
Ivan Kozlovic	2ad2bed170	[ADDED] Support for route hostname resolution We previously simply called DialTimeout() on a route's url when soliciting. If it resolved to the IP of the host, it would create a route to self, which server detects, but then would not try again with other IPs that would have allowed to form a cluster with other servers running on the other IPs. This PR keeps track of local IPs + cluster port and exclude them from the list of IPs returned by LookupHost API. This even prevent solicitation of routes to self. Only non-local IPs will be tried. Resolves #1586 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-09-08 13:40:17 -06:00
Phil Pennock	3c680eceb9	Inhibit Go's default TCP keepalive settings for NATS (#1562 ) Inhibit Go's default TCP keepalive settings for NATS Go 1.13 changed the semantics of the tuning parameters for TCP keepalives, including the default value. This affects all TCP listeners. The NATS protocol has its own L7 keepalive system (PING/PONG) and the Go defaults are not a good fit for some valid deployment scenarios, while Go doesn't directly expose a working API for tuning these. Rather than add a configuration knob and pull in another dependency (with portability issues) just disable TCP keepalives for all listeners used for speaking the NATS protocol. Change the tests so we test the same logic. Do not change HTTP monitoring, profiling, or the websocket API listeners. Change KeepAlive on client connections too.	2020-08-14 13:37:59 -04:00
Ivan Kozlovic	c620175353	Rework closeConnection() This change allows the removal of the connection and update of the server state to be done "in place" but still delay the flushing of and close of tcp connection to the writeLoop. With ref counting we ensure that the reconnect happens after the flushing but not before the state has been updated. Had to fix some places where we may have called closeConnection() from under the server lock since it now would deadlock for sure. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-31 15:30:17 -06:00
Ivan Kozlovic	96ccf91566	[FIXED] Possible deadlock with solicited leafnodes when cluster conflict We cannot call c.closeConnection() under the server lock because closeConnection() can invoke server lock in some cases. Created a test that should run without `-race` to reproduce the deadlock (which it does) but sometimes would fail because cluster would not be formed. This unconvered an issue with conflict resolution which test TestRouteClusterNameConflictBetweenStaticAndDynamic() can reproduce easily. The issue was that we were not updating a dynamic name with the remote if the remote was non dynamic. Resolves #1543 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-30 18:45:36 -06:00
Ivan Kozlovic	b9764db478	Renamed gossipURLs type and moved its declaration to util.go Also made the add/remove/getAsStringSlice receiver for this type. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-16 11:22:58 -06:00
Ivan Kozlovic	9b0967a5d1	[FIXED] Handling of gossiped URLs If some servers in the cluster have the same connect URLs (due to the use of client advertise), then it would be possible to have a server sends the connect_urls INFO update to clients with missing URLs. Resolves #1515 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-15 17:39:12 -06:00
Ivan Kozlovic	4d495104de	Fixed no_responders use of sendProtoNow() The call sendProtoNow() should not normally be used (only when setting up a connection when the writeloop is not yet started and server needs to send something before being able to start the writeLoop. Instead, code should use enqueueProto(). For this particular case though, use queueOutbound() directly and add to the producer's pcd map. Also fixed other places where we were using queueOutbound() + flushSignal() which is what enqueueProto is doing. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-09 17:55:14 -06:00
Ivan Kozlovic	9288283d90	Fixed accept loops that could leave connections opened This was discovered with the test TestLeafNodeWithGatewaysServerRestart that was sometimes failing. Investigation showed that when cluster B was shutdown, one of the server on A that had a connection from B that just broke tried to reconnect (as part of reconnect retries of implicit gateways) to a server in B that was in the process of shuting down. The connection had been accepted but createGateway not called because the server's running boolean had been set to false as part of the shutdown. However, the connection was not closed so the server on A had a valid connection to a dead server from cluster B. When the B cluster (now single server) was restarted and a LeafNode connection connected to it, then the gateway from B to A was created, that server on A did not create outbound connection to that B server because it already had one (the zombie one). So this PR strengthens the starting of accept loops and also make sure that if a connection (all type of connections) is not accepted because the server is shuting down, that connection is properly closed. Since all accept loops had almost same code, made a generic function that accept functions to call specific create connection functions. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-06 17:03:19 -06:00
Derek Collison	6c805eebc7	Properly support leadnode clusters. Leafnodes that formed clusters were partially supported. This adds proper support for origin cluster, subscription suppression and data message no echo for the origin cluster. Signed-off-by: Derek Collison <derek@nats.io>	2020-06-26 09:03:22 -07:00
Derek Collison	98f84bdbc8	Make sure to merge with local deny clauses Signed-off-by: Derek Collison <derek@nats.io>	2020-06-16 11:56:24 -07:00
Derek Collison	3541e3f0f9	Updated older tests for new functionality Signed-off-by: Derek Collison <derek@nats.io>	2020-06-16 10:56:39 -07:00
Derek Collison	ca4f03c1a6	Properly handle leafnode spoke permissions. When a leafnode would connect with credentials that had permissions the spoke did not have a way of knowing what those were. This could lead to being disconnected when sending subscriptions or messages to the hub which were not allowed. Signed-off-by: Derek Collison <derek@nats.io>	2020-06-16 08:33:09 -07:00
Ivan Kozlovic	61cccbce02	[FIXED] LeafNode solicit failure race could leave conn registered This was found due to a recent test that was flapping. The test was not checking the correct server for leafnode connection, but that uncovered the following bug: When a leafnode connection is solicited, the read/write loops are started. Then, the connection lock is released and several functions invoked to register the connection with an account and add to the connection leafs map. The problem is that the readloop (for instance) could get a read error and close the connection before the above said code executes, which would lead to a connection incorrectly registered. This could be fixed either by delaying the start of read/write loops after the registration is done, or like in this PR, check the connection close status after registration, and if closed, manually undoing the registration with account/leafs map. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-06-12 16:01:13 -06:00
Derek Collison	4dee03b587	Allow mixed TLS and non-TLS on same port Signed-off-by: Derek Collison <derek@nats.io>	2020-06-05 18:04:11 -07:00
Ivan Kozlovic	25bd5ca352	[FIXED] Unsubscribe may not be propagated through a leaf node There is a race between the time the processing of a subscription and the init/send of subscriptions when accepting a leaf node connection that may cause internally a subscription's subject to be counted many times, which would then prevent the send of an LS- when the subscription's interest goes away. Imagine this sequence of events, each side represents a "thread" of execution: ``` client readLoop leaf node readLoop ---------------------------------------------------------- recv SUB foo 1 sub added to account's sublist recv CONNECT auth, added to acc. updateSmap smap["foo"]++ -> 1 no LS+ because !allSubsSent init smap finds sub in acc sl smap["foo"]++ -> 2 sends LS+ foo allSubsSent == true recv UNSUB 1 updateSmap smap["foo"]-- -> 1 no LS- because count != 0 ---------------------------------------------------------- ``` Equivalent result but with slightly diffent execution: ``` client readLoop leaf node readLoop ---------------------------------------------------------- recv SUB foo 1 sub added to account's sublist recv CONNECT auth, added to acc. init smap finds sub in acc sl smap["foo"]++ -> 1 sends LS+ foo allSubsSent == true updateSmap smap["foo"]++ -> 2 no LS+ because count != 1 recv UNSUB 1 updateSmap smap["foo"]-- -> 1 no LS- because count != 0 ---------------------------------------------------------- ``` The approach for the fix is delay the creation of the smap until we actually initialize the map and send the subs on processing of the CONNECT. In the meantime, as soon as the LN connection is registered and available in updateSmap, we check that smap is nil or not. If nil, we do nothing. In "init smap" we keep track of the subscriptions that have been added to smap. This map will be short lived, just enough to protect against races above. In updateSmap, when smap is not nil, we need to checki, if we are adding, that the subscription has not already been handled. The tempory subscription map will be ultimately emptied/set to nil with the use of a timer (if not emptied in place when processing smap updates). Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-06-05 10:07:15 -06:00
Ivan Kozlovic	8f05bc5c46	[FIXED] Possible stall on shutdown with leafnode setup If a leafnode connection is accepted but the server is shutdown before the connection is fully registered, the shutdown would stall because read and write loop go routine would not be stopped. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-05-22 15:26:04 -06:00
Derek Collison	99d1e56aac	Don't send updates to leafnodes before all subs on init Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:33:56 -07:00
Derek Collison	915e3cd74e	Header support for Leafnodes Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:33:56 -07:00
Derek Collison	019c105ca7	Updates based on feedback, more tests, few bug fixes Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:33:06 -07:00
Derek Collison	f5ceab339a	Server support for headers between routes Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:33:06 -07:00
Derek Collison	ea5e5bd364	Services rewrite #2 This contains a rewrite to the services layer for exporting and importing. The code this merges to already had a first significant rewrite that moved from special interest processing to plain subscriptions. This code changes the prior version's dealing with reverse mapping which was based mostly on thresholds and manual pruning, with some sporadic timer usage. This version uses the jetstream branch's code that understands interest and failed deliveries. So this code is much more tuned to reacting to interest changes. It also removes thresholds and goes only by interest changes or expirations based around a new service export property, response thresholds. This allows a service provider to provide semantics on how long a response should take at a maximum. This commit also introduces formal support for service export streamed and chunked response types send an empty message to signify EOF. This commit also includes additions to the service latency tracking such that errors are now sent, not only successful interactions. We have added a Status field and an optional Error fields to ServiceLatency. We support the following Status codes, these are directly from HTTP. 400 Bad Request (request did not have a reply subject) 408 Request Timeout (when system detects request interest went away, old request style to make dependable).. 503 Service Unavailable (no service responders running) 504 Service Timeout (The new response threshold expired) Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:26:46 -07:00
Derek Collison	df774e44b0	Rework how service imports are handled to avoid performance hits Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:18:34 -07:00
Derek Collison	8d1f3cc7c2	Allow JetStream consumers to work across multi-server hops Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:16:03 -07:00
Derek Collison	685efc36df	Allow JS to work over leafnodes for streams Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:16:03 -07:00
Derek Collison	aff10aa16b	Fix for #1344 Signed-off-by: Derek Collison <derek@nats.io>	2020-04-14 09:26:35 -07:00
Derek Collison	dc55356096	Have events look at whether or not a leaf is a hub, regardless of solicit Signed-off-by: Derek Collison <derek@nats.io>	2020-04-13 15:25:21 -07:00
Derek Collison	6fa7f1ce82	Have hub role sent to accepting side and adapt to be a spoke Signed-off-by: Derek Collison <derek@nats.io>	2020-04-13 15:18:42 -07:00
Derek Collison	2b1fe8f261	Merge pull request #1337 from nats-io/service-account-leaf-test [FIXED] Service across accounts and leaf nodes	2020-04-10 17:38:07 -07:00
Derek Collison	ef85a1b836	Fix for #1336 Signed-off-by: Derek Collison <derek@nats.io>	2020-04-10 17:30:03 -07:00

1 2 3

113 Commits