nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-02 11:48:43 -07:00

Author	SHA1	Message	Date
Ivan Kozlovic	61cccbce02	[FIXED] LeafNode solicit failure race could leave conn registered This was found due to a recent test that was flapping. The test was not checking the correct server for leafnode connection, but that uncovered the following bug: When a leafnode connection is solicited, the read/write loops are started. Then, the connection lock is released and several functions invoked to register the connection with an account and add to the connection leafs map. The problem is that the readloop (for instance) could get a read error and close the connection before the above said code executes, which would lead to a connection incorrectly registered. This could be fixed either by delaying the start of read/write loops after the registration is done, or like in this PR, check the connection close status after registration, and if closed, manually undoing the registration with account/leafs map. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-06-12 16:01:13 -06:00
Derek Collison	4dee03b587	Allow mixed TLS and non-TLS on same port Signed-off-by: Derek Collison <derek@nats.io>	2020-06-05 18:04:11 -07:00
Ivan Kozlovic	25bd5ca352	[FIXED] Unsubscribe may not be propagated through a leaf node There is a race between the time the processing of a subscription and the init/send of subscriptions when accepting a leaf node connection that may cause internally a subscription's subject to be counted many times, which would then prevent the send of an LS- when the subscription's interest goes away. Imagine this sequence of events, each side represents a "thread" of execution: ``` client readLoop leaf node readLoop ---------------------------------------------------------- recv SUB foo 1 sub added to account's sublist recv CONNECT auth, added to acc. updateSmap smap["foo"]++ -> 1 no LS+ because !allSubsSent init smap finds sub in acc sl smap["foo"]++ -> 2 sends LS+ foo allSubsSent == true recv UNSUB 1 updateSmap smap["foo"]-- -> 1 no LS- because count != 0 ---------------------------------------------------------- ``` Equivalent result but with slightly diffent execution: ``` client readLoop leaf node readLoop ---------------------------------------------------------- recv SUB foo 1 sub added to account's sublist recv CONNECT auth, added to acc. init smap finds sub in acc sl smap["foo"]++ -> 1 sends LS+ foo allSubsSent == true updateSmap smap["foo"]++ -> 2 no LS+ because count != 1 recv UNSUB 1 updateSmap smap["foo"]-- -> 1 no LS- because count != 0 ---------------------------------------------------------- ``` The approach for the fix is delay the creation of the smap until we actually initialize the map and send the subs on processing of the CONNECT. In the meantime, as soon as the LN connection is registered and available in updateSmap, we check that smap is nil or not. If nil, we do nothing. In "init smap" we keep track of the subscriptions that have been added to smap. This map will be short lived, just enough to protect against races above. In updateSmap, when smap is not nil, we need to checki, if we are adding, that the subscription has not already been handled. The tempory subscription map will be ultimately emptied/set to nil with the use of a timer (if not emptied in place when processing smap updates). Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-06-05 10:07:15 -06:00
Ivan Kozlovic	8f05bc5c46	[FIXED] Possible stall on shutdown with leafnode setup If a leafnode connection is accepted but the server is shutdown before the connection is fully registered, the shutdown would stall because read and write loop go routine would not be stopped. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-05-22 15:26:04 -06:00
Derek Collison	99d1e56aac	Don't send updates to leafnodes before all subs on init Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:33:56 -07:00
Derek Collison	915e3cd74e	Header support for Leafnodes Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:33:56 -07:00
Derek Collison	019c105ca7	Updates based on feedback, more tests, few bug fixes Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:33:06 -07:00
Derek Collison	f5ceab339a	Server support for headers between routes Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:33:06 -07:00
Derek Collison	ea5e5bd364	Services rewrite #2 This contains a rewrite to the services layer for exporting and importing. The code this merges to already had a first significant rewrite that moved from special interest processing to plain subscriptions. This code changes the prior version's dealing with reverse mapping which was based mostly on thresholds and manual pruning, with some sporadic timer usage. This version uses the jetstream branch's code that understands interest and failed deliveries. So this code is much more tuned to reacting to interest changes. It also removes thresholds and goes only by interest changes or expirations based around a new service export property, response thresholds. This allows a service provider to provide semantics on how long a response should take at a maximum. This commit also introduces formal support for service export streamed and chunked response types send an empty message to signify EOF. This commit also includes additions to the service latency tracking such that errors are now sent, not only successful interactions. We have added a Status field and an optional Error fields to ServiceLatency. We support the following Status codes, these are directly from HTTP. 400 Bad Request (request did not have a reply subject) 408 Request Timeout (when system detects request interest went away, old request style to make dependable).. 503 Service Unavailable (no service responders running) 504 Service Timeout (The new response threshold expired) Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:26:46 -07:00
Derek Collison	df774e44b0	Rework how service imports are handled to avoid performance hits Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:18:34 -07:00
Derek Collison	8d1f3cc7c2	Allow JetStream consumers to work across multi-server hops Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:16:03 -07:00
Derek Collison	685efc36df	Allow JS to work over leafnodes for streams Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:16:03 -07:00
Derek Collison	aff10aa16b	Fix for #1344 Signed-off-by: Derek Collison <derek@nats.io>	2020-04-14 09:26:35 -07:00
Derek Collison	dc55356096	Have events look at whether or not a leaf is a hub, regardless of solicit Signed-off-by: Derek Collison <derek@nats.io>	2020-04-13 15:25:21 -07:00
Derek Collison	6fa7f1ce82	Have hub role sent to accepting side and adapt to be a spoke Signed-off-by: Derek Collison <derek@nats.io>	2020-04-13 15:18:42 -07:00
Derek Collison	2b1fe8f261	Merge pull request #1337 from nats-io/service-account-leaf-test [FIXED] Service across accounts and leaf nodes	2020-04-10 17:38:07 -07:00
Derek Collison	ef85a1b836	Fix for #1336 Signed-off-by: Derek Collison <derek@nats.io>	2020-04-10 17:30:03 -07:00
Ivan Kozlovic	b200368e52	LeafNode: delay connect even when loop detected by accepting side If the loop is detected by a server accepting the leafnode connection, an error is sent back and connection is closed. This change ensures that the server checks an -ERR for "Loop detected" and then set the connect delay, so that it does not try to reconnect right away. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-04-10 16:44:16 -06:00
Derek Collison	84841a35bb	Merge pull request #1334 from nats-io/leafnode_bug Fix for bug when requestor and leafnode on same server.	2020-04-10 08:51:31 -07:00
Derek Collison	e843a27bba	When a responder was on a leaf node and the requestor was connected to the same server as the leafnode we did not propagate the service reply wildcard properly. This fixes that. Signed-off-by: Derek Collison <derek@nats.io>	2020-04-10 08:35:09 -07:00
Ivan Kozlovic	34eb5bda31	[ADDED] Deny import/export options for LeafNode remote configuration This will allow a leafnode remote connection to prevent unwanted messages to be received, or prevent local messages to be sent to the remote server. Configuration will be something like: ``` leafnodes { remotes: [ { url: "nats://localhost:6222" deny_imports: ["foo.", "bar"] deny_exports: ["baz.", "bat"] } ] } ``` Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-04-09 18:55:44 -06:00
Derek Collison	d70426d3f6	Do prefix test to avoid read lock if possible Signed-off-by: Derek Collison <derek@nats.io>	2020-04-09 08:48:04 -07:00
Derek Collison	699502de8f	Detection for loops with leafnodes. We need to send the unique LDS subject to all leafnodes to properly detect setups like triangles. This will have the server who completes the loop be the one that detects the error soley based on its own loop detection subject. Otehr changes are just to fix tests that were not waiting for the new LDS sub. Signed-off-by: Derek Collison <derek@nats.io>	2020-04-08 20:00:40 -07:00
Ivan Kozlovic	76e8e1c9b0	[ADDED] Leafnode remote's Hub option This allows a node that creates a remote LeafNode connection to act as it was the hub (of the hub and spoke topology). This is related to subscription interest propagation. Normally, a spoke (the one creating the remote LN connection) will forward only its local subscriptions and when receiving subscription interest would not try to forward to local cluster and/or gateways. If a remote has the Hub boolean set to true, even though the node is the one creating the remote LN connection, it will behave as if it was accepting that connection. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-04-07 13:42:55 -06:00
Matthias Hanel	6f77a54118	[FIXED] loop detection by checking for duplicate lds subscriptions This is in addition to checking if the own subscription comes back. The duplicated lds subscription must come from a different client. Added unit tests. Also prefixed lds with '$' to mark it as system subject going forward. This moves the loop detection check past other checks. These checks should not trigger in cases where a loop is initially detected. Fixes #1305 Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-17 19:06:35 -04:00
Ivan Kozlovic	cbc0e5848a	Merge pull request #1300 from nats-io/reload [FIXED] trace/debug/sys_log reload will affect existing clients	2020-03-09 09:48:24 -06:00
Matthias Hanel	8a74add60b	Include port in trace Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-06 15:44:32 -05:00
Matthias Hanel	6a1c3fc29b	Moving inbound tracing to the caller (client.parse) Tracing for outgoing operations is always done while holding the client lock. Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-04 17:31:18 -05:00
Matthias Hanel	f5bd07b36c	[FIXED] trace/debug/sys_log reload will affect existing clients Fixed #1296, by altering client state on reload Detect a trace level change on reload and update all clients. To avoid data races, read client.trace while holding the lock, pass the value into functionis that trace while not holding the lock. Delete unused client.debug. Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-04 13:54:15 -05:00
Ivan Kozlovic	27ae160f75	Use CID and LeafNodeURLs as an indicator connected to proper port First, the test should be done only for the initial INFO and only for solicited connections. Based on the content of INFO coming from different "listen ports", use the CID and LeafNodeURLs for the indication that we are connected to the proper port. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-01-29 14:43:41 -07:00
Waldemar Quevedo	ecb5008fe3	Add check prevent leafnode connecting to client port Signed-off-by: Waldemar Quevedo <wally@synadia.com>	2020-01-28 12:43:27 -08:00
Ivan Kozlovic	47b08335a4	[FIXED] Reset of tlsName only for x509.HostnameError For issue #1256, we cleared the possibly saved tlsName on Hanshake failure. However, this meant that for normal use cases, if a reconnect failed for any reason we would not be able to reconnect if it is an IP until we get back to the URL that contained the hostname. We now clear only if the handshake error is of x509.HostnameError type, which include errors such as: ``` "x509: Common Name is not a valid hostname: <x>" "x509: cannot validate certificate for <x> because it doesn't contain any IP SANs" "x509: certificate is not valid for any names, but wanted to match <x>" "x509: certificate is valid for <x>, not <y>" ``` Applied the same logic to solicited gateway connections, and fixed the fact that the tlsConfig should be cloned (since we set the ServerName). I have also made a change for leafnode connections similar to what we are doing for gateway connections, which is to use the saved tlsName only if tlsConfig.ServerName is empty, which may not be the case for users that embed NATS Server and pass directly tls configuration. In other words, if the option TLSConfig.ServerName is not empty, always use this value. Relates to #1256 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-01-28 13:16:38 -07:00
Derek Collison	643e73c0c5	Fix for #1256 , mixed IP and DNS for cluster and TLS with leafnodes Signed-off-by: Derek Collison <derek@nats.io>	2020-01-22 11:25:09 -08:00
Ivan Kozlovic	c73be88ac0	Updated based on comments Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-01-06 16:57:48 -07:00
Ivan Kozlovic	947798231b	[UPDATED] TCP Write and SlowConsumer handling - All writes will now be done by the writeLoop, unless when the writeLoop has not been started yet (likely in connection init). - Slow consumers for non CLIENT connections will be reported but not failed. The idea is that routes, gateway, etc.. connections should stay connected as much as possible. However if a flush operation times out and no data at all has been written, the connection will be closed (regardless of type). - Slow consumers due to max pending is only for CLIENT connections. This allows sending of SUBs through routes, etc.. to not have to be chunked. - The backpressure to CLIENT connections is increased (up to 1sec) based on the sub's connection pending bytes level. - Connection is flushed on close from the writeLoop as to not block the "fast path". Some tests have been fixed and adapted since now closeConnection() is not flushing/closing/removing connection in place. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-12-31 15:06:27 -07:00
Derek Collison	07253c0517	Merge pull request #1196 from nats-io/daisy Allow interest propagation with daisy-chained leafnodes	2019-11-17 17:46:23 -08:00
Derek Collison	07da68ce56	Allow interest propagation with daisy chained leafnodes Signed-off-by: Derek Collison <derek@nats.io>	2019-11-17 17:35:20 -08:00
Ivan Kozlovic	e0bc81d0ed	Make the Leafnode internal sub on _GR_.> This is needed for mapped gateway replies. We had used an extra token when implementing the new prefix, but it was then removed, but the leafnode subscription on _GR_...*.> was not updated. We now subscribe on _GR_.> There was a test that was passing because we were using inboxes that caused the pattern to match. Replaced with single token reply so that it would have caught this bug. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-17 17:37:09 -07:00
Ivan Kozlovic	aa843945c9	Work on Gateways reply mapping - New prefix that includes origin server for the request - Mapping done if request is service import or requestor has recent subscription - Subscription considered recent if less than 250ms - Destination server strip GW prefix before giving to client and restore when getting a reply on that subject - Mapping removed aftert 250ms - Server rejects client publish on "$GNR." (the new prefix) - Cluster and server hash are now 8 chars long and from base 62 alphabets - Mapped replies need to be sent to leafnode servers due to race (cluster B sends RS+ on GW inbound then RMSG on outbound, the RS+ may be processed later and cluster A may have given message to LN before RS+ on reply subject. So LN needs to accept the mapped reply but will strip to give to client and reassemble before sending it back) Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-06 16:06:49 -07:00
Ivan Kozlovic	cbbc21ac25	Some update to leafnode subscription handling - Send all subs in place if smap is small - Skip sending update until after sendAllLeafSubs() is done Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-10-30 20:01:49 -06:00
Ivan Kozlovic	51f83220c6	Fix race introduced in #1170 Code for leafnode loop detection had a data race. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-10-29 19:09:21 -06:00
Ivan Kozlovic	6bcb717722	Updates following code review - Make "lds." a constant - Create remote's get/reset functions for loop delay - Bump loop delay to 30 seconds Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-10-29 17:59:15 -06:00
Ivan Kozlovic	279cab2aaf	[FIXED] Detect loop between LeafNode servers This is achieved by subscribing to a unique subject. If the LS+ protocol is coming back for the same subject on the same account, then this indicates a loop. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-10-29 16:14:35 -06:00
Ivan Kozlovic	18a1702ba2	[ADDED] Basic auth for leafnodes Added a way to specify which account an accepted leafnode connection should be bound to when using simple auth (user/password). Singleton: ``` leafnodes { port: ... authorization { user: leaf password: secret account: TheAccount } } ``` With above configuration, if a soliciting server creates a LN connection with url: `nats://leaf:secret@host:port`, then the accepting server will bind the leafnode connection to the account "TheAccount". This account need to exist otherwise the connection will be rejected. Multi: ``` leafnodes { port: ... authorization { users = [ {user: leaf1, password: secret, account: account1} {user: leaf2, password: secret, account: account2} ] } } ``` With the above, if a server connects using `leaf1:secret@host:port`, then the accepting server will bind the connection to account `account1`. If user/password (either singleton or multi) is defined, then the connecting server MUST provide the proper credentials otherwise the connection will be rejected. If no user/password info is provided, it is still possible to provide the account the connection should be associated with: ``` leafnodes { port: ... authorization { account: TheAccount } } ``` With the above, a connection without credentials will be bound to the account "TheAccount". If credentials are used (jwt, nkey or other), then the server will attempt to authenticate and if successful associate to the account for that specific user. If the user authentication fails (wrong password, no such user, etc..) the connection will be also rejected. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-09-30 19:42:11 -06:00
Derek Collison	52430c304a	System level services for debugging. This is the first pass at introducing exported services to the system account for generally debugging of blackbox systems. The first service reports number of subscribers for a given subject. The payload of the request is the subject, and optional queue group, and can contain wildcards. Signed-off-by: Derek Collison <derek@nats.io>	2019-09-17 09:37:35 -07:00
Ivan Kozlovic	cd9f898eb0	Made a server's helper to set first ping timer Defaults to 1sec but will be opts.PingInterval if value is lower. All non client connections invoked this function for the first PING. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-26 10:21:43 -06:00
Ivan Kozlovic	90d592e163	Leaf and Route RTT When a leaf or route connection is created, set the first ping timer to fire at 1sec, which will allow to compute the RTT reasonably soon (since the PingInterval could be user configured and set much higher). For Route in PR #1101, I was sending the PING on receiving the INFO which required changing bunch of tests. Changing that to also use the first timer interval of 1sec and reverted changes to route tests. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-26 09:34:17 -06:00
Ivan Kozlovic	7ca8723942	[FIXED] Some Leafnode issues - On startup, verify that local account in leafnode (if specified can be found otherwise fail startup). - At runtime, print error and continue trying to reconnect. Will need to decide a better approach. - When using basic auth (user/password), it was possible for a solicited Leafnode connection to not use user/password when trying an URL that was discovered through gossip. The server now saves the credentials of a configured URL to use with the discovered ones. Updated RouteRTT test in case RTT does not seem to be updated because getting always the same value. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-23 14:08:07 -06:00
Derek Collison	8f5bc503e5	Add ability for cross account import services to return streams as well as singeltons. Take into account tracking of response maps that are created and do proper cleanup. Also fixes #1089 which was discovered while working on this. Signed-off-by: Derek Collison <derek@nats.io>	2019-08-06 14:15:40 -07:00
Ivan Kozlovic	0873b46f67	[FIXED] LeafNode urls may be missing in INFO sent to LN connections When a cluster of servers are having routes to each other, there is a chance that the list of leafnode URLs maintained on each server is not complete. This would result in LN servers connecting to this cluster to not get the full list of possible URLs the server could reconnect to. Also fixed a DATA RACE that appeared when running the updated TestLeafNodeInfoURLs test. Fixed the race and added specific test that easily demonstrated the race: TestLeafNodeNoRaceGeneratingNonce Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-07-12 19:15:30 -06:00

1 2

80 Commits