nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-16 19:14:41 -07:00

Author	SHA1	Message	Date
Derek Collison	82f585d83a	Updated to also resend leafnode connect on GW connect via first INFO Signed-off-by: Derek Collison <derek@nats.io>	2020-04-08 09:55:19 -07:00
Matthias Hanel	6a1c3fc29b	Moving inbound tracing to the caller (client.parse) Tracing for outgoing operations is always done while holding the client lock. Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-04 17:31:18 -05:00
Matthias Hanel	fe373ac597	Incorporating comments. c -> client defer in oneliner argument order Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-04 15:48:19 -05:00
Matthias Hanel	f5bd07b36c	[FIXED] trace/debug/sys_log reload will affect existing clients Fixed #1296, by altering client state on reload Detect a trace level change on reload and update all clients. To avoid data races, read client.trace while holding the lock, pass the value into functionis that trace while not holding the lock. Delete unused client.debug. Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-04 13:54:15 -05:00
Ivan Kozlovic	47b08335a4	[FIXED] Reset of tlsName only for x509.HostnameError For issue #1256, we cleared the possibly saved tlsName on Hanshake failure. However, this meant that for normal use cases, if a reconnect failed for any reason we would not be able to reconnect if it is an IP until we get back to the URL that contained the hostname. We now clear only if the handshake error is of x509.HostnameError type, which include errors such as: ``` "x509: Common Name is not a valid hostname: <x>" "x509: cannot validate certificate for <x> because it doesn't contain any IP SANs" "x509: certificate is not valid for any names, but wanted to match <x>" "x509: certificate is valid for <x>, not <y>" ``` Applied the same logic to solicited gateway connections, and fixed the fact that the tlsConfig should be cloned (since we set the ServerName). I have also made a change for leafnode connections similar to what we are doing for gateway connections, which is to use the saved tlsName only if tlsConfig.ServerName is empty, which may not be the case for users that embed NATS Server and pass directly tls configuration. In other words, if the option TLSConfig.ServerName is not empty, always use this value. Relates to #1256 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-01-28 13:16:38 -07:00
Ivan Kozlovic	c097357b52	[FIXED] More than expected switch to Interest-Only mode for account When an account is switched to interest-only mode due to no interest, it was not possible to switch that account more than once. But the function switchAccountToInterestMode() that triggers a switch could possibly doing it more than once. This should not cause problems but increased the number of traces in a big super cluster. Also fixed some flappers and a data race. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-01-09 13:35:08 -07:00
Ivan Kozlovic	c73be88ac0	Updated based on comments Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-01-06 16:57:48 -07:00
Ivan Kozlovic	947798231b	[UPDATED] TCP Write and SlowConsumer handling - All writes will now be done by the writeLoop, unless when the writeLoop has not been started yet (likely in connection init). - Slow consumers for non CLIENT connections will be reported but not failed. The idea is that routes, gateway, etc.. connections should stay connected as much as possible. However if a flush operation times out and no data at all has been written, the connection will be closed (regardless of type). - Slow consumers due to max pending is only for CLIENT connections. This allows sending of SUBs through routes, etc.. to not have to be chunked. - The backpressure to CLIENT connections is increased (up to 1sec) based on the sub's connection pending bytes level. - Connection is flushed on close from the writeLoop as to not block the "fast path". Some tests have been fixed and adapted since now closeConnection() is not flushing/closing/removing connection in place. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-12-31 15:06:27 -07:00
Ivan Kozlovic	a22da91647	[FIXED] Closing of Gateway or Route TLS connection may hang This could happen if the remote server is running but not dequeueing from the socket. TLS connection Close() may send/read and so we need to protect with a deadline. For non client/leaf connection, do not call flushOutbound(). Set the write deadline regardless of handshakeComplete flag, and set it to a low value. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-12-04 17:27:00 -07:00
Ivan Kozlovic	a0f8bd112e	[FIXED] Prevent A- for account that has service reply subscription Prevent sending an A- for a given account if the server has this account registered and an internal service reply subscription. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-26 16:21:36 -07:00
Derek Collison	b2cbde2616	Match comment about hash size Signed-off-by: Derek Collison <derek@nats.io>	2019-11-16 17:56:06 -08:00
Ivan Kozlovic	9b837813b1	Process service replies in gateway inbound Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-16 17:43:44 -07:00
Derek Collison	747ba1dc09	Change , remove T placeholder, 8 to 6 on hash len Signed-off-by: Derek Collison <derek@nats.io>	2019-11-16 13:06:56 -08:00
Derek Collison	6ad8287bbe	Introduced wildcard handling of _R_ mapped replies. We had too much special processing, so reduced to a single wildcard which will propagate across routes and gateways and is consistent with gateway handling of globally routed subjects and timeouts. Signed-off-by: Derek Collison <derek@nats.io>	2019-11-16 12:50:53 -08:00
Ivan Kozlovic	d046f7945f	Bump defaultGatewayRecentSubExpiration and RC2 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-15 10:06:38 -07:00
Ivan Kozlovic	b561bde366	Alternate approach to GW reply mapping expiration Use centralized sync map to gather *client that have GW replies. Tested with concurrent receiving clients and perf is as good as with timer per client but reduces need of that timer per client object. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-11 13:36:24 -07:00
Ivan Kozlovic	8a8695d07c	Backward compatibility with previous servers Want to keep this commit separate so that we can easily remove when we no longer want to support both prefixes. - If this server receives a "$GR." message, it takes the subject and tries to process this locally. If there is no cluster race reply may be received ok (like before). - If this server sends a routed reply, it detects if sending to an older server (then uses $GR.) or not (then uses $GNR) - Gateway INFO has a new field that indicates if the server is using the new prefix. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-08 16:22:34 -07:00
Ivan Kozlovic	9b7dab0548	Updates based on code review - Add atomic in client to skip check in processInboundClientMsg() if value is 0. Avoids getting the lock in fast path if not needed. - Have a timer per client instead of the global server list that was expiring: noticed a lot of contention there when running some perf/profiling tests. The timer is also not reset for every timestamp that is not yet expired since this too affects performance. Instead fires are regular interval and cleared when map is empty after a cycle. - Move processing of gw map rely on its own function (in inbound msg). I have verified that this is inlined same way as when code was directly in processInboundClientMsg. - Use string(subj[]) for prefix detection: I have verified that it is actually faster. - Builds the RMSG with appends to local buffer in handleGatewayReply() instead of using fmt.Sprintf(). Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-08 15:56:28 -07:00
Ivan Kozlovic	aa843945c9	Work on Gateways reply mapping - New prefix that includes origin server for the request - Mapping done if request is service import or requestor has recent subscription - Subscription considered recent if less than 250ms - Destination server strip GW prefix before giving to client and restore when getting a reply on that subject - Mapping removed aftert 250ms - Server rejects client publish on "$GNR." (the new prefix) - Cluster and server hash are now 8 chars long and from base 62 alphabets - Mapped replies need to be sent to leafnode servers due to race (cluster B sends RS+ on GW inbound then RMSG on outbound, the RS+ may be processed later and cluster A may have given message to LN before RS+ on reply subject. So LN needs to accept the mapped reply but will strip to give to client and reassemble before sending it back) Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-06 16:06:49 -07:00
Ivan Kozlovic	75ec78c232	[FIXED] Explicit gateway not using discovered URLs If cluster A configures a gateway to cluster B, the server on A tries to connect to that server URL. If there is no server on B at that address, but a server on B with different address connects to server on cluster A, that server should be able to create its outbound connection in response. That was not the case because the configured URLs were snapshot before the loop of trying to connect. When accepting an inbound connection and updating the array, this new URL was not being used. The issue is only if the server on A had no outbound connection at that time. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-10-24 16:40:38 -06:00
Derek Collison	94f143ccce	Latency tracking updates. Will now breakout the internal NATS latency to show requestor client RTT, responder client RTT and any internal latency caused by hopping between servers, etc. Signed-off-by: Derek Collison <derek@nats.io>	2019-09-11 16:43:19 -07:00
Ivan Kozlovic	cd9f898eb0	Made a server's helper to set first ping timer Defaults to 1sec but will be opts.PingInterval if value is lower. All non client connections invoked this function for the first PING. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-26 10:21:43 -06:00
Ivan Kozlovic	c20afd4016	[FIXED] Connection could be closed twice This was introduced in PR#930. The first commit had the route's check if the flushOutbound() returned false, and if so would locally unlock/lock the connection's lock. Unfortunately, this was replaced in the second commit (`a6aeed3a6b`) to the flushOutbound() function itself. This causes the function closeConnection() to possibly unlock the connection while calling flushOutbound(), which if the connection is closed due to both a tls timeout for instance and explicitly, it would result in the connection being scheduled for a reconnect (if explicit gateway connection, possibly route). Added defensive code in Gateway to register a unique outbound gateway. Fixed a test that was now failing with newer Go version in which they fixed url.Parse() Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-13 20:11:03 -06:00
Ivan Kozlovic	0a72993d80	Add warning for TLS insecure setting on LeafNodes Also fix for #1071 in that we need to check remote gateways TLS config even if main gateway section is not configured with TLS. Related to #1071 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-07-12 17:22:57 -06:00
Ivan Kozlovic	9e09486e26	Use all caps for the production message Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-07-12 13:44:01 -06:00
Ivan Kozlovic	37d08a6c56	[FIXED] Allow TLS InsecureSkipVerify again This has an effect only on connections created by the server, so routes and gateways (explicit and implicit). Make sure that an explicit warning is printed if the insecure property is set, but otherwise allow it. Resolves #1062 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-07-12 12:10:28 -06:00
Derek Collison	8168aa1f81	Allow sublist cache do be disabled globally Signed-off-by: Derek Collison <derek@nats.io>	2019-07-02 07:34:02 -07:00
Derek Collison	d246359dc8	Merge pull request #1028 from nats-io/leaf_gw_si Bug fix for service import with leafnodes and gws	2019-05-31 11:29:33 -07:00
Derek Collison	3cf6f6a5d2	Bug fix for service import with leafnodes and gws Signed-off-by: Derek Collison <derek@nats.io>	2019-05-31 11:22:02 -07:00
Ivan Kozlovic	37f4e71246	Fixed race due to use of byte slice instead of string The go routine that is started during interest mode switch was using the accName (which was a byte slice) instead of account, which was a string copy of that byte slice. It meant that when printing the notice, the underlying buffer may have be overwriten by the readloop. Changing accName to a string - since we were doing a copy anyway, better change it at the function param level. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-30 18:43:01 -06:00
Ivan Kozlovic	37b3546e7b	Switch gateway to InterestMode only once When a leafnode connection is created, the server forces all gateway inbound connections to switch to InterestMode. Do this only once, regardless of how many times the LN (re)connects. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-30 17:21:15 -06:00
Ivan Kozlovic	66f5325cee	Merge pull request #1018 from nats-io/gw_log_interest_switch Added logging of account interest mode switch for gateways	2019-05-28 15:33:06 -06:00
Ivan Kozlovic	f5991e8a2b	Merge pull request #1015 from nats-io/restore_conn_error_default_attempts_to_one Update to connect/reconnect error reports logic	2019-05-28 14:57:29 -06:00
Ivan Kozlovic	2d4c3dd38f	Added logging of account interest mode switch for gateways Both sides will log when an account is switched to interest-only mode. There are 2 traces (start/complete) per account. They are logged at [INF] level. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-28 14:55:45 -06:00
Ivan Kozlovic	5478eaf01e	Added /gatewayz endpoint Such endpoint will list the gateway/cluster name, address and port then list of outbound/inbound connections. For each remote gateway there will be at most one outbound connection. There can be 0 or more inbound connections for the same remote gateway. For each of these outbound/inbound connection, the connection info similar to Connz is reported. Optionally, one can include the interest mode/stats for each account. Here are possible options: * No specific options http://host:port/gatewayz * Limit to specific remote gateway, say name "B": http://host:port/gatewayz/gw_name=B * Include accounts (default limit to 1024 accounts) http://host:port/gatewayz/accs=1 * Specific limit, say 200 (note accs=1 in this case is optional) http://host:port/gatewayz/accs=1&accs_limit=200 * Specific account, say "acc_1". Note that accs=1 is not required then http://host:port/gatewayz/acc_name=acc_1 * Above options can be mixed: specific remote gateway (B), with 100 accounts reported http://host:port/gatewayz/gw_name=B&accs_limit=200 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-28 12:41:09 -06:00
Ivan Kozlovic	d2578f9e05	Update to connect/reconnect error reports logic Changed the introduced new option and added a new one. The idea is to be able to differentiate between never connected and reconnected event. The never connected situation will be logged at first attempt and every hour (by default, configurable). However, once connected and if trying to reconnect, will report every attempts by default, but this is configurable too. These two options are supported for config reload. Related to #1000 Related to #1001 Resolves #969 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-26 17:51:01 -06:00
Ivan Kozlovic	b325cf1e4a	Fixed loss of queue subscription interest across Gateways in some cases Suppose two servers, SA in cluster A and SB in cluster B. If SA sends a message to SB on an account for which there is no interest at all (account not known or no subscription), SB will send an A- and keep track that it sent an A- for this account. When a queue subscription is created on SB, SB will send and RS+ to A because A needs to have perfect knowledge of all queue subs in all clusters. If then a regular subscription is also created on SB, SB will think that it needs to send an A+ because it had sent an A- for this account. However, SA had an entry for this account for the queue sub. The A+ would clear the entry in the map and would cause SA to not send messages to SB even if they would have been a match for the queue sub on SB. We fix this in two ways: - Clear the possible A- in SB when sending an RS+ for queue sub - Processing of A-/A+ to be aware of a possible entry in the map due to queue subs. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-25 16:27:00 -06:00
Ivan Kozlovic	55597a7e8b	[ADDED] URLs to cluster{} in /varz and update of gateway ones In varz's cluster{} section, there was no URLs field. This PR adds it and displays the routes defined in the cluster{} config section. The value gets updated should there be a config reload following addition/removal of an url from "routes". If config had 1 route to "nats://127.0.0.1:1234", here is what it would look like now: ``` "cluster": { "addr": "0.0.0.0", "cluster_port": 6222, "auth_timeout": 1, "urls": [ "127.0.0.1:1234" ] }, ``` Adding route to "127.0.0.1:4567" and doing config reload: ``` "cluster": { "addr": "0.0.0.0", "cluster_port": 6222, "auth_timeout": 1, "urls": [ "127.0.0.1:1234", "127.0.0.1:4567" ] }, ``` Note that due to how we handle discovered servers in the cluster, new urls dynamically discovered will not show in above output. This could be done, but would need some changes in how we store things (actually in this case, new urls are not stored, just attempted to be connected. Once they connect, they would be visible in /routez). For gateways, however, this PR displays the combination of the URLs defined in config and the ones that are discovered after a connection is made to a give cluster. So say cluster A has a single url to one server in cluster B, when connecting to that server, the server on A will get the list of the gateway URLs that one can connect to, and these will be reflected in /varz. So this is a different behavior that for routes. As explained above, we could harmonize the behavior in a future PR. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-24 13:42:41 -06:00
Ivan Kozlovic	48c3f7f846	Fixed some flappers Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-24 09:53:35 -06:00
Ivan Kozlovic	97ee89cc67	Check inbound GW connection connected state in parser If the first protocol for an inbound gateway connection is not CONNECT, reject with auth violation. Fixes #1006 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-22 12:31:16 -06:00
Ivan Kozlovic	1cdc3eb41f	Better randomize solicited Gateway URLs Shuffle the array created when iterating through the gateways URLs map since map iteration may not be well randomized with small maps. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-21 09:28:59 -06:00
Ivan Kozlovic	7272e4e317	Make the error report attempts configurable This is a continuation of #1000. Added a configuration to specify the number of attempts at which the repeated error is reported. The algo is now to print only the 1st attempt and when current attempt % <this config param> == 0. Resolves #969 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-20 16:28:48 -06:00
Ivan Kozlovic	03930ba0e4	[UPDATED] Reduce report of failed connection attempts This applies to routes, gateways and leaf node connections. The failed attempts will be printed at the first, after the first minute and then every hour. The connect/error statements now include the attempt number. Note that in debug mode, all attempts are traced, so you may get double trace (one for debug, one for info/error). Resolves #969 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-20 10:13:56 -06:00
Derek Collison	1c8d4b4b6e	Make sure we are set to RMSG for send to Gateways Signed-off-by: Derek Collison <derek@nats.io>	2019-05-01 15:31:54 -07:00
Ivan Kozlovic	dce9d672c1	Fixed panic with leafnode and gateway when no interest registered Say there are 2 clusters, A and B. A client connects to A and publishes messages on an account that B has no interest in. Then a leaf node server connects to B (using same account than the no-interest is for). Cluster B will ask cluster A to switch to interest mode only for leaf node account. This would cause a panic. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-01 13:40:17 -06:00
Ivan Kozlovic	9f497a6cd4	Revert to use Sublist but use the SublistNoCache version. Remove sub from rsubs sublist when user UNSUBs. Fix bench test that was not actually creating a SUB per request in the Benchmark_Gateways_Requests_CreateOneSubForEach test. Also UNSUBs older SUBs after a certain threshold to simulate actual req/reply. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-23 14:13:13 -06:00
Ivan Kozlovic	41436fb787	Updates based on comments Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-22 20:00:21 -06:00
Ivan Kozlovic	bb4e8ae0f9	Gateways: Fix race for request reply This addresses the following race: - client connection creates a subscription on a reply subject - client connection sends a request - server sends the subscription to inbound gateway - server sends the message to outbound gateway (those may be to different servers) - receiving server sends to sub interested in request subject - app sends reply - its server then check for interest on the reply's subject In interestOnly mode, there is a possibility that this server has not received the interest on the reply subject yet and would then drop the reply. This PR detects above scenario and will prefix the reply subject to identify the origin cluster if it is detected that the last subscription from the sending connection was created less than a second ago. Once the destination has this prefix, the destination cluster will always send back that message to origin cluster even if there is no registered interest. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-22 20:00:21 -06:00
Ivan Kozlovic	bf07862140	Fixed invocations of startGoRoutine Resolves #960 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-18 09:51:56 -06:00
Ivan Kozlovic	d8098c134b	Reduce startup memory for gateways Similar to #956 but for gateways code. Also fixing route test TestLargeClusterMem. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-17 15:18:46 -06:00

1 2

81 Commits