nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-11 08:51:21 -07:00

Author	SHA1	Message	Date
Ivan Kozlovic	8e4b449119	Fixed flappers Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-02-19 13:19:08 -07:00
Ivan Kozlovic	bd28a015b1	[FIXED] Sublist isSubsetMatch to handle empty tokens If a subject has empty tokens, returns false. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-01-14 18:28:14 -07:00
Ivan Kozlovic	c097357b52	[FIXED] More than expected switch to Interest-Only mode for account When an account is switched to interest-only mode due to no interest, it was not possible to switch that account more than once. But the function switchAccountToInterestMode() that triggers a switch could possibly doing it more than once. This should not cause problems but increased the number of traces in a big super cluster. Also fixed some flappers and a data race. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-01-09 13:35:08 -07:00
Ivan Kozlovic	947798231b	[UPDATED] TCP Write and SlowConsumer handling - All writes will now be done by the writeLoop, unless when the writeLoop has not been started yet (likely in connection init). - Slow consumers for non CLIENT connections will be reported but not failed. The idea is that routes, gateway, etc.. connections should stay connected as much as possible. However if a flush operation times out and no data at all has been written, the connection will be closed (regardless of type). - Slow consumers due to max pending is only for CLIENT connections. This allows sending of SUBs through routes, etc.. to not have to be chunked. - The backpressure to CLIENT connections is increased (up to 1sec) based on the sub's connection pending bytes level. - Connection is flushed on close from the writeLoop as to not block the "fast path". Some tests have been fixed and adapted since now closeConnection() is not flushing/closing/removing connection in place. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-12-31 15:06:27 -07:00
Ivan Kozlovic	a22da91647	[FIXED] Closing of Gateway or Route TLS connection may hang This could happen if the remote server is running but not dequeueing from the socket. TLS connection Close() may send/read and so we need to protect with a deadline. For non client/leaf connection, do not call flushOutbound(). Set the write deadline regardless of handshakeComplete flag, and set it to a low value. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-12-04 17:27:00 -07:00
Ivan Kozlovic	a0f8bd112e	[FIXED] Prevent A- for account that has service reply subscription Prevent sending an A- for a given account if the server has this account registered and an internal service reply subscription. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-26 16:21:36 -07:00
Ivan Kozlovic	63138509f7	Tune some code/test for Windows Running test suite on a Windows VM, I notice several failures. Updated the compute of the RTT to be at least 1ns. I think that this is just an issue with the VM I am running, but that change will have no impact for normal situations (since setting the rtt to the very minimum duration (1ns) instead of 0) and will prevent some tests from failing. Because of those same timer granularity issues, I had to add some delays between some actions in order for time.Sub()/Since() to actually report something more than 0. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-21 14:32:46 -07:00
Derek Collison	6ad8287bbe	Introduced wildcard handling of _R_ mapped replies. We had too much special processing, so reduced to a single wildcard which will propagate across routes and gateways and is consistent with gateway handling of globally routed subjects and timeouts. Signed-off-by: Derek Collison <derek@nats.io>	2019-11-16 12:50:53 -08:00
Ivan Kozlovic	b561bde366	Alternate approach to GW reply mapping expiration Use centralized sync map to gather *client that have GW replies. Tested with concurrent receiving clients and perf is as good as with timer per client but reduces need of that timer per client object. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-11 13:36:24 -07:00
Ivan Kozlovic	cacfb4a08c	Fix some gateway tests Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-08 19:07:57 -07:00
Ivan Kozlovic	9b7dab0548	Updates based on code review - Add atomic in client to skip check in processInboundClientMsg() if value is 0. Avoids getting the lock in fast path if not needed. - Have a timer per client instead of the global server list that was expiring: noticed a lot of contention there when running some perf/profiling tests. The timer is also not reset for every timestamp that is not yet expired since this too affects performance. Instead fires are regular interval and cleared when map is empty after a cycle. - Move processing of gw map rely on its own function (in inbound msg). I have verified that this is inlined same way as when code was directly in processInboundClientMsg. - Use string(subj[]) for prefix detection: I have verified that it is actually faster. - Builds the RMSG with appends to local buffer in handleGatewayReply() instead of using fmt.Sprintf(). Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-08 15:56:28 -07:00
Ivan Kozlovic	aa843945c9	Work on Gateways reply mapping - New prefix that includes origin server for the request - Mapping done if request is service import or requestor has recent subscription - Subscription considered recent if less than 250ms - Destination server strip GW prefix before giving to client and restore when getting a reply on that subject - Mapping removed aftert 250ms - Server rejects client publish on "$GNR." (the new prefix) - Cluster and server hash are now 8 chars long and from base 62 alphabets - Mapped replies need to be sent to leafnode servers due to race (cluster B sends RS+ on GW inbound then RMSG on outbound, the RS+ may be processed later and cluster A may have given message to LN before RS+ on reply subject. So LN needs to accept the mapped reply but will strip to give to client and reassemble before sending it back) Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-06 16:06:49 -07:00
Ivan Kozlovic	75ec78c232	[FIXED] Explicit gateway not using discovered URLs If cluster A configures a gateway to cluster B, the server on A tries to connect to that server URL. If there is no server on B at that address, but a server on B with different address connects to server on cluster A, that server should be able to create its outbound connection in response. That was not the case because the configured URLs were snapshot before the loop of trying to connect. When accepting an inbound connection and updating the array, this new URL was not being used. The issue is only if the server on A had no outbound connection at that time. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-10-24 16:40:38 -06:00
Ivan Kozlovic	77c63dbce1	Fix flappers Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-20 17:07:22 -06:00
Ivan Kozlovic	2f48ad5150	Fixed subscription close I noticed that TestNoRaceRoutedQueueAutoUnsubscribe started to fail a lot on Travis. Running locally I could see a 45 to 50% failures. After investigation I realized that the issue was that we have wrongly re-used `subscription.nm` and set to -1 on unsubscribe however, I believe that it was possible that when subscription was closed, the server may have already picked that consumer for a delivery which then causes nm==-1 to be bumped to 0, which was wrong. Commenting out the subscription.close() that sets nm to -1, I could not get the test to fail on macOS but would still get 7% failure on Linux VM. Adding the check to see if sub is closed in deliverMsg() completely erase the failures, even on Linux VM. We could still use `nm` set to -1 but check on deliverMsg(), the same way I use the closed int32 now. Fixed some flappers. Updated .travis.yml to failfast if one of the command in the `script` fails. User `set -e` and `set +e` as recommended in https://github.com/travis-ci/travis-ci/issues/1066 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-20 14:39:23 -06:00
Ivan Kozlovic	c20afd4016	[FIXED] Connection could be closed twice This was introduced in PR#930. The first commit had the route's check if the flushOutbound() returned false, and if so would locally unlock/lock the connection's lock. Unfortunately, this was replaced in the second commit (`a6aeed3a6b`) to the flushOutbound() function itself. This causes the function closeConnection() to possibly unlock the connection while calling flushOutbound(), which if the connection is closed due to both a tls timeout for instance and explicitly, it would result in the connection being scheduled for a reconnect (if explicit gateway connection, possibly route). Added defensive code in Gateway to register a unique outbound gateway. Fixed a test that was now failing with newer Go version in which they fixed url.Parse() Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-13 20:11:03 -06:00
Ivan Kozlovic	ed1901c792	Update go.mod to satisfy v2 requirements Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-06-03 19:45:47 -06:00
Ivan Kozlovic	37b3546e7b	Switch gateway to InterestMode only once When a leafnode connection is created, the server forces all gateway inbound connections to switch to InterestMode. Do this only once, regardless of how many times the LN (re)connects. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-30 17:21:15 -06:00
Ivan Kozlovic	2d4c3dd38f	Added logging of account interest mode switch for gateways Both sides will log when an account is switched to interest-only mode. There are 2 traces (start/complete) per account. They are logged at [INF] level. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-28 14:55:45 -06:00
Ivan Kozlovic	5478eaf01e	Added /gatewayz endpoint Such endpoint will list the gateway/cluster name, address and port then list of outbound/inbound connections. For each remote gateway there will be at most one outbound connection. There can be 0 or more inbound connections for the same remote gateway. For each of these outbound/inbound connection, the connection info similar to Connz is reported. Optionally, one can include the interest mode/stats for each account. Here are possible options: * No specific options http://host:port/gatewayz * Limit to specific remote gateway, say name "B": http://host:port/gatewayz/gw_name=B * Include accounts (default limit to 1024 accounts) http://host:port/gatewayz/accs=1 * Specific limit, say 200 (note accs=1 in this case is optional) http://host:port/gatewayz/accs=1&accs_limit=200 * Specific account, say "acc_1". Note that accs=1 is not required then http://host:port/gatewayz/acc_name=acc_1 * Above options can be mixed: specific remote gateway (B), with 100 accounts reported http://host:port/gatewayz/gw_name=B&accs_limit=200 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-28 12:41:09 -06:00
Ivan Kozlovic	4ed08dde07	Merge pull request #1013 from nats-io/fix_gw_qinterest_loss Fixed loss of queue subscription interest across Gateways in some cases	2019-05-26 18:23:06 -06:00
Ivan Kozlovic	ce1e6defab	Fix flappers - TestSystemAccountConnectionUpdatesStopAfterNoLocal: I believe that the check on number of notifications was wrong. Since we did not consume the ones for the connect, the expected count after the disconnect is 8 instead of 4. - Possible fix GW tests complaining about number of outbound/inbound I think that it may be possible that connection does not succeed right away (remote to fully started, etc) and due to dial timeout and reconnect attempt delay, I suspect that when given a max time of 1sec to complete, it may not be enough. Quick change for now is to override to 2secs for now in the wait helpers. If that proves conclusive, we could remove the timeout given to these helpers. - TestGatewaySendAllSubsBadProtocol: used a t.Fatalf() in checkFor instead of return fmt.Errorf(). - TestLeafNodeResetsMSGProto: this test is not about change to interest mode only, so to avoid possible mix of protos, delay a bit creation of gateway after creation of leaf node. - Some defer s.Shutdown() were missing Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-26 17:17:08 -06:00
Ivan Kozlovic	b325cf1e4a	Fixed loss of queue subscription interest across Gateways in some cases Suppose two servers, SA in cluster A and SB in cluster B. If SA sends a message to SB on an account for which there is no interest at all (account not known or no subscription), SB will send an A- and keep track that it sent an A- for this account. When a queue subscription is created on SB, SB will send and RS+ to A because A needs to have perfect knowledge of all queue subs in all clusters. If then a regular subscription is also created on SB, SB will think that it needs to send an A+ because it had sent an A- for this account. However, SA had an entry for this account for the queue sub. The A+ would clear the entry in the map and would cause SA to not send messages to SB even if they would have been a match for the queue sub on SB. We fix this in two ways: - Clear the possible A- in SB when sending an RS+ for queue sub - Processing of A-/A+ to be aware of a possible entry in the map due to queue subs. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-25 16:27:00 -06:00
Derek Collison	933f5d0df4	Add in TestGatewayServiceExportWithWildcards Signed-off-by: Derek Collison <derek@nats.io>	2019-05-21 15:28:42 -07:00
Derek Collison	67bb08af8b	Fixes for a few flappers. TestJWTAccountImportActivationExpires TestGatewayServiceImportWithQueue Signed-off-by: Derek Collison <derek@nats.io>	2019-05-21 15:12:31 -07:00
Ivan Kozlovic	1eff7bc112	Fixed gateway test race report Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-13 11:49:29 -06:00
Derek Collison	d7140a0fd1	Update for client rename Signed-off-by: Derek Collison <derek@nats.io>	2019-05-10 15:11:30 -07:00
Derek Collison	acfe372d63	Changes for rename from gnatsd -> nats-server Signed-off-by: Derek Collison <derek@nats.io>	2019-05-06 15:04:24 -07:00
Derek Collison	f320f318b7	Fixed merge conflict Signed-off-by: Derek Collison <derek@nats.io>	2019-04-23 17:28:42 -07:00
Derek Collison	bfe83aff81	Make account lookup faster with sync.Map Signed-off-by: Derek Collison <derek@nats.io>	2019-04-23 17:13:23 -07:00
Ivan Kozlovic	bb4e8ae0f9	Gateways: Fix race for request reply This addresses the following race: - client connection creates a subscription on a reply subject - client connection sends a request - server sends the subscription to inbound gateway - server sends the message to outbound gateway (those may be to different servers) - receiving server sends to sub interested in request subject - app sends reply - its server then check for interest on the reply's subject In interestOnly mode, there is a possibility that this server has not received the interest on the reply subject yet and would then drop the reply. This PR detects above scenario and will prefix the reply subject to identify the origin cluster if it is detected that the last subscription from the sending connection was created less than a second ago. Once the destination has this prefix, the destination cluster will always send back that message to origin cluster even if there is no registered interest. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-22 20:00:21 -06:00
Ivan Kozlovic	d8098c134b	Reduce startup memory for gateways Similar to #956 but for gateways code. Also fixing route test TestLargeClusterMem. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-17 15:18:46 -06:00
Ivan Kozlovic	4ea96337ed	Added test gateway tlsConfig.ServerName Checks that if not provided server fails to connect to remote gateway. Once set to expected hostname ("localhost"), connection works. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-12 11:21:57 -06:00
Ivan Kozlovic	18399a3808	Gateways: Rework Account Sub/Unsub We now send A- if an account does not exists, or if there is no interest on a given subject and no existing subscription. An A+ is sent if an A- was previously sent and a subscription for this account is registered. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-02-26 18:34:30 -07:00
Ivan Kozlovic	7ad4498a09	Gateways: Remove unused permissions options Permissions were configured but not implemented. Removing for now. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-01-10 09:49:36 -07:00
Ivan Kozlovic	7c220ba700	Support for service export with wildcards Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-12-13 21:22:01 -07:00
Ivan Kozlovic	519c3dab47	Add Gateway test for service import and interest only Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-12-11 14:44:02 -08:00
Ivan Kozlovic	4b70cdfc89	Fix Gateways with Service Imports Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-12-11 00:27:40 -08:00
Ivan Kozlovic	6eaa1dc351	Resolve IP if gateway listen is 0.0.0.0 or :: Otherwise, this may be sent to servers in the cluster and to other gateways which may result in attempt to connect to self which in case of TLS would produce error. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-12-07 17:28:21 -07:00
Ivan Kozlovic	95a5f79ac7	Added Gateway test for service import with queue group Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-12-06 19:13:39 -07:00
Ivan Kozlovic	111e050d32	Allow service import to work with Gateways This is not complete solution and is a bit hacky but is a start to be able to have service import work at least in some basic cases. Also fixed a bug where replySub would not be removed from connection's list of subs after delivery. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-12-05 20:35:43 -07:00
Derek Collison	2d54fc3ee7	Account lookup failures, account and client limits, options reload. Changed account lookup and validation failures to be more understandable by users. Changed limits to be -1 for unlimited to match jwt pkg. The limits changed exposed problems with options holding real objects causing issues with reload tests under race mode. Longer term this code should be reworked such that options only hold config data, not real structs, etc. Signed-off-by: Derek Collison <derek@nats.io>	2018-12-05 14:25:40 -08:00
Ivan Kozlovic	e7b6c5731e	Update based on comments Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-12-03 17:17:55 -07:00
Ivan Kozlovic	a23ef5b740	Switch to send-all-subs when number of RS- gets too big Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-12-03 13:15:11 -07:00
Ivan Kozlovic	f011db47c7	Fixed race issue with lookup/update of the sent no-interest map We can't use a simple sync.Map here because the noInterest map for inbound gateway connections are used concurrently. Indeed, whenever an account would have been registered or a new sub created this could trigger an update of that map in order to clear the fact that we had sent an A-/RS- and now are sending an A+/RS+. So changed to simple map but protected by gw connection's lock. Without this change, server would panic if there are messages published to cluster A that are sent to server B while a sub is then created on matching subject on B. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-11-29 14:22:56 -07:00
Ivan Kozlovic	60462b2a44	Fixed gateway flapper Need to make sure message is received before unsub'ing because otherwise it would be possible that the unsub happens before message is delivered, which would have resulted in an RS- while we were expecting the message to not cause one. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-11-28 19:16:35 -07:00
Ivan Kozlovic	cfc5ec4d44	Fixed test and remove grace period from total duration. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-11-28 18:25:15 -07:00
Ivan Kozlovic	086b26f14a	Gateways: Ignore reference to self Allows the use of a global include for all gateways and each gateway will ignore its own reference. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-11-28 14:24:28 -07:00
Ivan Kozlovic	d78b1ae464	Fixed issue with gateways - If/when splitting buffer to pass to queueOutbound(), it has to be include full protocol. - Fix counting of total queue subs - Fix tests - Send RS- if no plain sub interest even if there is queue sub interest. - Removed a one-liner function Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-11-28 13:15:47 -07:00
Ivan Kozlovic	52c724a83c	Updates based on comments - Solve RS+ with wildcards - Solve issue with messages not send to remote gateways queue subs if there was a qsub on local server. - Made rcache a perAccountCache since it is now used by routes and gateways - Order outbound gateways only on RTT updates - Print a server's gateway name on startup - Augment/add some tests - Update TLS handling: when connecting, use hostname for ServerName if url is not IP, otherwise use a hostname that we saved when parsing/adding URLs for the remote gateway. - Send big buffer in chunks if needed. - Add caching for qsubs match Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2018-11-27 19:39:41 -07:00

1 2

51 Commits