nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-11 08:51:21 -07:00

Author	SHA1	Message	Date
Matthias Hanel	6a1c3fc29b	Moving inbound tracing to the caller (client.parse) Tracing for outgoing operations is always done while holding the client lock. Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-04 17:31:18 -05:00
Matthias Hanel	fe373ac597	Incorporating comments. c -> client defer in oneliner argument order Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-04 15:48:19 -05:00
Matthias Hanel	f5bd07b36c	[FIXED] trace/debug/sys_log reload will affect existing clients Fixed #1296, by altering client state on reload Detect a trace level change on reload and update all clients. To avoid data races, read client.trace while holding the lock, pass the value into functionis that trace while not holding the lock. Delete unused client.debug. Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-04 13:54:15 -05:00
Ivan Kozlovic	c73be88ac0	Updated based on comments Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-01-06 16:57:48 -07:00
Ivan Kozlovic	947798231b	[UPDATED] TCP Write and SlowConsumer handling - All writes will now be done by the writeLoop, unless when the writeLoop has not been started yet (likely in connection init). - Slow consumers for non CLIENT connections will be reported but not failed. The idea is that routes, gateway, etc.. connections should stay connected as much as possible. However if a flush operation times out and no data at all has been written, the connection will be closed (regardless of type). - Slow consumers due to max pending is only for CLIENT connections. This allows sending of SUBs through routes, etc.. to not have to be chunked. - The backpressure to CLIENT connections is increased (up to 1sec) based on the sub's connection pending bytes level. - Connection is flushed on close from the writeLoop as to not block the "fast path". Some tests have been fixed and adapted since now closeConnection() is not flushing/closing/removing connection in place. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-12-31 15:06:27 -07:00
Ivan Kozlovic	a22da91647	[FIXED] Closing of Gateway or Route TLS connection may hang This could happen if the remote server is running but not dequeueing from the socket. TLS connection Close() may send/read and so we need to protect with a deadline. For non client/leaf connection, do not call flushOutbound(). Set the write deadline regardless of handshakeComplete flag, and set it to a low value. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-12-04 17:27:00 -07:00
Derek Collison	6ad8287bbe	Introduced wildcard handling of _R_ mapped replies. We had too much special processing, so reduced to a single wildcard which will propagate across routes and gateways and is consistent with gateway handling of globally routed subjects and timeouts. Signed-off-by: Derek Collison <derek@nats.io>	2019-11-16 12:50:53 -08:00
Ivan Kozlovic	d85f9a9388	Fixed bug with duplicate route and GW replies When a duplicate route is detected and closed, we need to clear the route's hash in order to prevent the removal from the server's routeByHash map. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-15 17:24:50 -07:00
Ivan Kozlovic	aa843945c9	Work on Gateways reply mapping - New prefix that includes origin server for the request - Mapping done if request is service import or requestor has recent subscription - Subscription considered recent if less than 250ms - Destination server strip GW prefix before giving to client and restore when getting a reply on that subject - Mapping removed aftert 250ms - Server rejects client publish on "$GNR." (the new prefix) - Cluster and server hash are now 8 chars long and from base 62 alphabets - Mapped replies need to be sent to leafnode servers due to race (cluster B sends RS+ on GW inbound then RMSG on outbound, the RS+ may be processed later and cluster A may have given message to LN before RS+ on reply subject. So LN needs to accept the mapped reply but will strip to give to client and reassemble before sending it back) Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-11-06 16:06:49 -07:00
Ivan Kozlovic	d20f76cbaa	Merge pull request #1166 from nats-io/add_servername_to_routestat [ADDED] Server name in the RouteStat for statsz	2019-10-28 13:19:53 -06:00
Ivan Kozlovic	5a44e3b4c6	Changes on how tests can override route protocol I may need to introduce a new route protocol version for an upcoming PR and realized that this needed some cleaning. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-10-26 10:12:30 -06:00
Ivan Kozlovic	12eb1f5b00	[ADDED] Server name in the RouteStat for statsz Add the remote server name for a route in the statsz event Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-10-25 16:34:07 -06:00
Ivan Kozlovic	15201a19cd	Fixed a lock inversion issue with account In updateRouteSubscriptionMap(), when a queue sub is added/removed, the code locks the account and then the route to send the update. However, when a route is accepted and the subs are sent, the opposite (locking wise) occurs. The route is locked, then the account. This lock inversion is possible because a route is registered (added to the server's map) and then the subs are sent. Use a special lock to protect the send, but don't hold the acc.mu lock while getting the route's lock. The tests that were created for the original missed queue updates issue, namely TestClusterLeaksSubscriptions() and TestQueueSubWeightOrderMultipleConnections() pass with this change. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-09-13 14:30:00 -06:00
Ivan Kozlovic	2a8973a62b	Fixed flushOutbound With Go 1.12 (strangely was not able to reproduce with Go 1.11) the test TestRouteNoCrashOnAddingSubToRoute() would frequently locks up and consume all avail CPUs on the machine. Running this test with GOMAXPROCS=2 you would see server.test CPU usage pegged at 200% (assuming you have at least 2 CPUs). The reason was that the writeLoop was spinning because another routine was already in flushOutbound() and stack trace would show that it was stuck in system calls. It seems that even though the writeLoop does release the lock but grab it right away was not allowing the syscall to complete. So decided to put back the unlock/gosched/lock back in flushOutbound() when flag is already set, but then protect the closeConnection() with its own flag (similar to clearConnection) to not re-introduce issue fixed in #1092. Had to fix the benchmark test RoutedInterestGraph because after a route is accepted, the initial PING will be sent after 1sec which was breaking this test. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-29 12:59:27 -06:00
Ivan Kozlovic	cd9f898eb0	Made a server's helper to set first ping timer Defaults to 1sec but will be opts.PingInterval if value is lower. All non client connections invoked this function for the first PING. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-26 10:21:43 -06:00
Ivan Kozlovic	90d592e163	Leaf and Route RTT When a leaf or route connection is created, set the first ping timer to fire at 1sec, which will allow to compute the RTT reasonably soon (since the PingInterval could be user configured and set much higher). For Route in PR #1101, I was sending the PING on receiving the INFO which required changing bunch of tests. Changing that to also use the first timer interval of 1sec and reverted changes to route tests. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-26 09:34:17 -06:00
Ivan Kozlovic	89dd13f134	[ADDED] RTT in routez's route info Added the RTT field to each route reported in routez. Ensure that when a route is accepted, we send a PING to compute the first RTT and don't have to wait for the ping timer to fire. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-20 14:16:07 -06:00
Ivan Kozlovic	c20afd4016	[FIXED] Connection could be closed twice This was introduced in PR#930. The first commit had the route's check if the flushOutbound() returned false, and if so would locally unlock/lock the connection's lock. Unfortunately, this was replaced in the second commit (`a6aeed3a6b`) to the flushOutbound() function itself. This causes the function closeConnection() to possibly unlock the connection while calling flushOutbound(), which if the connection is closed due to both a tls timeout for instance and explicitly, it would result in the connection being scheduled for a reconnect (if explicit gateway connection, possibly route). Added defensive code in Gateway to register a unique outbound gateway. Fixed a test that was now failing with newer Go version in which they fixed url.Parse() Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-08-13 20:11:03 -06:00
Derek Collison	8f5bc503e5	Add ability for cross account import services to return streams as well as singeltons. Take into account tracking of response maps that are created and do proper cleanup. Also fixes #1089 which was discovered while working on this. Signed-off-by: Derek Collison <derek@nats.io>	2019-08-06 14:15:40 -07:00
Derek Collison	495a1a7ec3	Allow dynamic publish permissions based on reply subjects of received msgs Signed-off-by: Derek Collison <derek@nats.io>	2019-07-25 13:17:26 -07:00
Derek Collison	df29be11ed	Changes based on PR comments Signed-off-by: Derek Collison <derek@nats.io>	2019-07-22 18:37:40 -07:00
Derek Collison	1d6c58074f	Fix for #1065 (leaked subscribers from dq subs across routes) Signed-off-by: Derek Collison <derek@nats.io>	2019-07-22 17:17:43 -07:00
Ivan Kozlovic	0873b46f67	[FIXED] LeafNode urls may be missing in INFO sent to LN connections When a cluster of servers are having routes to each other, there is a chance that the list of leafnode URLs maintained on each server is not complete. This would result in LN servers connecting to this cluster to not get the full list of possible URLs the server could reconnect to. Also fixed a DATA RACE that appeared when running the updated TestLeafNodeInfoURLs test. Fixed the race and added specific test that easily demonstrated the race: TestLeafNodeNoRaceGeneratingNonce Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-07-12 19:15:30 -06:00
Ivan Kozlovic	9e09486e26	Use all caps for the production message Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-07-12 13:44:01 -06:00
Ivan Kozlovic	37d08a6c56	[FIXED] Allow TLS InsecureSkipVerify again This has an effect only on connections created by the server, so routes and gateways (explicit and implicit). Make sure that an explicit warning is printed if the insecure property is set, but otherwise allow it. Resolves #1062 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-07-12 12:10:28 -06:00
Derek Collison	d1a782e014	Messages not distributed evenly when sourced from leafnode. When messages came from a leafnode there were not being distributed evenly to the destination cluster. Signed-off-by: Derek Collison <derek@nats.io>	2019-06-11 20:37:49 -07:00
Derek Collison	bd589fb20c	Warn on no random client Signed-off-by: Derek Collison <derek@nats.io>	2019-05-28 16:17:25 -07:00
Ivan Kozlovic	d2578f9e05	Update to connect/reconnect error reports logic Changed the introduced new option and added a new one. The idea is to be able to differentiate between never connected and reconnected event. The never connected situation will be logged at first attempt and every hour (by default, configurable). However, once connected and if trying to reconnect, will report every attempts by default, but this is configurable too. These two options are supported for config reload. Related to #1000 Related to #1001 Resolves #969 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-26 17:51:01 -06:00
Ivan Kozlovic	7272e4e317	Make the error report attempts configurable This is a continuation of #1000. Added a configuration to specify the number of attempts at which the repeated error is reported. The algo is now to print only the 1st attempt and when current attempt % <this config param> == 0. Resolves #969 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-20 16:28:48 -06:00
Ivan Kozlovic	03930ba0e4	[UPDATED] Reduce report of failed connection attempts This applies to routes, gateways and leaf node connections. The failed attempts will be printed at the first, after the first minute and then every hour. The connect/error statements now include the attempt number. Note that in debug mode, all attempts are traced, so you may get double trace (one for debug, one for info/error). Resolves #969 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-05-20 10:13:56 -06:00
Derek Collison	f320f318b7	Fixed merge conflict Signed-off-by: Derek Collison <derek@nats.io>	2019-04-23 17:28:42 -07:00
Derek Collison	bfe83aff81	Make account lookup faster with sync.Map Signed-off-by: Derek Collison <derek@nats.io>	2019-04-23 17:13:23 -07:00
Ivan Kozlovic	9f497a6cd4	Revert to use Sublist but use the SublistNoCache version. Remove sub from rsubs sublist when user UNSUBs. Fix bench test that was not actually creating a SUB per request in the Benchmark_Gateways_Requests_CreateOneSubForEach test. Also UNSUBs older SUBs after a certain threshold to simulate actual req/reply. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-23 14:13:13 -06:00
Ivan Kozlovic	41436fb787	Updates based on comments Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-22 20:00:21 -06:00
Ivan Kozlovic	bb4e8ae0f9	Gateways: Fix race for request reply This addresses the following race: - client connection creates a subscription on a reply subject - client connection sends a request - server sends the subscription to inbound gateway - server sends the message to outbound gateway (those may be to different servers) - receiving server sends to sub interested in request subject - app sends reply - its server then check for interest on the reply's subject In interestOnly mode, there is a possibility that this server has not received the interest on the reply subject yet and would then drop the reply. This PR detects above scenario and will prefix the reply subject to identify the origin cluster if it is detected that the last subscription from the sending connection was created less than a second ago. Once the destination has this prefix, the destination cluster will always send back that message to origin cluster even if there is no registered interest. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-22 20:00:21 -06:00
Ivan Kozlovic	bf07862140	Fixed invocations of startGoRoutine Resolves #960 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-18 09:51:56 -06:00
Derek Collison	f1d06d6c5b	Reduce startup memory for cluster Signed-off-by: Derek Collison <derek@nats.io>	2019-04-17 11:05:29 -07:00
Ivan Kozlovic	4dd1b26cc5	Add a warning if cluster's insecure setting is enabled For cluster, we allow to skip hostname verification from certificate. We now print a warning when this option is enabled, both on startup or if the property is enabled on config reload. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-09 17:37:53 -06:00
Ivan Kozlovic	98161722dc	Merge pull request #930 from nats-io/route_send_subs_go_routine_threshold Conditional send of routed subs from a go routine	2019-04-08 14:03:41 -06:00
Ivan Kozlovic	a6aeed3a6b	Move unlock/gosched/lock in flushOutbound Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-08 13:57:23 -06:00
Ivan Kozlovic	6b1918efb4	LeafNode: support for advertise A server that creates a LeafNode connection to a remote cluster will now be notified of all possible LeafNode URLs in that cluster. The list is updated when nodes in the cluster come and go. Also support for advertise address, similar to cluster, gateway, etc.. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-04-08 10:54:39 -06:00
Ivan Kozlovic	2a86112a30	Conditional send of routed subs from a go routine When a route is established, it is possible that each server sends its list of subscriptions to each other at the same time. Doing it in place from the readLoop could then cause problems because each side could reach a point where the outbound socket buffer is full and no one is dequeuing data (since readLoop is doing the send of the subs list). We changed sending this list from a go routine. However, for small number of subscriptions, it is not required and was causing some of the tests to fail because of timing issues. We will now send in place if the estimated size of all protocols is below a give threshold (1MB). Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-03-26 17:21:33 -06:00
Derek Collison	bacb73a403	First pass at leaf nodes. Basic functionality working, including gateways. What is not completed: 1. TLS 2. config to bind local account. 3. Info updates for solicitor to track topology changes like a client. 4. CONNECT sent after INFO for nonce authroization. 5. Authorization 6. Services and Streams tests. 7. config file parsing. Signed-off-by: Derek Collison <derek@nats.io>	2019-03-25 08:54:47 -07:00
Ivan Kozlovic	65cc218cba	[FIXED] Allow use of custom auth with config reload Resolves #923 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-03-20 15:45:17 -06:00
Ivan Kozlovic	04d824c4d4	[FIXED] Possible slow consumers when routes exchange sub list If each server has a long list of subscriptions, when the route is established, sending this list could result in each server treating the peer as a slow consumer, resulting in a reconnect, etc.. Also bumping the fan-in threshold for route connections. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-02-20 12:09:26 -08:00
Derek Collison	af78552549	Move ints to proper sizes for all Signed-off-by: Derek Collison <derek@nats.io>	2019-02-05 15:19:59 -08:00
Ivan Kozlovic	c310489689	Merge pull request #872 from nats-io/fix_mem_usage_on_tls_failure [FIXED] Memory usage for failed TLS connections	2019-01-10 09:16:16 -07:00
Ivan Kozlovic	ae239dc3b5	Fixed data race Resolves #870 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-01-09 18:41:48 -07:00
Ivan Kozlovic	b075c00103	[FIXED] Memory usage for failed TLS connections Moving some of the connection initialization post TLS handshake to avoid temporary memory growth when getting repeated failed connections to any of the client, route and gateway ports. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-01-09 15:50:23 -07:00
Ivan Kozlovic	7449e9ac53	Replace megacheck with staticcheck Fixed issues reported by staticcheck Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-01-09 14:14:47 -07:00

1 2 3 4

192 Commits