nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-14 18:20:42 -07:00

Author	SHA1	Message	Date
Derek Collison	f13fa767c2	Remove the swapping of accounts during processing of service imports. When processing service imports we would swap out the accounts during processing. With the addition of internal subscriptions and internal clients publishing in JetStream we had an issue with the wrong account being used. This was specific to delyaed pull subscribers trying to unsubscribe due to max of 1 while other JetStream API calls were running concurrently.	2021-07-26 07:57:10 -07:00
Derek Collison	1270977322	When receiving a response across a gateway that has headers and a globally routed subject (_GR_) we were dropping header information. Signed-off-by: Derek Collison <derek@nats.io>	2021-06-10 14:29:33 -07:00
Matthias Hanel	b1dee292e6	[changed] pinned certs to check the server connected to as well (#2247 ) * [changed] pinned certs to check the server connected to as well on reload clients with removed pinned certs will be disconnected. The check happens only on tls handshake now. Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-05-24 17:28:32 -04:00
Matthias Hanel	6f6f22e9a7	[added] pinned_cert option to tls block hex(sha256(spki)) (#2233 ) * [added] pinned_cert option to tls block hex(sha256(spki)) When read form config, the values are automatically lower cased. The check when seeing the values programmatically requires lower case to avoid having to alter the map at this point. Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-05-20 17:00:09 -04:00
Ivan Kozlovic	2881e4a1f0	[FIXED] MQTT fixes and improvements Some issues that have been fixed would manifest by timeouts on connect, unexpected memory usage on high publish message rate. Some details: - Replies were not always GW routed properly because we were looking at the wrong connection's rsubs - GW routed replies would not be found because they were tracked in the subscription's client object, which may not be the same used to send the reply - Increased the mqtt timeout to wait for JS replies since in some tests it was sometimes taking more than the original 2 seconds - Incoming gateway messages destined for an MQTT internal subscription may have been rejected as a no interest if the account had service imports - Don't use time.After(), instead create explicit timer so it can be stopped when not timing out. - Unnecessary copy of a slice since we were converting to a string anyway. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-05-04 20:48:14 -06:00
Jaime Piña	e12181cb83	Return not ready for connection reason Currently, we use ReadyForConnections in server tests to wait for the server to be ready. However, when this fails we don't get a clue about why it failed. This change adds a new unexported method called readyForConnections that returns an error describing which check failed. The exported ReadyForConnections version works exactly as before. The unexported version gets used in internal tests only.	2021-04-20 11:45:08 -07:00
Ivan Kozlovic	56d0d9ec87	Do not propagate service import interest across GW and ROUTES Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-04-15 11:34:36 -06:00
Derek Collison	8eefff2b3b	Make sure the jetstream accounts use the name as the key to the map. This prevents possible double adds under reload or restart scenarios. Signed-off-by: Derek Collison <derek@nats.io>	2021-03-18 17:29:26 -07:00
Ivan Kozlovic	cbcff97244	[CHANGED] Move Gateway interest-only mode switch from INF to DBG Also fixed a test that would sometimes fail depending on timing. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-03-14 11:34:36 -06:00
Ivan Kozlovic	27f51d4028	Fix ephemeral consumer delete in single cluster Also remove retry of sources/mirror in the setSourceConsumer() itself when not getting a response. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-03-10 15:16:31 -07:00
Ivan Kozlovic	e7e756034a	Switch Gateway JS accounts to interest-only mode + some other fixes - Fixed the close of a TLS connection which starting Go 1.16 set the deadline to 5 seconds. - Fixed an issue with setHeader that was causing these error messages ``` === RUN TestServiceImportReplyMatchCycleMultiHops nats: message could not decode headers on connection [4] for subscription on "foo" --- PASS: TestServiceImportReplyMatchCycleMultiHops (0.04s) ``` - Fixed names of tests in norace_test.go since they must start with TestNoRace in order to make sure that we execute them in Travis: ``` go test -v -run=TestNoRace --failfast -p=1 ./... ``` Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-03-03 19:15:28 -07:00
Matthias Hanel	c50ee2a1c6	[Changed] all times exposed will be computed in UTC (#1943 ) This also applies to times that end up in that json. Where applicable moved time.Now() to where it is used. Moved calls to .UTC() to where time is created it that time is converted later anyway. Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-03-02 21:37:42 -05:00
Ivan Kozlovic	1652fe62ef	Updates to when do snapshot Remove panic on runAsLeader when not able to subscribe (which happens on shutdown) Gateway name access does not need lock since it is immutable. Will prevent deadlocks in some situations. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-02-23 19:06:07 -07:00
Derek Collison	bb58d455f6	Revert switching to interest only mode Signed-off-by: Derek Collison <derek@nats.io>	2021-02-23 18:00:47 -08:00
Derek Collison	6d6a6c07ff	Don't send empty subjects, always put system account in interest only Signed-off-by: Derek Collison <derek@nats.io>	2021-02-23 10:57:12 -08:00
Derek Collison	fa8a74ceb5	Allow placement directives for metacontroller stepdown to allow placement to new clusters. Signed-off-by: Derek Collison <derek@nats.io>	2021-02-19 10:55:22 -08:00
Ivan Kozlovic	61bd1b8d86	MQTT clustering Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-02-19 08:50:00 -07:00
Ivan Kozlovic	8598de6dbe	[FIXED] Gateway's implicit connection not using global user/pass If a gateway is configured with an authorization block containing username and password and accepts an unknown Gateway connection, when initiating the outbound connection, it should use the gateway authorization's user/pass information. Resolves #1912 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-02-16 10:06:06 -07:00
Derek Collison	6d32c307ef	Remove pretty indent for json. Signed-off-by: Derek Collison <derek@nats.io>	2021-02-06 20:09:44 -08:00
Ivan Kozlovic	2b8c6e0124	Support for Websocket Leafnode connections Added two options in the remote leaf node configuration - compress, for websocket only at the moment - ws_masking, to force remote leafnode connections to mask websocket frames (default is no masking since it is communication between server to server) Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-01-28 13:13:11 -07:00
Ivan Kozlovic	131be1cb33	Make TLS client/server handshake helpers function This reduces code duplication Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-01-28 13:13:11 -07:00
Ivan Kozlovic	ef38abe75b	Fixed gateway reply mapping following changes in JetStream clustering Those changes are required to maintain backward compatibility. Since the replies are "_G_.<gateway name hash>.<server ID hash>" and the hash were 6 characters long, changing to 8 the hash function would break things. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-01-15 17:32:04 -07:00
Derek Collison	f0cdf89c61	JetStream Clustering WIP Signed-off-by: Derek Collison <derek@nats.io>	2021-01-14 01:14:52 -08:00
Ivan Kozlovic	d24e9b75b3	Fixed GW implicit reconnection PR #1412 had a fix for races during implicit GW reconnection. However, the fix was a bit too simplistic in that it was checking only if there was any inbound gateway to decide to try to reconnect an implicit disconnected GW. We need to check the name, not only presence of inbound GW connections. Related to #1412 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-12-28 12:28:55 -07:00
Ivan Kozlovic	fc1521636c	[FIXED] Config reload for gateways/leaf remote TLS configurations Presence of TLS config in any remote gateway or leafnode would cause the config reload to fail (because TLS config internal content may change which fails the DeepEqual check). This PR excludes the TLS configs in such case to check for changes in gateways and leafnodes. Although GW and LN config reload is technically supported, this PR updates the internal remotes' TLS configuration so that changes/updates to TLS certificates would take effect after a configuration reload. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-12-11 16:56:25 -07:00
Ivan Kozlovic	ffd476357e	[CHANGED] Gateway connections now always send PINGs Connections normally suppress sending PINGs if there was some activity. We now force GATEWAY connections to send PINGs at the configured interval or 15 seconds, whichever is the smallest. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-11-03 13:13:09 -07:00
Ivan Kozlovic	2ad2bed170	[ADDED] Support for route hostname resolution We previously simply called DialTimeout() on a route's url when soliciting. If it resolved to the IP of the host, it would create a route to self, which server detects, but then would not try again with other IPs that would have allowed to form a cluster with other servers running on the other IPs. This PR keeps track of local IPs + cluster port and exclude them from the list of IPs returned by LookupHost API. This even prevent solicitation of routes to self. Only non-local IPs will be tried. Resolves #1586 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-09-08 13:40:17 -06:00
Phil Pennock	3c680eceb9	Inhibit Go's default TCP keepalive settings for NATS (#1562 ) Inhibit Go's default TCP keepalive settings for NATS Go 1.13 changed the semantics of the tuning parameters for TCP keepalives, including the default value. This affects all TCP listeners. The NATS protocol has its own L7 keepalive system (PING/PONG) and the Go defaults are not a good fit for some valid deployment scenarios, while Go doesn't directly expose a working API for tuning these. Rather than add a configuration knob and pull in another dependency (with portability issues) just disable TCP keepalives for all listeners used for speaking the NATS protocol. Change the tests so we test the same logic. Do not change HTTP monitoring, profiling, or the websocket API listeners. Change KeepAlive on client connections too.	2020-08-14 13:37:59 -04:00
Ivan Kozlovic	b9764db478	Renamed gossipURLs type and moved its declaration to util.go Also made the add/remove/getAsStringSlice receiver for this type. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-16 11:22:58 -06:00
Ivan Kozlovic	9b0967a5d1	[FIXED] Handling of gossiped URLs If some servers in the cluster have the same connect URLs (due to the use of client advertise), then it would be possible to have a server sends the connect_urls INFO update to clients with missing URLs. Resolves #1515 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-15 17:39:12 -06:00
Ivan Kozlovic	4d495104de	Fixed no_responders use of sendProtoNow() The call sendProtoNow() should not normally be used (only when setting up a connection when the writeloop is not yet started and server needs to send something before being able to start the writeLoop. Instead, code should use enqueueProto(). For this particular case though, use queueOutbound() directly and add to the producer's pcd map. Also fixed other places where we were using queueOutbound() + flushSignal() which is what enqueueProto is doing. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-09 17:55:14 -06:00
Ivan Kozlovic	9288283d90	Fixed accept loops that could leave connections opened This was discovered with the test TestLeafNodeWithGatewaysServerRestart that was sometimes failing. Investigation showed that when cluster B was shutdown, one of the server on A that had a connection from B that just broke tried to reconnect (as part of reconnect retries of implicit gateways) to a server in B that was in the process of shuting down. The connection had been accepted but createGateway not called because the server's running boolean had been set to false as part of the shutdown. However, the connection was not closed so the server on A had a valid connection to a dead server from cluster B. When the B cluster (now single server) was restarted and a LeafNode connection connected to it, then the gateway from B to A was created, that server on A did not create outbound connection to that B server because it already had one (the zombie one). So this PR strengthens the starting of accept loops and also make sure that if a connection (all type of connections) is not accepted because the server is shuting down, that connection is properly closed. Since all accept loops had almost same code, made a generic function that accept functions to call specific create connection functions. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-06 17:03:19 -06:00
Derek Collison	120402241a	Fix for #1486 Signed-off-by: Derek Collison <derek@nats.io>	2020-06-18 21:04:34 -07:00
Derek Collison	4dee03b587	Allow mixed TLS and non-TLS on same port Signed-off-by: Derek Collison <derek@nats.io>	2020-06-05 18:04:11 -07:00
Ivan Kozlovic	5dba3cdd75	[FIXED] Race condition during implicit Gateway reconnection Say server in cluster A accepts a connection from a server in cluster B. The gateway is implicit, in that A does not have a configured remote gateway to B. Then the server in B is shutdown, which A detects and initiate a single reconnect attempt (since it is implicit and if the reconnect retries is not set). While this happens, a new server in B is restarted and connects to A. If that happens before the initial reconnect attempt failed, A will register that new inbound and do not attempt to solicit because it has already a remote entry for gateway B. At this point when the reconnect to old server B fails, then the remote GW entry is removed, and A will not create an outbound connection to the new B server. We fix that by checking if there is a registered inbound when we get to the point of removing the remote on a failed implicit reconnect. If there is one, we try the reconnection. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-05-22 13:01:17 -06:00
Derek Collison	0129a7fa09	Header support for GWs Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:33:56 -07:00
Derek Collison	ea5e5bd364	Services rewrite #2 This contains a rewrite to the services layer for exporting and importing. The code this merges to already had a first significant rewrite that moved from special interest processing to plain subscriptions. This code changes the prior version's dealing with reverse mapping which was based mostly on thresholds and manual pruning, with some sporadic timer usage. This version uses the jetstream branch's code that understands interest and failed deliveries. So this code is much more tuned to reacting to interest changes. It also removes thresholds and goes only by interest changes or expirations based around a new service export property, response thresholds. This allows a service provider to provide semantics on how long a response should take at a maximum. This commit also introduces formal support for service export streamed and chunked response types send an empty message to signify EOF. This commit also includes additions to the service latency tracking such that errors are now sent, not only successful interactions. We have added a Status field and an optional Error fields to ServiceLatency. We support the following Status codes, these are directly from HTTP. 400 Bad Request (request did not have a reply subject) 408 Request Timeout (when system detects request interest went away, old request style to make dependable).. 503 Service Unavailable (no service responders running) 504 Service Timeout (The new response threshold expired) Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:26:46 -07:00
Derek Collison	df774e44b0	Rework how service imports are handled to avoid performance hits Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:18:34 -07:00
Derek Collison	8d1f3cc7c2	Allow JetStream consumers to work across multi-server hops Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:16:03 -07:00
Derek Collison	0c2d539b06	Remote request API Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:13:22 -07:00
Derek Collison	0fb7ee32bc	Auto-expiration of ephemeral push based observables Signed-off-by: Derek Collison <derek@nats.io>	2020-05-19 14:07:02 -07:00
Ivan Kozlovic	fef94759ab	[FIXED] Update remote gateway URLs when node goes away in cluster If a node in the cluster goes away, an async INFO is sent to inbound gateway connections so they get a chance to update their list of remote gateway URLs. Same happens when a node is added to the cluster. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-04-20 13:48:47 -06:00
Derek Collison	82f585d83a	Updated to also resend leafnode connect on GW connect via first INFO Signed-off-by: Derek Collison <derek@nats.io>	2020-04-08 09:55:19 -07:00
Matthias Hanel	6a1c3fc29b	Moving inbound tracing to the caller (client.parse) Tracing for outgoing operations is always done while holding the client lock. Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-04 17:31:18 -05:00
Matthias Hanel	fe373ac597	Incorporating comments. c -> client defer in oneliner argument order Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-04 15:48:19 -05:00
Matthias Hanel	f5bd07b36c	[FIXED] trace/debug/sys_log reload will affect existing clients Fixed #1296, by altering client state on reload Detect a trace level change on reload and update all clients. To avoid data races, read client.trace while holding the lock, pass the value into functionis that trace while not holding the lock. Delete unused client.debug. Signed-off-by: Matthias Hanel <mh@synadia.com>	2020-03-04 13:54:15 -05:00
Ivan Kozlovic	47b08335a4	[FIXED] Reset of tlsName only for x509.HostnameError For issue #1256, we cleared the possibly saved tlsName on Hanshake failure. However, this meant that for normal use cases, if a reconnect failed for any reason we would not be able to reconnect if it is an IP until we get back to the URL that contained the hostname. We now clear only if the handshake error is of x509.HostnameError type, which include errors such as: ``` "x509: Common Name is not a valid hostname: <x>" "x509: cannot validate certificate for <x> because it doesn't contain any IP SANs" "x509: certificate is not valid for any names, but wanted to match <x>" "x509: certificate is valid for <x>, not <y>" ``` Applied the same logic to solicited gateway connections, and fixed the fact that the tlsConfig should be cloned (since we set the ServerName). I have also made a change for leafnode connections similar to what we are doing for gateway connections, which is to use the saved tlsName only if tlsConfig.ServerName is empty, which may not be the case for users that embed NATS Server and pass directly tls configuration. In other words, if the option TLSConfig.ServerName is not empty, always use this value. Relates to #1256 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-01-28 13:16:38 -07:00
Ivan Kozlovic	c097357b52	[FIXED] More than expected switch to Interest-Only mode for account When an account is switched to interest-only mode due to no interest, it was not possible to switch that account more than once. But the function switchAccountToInterestMode() that triggers a switch could possibly doing it more than once. This should not cause problems but increased the number of traces in a big super cluster. Also fixed some flappers and a data race. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-01-09 13:35:08 -07:00
Ivan Kozlovic	c73be88ac0	Updated based on comments Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-01-06 16:57:48 -07:00
Ivan Kozlovic	947798231b	[UPDATED] TCP Write and SlowConsumer handling - All writes will now be done by the writeLoop, unless when the writeLoop has not been started yet (likely in connection init). - Slow consumers for non CLIENT connections will be reported but not failed. The idea is that routes, gateway, etc.. connections should stay connected as much as possible. However if a flush operation times out and no data at all has been written, the connection will be closed (regardless of type). - Slow consumers due to max pending is only for CLIENT connections. This allows sending of SUBs through routes, etc.. to not have to be chunked. - The backpressure to CLIENT connections is increased (up to 1sec) based on the sub's connection pending bytes level. - Connection is flushed on close from the writeLoop as to not block the "fast path". Some tests have been fixed and adapted since now closeConnection() is not flushing/closing/removing connection in place. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2019-12-31 15:06:27 -07:00

1 2 3

123 Commits