nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-15 10:40:41 -07:00

Author	SHA1	Message	Date
peaaceChoi	038037381b	Fix some typos in code comment	2023-01-12 10:31:32 +09:00
Ivan Kozlovic	8d9c57ad44	[IMPROVED] Fan-out performance There was an observed degradation (around 5%) for large fan out in v2.9.0 compared to earlier release. This is because we added accounting of the in/out messages for the account, which result in 4 atomic operations, 2 for in and 2 for out, however, it means that for a fan-out of say 100 matching subscriptions, it is now 2 + 2 * 100 = 202. This PR rework how the stats accounting is done which removes the regression and even boost a bit the numbers since we are doing the server stats update as an aggregate too. There are still degradation for queues and no-sub at all that need to be looked at. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-09-27 19:43:32 -06:00
Ivan Kozlovic	170ff49837	[ADDED] JetStream: peer (the hash of server name) in statsz/jsz A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something like this: ``` ... "meta_cluster": { "name": "local", "leader": "A", "peer": "NUmM6cRx", "replicas": [ { "name": "B", "current": true, "active": 690369000, "peer": "b2oh2L6w" }, { "name": "Server name unknown at this time (peerID: jZ6RvVRH)", "current": false, "offline": true, "active": 0, "peer": "jZ6RvVRH" } ], "cluster_size": 3 } ``` Note the "peer" field following the "leader" field that contains the server name. The new field is the node ID, which is a hash of the server name. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-09-16 15:31:37 -06:00
Ivan Kozlovic	8d1fb4bc92	[FIXED] JetStream: possible routing issues through gateways Internally jetstream may subscribe to some subject and then send a request with a reply subject matching that subscription. Due to interest propagation through a super cluster, it is possible that the reply comes back to a node that is not yet aware of the subscription interest which would cause the reply to be dropped. Some code detects that the subscription is recent and "map" the reply subject so that it can be routed back to the origin server. However, this was done with the use of the connection object that created the subscription, but at the time of the send, a different internal "*client" object may be used which would then cause the code to not be aware of the recent subscription and not do the mapping. This code was changed to scope at the account level instead of connection. A recent change in PR #3412 is no longer needed and was reverted in favor of changes in this PR. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-31 14:18:28 -06:00
Derek Collison	98bf861a7a	Updates to stream and consumer move logic. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-30 16:11:35 -07:00
Ivan Kozlovic	9d1e773e8f	[FIXED] Gateway: system request/replies may not work properly When a subscription is recently made, gateway code ensures that if there is a reply subject, the reply is "mapped" or rewritten to allow the reply to come back to the origin cluster, regardless of subscription interest propagation. The issue was that this uses a map with a `*client` as the key but the pointer for SYSTEM clients would not always be the same, which meant that the rewrite would not happen, causing possible "loss" of replies. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-29 14:05:51 -06:00
Ivan Kozlovic	f6c4e5fcee	[CHANGED] Gateway: Switch all accounts to interest-only mode We are phasing out the optimistic-only mode. Servers accepting inbound gateway connections will switch the accounts to interest-only mode. The servers with outbound gateway connection will check interest and ignore the "optimistic" mode if it is known that the corresponding inbound is going to switch the account to interest-only. This is done using a boolean in the gateway INFO protocol. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-19 16:41:44 -06:00
Ivan Kozlovic	5d3ee8ebf4	[FIXED] Gateway: possible panic if monitor endpoint inspected too soon The monitoring http server is started early and the gateway setup (when configured) may not be fully ready when the `/gatewayz` endpoint is inspected and could cause a panic. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-17 13:30:58 -06:00
Ivan Kozlovic	3c9a7cc6e5	Move to Go 1.19, remote io/util, fix data race and a flapper Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-05 09:55:37 -06:00
Ivan Kozlovic	98c1f0ecb2	Fixed some data race and some flappers Got a data race: ``` ================== WARNING: DATA RACE Write at 0x00c001c736b0 by goroutine 605: runtime.mapassign_faststr() /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:202 +0x0 github.com/nats-io/nats-server/v2/server.(Account).addServiceImport() /home/travis/gopath/src/github.com/nats-io/nats-server/server/accounts.go:1868 +0xb7b github.com/nats-io/nats-server/v2/server.(Account).AddServiceImportWithClaim() ... Previous read at 0x00c001c736b0 by goroutine 301: runtime.mapaccess2_faststr() /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:107 +0x0 github.com/nats-io/nats-server/v2/server.(Server).registerSystemImports() /home/travis/gopath/src/github.com/nats-io/nats-server/server/events.go:1577 +0x284 github.com/nats-io/nats-server/v2/server.(Server).updateAccountClaimsWithRefresh() ... ``` Also, remove some condition in gateway.go on how we were checking if a subject was a serviec reply, which was causing a test to flap. Finally, used AckSync() in a rest (instead of m.Respond(nil)) to prevent it from flapping. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-29 19:02:41 -06:00
Ivan Kozlovic	63c750e295	[CHANGED] Gateway: Detect duplicate names between clusters Gateway connection will be closed and error reported if a remote has a name that is a duplicate of the local cluster. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-15 15:00:13 -06:00
Ivan Kozlovic	85b3f8a7fd	Gateways: data race when setting first ping timer This was introduced when fixing #2881. The call to setFirstPingTimer needed to be done under the client's lock. Moved setFirstPingTimer from a server receiver to a client receiver. The only reason it was a server receiver is because we need the server options, but c.srv is always set when invoking this function, so we will get the server from c.srv in that function now. Related to #2881 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-04 19:55:07 -07:00
Ivan Kozlovic	08d6aaa78f	[FIXED] Gateway: connect could fail due to PING sent before CONNECT When a gateway connection was created (either accepted or initiated) the timer to fire the first PING was started at that time, which means that for an outbound connection, if the INFO coming from the other side was delayed, it was possible for the outbound to send the PING protocol before the CONNECT, which would cause the accepting side to close the connection due to a "parse" error (since the CONNECT for an inbound is supposed to be the very first protocol). Also noticed that we were not setting the auth timer like we do for the other type of connections. If authorization{timeout:<n>} is not set, the default is 2 seconds. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-02-23 15:19:20 -07:00
Ivan Kozlovic	5fc9e0e1cc	[FIXED] Gateway URLs gossip and `/varz` report issues - When detecting duplicate route, it was possible that a server would lose track of the peer's gateway URL, which would prevent it from gossiping that URL to inbound gateway connections - When a server has gateways enabled and has as a remote its own gateway, the monitoring endpoint `/varz` would include it but without the "urls" array. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-10-28 12:05:30 -06:00
Ivan Kozlovic	0bd38bd424	[FIXED] Monitoring: `/varz` gateway URLs not always updated When servers leave a cluster and their gateway URLs was not in the remote cluster's configuration, it is possible that their gateway URL do not disappear from the list of URLs in the `/varz` monitoring endpoint. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-10-26 13:11:06 -06:00
Matthias Hanel	1c508220d8	Review comment Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-10-19 18:03:59 -04:00
Matthias Hanel	c4a3a4c95e	fix timer not being stopped prior to reset Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-10-19 16:56:20 -04:00
Derek Collison	f13fa767c2	Remove the swapping of accounts during processing of service imports. When processing service imports we would swap out the accounts during processing. With the addition of internal subscriptions and internal clients publishing in JetStream we had an issue with the wrong account being used. This was specific to delyaed pull subscribers trying to unsubscribe due to max of 1 while other JetStream API calls were running concurrently.	2021-07-26 07:57:10 -07:00
Derek Collison	1270977322	When receiving a response across a gateway that has headers and a globally routed subject (_GR_) we were dropping header information. Signed-off-by: Derek Collison <derek@nats.io>	2021-06-10 14:29:33 -07:00
Matthias Hanel	b1dee292e6	[changed] pinned certs to check the server connected to as well (#2247 ) * [changed] pinned certs to check the server connected to as well on reload clients with removed pinned certs will be disconnected. The check happens only on tls handshake now. Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-05-24 17:28:32 -04:00
Matthias Hanel	6f6f22e9a7	[added] pinned_cert option to tls block hex(sha256(spki)) (#2233 ) * [added] pinned_cert option to tls block hex(sha256(spki)) When read form config, the values are automatically lower cased. The check when seeing the values programmatically requires lower case to avoid having to alter the map at this point. Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-05-20 17:00:09 -04:00
Ivan Kozlovic	2881e4a1f0	[FIXED] MQTT fixes and improvements Some issues that have been fixed would manifest by timeouts on connect, unexpected memory usage on high publish message rate. Some details: - Replies were not always GW routed properly because we were looking at the wrong connection's rsubs - GW routed replies would not be found because they were tracked in the subscription's client object, which may not be the same used to send the reply - Increased the mqtt timeout to wait for JS replies since in some tests it was sometimes taking more than the original 2 seconds - Incoming gateway messages destined for an MQTT internal subscription may have been rejected as a no interest if the account had service imports - Don't use time.After(), instead create explicit timer so it can be stopped when not timing out. - Unnecessary copy of a slice since we were converting to a string anyway. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-05-04 20:48:14 -06:00
Jaime Piña	e12181cb83	Return not ready for connection reason Currently, we use ReadyForConnections in server tests to wait for the server to be ready. However, when this fails we don't get a clue about why it failed. This change adds a new unexported method called readyForConnections that returns an error describing which check failed. The exported ReadyForConnections version works exactly as before. The unexported version gets used in internal tests only.	2021-04-20 11:45:08 -07:00
Ivan Kozlovic	56d0d9ec87	Do not propagate service import interest across GW and ROUTES Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-04-15 11:34:36 -06:00
Derek Collison	8eefff2b3b	Make sure the jetstream accounts use the name as the key to the map. This prevents possible double adds under reload or restart scenarios. Signed-off-by: Derek Collison <derek@nats.io>	2021-03-18 17:29:26 -07:00
Ivan Kozlovic	cbcff97244	[CHANGED] Move Gateway interest-only mode switch from INF to DBG Also fixed a test that would sometimes fail depending on timing. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-03-14 11:34:36 -06:00
Ivan Kozlovic	27f51d4028	Fix ephemeral consumer delete in single cluster Also remove retry of sources/mirror in the setSourceConsumer() itself when not getting a response. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-03-10 15:16:31 -07:00
Ivan Kozlovic	e7e756034a	Switch Gateway JS accounts to interest-only mode + some other fixes - Fixed the close of a TLS connection which starting Go 1.16 set the deadline to 5 seconds. - Fixed an issue with setHeader that was causing these error messages ``` === RUN TestServiceImportReplyMatchCycleMultiHops nats: message could not decode headers on connection [4] for subscription on "foo" --- PASS: TestServiceImportReplyMatchCycleMultiHops (0.04s) ``` - Fixed names of tests in norace_test.go since they must start with TestNoRace in order to make sure that we execute them in Travis: ``` go test -v -run=TestNoRace --failfast -p=1 ./... ``` Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-03-03 19:15:28 -07:00
Matthias Hanel	c50ee2a1c6	[Changed] all times exposed will be computed in UTC (#1943 ) This also applies to times that end up in that json. Where applicable moved time.Now() to where it is used. Moved calls to .UTC() to where time is created it that time is converted later anyway. Signed-off-by: Matthias Hanel <mh@synadia.com>	2021-03-02 21:37:42 -05:00
Ivan Kozlovic	1652fe62ef	Updates to when do snapshot Remove panic on runAsLeader when not able to subscribe (which happens on shutdown) Gateway name access does not need lock since it is immutable. Will prevent deadlocks in some situations. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-02-23 19:06:07 -07:00
Derek Collison	bb58d455f6	Revert switching to interest only mode Signed-off-by: Derek Collison <derek@nats.io>	2021-02-23 18:00:47 -08:00
Derek Collison	6d6a6c07ff	Don't send empty subjects, always put system account in interest only Signed-off-by: Derek Collison <derek@nats.io>	2021-02-23 10:57:12 -08:00
Derek Collison	fa8a74ceb5	Allow placement directives for metacontroller stepdown to allow placement to new clusters. Signed-off-by: Derek Collison <derek@nats.io>	2021-02-19 10:55:22 -08:00
Ivan Kozlovic	61bd1b8d86	MQTT clustering Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-02-19 08:50:00 -07:00
Ivan Kozlovic	8598de6dbe	[FIXED] Gateway's implicit connection not using global user/pass If a gateway is configured with an authorization block containing username and password and accepts an unknown Gateway connection, when initiating the outbound connection, it should use the gateway authorization's user/pass information. Resolves #1912 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-02-16 10:06:06 -07:00
Derek Collison	6d32c307ef	Remove pretty indent for json. Signed-off-by: Derek Collison <derek@nats.io>	2021-02-06 20:09:44 -08:00
Ivan Kozlovic	2b8c6e0124	Support for Websocket Leafnode connections Added two options in the remote leaf node configuration - compress, for websocket only at the moment - ws_masking, to force remote leafnode connections to mask websocket frames (default is no masking since it is communication between server to server) Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-01-28 13:13:11 -07:00
Ivan Kozlovic	131be1cb33	Make TLS client/server handshake helpers function This reduces code duplication Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-01-28 13:13:11 -07:00
Ivan Kozlovic	ef38abe75b	Fixed gateway reply mapping following changes in JetStream clustering Those changes are required to maintain backward compatibility. Since the replies are "_G_.<gateway name hash>.<server ID hash>" and the hash were 6 characters long, changing to 8 the hash function would break things. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2021-01-15 17:32:04 -07:00
Derek Collison	f0cdf89c61	JetStream Clustering WIP Signed-off-by: Derek Collison <derek@nats.io>	2021-01-14 01:14:52 -08:00
Ivan Kozlovic	d24e9b75b3	Fixed GW implicit reconnection PR #1412 had a fix for races during implicit GW reconnection. However, the fix was a bit too simplistic in that it was checking only if there was any inbound gateway to decide to try to reconnect an implicit disconnected GW. We need to check the name, not only presence of inbound GW connections. Related to #1412 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-12-28 12:28:55 -07:00
Ivan Kozlovic	fc1521636c	[FIXED] Config reload for gateways/leaf remote TLS configurations Presence of TLS config in any remote gateway or leafnode would cause the config reload to fail (because TLS config internal content may change which fails the DeepEqual check). This PR excludes the TLS configs in such case to check for changes in gateways and leafnodes. Although GW and LN config reload is technically supported, this PR updates the internal remotes' TLS configuration so that changes/updates to TLS certificates would take effect after a configuration reload. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-12-11 16:56:25 -07:00
Ivan Kozlovic	ffd476357e	[CHANGED] Gateway connections now always send PINGs Connections normally suppress sending PINGs if there was some activity. We now force GATEWAY connections to send PINGs at the configured interval or 15 seconds, whichever is the smallest. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-11-03 13:13:09 -07:00
Ivan Kozlovic	2ad2bed170	[ADDED] Support for route hostname resolution We previously simply called DialTimeout() on a route's url when soliciting. If it resolved to the IP of the host, it would create a route to self, which server detects, but then would not try again with other IPs that would have allowed to form a cluster with other servers running on the other IPs. This PR keeps track of local IPs + cluster port and exclude them from the list of IPs returned by LookupHost API. This even prevent solicitation of routes to self. Only non-local IPs will be tried. Resolves #1586 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-09-08 13:40:17 -06:00
Phil Pennock	3c680eceb9	Inhibit Go's default TCP keepalive settings for NATS (#1562 ) Inhibit Go's default TCP keepalive settings for NATS Go 1.13 changed the semantics of the tuning parameters for TCP keepalives, including the default value. This affects all TCP listeners. The NATS protocol has its own L7 keepalive system (PING/PONG) and the Go defaults are not a good fit for some valid deployment scenarios, while Go doesn't directly expose a working API for tuning these. Rather than add a configuration knob and pull in another dependency (with portability issues) just disable TCP keepalives for all listeners used for speaking the NATS protocol. Change the tests so we test the same logic. Do not change HTTP monitoring, profiling, or the websocket API listeners. Change KeepAlive on client connections too.	2020-08-14 13:37:59 -04:00
Ivan Kozlovic	b9764db478	Renamed gossipURLs type and moved its declaration to util.go Also made the add/remove/getAsStringSlice receiver for this type. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-16 11:22:58 -06:00
Ivan Kozlovic	9b0967a5d1	[FIXED] Handling of gossiped URLs If some servers in the cluster have the same connect URLs (due to the use of client advertise), then it would be possible to have a server sends the connect_urls INFO update to clients with missing URLs. Resolves #1515 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-15 17:39:12 -06:00
Ivan Kozlovic	4d495104de	Fixed no_responders use of sendProtoNow() The call sendProtoNow() should not normally be used (only when setting up a connection when the writeloop is not yet started and server needs to send something before being able to start the writeLoop. Instead, code should use enqueueProto(). For this particular case though, use queueOutbound() directly and add to the producer's pcd map. Also fixed other places where we were using queueOutbound() + flushSignal() which is what enqueueProto is doing. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-09 17:55:14 -06:00
Ivan Kozlovic	9288283d90	Fixed accept loops that could leave connections opened This was discovered with the test TestLeafNodeWithGatewaysServerRestart that was sometimes failing. Investigation showed that when cluster B was shutdown, one of the server on A that had a connection from B that just broke tried to reconnect (as part of reconnect retries of implicit gateways) to a server in B that was in the process of shuting down. The connection had been accepted but createGateway not called because the server's running boolean had been set to false as part of the shutdown. However, the connection was not closed so the server on A had a valid connection to a dead server from cluster B. When the B cluster (now single server) was restarted and a LeafNode connection connected to it, then the gateway from B to A was created, that server on A did not create outbound connection to that B server because it already had one (the zombie one). So this PR strengthens the starting of accept loops and also make sure that if a connection (all type of connections) is not accepted because the server is shuting down, that connection is properly closed. Since all accept loops had almost same code, made a generic function that accept functions to call specific create connection functions. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2020-07-06 17:03:19 -06:00
Derek Collison	120402241a	Fix for #1486 Signed-off-by: Derek Collison <derek@nats.io>	2020-06-18 21:04:34 -07:00

1 2 3

140 Commits