nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-02 03:38:42 -07:00

Author	SHA1	Message	Date
Derek Collison	edbaa57e87	Fixes for move test. The default timeout for JetStream API calls is 10s, so in the case where we determine that we are the leader, but the stream info endpoint has not registered with the server we are connected to, the stream info call could fail and we would exhaust the whole checkFor since we would stay in one call for 10s. Fix is to override and make multiple attempts possible. Signed-off-by: Derek Collison <derek@nats.io>	2023-09-12 11:38:35 -07:00
Derek Collison	8544cb7adf	Merge branch 'main' into dev Signed-off-by: Derek Collison <derek@nats.io>	2023-08-22 20:04:59 -07:00
Derek Collison	ddb7f9f9d5	Fix for a peer-remove of an R1 that would brick the stream. Signed-off-by: Derek Collison <derek@nats.io>	2023-08-22 17:45:19 -07:00
Derek Collison	bcf5da04e3	Merge branch 'main' into dev	2023-08-22 06:50:36 -07:00
Derek Collison	e5d208bf33	When moving streams, we could check too soon and be in a gap where the replica peer has not registered a catchup request. This would cause us to think the replica was caughtup incorrectly and drop our leadership, which would cancel any cacthup requests. Signed-off-by: Derek Collison <derek@nats.io>	2023-08-21 20:07:48 -07:00
Jean-Noël Moyne	7ff114162c	Adds the same check for valid stream name for Mirror Fix test using invalid stream names Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>	2023-06-08 07:49:47 -07:00
Derek Collison	724160ebac	Fix flapping tests Signed-off-by: Derek Collison <derek@nats.io>	2023-02-28 14:30:23 -08:00
Derek Collison	2d794d09e1	Fix to flapping test to make sure we do not quickly blow away all consumer state. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-17 14:23:34 -08:00
Marco Primi	f8a030bc4a	Use testing.TempDir() where possible Refactor tests to use go built-in temporary directory utility for tests. Also avoid binding to default port (which may be in use)	2022-12-12 13:18:44 -08:00
Ivan Kozlovic	170ff49837	[ADDED] JetStream: peer (the hash of server name) in statsz/jsz A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something like this: ``` ... "meta_cluster": { "name": "local", "leader": "A", "peer": "NUmM6cRx", "replicas": [ { "name": "B", "current": true, "active": 690369000, "peer": "b2oh2L6w" }, { "name": "Server name unknown at this time (peerID: jZ6RvVRH)", "current": false, "offline": true, "active": 0, "peer": "jZ6RvVRH" } ], "cluster_size": 3 } ``` Note the "peer" field following the "leader" field that contains the server name. The new field is the node ID, which is a hash of the server name. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-09-16 15:31:37 -06:00
Ivan Kozlovic	29224c8ea9	Split more tests to speed up Travis run Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-09-09 12:45:48 -06:00
Matthias Hanel	f7cb5b1f0d	changed format of JSClusterNoPeers error (#3459 ) * changed format of JSClusterNoPeers error This error was introduced in #3342 and reveals to much information This change gets rid of cluster names and peer counts. All other counts where changed to booleans, which are only included in the output when the filter was hit. In addition, the set of not matching tags is included. Furthermore, the static error description in server/errors.json is moved into selectPeerError sample errors: 1) no suitable peers for placement, tags not matched ['cloud:GCP', 'country:US']" 2) no suitable peers for placement, insufficient storage Signed-off-by: Matthias Hanel <mh@synadia.com> Signed-off-by: Ivan Kozlovic <ivan@synadia.com> Co-authored-by: Ivan Kozlovic <ivan@synadia.com>	2022-09-08 18:25:48 -07:00
Ivan Kozlovic	b69ffe244e	Fixed some tests Code change: - Do not start the processMirrorMsgs and processSourceMsgs go routine if the server has been detected to be shutdown. This would otherwise leave some go routine running at the end of some tests. - Pass the fch and qch to the consumerFileStore's flushLoop otherwise in some tests this routine could be left running. Tests changes: - Added missing defer NATS connection close - Added missing defer server shutdown Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-09-08 11:28:23 -06:00
Derek Collison	9c3bd17059	Updates to tests Signed-off-by: Derek Collison <derek@nats.io>	2022-09-06 13:33:39 -07:00
Derek Collison	b850a95d4c	Remove auto-promotion of direct get. Force stream config to set AllowDirect to true. Signed-off-by: Derek Collison <derek@nats.io>	2022-09-06 13:33:39 -07:00
Ivan Kozlovic	88ece75765	[FIXED] JetStream: Some nodes may never be reported as offline In some rare situations, it is possible that nodes are added to the cluster but are not properly tracked and not shown as offline when they exit the cluster. Relates to #3258 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-09-01 12:48:12 -06:00
Ivan Kozlovic	8d1fb4bc92	[FIXED] JetStream: possible routing issues through gateways Internally jetstream may subscribe to some subject and then send a request with a reply subject matching that subscription. Due to interest propagation through a super cluster, it is possible that the reply comes back to a node that is not yet aware of the subscription interest which would cause the reply to be dropped. Some code detects that the subscription is recent and "map" the reply subject so that it can be routed back to the origin server. However, this was done with the use of the connection object that created the subscription, but at the time of the send, a different internal "*client" object may be used which would then cause the code to not be aware of the recent subscription and not do the mapping. This code was changed to scope at the account level instead of connection. A recent change in PR #3412 is no longer needed and was reverted in favor of changes in this PR. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-31 14:18:28 -06:00
Derek Collison	98bf861a7a	Updates to stream and consumer move logic. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-30 16:11:35 -07:00
Ivan Kozlovic	380fa4499f	Merge pull request #3383 from nats-io/gw_switch_to_interest_only_right_away [CHANGED] Gateway: Switch all accounts to interest-only mode	2022-08-23 08:44:15 -06:00
Ivan Kozlovic	5663bc2fa3	Reduce length of some clustering tests Since PR #3381, the 2 tests modified here would take twice as long (around 245 seconds) to complete. Talking with Matthias, he suggested using a variable instead of a const and set it to 0 for those 2 tests since they don't really need that to be set. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-22 12:35:37 -06:00
Ivan Kozlovic	f6c4e5fcee	[CHANGED] Gateway: Switch all accounts to interest-only mode We are phasing out the optimistic-only mode. Servers accepting inbound gateway connections will switch the accounts to interest-only mode. The servers with outbound gateway connection will check interest and ignore the "optimistic" mode if it is known that the corresponding inbound is going to switch the account to interest-only. This is done using a boolean in the gateway INFO protocol. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-19 16:41:44 -06:00
Matthias Hanel	6bf50dbb77	induce delay prior to scale down (#3381 ) This is to avoid a narrow race between adding server and them catching up where they also register as current. Also wait for all peers to be caught up. This also avoids clearing catchup marker once catchup stalled. A stalled catchup would remove the marker causing the peer to register as current. Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-18 13:47:40 -07:00
Matthias Hanel	9892a132e7	Improve StreamMoveInProgressError (#3376 ) by adding progress indicators Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-17 15:12:32 -07:00
Matthias Hanel	76219f8e5b	fix unit test (#3359 ) Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-11 01:46:30 +02:00
Matthias Hanel	c6e37cf7af	Fix race between stream stop and monitorStream (#3350 ) * Fix race between stream stop and monitorStream monitorCluster stops the stream, when doing so, monitorStream needs to be stopped to avoid miscounting of store size. In a test stop and reset of store size happened first and then was followed by storing more messages via monitorStream Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-10 19:01:21 +02:00
Matthias Hanel	7015e46dd9	fix move cancel issue where tags and peers diverge (#3354 ) This can happen if the move was initiated by the user. A subsequent cancel resets the initial peer list. The original peer list was picked on the old set of tags. A cancel would then keep the new list of tags but reset to the old peers. Thus tags and peers diverge. The problem is that at the time of cancel, the old placement tags can't be found anymore. This fix causes cancel to remove the placement tags, if the old peers do not satisfy the new placement tags. Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-10 18:48:18 +02:00
Derek Collison	758b733d43	Attempt to improve long RTT catchup time during stream moves. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-08 11:06:10 -06:00
Matthias Hanel	52c4872666	better error when peer selection fails (#3342 ) * better error when peer selection fails It is pretty hard to diagnose what went wrong when not enough peers for an operation where found. This change now returns counts of reasons why peers where discarded. Changed the error to JSClusterNoPeers as it seems more appropriate of an error for that operation. Not having enough resources is one of the conditions for a peer not being considered. But so is having a non matching tag. Which is why JSClusterNoPeers seems more appropriate In addition, JSClusterNoPeers was already used as error after one call to selectPeerGroup already. example: no suitable peers for placement: peer selection cluster 'C' with 3 peers offline: 0 excludeTag: 1 noTagMatch: 2 noSpace: 0 uniqueTag: 0 misc: 0 Examle for mqtt: mid:12 - "mqtt" - unable to connect: create sessions stream for account "$G": no suitable peers for placement: peer selection cluster 'MQTT' with 3 peers offline: 0 excludeTag: 0 noTagMatch: 0 noSpace: 0 uniqueTag: 0 misc: 0 (10005) Signed-off-by: Matthias Hanel <mh@synadia.com> * review comment Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-06 00:17:01 +02:00
Derek Collison	28ccaa4371	Direct get across a leafnode using cross domain mappings to a queue subscriber did not work. The interest moved across the leafnode would be for the mapping, and not the actual qsub. So when received if we did detect that we are mapped and do not have a queue filter present make sure to ignore. This will allow queue subscriber processing on the local server that received the message from the leafnode. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-03 20:21:28 -07:00
Derek Collison	748890adb1	Auto-set and upgrade AllowDirect when MaxMsgsPerSubject is set. Also allow mirrors to inherit properly. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-03 12:36:52 -07:00
Ivan Kozlovic	38727417df	Moving super-cluster tests from cluster tests file to supercluster file Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-07-27 17:14:19 -06:00
Matthias Hanel	d53d2d0484	[Added] account specific monitoring endpoint(s) (#3250 ) Added http monitoring endpoint /accstatz It responds with a list of statz for all accounts with local connections the argument "unused=1" can be provided to get statz for all accounts This endpoint is also exposed as nats request under: This monitoring endpoint is exposed via the system account. $SYS.REQ.ACCOUNT..STATZ Each server will respond with connection statistics for the requested account. The format of the data section is a list (size 1) identical to the event $SYS.ACCOUNT.%s.SERVER.CONNS which is sent periodically as well as on connect/disconnect. Unless requested by options, server without the account, or server where the account has no local connections, will not respond. A PING endpoint exists as well. The response format is identical to $SYS.REQ.ACCOUNT..STATZ (however the data section will contain more than one account, if they exist) In addition to general filter options the request takes a list of accounts and an argument to include accounts without local connections (disabled by default) $SYS.REQ.ACCOUNT.PING.STATZ Each account has a new system account import where the local subject $SYS.REQ.ACCOUNT.PING.STATZ essentially responds as if the importing account name was used for $SYS.REQ.ACCOUNT..STATZ The only difference between requesting ACCOUNT.PING.STATZ from within the system account and an account is that the later can only retrieve statz for the account the client requests from. Also exposed the monitoring /healthz via the system account under $SYS.REQ.SERVER..HEALTHZ $SYS.REQ.SERVER.PING.HEALTHZ No dedicated options are available for these. HEALTHZ also accept general filter options. Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-07-12 21:50:32 +02:00
Matthias Hanel	f0ee56cf0a	Fix unique_tag issue with stream replica increase When increasing the replica count unique tags for already existing peers where ignored, which could lead to bad placement Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-07-07 21:22:55 +02:00
Derek Collison	47bef915ed	Allow all members of a replicated stream to participate in direct access. We will wait until a non-leader replica is current to subscribe. Signed-off-by: Derek Collison <derek@nats.io>	2022-07-03 11:08:24 -07:00
Derek Collison	4075721651	Allow direct msg get for stream to operate in queue group and allows mirrors to opt-in to the same group. Signed-off-by: Derek Collison <derek@nats.io>	2022-07-02 14:16:55 -07:00
Ivan Kozlovic	53e3c53d96	[FIXED] JetStream: consumer with deliver new may miss messages This could happen when a consumer had not sent anything to the attached NATS subscription and there was a consumer leader step down or server restart. Signed-off-by: Derek Collison <derek@nats.io>	2022-05-23 12:01:48 -06:00
Derek Collison	f702e279ab	Fix for a consumer recovery issue. Also update healthz to check all assets that are assigned, not just running. Signed-off-by: Derek Collison <derek@nats.io>	2022-04-26 19:22:19 -07:00
Ivan Kozlovic	06ff4b2b29	Split JS cluster and super clusters tests and compile only on 1.16 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-26 16:24:05 -06:00

38 Commits