nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-17 03:24:40 -07:00

Author	SHA1	Message	Date
Derek Collison	212adf5775	General improvements to clustered streams during server restart and KV/CAS scenarios. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-22 18:36:15 -07:00
Matthias Hanel	c02d1ad69e	fix consumer subject validation on recovery (#3389 ) This fixes an issue introduced in #3080 The consumer filter subject check was skipped on recovery. The intent was to bypass the upstream stream subjects. But it also filtered the downstream stream subject. This became a problem when the downstream was itself an upstream. Then during recover, the stream subject was not checked, which lead to delivery of filtered messages that should never have been delivered. Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-22 14:30:00 -07:00
Derek Collison	1228a32bc5	Merge pull request #3380 from nats-io/direct-get-delay Don't immediately listen on the direct get subjects.	2022-08-17 17:26:52 -07:00
Matthias Hanel	904b7aeefc	fixed consumer source update receiving upstream msgs (#3364 ) if an origin stream contains: 1M msgs with subject foo and 1M msgs with subject bar IF the source consumer changes their filter from foo to bar Then it would have received messages for subject bar. This happens because this tail was filtered and their respective seqno was not communicated to the consumer. This is somewhat unexpected. It is also coincidental. Had the last message in the stream had subject foo then this wouldn't happen. Therefore, when completely changing the subject say, from foo to bar, we only receive messages received after the time the change was made. However, if the old and new subject overlap in any way, we go by sequence number. Meaning in these cases the outlined behavior remains in order to not induce artificial message loss for the part of the subject space that is covered by old and new filter. Signed-off-by: Matthias Hanel <mh@synadia.com> Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-17 17:16:03 -07:00
Derek Collison	ce2d5fa173	Don't immediately listen on the direct get subjects. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-17 16:39:34 -07:00
Ivan Kozlovic	f0b098af92	[FIXED] JetStream: issue with max deliver and server/cluster restart This is a regression introduced in v2.8.3. If a message reaches the max redeliver count, it stops being delivered to the consumer. However, after a server or cluster restart, those messages would be redelivered again. Resolves #3361 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-16 17:05:47 -06:00
Marco Primi	c6af1ecc9c	Fix typo in comment	2022-08-16 09:07:05 -07:00
Matthias Hanel	c26e915c5b	adding source/mirror unit tests (#3352 ) * adding source/mirror unit tests Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-10 19:01:45 +02:00
Matthias Hanel	2cf2868406	fixed consumer restart on source filter update (#3355 ) * fixed consumer restart on source filter update When a stream source filter subject was updated, the internal consumer was not re created If the upstream stream contains a tail of previously filtered messages, these will now be delivered Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-10 18:47:19 +02:00
Matthias Hanel	5588c3d0de	Added check for source/mirror filter subjects (#3356 ) * Added check for source/mirror filter subjects When the origin stream exists, the sourec/mirror filter subject will be checked against the stream subjects. If there is no overlap, an error will be returned Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-10 18:46:52 +02:00
Ivan Kozlovic	ecddb08469	[IMPROVED] JetStream catchup can be aborted and better flow control If the leader sends messages but the follower for any reason aborts or retry the snapshot process, it will now send the error that caused this and the leader can then abort the catchup instead of waiting for its inactivity threshold of 5 seconds. Also make the send of a batch be delayed for a bit until the number of "acks" is 1/2 of the batch size or after reaching 100ms. This helps avoid trickling of messages. Tested with the new test TestJetStreamSuperClusterStreamCathupLongRTT() and see better results both in size of batches and overall time is smaller or similar but not longer. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-08 17:19:36 -06:00
Derek Collison	906afccb8a	Make a check loop based on review feedback. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-08 11:06:10 -06:00
Derek Collison	33526f4d93	Make sure empty msgs do not interfere with catchup process. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-08 11:06:10 -06:00
Derek Collison	e635de7526	Additional stability improvements for catchup. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-08 11:06:10 -06:00
Derek Collison	5a050fc10b	Improve handling when a snapshot represents state we no longer have. We would send skip messages for a sync request that was completely below our current state, but this could be more traffic then we might want. Now we only send EOF and the other side can detect the skip forward and adjust on a successful catchup. We still send skips if we can partially fill the sync request. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-08 11:06:08 -06:00
Ivan Kozlovic	267e6d1958	[IMPROVED] Replicas ordering and info regarding unknown in stream info If a cluster is brought down and then partially restarted, the replica information about the non restarted node would be completely missing. The CLI could report replicas 3 but then only the leader and the running replicas, but nothing about the other node. Since this node's server name is not know, this PR adds an entry with something similar to this: ``` <unknown (peerID: jZ6RvVRH)>, outdated, OFFLINE, not seen ``` Also, replicas array is now ordered, which will help when using a watcher or repeating stream info commands in that the replicas output will be stable in regards to the list of replicas. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-07 18:54:26 -06:00
Matthias Hanel	52c4872666	better error when peer selection fails (#3342 ) * better error when peer selection fails It is pretty hard to diagnose what went wrong when not enough peers for an operation where found. This change now returns counts of reasons why peers where discarded. Changed the error to JSClusterNoPeers as it seems more appropriate of an error for that operation. Not having enough resources is one of the conditions for a peer not being considered. But so is having a non matching tag. Which is why JSClusterNoPeers seems more appropriate In addition, JSClusterNoPeers was already used as error after one call to selectPeerGroup already. example: no suitable peers for placement: peer selection cluster 'C' with 3 peers offline: 0 excludeTag: 1 noTagMatch: 2 noSpace: 0 uniqueTag: 0 misc: 0 Examle for mqtt: mid:12 - "mqtt" - unable to connect: create sessions stream for account "$G": no suitable peers for placement: peer selection cluster 'MQTT' with 3 peers offline: 0 excludeTag: 0 noTagMatch: 0 noSpace: 0 uniqueTag: 0 misc: 0 (10005) Signed-off-by: Matthias Hanel <mh@synadia.com> * review comment Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-06 00:17:01 +02:00
Ivan Kozlovic	653b739fa1	Use filepath.Join() instead of manual concatenation Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-05 14:41:23 -06:00
Ivan Kozlovic	441c09799f	Fixed flapping test Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-05 14:17:43 -06:00
Ivan Kozlovic	88424a89ef	Remove io/ioutil Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-05 13:12:13 -06:00
Matthias Hanel	c56f3b9fbd	Adding account purge operation (#3319 ) * Adding account purge operation The new request is available for the system account. The subject to send the request to is $JS.API.ACCOUNT.PURGE.* With the name of the account to purge instead of the wildcard. Also added directory cleanup code such that server do not end up with empty streams directories and account dirs that only contain streams Also adding ACCOUNT to leaf node domain rewrite table Addresses #3186 and #3306 by providing a way to get rid of the streams for existing and non existing accounts Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-08-05 18:24:19 +02:00
Ivan Kozlovic	fe1feeba7d	Fixed JS cluster prefix name for Travis run Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-04 09:09:52 -06:00
Derek Collison	28ccaa4371	Direct get across a leafnode using cross domain mappings to a queue subscriber did not work. The interest moved across the leafnode would be for the mapping, and not the actual qsub. So when received if we did detect that we are mapped and do not have a queue filter present make sure to ignore. This will allow queue subscriber processing on the local server that received the message from the leafnode. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-03 20:21:28 -07:00
Ivan Kozlovic	b73afbdcb1	[FIXED] JetStream: reject stream update with changes to RePublish The update was not rejected, yet the republish update was not taking place. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-03 16:37:39 -06:00
Derek Collison	748890adb1	Auto-set and upgrade AllowDirect when MaxMsgsPerSubject is set. Also allow mirrors to inherit properly. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-03 12:36:52 -07:00
Derek Collison	8dc1e4b6de	When compact would reclaim head of block space, we needed to update block key for counter for new writes. Signed-off-by: Derek Collison <derek@nats.io>	2022-07-30 13:05:41 -07:00
Byron Ruth	095cfef9eb	Add check and test to prevent updating consumer MaxWaiting Signed-off-by: Byron Ruth <b@devel.io>	2022-07-29 15:05:00 -04:00
Derek Collison	27d87a68a4	Improvements to raft layer with snapshots on catchup. Signed-off-by: Derek Collison <derek@nats.io>	2022-07-29 09:01:03 -07:00
Derek Collison	e120bb86a9	Update tests to check last seq Signed-off-by: Derek Collison <derek@nats.io>	2022-07-28 07:23:39 -07:00
Ivan Kozlovic	38727417df	Moving super-cluster tests from cluster tests file to supercluster file Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-07-27 17:14:19 -06:00
Matthias Hanel	3358205de3	add implementation for consumer replica change (#3293 ) * add implementation for consumer replica change fixes #3262 also check peer list on every update Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-07-27 03:56:28 +02:00
Derek Collison	52f7765322	When msgs were expired on restart recovery we could lose track on subsequent restart of starting sequence with no additional activity. Signed-off-by: Derek Collison <derek@nats.io>	2022-07-23 17:15:16 -07:00
Ivan Kozlovic	a3e62f000c	Merge pull request #3265 from abegaj/fix-max-deliver-update [FIXED] Maximum Deliveries update has no effect	2022-07-21 16:06:41 -06:00
Ardit Begaj	c04ab08f2a	improve tests based on feedback	2022-07-21 23:21:58 +02:00
Matthias Hanel	918ce307af	fix and improve unit test (#3281 ) reduce number of messages sent try move+cancel for every server Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-07-21 22:18:59 +02:00
Ivan Kozlovic	1da5ecfb96	[IMPROVED] JetStream: stream already exists error description The `JSStreamNameExistErr` will now include in the description that the stream exists with a different configuration, because that is the error clients would get when trying to add a stream with a different configuration (otherwise this is a no-op and client don't get an error). Since that error was used in case of restore, a new error is added but uses the same description prefix "stream name already in use" but adds ", cannot restore" to indicate that this is a restore failure because the stream already exists. Resolves #3273 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-07-21 10:20:07 -06:00
Matthias Hanel	51b6d5233f	Fix raft issue where pindex of follower was off by 1 (#3277 ) introduced by `57395bba02` Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-07-21 00:51:26 +02:00
Ardit Begaj	94d22642d1	move test to appropriate file	2022-07-20 19:49:13 +02:00
Matthias Hanel	89b5e872ac	Move and cancel fixes (#3270 ) The Move/Cancel/Downscale mechanism did not take into account that the consumer's replica count can be set independently. This also alters peer selection to have the ability to skip unique tag prefix check for server that will be replaced. Say you have 3 az, and want to add another server to az:1, in order to replace a server that is the same zone. Without this change, uniqueTagPrefix check would filter the server to replace with and cause a failure. The cancel move response could not be received due to the wrong account name. Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-07-18 18:42:03 +02:00
Ivan Kozlovic	5f0ee2344a	[FIXED] JetStream: stream mirror updates not rejected in standalone Updates to stream mirror config are rejected in cluster mode, but were not in standalone. This PR adds the check in standalone mode. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-07-15 10:51:57 -06:00
Matthias Hanel	023500e1da	add the ability to cancel a move in progress (#3253 ) * add the ability to cancel a move in progress Move to individual subjects for move and cancel_move New subjects are: $JS.API.ACCOUNT.STREAM.MOVE.. $JS.API.ACCOUNT.STREAM.CANCEL_MOVE.. last and second to last token are account and stream name Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-07-12 21:54:18 +02:00
Matthias Hanel	d53d2d0484	[Added] account specific monitoring endpoint(s) (#3250 ) Added http monitoring endpoint /accstatz It responds with a list of statz for all accounts with local connections the argument "unused=1" can be provided to get statz for all accounts This endpoint is also exposed as nats request under: This monitoring endpoint is exposed via the system account. $SYS.REQ.ACCOUNT..STATZ Each server will respond with connection statistics for the requested account. The format of the data section is a list (size 1) identical to the event $SYS.ACCOUNT.%s.SERVER.CONNS which is sent periodically as well as on connect/disconnect. Unless requested by options, server without the account, or server where the account has no local connections, will not respond. A PING endpoint exists as well. The response format is identical to $SYS.REQ.ACCOUNT..STATZ (however the data section will contain more than one account, if they exist) In addition to general filter options the request takes a list of accounts and an argument to include accounts without local connections (disabled by default) $SYS.REQ.ACCOUNT.PING.STATZ Each account has a new system account import where the local subject $SYS.REQ.ACCOUNT.PING.STATZ essentially responds as if the importing account name was used for $SYS.REQ.ACCOUNT..STATZ The only difference between requesting ACCOUNT.PING.STATZ from within the system account and an account is that the later can only retrieve statz for the account the client requests from. Also exposed the monitoring /healthz via the system account under $SYS.REQ.SERVER..HEALTHZ $SYS.REQ.SERVER.PING.HEALTHZ No dedicated options are available for these. HEALTHZ also accept general filter options. Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-07-12 21:50:32 +02:00
Ivan Kozlovic	39a0cfccca	[FIXED] JetStream: servers may be reported as orphaned In some situations, a server may report that a remote server is detected as orphaned (and the node is marked as offline). This is because the orphaned detection relies on conns update to be received, however, servers would suppress the update if an account does not have any connections attached. This PR ensures that the update is sent regardless if the account is JS configured (not necessarily enabled at the moment). Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-07-11 16:15:01 -06:00
Derek Collison	59a0a3ba5c	Allow stream and mirror direct get abilities to be updated. Signed-off-by: Derek Collison <derek@nats.io>	2022-07-08 07:13:34 -07:00
Derek Collison	333e2fc2f1	Fix for stalled catchup in endless cycle on EOF trying to retrieve catchup msg. A customer experienced and endless failure to have a stream cacthup. The current leader was being asked for a message from a snapshot that was larger then what we had, resulting in EOF which silently failed. We now detect this and signal end of catchup and redo the bad snapshot if possible. Signed-off-by: Derek Collison <derek@nats.io>	2022-07-07 13:42:41 -07:00
Matthias Hanel	70be4b77f9	fixes peer removal, simplifies move, more tests Make sure when processing a peer removal that the stream assignment agrees. When a new leader takes over it can resend a peer removal, and if the stream/consumer really was rescheduled we could remove by accident. Also need to make sure that when we remove a stream we remove the node as part of the stream assignment. If we didn't, if the same asset returned to this server we would not start up the monitoring loop. Simplify migration logic in monitorStream, to be driven by leader only Improved unit tests Added failure when server not in peer list Move command does not require server anymore Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-07-07 03:32:13 +02:00
Derek Collison	c14fda51e7	Direct access to JetStream resources would be affected if across a leafnode that was down. This allows a solciting leafnode config to ask that any JetStream cluster assets that are a current leader have the leader stepdown. Signed-off-by: Derek Collison <derek@nats.io>	2022-07-05 12:35:09 -07:00
Derek Collison	e6479dafd2	Close leafnode connection when same cluster name detected Signed-off-by: Derek Collison <derek@nats.io>	2022-06-30 15:34:22 -07:00
Derek Collison	4a94a172c4	Merge pull request #3227 from nats-io/filtered_mirrors Allow filtered stream mirrors	2022-06-29 15:39:07 -07:00
Matthias Hanel	6bd14e1b7a	removed commented out code (#3228 ) Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-06-29 20:31:12 +02:00

1 2 3 4 5 ...

407 Commits