nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-16 19:14:41 -07:00

Author	SHA1	Message	Date
Derek Collison	d04763eb7d	CAS operations improved, hold lock past store. Use separate lock for consumer list and storage updates. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-24 18:30:44 -07:00
Ivan Kozlovic	02ecda535c	Stop the raft node to not cause test to flap. Test TestNoRaceJetStreamClusterCorruptWAL() would start to flap because of the snapshot on cluster shutdown. Disable the snapshot on exit for this test by stopping the raft node before shutdown. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-16 18:44:32 -06:00
Ivan Kozlovic	7de4497815	Install consumer snapshot on clean exit and few other fixes - didRemove in applyMetaEntries() could be reset when processing multiple entries - change "no race" test names to include JetStream - separate raft nodes leader stepdown and stop in server shutdown process - in InstallSnapshot, call wal.Compact() with lastIndex+1 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-16 17:05:49 -06:00
Derek Collison	8c04adc009	Improvements to filestore for large KVs. Use better indexing for lookups, we used to do simple linear scan backwards, now track first and last block. Will expire the fss cache at will to reduce memory usage. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-09 15:51:13 -05:00
Derek Collison	06112d6885	Reset activity interval on catchup to default vs ramp up. Tweak test. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-08 11:06:10 -06:00
Derek Collison	758b733d43	Attempt to improve long RTT catchup time during stream moves. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-08 11:06:10 -06:00
Ivan Kozlovic	3c9a7cc6e5	Move to Go 1.19, remote io/util, fix data race and a flapper Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-05 09:55:37 -06:00
Ivan Kozlovic	fe370955c8	Merge pull request #3288 from nats-io/debug_test_failure [FIXED] JetStream: Some scaling up issues	2022-07-26 08:57:17 -06:00
Ivan Kozlovic	1a6c5f1c90	[FIXED] JetStream: Some scaling up issues - Send snapshot only if leader - When processing snapshot, start with a smaller inactivity interval that will double up to 10sec or use 10sec directly once we get a message. Reason for that is that it is possible that the request for snapshot is sent while the leader has not yet setup the subscription that receives the requests (or subscription has not fully reached the cluster). - Don't remember snapfile on err. - Do not consider current if we have not had any activity. - Stabilize stream scale up under active heavy publishing. - Due to the publish pressure move the check for followers direct subs spinning up til after we stop publishing. Signed-off-by: Derek Collison <derek@nats.io> Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-07-25 18:44:18 -06:00
Ivan Kozlovic	ebeca00e20	[FIXED] JetStream/Cluster: Stream names/infos would return bad response If there are more stream names that the current limit of 1024, getting the list of names would return them all instead of using pagination. For "stream infos", the Total amount returned would be the API limit instead of the actual number of streams. Resolves https://github.com/nats-io/natscli/issues/541 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-07-25 14:41:05 -06:00
Derek Collison	69f522cb9f	Make sure to clean up client connection Signed-off-by: Derek Collison <derek@nats.io>	2022-07-06 19:29:32 -07:00
Derek Collison	f8939b40bc	Do not unsubscribe from direct access on leader stepdown, only stopping. Also wait for stream to have replicas and leader for test. Signed-off-by: Derek Collison <derek@nats.io>	2022-07-06 16:20:12 -07:00
Derek Collison	81a9906ad9	Wait a bit longer for the direct sub Signed-off-by: Derek Collison <derek@nats.io>	2022-07-03 12:54:15 -07:00
Derek Collison	47bef915ed	Allow all members of a replicated stream to participate in direct access. We will wait until a non-leader replica is current to subscribe. Signed-off-by: Derek Collison <derek@nats.io>	2022-07-03 11:08:24 -07:00
Ivan Kozlovic	4bf81420e2	[FIXED] Fast routed JetStream API requests were dropped If a JS API request is received from a non client connection, it was processed in its own go routine. To reduce the number of such go routine, we were limiting the number of outstanding routines to 4096. However, in some situations, it was possible to issue many requests at the same time that would then cause those requests to be dropped. (an example was an MQTT benchmark tool that would create 5000 sessions, each with one QoS1 R1 consumer (with the use of consumer_replicas=1). On abrupt exit of the tool, the consumers and their sessions needed to be deleted. Since would cause fast incoming delete consumer requests which would cause the original code to drop some of them) Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-05-23 11:15:55 -06:00
Derek Collison	790d643431	Consumer's num pending can now rely on the stream's store vs trying to maintain furing runtime which could be wrong under certain conditions. Signed-off-by: Derek Collison <derek@nats.io>	2022-05-20 08:45:43 -07:00
Derek Collison	ef3eea4d73	Speed up raft for tests Signed-off-by: Derek Collison <derek@nats.io>	2022-05-18 16:28:58 -07:00
Ivan Kozlovic	cadf921ed1	[FIXED] JetStream: PullConsumer MaxWaiting==1 and Canceled requests There was an issue with MaxWaiting==1 that was causing a request with expiration to actually not expire. This was because processWaiting would not pick it up because wq.rp was actually equal to wq.wp (that is, the read pointer was equal to write pointer for a slice of capacity of 1). The other issue was that when reaching the maximum of waiting pull requests, a new request would evict an old one with a "408 Request Canceled". There is no reason for that, instead the server will first try to find some existing expired requests (since some of the expiration is lazily done), but if none is expired, and the queue is full, the server will return a "409 Exceeded MaxWaiting" to the new request, and not a "408 Request Canceled" to an old one... Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-05-03 15:17:20 -06:00
Ivan Kozlovic	d4d37e67f4	[FIXED] JetStream: file store compact and when to write index When deciding to compact a file, we need to remove from the raw bytes the empty records, otherwise, for small messages, we would end-up calling compact() too many times. When removing a message from the stream, in FIFO cases we would write the index every 2 seconds at most when doing it in place, when when dealing with out of order deletes, we would do it for every single delete, which can be costly. We are now writing only every 500ms for non FIFO cases. Also fixed some unrelated code: - Decision to install a snapshot was based on incorrect logical expression - In checkPending(), protect against the timer being nil which could happen when consumer is stopped or leadership change. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-28 12:35:19 -06:00
Ivan Kozlovic	0e2ab5eeea	Changes to tests that run on Travis - Remove code coverage from Travis and add it to a GitHub Action that will be run as a nightly. - Use tag builds to exclude some tests, such as the "norace" or JS tests. Since "go test" does not support "negative" regexs, there is no other way. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-26 14:11:31 -06:00
Ivan Kozlovic	b9463b322f	[FIXED] JetStream: stream mirror issues in mixed mode clusters Similar to PR #3061 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-20 23:21:15 -06:00
Ivan Kozlovic	df61a335c7	Merge pull request #3061 from nats-io/js_fix_stream_source [FIXED] JetStream: stream sources issue in mixed mode clusters	2022-04-20 23:20:41 -06:00
Ivan Kozlovic	9975a38c6e	[FIXED] JetStream: stream sources issue in mixed mode clusters The main issue was that in mixed-mode, the interest through gateway may still be in optimistic mode, which when creating the source consumer would start delivery before we had a chance to setup the subscription to receive those messages. The approach is to create the subscription prior to sending the consumer create request. Also refactored a bit the code in the hope to make the retries a bit more bullet proof. We may also look at making sure that gateways are switched to interest-mode when detecting a mixed-mode setup. Also fixed a defect that could cause a source to be canceled when updating a stream. Resovles #2801 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-20 21:02:35 -06:00
Matthias Hanel	ff5d60973d	introducing max_age/dupe_window minimum value of 100ms. (#3056 ) Signed-off-by: Matthias Hanel <mh@synadia.com>	2022-04-20 13:58:19 -04:00
Ivan Kozlovic	a78ccdcb2f	[FIXED] JetStream: some stream SOURCE issues - Possibly missing some early messages from the sourced stream - In some cancel situations, the processing of sourced messages would not longer work Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-18 12:42:16 -06:00
Ivan Kozlovic	bd61d51a1c	[IMPROVED] JetStream: reduce unnecessary leader election - Wait of some sort of routing to be in place before starting the raft run loop - Remove use of lock in apiDispatch that was not necessary but could have cause a route to block, causing memory growth, etc.. Unrelated rename of some tests so that they start with TestJetStream and TestJetStreamCluster for cluster tests, fixed some flappers and ensure that tests that change RAFT timeouts put them back to default values on exit. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-14 10:47:14 -06:00
Derek Collison	3c0bced76e	Move test to no race, rename others Signed-off-by: Derek Collison <derek@nats.io>	2022-04-12 16:23:36 -07:00
Ivan Kozlovic	50c3986863	[FIXED] JetStream stream catchup issues - A stream could become leader when it should not, causing messages to be lost. - A catchup could stall because the server sending data could bail out of the runCatchup routine but still send the EOF signal. - Deadlock with monitoring of Jsz Signed-off-by: Ivan Kozlovic <ivan@synadia.com> Signed-off-by: Derek Collison <derek@nats.io> Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-12 16:05:12 -06:00
Derek Collison	5dfcc5e934	Fix for flapping WAL test Signed-off-by: Derek Collison <derek@nats.io>	2022-04-11 22:50:25 -07:00
Derek Collison	e330572cef	Select next leader before truncating Signed-off-by: Derek Collison <derek@nats.io>	2022-04-11 17:04:29 -07:00
Derek Collison	c3612b57c7	Fixes for some flapping tests Signed-off-by: Derek Collison <derek@nats.io>	2022-04-10 13:02:03 -07:00
Ivan Kozlovic	c78f7f343c	Add test that demonstrated the consumer filter perf degradation This is a follow up to PR #3008. This test fails on v2.7.4 but passes on main. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-06 09:27:56 -06:00
Derek Collison	eb16c35016	OrderedConsumer was very conservative with slow start and small max outstanding bytes. This is increasing perf for longer rtt. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-30 05:08:36 -07:00
Derek Collison	607858f213	Improved consumer snapshot logic in clustered mode and disk usage. Also fixed a bug that could cause memory based replicated consumers to no longer work after snapshots and server restarts. The snapshot logic would allow non-state changing updates to continously grow the raft logs. We also were too conservative on when we snapshotted and why. Also added in ability to have FileStore.Compact() reclaim space from the block file from the head of last changed block. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-29 18:02:49 -07:00
Derek Collison	780d4c0dd8	Merge pull request #2960 from nats-io/mem_pool Additional improvements to memory pooling and management.	2022-03-28 17:10:16 -07:00
Derek Collison	5e5aab378e	Additional improvements to memory pooling and management. Also logic fix for firstMatching that did unnecessary work when matching all. During contention to the head write blk, the system could perform worse memory wise compared to simple go runtime. Also had some references for the subject of messages bloating memory. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-28 10:15:23 -07:00
Ivan Kozlovic	6ad93d9b34	Fix some flappers Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-25 18:24:17 -06:00
Derek Collison	ef8f543ea5	Improve memory usage through JetStream storage layer. Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies. Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them. The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally. Also fixed some flappers and a bug where de-dupe might not be reformed correctly. Signed-off-by: Derek Collison <derek@nats.io>	2022-03-24 17:45:15 -06:00
Ivan Kozlovic	29ff67e2ac	Tests: Replace all Ack() with AckSync() for now For reason explained in previous commit, for tests that were expecting the number of ack/pending to be of a certain value after an Ack(), they would be flapping. Replaced all references and we can go back to selectively call Ack() when AckSync() is not needed. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-17 20:25:01 -06:00
Ivan Kozlovic	b4128693ed	Ensure file path is correct during stream restore Also had to change all references from `path.` to `filepath.` when dealing with files, so that it works properly on Windows. Fixed also lots of tests to defer the shutdown of the server after the removal of the storage, and fixed some config files directories to use the single quote `'` to surround the file path, again to work on Windows. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-03-09 13:31:51 -07:00
Derek Collison	330a40009c	Cleanup key files when removing message blocks. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-17 11:33:41 -08:00
Derek Collison	68104d7cf3	During a filestore snapshot we generate the fss files but were not cleaning them up if the block was deleted before a server restart. https://gist.github.com/nekufa/010185dfb59261f222a0042d3a7d2a1c Signed-off-by: Derek Collison <derek@nats.io>	2022-02-09 17:12:08 -08:00
Derek Collison	d50febeeff	Improved sparse consumers replay time. When a stream has multiple subjects and a consumer filters the stream to a small and spread out list of messages the logic would do a linear scan looking for the next message for the filtered consumer. This CL allows the store layer to utilize the per subject info to improve the times. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-07 17:26:32 -08:00
Derek Collison	6a3cf0f71e	Added in ability to get number of subjects from StreamInfo, and optionally details per subject on how many messages each subject has. This can also be filtered, meaning you can filter out the subjects when asking for details. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-02 08:51:13 -08:00
Derek Collison	6b5332249b	This test was using fetch and failing if the complete batch was not filled. This has nothing to do with the test, we just want to make sure the leader steps down and there were no low level errors on the fetch. Signed-off-by: Derek Collison <derek@nats.io>	2022-02-01 13:34:00 -08:00
Derek Collison	8815072e34	Fix flapping test Signed-off-by: Derek Collison <derek@nats.io>	2022-01-30 14:54:24 -08:00
Derek Collison	6486cd8fc8	Added in /healthz endpoint for health and liveness probes in environments like k8s. Currently this code returns a 200 and { "status": "ok" } iff all configured ports are open and if JetStream is configured and we have contact with the metaleader and the cluster and all streams are up to date. Signed-off-by: Derek Collison <derek@nats.io>	2022-01-24 19:30:10 -08:00
Derek Collison	c5fbb63614	JetStream ephemeral consumers could create a situation where the server would exhaust the OS thread limit - default 10k. Under certain situations large number of consumers that are racing to update state or delete their stores during a delete would start taking up OS threads due to blocking disk IO. When this happened and their were a bunch of Go routines becoming runnable the Go runtime would create extra OS threads to fill in the runnable pool and would exhaust the max thread setting. This code places a channel as a simple semaphore to limit the number of disk IO blocking OS threads. Signed-off-by: Derek Collison <derek@nats.io>	2021-12-29 07:05:34 -08:00
Derek Collison	af4d7dbe52	Memory store tracked interior deletes for stream state, but under KV semantics this could be very large. Actually faster to not track at all and generate on the fly. Saves lots of memory too. When we update the stream state to include runs, etc will update this as well. Signed-off-by: Derek Collison <derek@nats.io>	2021-12-20 17:37:16 -08:00
Derek Collison	98757253f9	Recreate client in case shutdown server was the one we were connected to Signed-off-by: Derek Collison <derek@nats.io>	2021-11-18 14:50:22 -08:00

1 2 3

132 Commits