nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-17 03:24:40 -07:00

Author	SHA1	Message	Date
Derek Collison	ebe08040e9	Attempt to fix flapper again Signed-off-by: Derek Collison <derek@nats.io>	2023-03-01 06:24:51 -08:00
Derek Collison	baca7bd751	Fix for test flapper Signed-off-by: Derek Collison <derek@nats.io>	2023-03-01 04:58:01 -08:00
Derek Collison	2642a8c03d	Optimize locking for when under heavy loads. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-27 18:56:55 -08:00
Derek Collison	d347cb116a	When becoming leader optionally send current snapshot to followers if caught up. This can help sync on restarts and improve ghost ephemerals. Also added more code to suppress respnses and API audits when we know we are recovering. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-23 10:30:36 -08:00
Derek Collison	2972c11be6	Improve consumer create performance. In cases where we had a large subject space, a filestore with many msg blocks, and a filtered consumer with a wildcard filtered subject, creating a consumer could take more memory and time then we wanted. This improvement works when the consumer is DeliverAll and we used the upper layer in memory psim structure to scan but only in memory and avoid a file read for each msg block. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-22 19:42:02 -08:00
Derek Collison	f16a7d8559	Skip test for now Signed-off-by: Derek Collison <derek@nats.io>	2023-02-22 15:49:48 -08:00
Derek Collison	d03d8e9d93	When having a max msgs per subject (e.g. KV) under heavy concurrent usage could skew the accounting for the underlying filestore. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-22 12:50:43 -08:00
Derek Collison	11b0f214d0	Do not re-calculate NumPending on consumer info calls. We noticed this was being called alot in user environments. When the consumer was filtered with a wilcard and the stream had a high cardinality of subjects and was falling behind this could take a substantial amount of time. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-16 16:30:14 -08:00
Derek Collison	32b5ec16dd	Fixed test to correspond to new limit of 1024. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-16 07:16:19 +04:00
Derek Collison	1e3c2810f4	Improve expireMsgs minAge calculation for when lots of messages to expire in each callback. This happens when under extreme load as shown in the skipped test. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-13 18:39:39 +02:00
Derek Collison	e9a983c802	Do not let !NeedSnapshot() avoid snapshots and compaction. Signed-off-by: Derek Collison <derek@nats.io>	2023-02-01 22:05:25 -07:00
Derek Collison	390fd02918	Updates to tests for updated Go client changes Signed-off-by: Derek Collison <derek@nats.io>	2023-01-31 09:47:36 -08:00
Ivan Kozlovic	79ca0c1787	Move test to "norace_test.go" The test TestJetStreamClusterConsumerListPaging was in the jetstream_cluster_3_test.go and because of `-race` flag would take more than 440 seconds (7+ minutes) as seen here: https://app.travis-ci.com/github/nats-io/nats-server/jobs/593984385#L335 Without the `-race` flag, this test takes ~17 seconds. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-01-23 17:05:18 -07:00
Neil Twigg	14d0ba1c65	Fix some lint errors after move to `golangci-lint`	2022-12-30 20:00:08 +00:00
Derek Collison	c90fe9a2fa	Improve performance and latency with large number of sparse consumers. When a stream had a large number of consumers on a server that were sparse, the signaling mechanism would do a linear scan to signal matching consumers. As usage patterns have continued to have more consumers that are filteres and sparse, meaning a message is destined for a single or small number of consumers. This change moves selection to a sublist that tracks only active consumer leaders for selection, which optimizes selection of consumers to signal when the number of consumers is large. Signed-off-by: Derek Collison <derek@nats.io>	2022-12-13 15:25:55 -08:00
Marco Primi	f8a030bc4a	Use testing.TempDir() where possible Refactor tests to use go built-in temporary directory utility for tests. Also avoid binding to default port (which may be in use)	2022-12-12 13:18:44 -08:00
Derek Collison	894115b82b	Fix for server panic when consumer state was not decoded correctly. The bug was when a timestamp for the pending state was exactly -1 which could happen based on timing of the redlivered pending items which would set pending.Timestamp into the future potentially and the timing on the encodeConsumerState call. Minor fixes to raft. Signed-off-by: Derek Collison <derek@nats.io>	2022-12-06 14:16:20 -08:00
Derek Collison	9f241f3322	Offload signaling to consumers when number is large. Signed-off-by: Derek Collison <derek@nats.io>	2022-11-15 11:25:07 -08:00
Derek Collison	4dab6ce92c	Fix test timing Signed-off-by: Derek Collison <derek@nats.io>	2022-11-09 19:44:22 -08:00
Derek Collison	c6031382a1	Fix for #3499 When we deleted a consumer from an interest policy stream we would make sure to clean up any unacked messages. However we only based start from the ack floor for the consumer and did not take into account the first sequence of the stream. Signed-off-by: Derek Collison <derek@nats.io>	2022-11-05 13:56:45 -07:00
Ivan Kozlovic	170ff49837	[ADDED] JetStream: peer (the hash of server name) in statsz/jsz A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something like this: ``` ... "meta_cluster": { "name": "local", "leader": "A", "peer": "NUmM6cRx", "replicas": [ { "name": "B", "current": true, "active": 690369000, "peer": "b2oh2L6w" }, { "name": "Server name unknown at this time (peerID: jZ6RvVRH)", "current": false, "offline": true, "active": 0, "peer": "jZ6RvVRH" } ], "cluster_size": 3 } ``` Note the "peer" field following the "leader" field that contains the server name. The new field is the node ID, which is a hash of the server name. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-09-16 15:31:37 -06:00
Derek Collison	6c97733bb8	Optimize needAck. Signed-off-by: Derek Collison <derek@nats.io>	2022-09-14 16:25:50 -07:00
Derek Collison	d979937bbd	Merge pull request #3456 from nats-io/max-bytes-pull [IMPROVED] Pull request logic	2022-09-08 12:08:10 -07:00
Derek Collison	dedf21d45d	Fix for issue #3455 When hitting max ack pending from getNextMsg would remove one shots incorrectly. Signed-off-by: Derek Collison <derek@nats.io>	2022-09-08 11:56:57 -07:00
Ivan Kozlovic	ae0d808f5b	Merge pull request #3457 from nats-io/cleanup_tests Fixed some tests	2022-09-08 12:24:07 -06:00
jnmoyne	95c1946231	Implements pagination for JS Stream Info requests	2022-09-08 10:45:20 -07:00
Ivan Kozlovic	b69ffe244e	Fixed some tests Code change: - Do not start the processMirrorMsgs and processSourceMsgs go routine if the server has been detected to be shutdown. This would otherwise leave some go routine running at the end of some tests. - Pass the fch and qch to the consumerFileStore's flushLoop otherwise in some tests this routine could be left running. Tests changes: - Added missing defer NATS connection close - Added missing defer server shutdown Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-09-08 11:28:23 -06:00
Derek Collison	9c3bd17059	Updates to tests Signed-off-by: Derek Collison <derek@nats.io>	2022-09-06 13:33:39 -07:00
Derek Collison	98bf861a7a	Updates to stream and consumer move logic. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-30 16:11:35 -07:00
Derek Collison	5f0ecef6f3	When writing a msg after the fss state was expired we would count the msg twice. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-30 05:38:16 -07:00
Derek Collison	d04763eb7d	CAS operations improved, hold lock past store. Use separate lock for consumer list and storage updates. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-24 18:30:44 -07:00
Ivan Kozlovic	02ecda535c	Stop the raft node to not cause test to flap. Test TestNoRaceJetStreamClusterCorruptWAL() would start to flap because of the snapshot on cluster shutdown. Disable the snapshot on exit for this test by stopping the raft node before shutdown. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-16 18:44:32 -06:00
Ivan Kozlovic	7de4497815	Install consumer snapshot on clean exit and few other fixes - didRemove in applyMetaEntries() could be reset when processing multiple entries - change "no race" test names to include JetStream - separate raft nodes leader stepdown and stop in server shutdown process - in InstallSnapshot, call wal.Compact() with lastIndex+1 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-16 17:05:49 -06:00
Derek Collison	8c04adc009	Improvements to filestore for large KVs. Use better indexing for lookups, we used to do simple linear scan backwards, now track first and last block. Will expire the fss cache at will to reduce memory usage. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-09 15:51:13 -05:00
Derek Collison	06112d6885	Reset activity interval on catchup to default vs ramp up. Tweak test. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-08 11:06:10 -06:00
Derek Collison	758b733d43	Attempt to improve long RTT catchup time during stream moves. Signed-off-by: Derek Collison <derek@nats.io>	2022-08-08 11:06:10 -06:00
Ivan Kozlovic	3c9a7cc6e5	Move to Go 1.19, remote io/util, fix data race and a flapper Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-08-05 09:55:37 -06:00
Ivan Kozlovic	fe370955c8	Merge pull request #3288 from nats-io/debug_test_failure [FIXED] JetStream: Some scaling up issues	2022-07-26 08:57:17 -06:00
Ivan Kozlovic	1a6c5f1c90	[FIXED] JetStream: Some scaling up issues - Send snapshot only if leader - When processing snapshot, start with a smaller inactivity interval that will double up to 10sec or use 10sec directly once we get a message. Reason for that is that it is possible that the request for snapshot is sent while the leader has not yet setup the subscription that receives the requests (or subscription has not fully reached the cluster). - Don't remember snapfile on err. - Do not consider current if we have not had any activity. - Stabilize stream scale up under active heavy publishing. - Due to the publish pressure move the check for followers direct subs spinning up til after we stop publishing. Signed-off-by: Derek Collison <derek@nats.io> Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-07-25 18:44:18 -06:00
Ivan Kozlovic	ebeca00e20	[FIXED] JetStream/Cluster: Stream names/infos would return bad response If there are more stream names that the current limit of 1024, getting the list of names would return them all instead of using pagination. For "stream infos", the Total amount returned would be the API limit instead of the actual number of streams. Resolves https://github.com/nats-io/natscli/issues/541 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-07-25 14:41:05 -06:00
Derek Collison	69f522cb9f	Make sure to clean up client connection Signed-off-by: Derek Collison <derek@nats.io>	2022-07-06 19:29:32 -07:00
Derek Collison	f8939b40bc	Do not unsubscribe from direct access on leader stepdown, only stopping. Also wait for stream to have replicas and leader for test. Signed-off-by: Derek Collison <derek@nats.io>	2022-07-06 16:20:12 -07:00
Derek Collison	81a9906ad9	Wait a bit longer for the direct sub Signed-off-by: Derek Collison <derek@nats.io>	2022-07-03 12:54:15 -07:00
Derek Collison	47bef915ed	Allow all members of a replicated stream to participate in direct access. We will wait until a non-leader replica is current to subscribe. Signed-off-by: Derek Collison <derek@nats.io>	2022-07-03 11:08:24 -07:00
Ivan Kozlovic	4bf81420e2	[FIXED] Fast routed JetStream API requests were dropped If a JS API request is received from a non client connection, it was processed in its own go routine. To reduce the number of such go routine, we were limiting the number of outstanding routines to 4096. However, in some situations, it was possible to issue many requests at the same time that would then cause those requests to be dropped. (an example was an MQTT benchmark tool that would create 5000 sessions, each with one QoS1 R1 consumer (with the use of consumer_replicas=1). On abrupt exit of the tool, the consumers and their sessions needed to be deleted. Since would cause fast incoming delete consumer requests which would cause the original code to drop some of them) Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-05-23 11:15:55 -06:00
Derek Collison	790d643431	Consumer's num pending can now rely on the stream's store vs trying to maintain furing runtime which could be wrong under certain conditions. Signed-off-by: Derek Collison <derek@nats.io>	2022-05-20 08:45:43 -07:00
Derek Collison	ef3eea4d73	Speed up raft for tests Signed-off-by: Derek Collison <derek@nats.io>	2022-05-18 16:28:58 -07:00
Ivan Kozlovic	cadf921ed1	[FIXED] JetStream: PullConsumer MaxWaiting==1 and Canceled requests There was an issue with MaxWaiting==1 that was causing a request with expiration to actually not expire. This was because processWaiting would not pick it up because wq.rp was actually equal to wq.wp (that is, the read pointer was equal to write pointer for a slice of capacity of 1). The other issue was that when reaching the maximum of waiting pull requests, a new request would evict an old one with a "408 Request Canceled". There is no reason for that, instead the server will first try to find some existing expired requests (since some of the expiration is lazily done), but if none is expired, and the queue is full, the server will return a "409 Exceeded MaxWaiting" to the new request, and not a "408 Request Canceled" to an old one... Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-05-03 15:17:20 -06:00
Ivan Kozlovic	d4d37e67f4	[FIXED] JetStream: file store compact and when to write index When deciding to compact a file, we need to remove from the raw bytes the empty records, otherwise, for small messages, we would end-up calling compact() too many times. When removing a message from the stream, in FIFO cases we would write the index every 2 seconds at most when doing it in place, when when dealing with out of order deletes, we would do it for every single delete, which can be costly. We are now writing only every 500ms for non FIFO cases. Also fixed some unrelated code: - Decision to install a snapshot was based on incorrect logical expression - In checkPending(), protect against the timer being nil which could happen when consumer is stopped or leadership change. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-28 12:35:19 -06:00
Ivan Kozlovic	0e2ab5eeea	Changes to tests that run on Travis - Remove code coverage from Travis and add it to a GitHub Action that will be run as a nightly. - Use tag builds to exclude some tests, such as the "norace" or JS tests. Since "go test" does not support "negative" regexs, there is no other way. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2022-04-26 14:11:31 -06:00

1 2 3 4

162 Commits