nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-17 03:24:40 -07:00

Author	SHA1	Message	Date
Derek Collison	e7b01c4154	Merge branch 'main' into dev	2023-05-02 16:30:00 -07:00
Derek Collison	9ef71893db	Bump to 2.9.17-beta.4 Signed-off-by: Derek Collison <derek@nats.io>	2023-05-02 09:43:11 -07:00
Derek Collison	188eea42cc	[IMPROVED] Do not hold filestore lock during remove that needs to do IO. (#4123 ) When removing a msg and we need to load the msg block and incur IO, unlock fs lock to avoid stalling other activity on other blocks. E.g removing and adding msgs at the same time. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-02 09:42:38 -07:00
Derek Collison	4a58feff27	When removing a msg and we need to load the msg block and incur IO, unlock fs lock to avoid stalling other activity on other blocks. E.g removing and adding msgs at the same time. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-02 08:56:43 -07:00
Derek Collison	eb1eb3c49e	Merge branch 'main' into dev	2023-05-01 16:29:35 -07:00
Derek Collison	ff6c80350b	[FIXED] A stream raft node could stay running after a stop(). (#4118 ) This can happen when we reset a stream internally and the stream had a prior snapshot. Also make sure to always release resources back to the account regardless if the store is no longer present. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-01 16:23:03 -07:00
Derek Collison	c24229287f	[ADDED] LeafNode: TLSHandshakeFirst option (#4119 ) A new field in `tls{}` blocks force the server to do TLS handshake before sending the INFO protocol. ``` leafnodes { port: 7422 tls { cert_file: ... ... handshake_first: true } remotes [ { url: tls://host:7423 tls { ... handshake_first: true } } ] } ``` Note that if `handshake_first` is set in the "accept" side, the first `tls{}` block in the example above, a server trying to create a LeafNode connection to this server would need to have `handshake_first` set to true inside the `tls{}` block of the corresponding remote. Configuration reload of leafnodes is generally not supported, but TLS certificates can be reloaded and the support for this new field was also added. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-05-01 16:12:27 -07:00
Derek Collison	f098c253aa	Make sure we adjust accounting reservations when deleting a stream with any issues. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-01 15:54:37 -07:00
Ivan Kozlovic	0a02f2121c	[ADDED] LeafNode: TLSHandhsakeFirst option A new field in `tls{}` blocks force the server to do TLS handshake before sending the INFO protocol. ``` leafnodes { port: 7422 tls { cert_file: ... ... handshake_first: true } remotes [ { url: tls://host:7423 tls { ... handshake_first: true } } ] } ``` Note that if `handshake_first` is set in the "accept" side, the first `tls{}` block in the example above, a server trying to create a LeafNode connection to this server would need to have `handshake_first` set to true inside the `tls{}` block of the corresponding remote. Configuration reload of leafnodes is generally not supported, but TLS certificates can be reloaded and the support for this new field was also added. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-05-01 16:41:51 -06:00
Derek Collison	f5ac5a4da0	Fix for a bug that could leave a raft node running when stopping a stream. This can happen when we reset a stream internally and the stream had a prior snapshot. Also make sure to always release resources back to the account regardless if the store is no longer present. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-01 13:22:06 -07:00
Derek Collison	1eed0e8c75	Bump to 2.9.17-beta.3 Signed-off-by: Derek Collison <derek@nats.io>	2023-04-30 17:43:59 -07:00
Derek Collison	e158c46884	Merge branch 'main' into dev	2023-04-30 17:37:47 -07:00
Derek Collison	7ad2dd2510	[IMPROVED] Updating of a large fleet of leafnodes. (#4117 ) When a fleet of leafnodes are isolated (not routed but using same cluster) we could do better at optimizing how we update the other leafnodes since if they are all in the same cluster and we know we are isolated we can skip. We can improve further in 2.10. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-30 17:32:14 -07:00
Derek Collison	c15cc0054a	When a fleet of leafnodes are isolated (not routed but using same cluster) we could do better at optimizing how we update the other leafnodes. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-30 17:08:16 -07:00
Derek Collison	0321eb6484	Merge branch 'main' into dev	2023-04-29 19:52:57 -07:00
Derek Collison	91607d8459	[IMPROVED] Health repair (#4116 ) Under certain scenarios we have witnessed healthz() that will never return healthy due to a stream or consumer being missing or stopped. This will now allow the healthz() call to attempt to restart those assets. We will also periodically call this in clustered mode from the monitorCluster routine. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-29 18:02:12 -07:00
Derek Collison	b27ce6de80	Add in a few more places to check on jetstream shutting down. Add in a helper method. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-29 11:27:18 -07:00
Derek Collison	db972048ce	Detect when we are shutting down or if a consumer is already closed when removing a stream. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-29 11:18:10 -07:00
Derek Collison	4eb4e5496b	Do health check on startup once we have processed existing state. Also do health checks in separate go routine. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-29 09:36:35 -07:00
Derek Collison	fac5658966	If we fail to create a consumer, make sure to clean up any raft nodes in meta layer and to shutdown the consumer if created but we encountered an error. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-29 08:15:33 -07:00
Derek Collison	546dd0c9ab	Make sure we can recover an underlying node being stopped. Do not return healthy if the node is closed, and wait a bit longer for forward progress. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-29 07:42:23 -07:00
Derek Collison	85f6bfb2ac	Check healthz periodically Signed-off-by: Derek Collison <derek@nats.io>	2023-04-28 17:58:45 -07:00
Derek Collison	ac27fd046a	Fix data race Signed-off-by: Derek Collison <derek@nats.io>	2023-04-28 17:57:03 -07:00
Derek Collison	d107ba3549	Under certain scenarios we have witnessed healthz() that never retrun healthy due to a stream or consumer being missing or stopped. This will now allow the healthy call to attempt to restart those assets. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-28 17:11:08 -07:00
Derek Collison	0ba93ce6e5	[ADDED] Support for route S2 compression (#4115 ) The new field `compression` in the `cluster{}` block allows to specify which compression mode to use between servers. It can be simply specified as a boolean or a string for the simple modes, or as an object for the "s2_auto" mode where a list of RTT thresholds can be specified. By default, if no compression field is specified, the server will default to "accept", which means that a server will accept compression from a remote and switch to that same compression mode, but will otherwise not initiate compression. That is, if 2 servers are configured with "accept", then compression will actually be "off". If one of the server had say s2_fast then they would both use this mode. Here is the way to specify compression with a simple string: ``` cluster { .. # Possible values are "disabled", "off", "enabled", "on", # "accept", "s2_fast", "s2_better", "s2_best" or "s2_auto" compression: s2_fast } ``` If the compression field is simply set to "s2_auto", then the server will use default RTT thresholds of 10ms, 50ms and 100ms for the "uncompressed", "fast", "better" and "best" modes. To specify a different list of thresholds for the s2_auto, here is how it would look like: ``` cluster { .. compression: { mode: s2_auto # This means that for RTT up to 5ms (included), then # the compression level will be "uncompressed", then # from 5ms+ to 15ms, the mode will switch to "s2_fast", # then from 15ms+ to 50ms, the level will switch to # "s2_better", and anything above 50ms will result # in the "s2_best" compression mode. rtt_thresholds: [5ms, 15ms, 50ms] } } ``` If a server has compression mode set (other than "off") but connects to an older server, there will be no compression between those 2 routes. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-04-28 14:55:45 -07:00
Ivan Kozlovic	349f01e86a	Change the absence of compression setting to default to "accept" In that mode, a server accepts and will switch to same compression level than the remote (if one is set) but will not initiate compression. So if all servers in a cluster do not have compression setting set, it defaults to "accept" which means that compression is "off". Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-04-28 15:33:17 -06:00
Ivan Kozlovic	5b8c9ee364	Changes based on code review Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-04-28 14:34:32 -06:00
Ivan Kozlovic	70af04a63f	Other flappers. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-04-28 11:22:04 -06:00
Ivan Kozlovic	73ed55ae5b	Fixed flapper Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-04-28 10:55:32 -06:00
Ivan Kozlovic	8d2683a062	Fixed data race Reverts changes made in PR#4001: `105237cba8 (diff-1322a81c43dfdd05284ae128c43d9ea51c1a3b677587686561ef6de47024e14aR1340)` Since a fix was made here: `b78ec39b1f` the changes made in PR need to be reverted. The test TestRoutePoolAndPerAccountWithServiceLatencyNoDataRace now passes. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-04-28 10:18:14 -06:00
Ivan Kozlovic	d6fe9d4c2d	[ADDED] Support for route S2 compression The new field `compression` in the `cluster{}` block allows to specify which compression mode to use between servers. It can be simply specified as a boolean or a string for the simple modes, or as an object for the "s2_auto" mode where a list of RTT thresholds can be specified. By default, if no compression field is specified, the server will use the s2_auto mode with default RTT thresholds of 10ms, 50ms and 100ms for the "uncompressed", "fast", "better" and "best" modes. ``` cluster { .. # Possible values are "disabled", "off", "enabled", "on", # "accept", "s2_fast", "s2_better", "s2_best" or "s2_auto" compression: s2_fast } ``` To specify a different list of thresholds for the s2_auto, here is how it would look like: ``` cluster { .. compression: { mode: s2_auto # This means that for RTT up to 5ms (included), then # the compression level will be "uncompressed", then # from 5ms+ to 15ms, the mode will switch to "s2_fast", # then from 15ms+ to 50ms, the level will switch to # "s2_better", and anything above 50ms will result # in the "s2_best" compression mode. rtt_thresholds: [5ms, 15ms, 50ms] } } ``` Note that the "accept" mode means that a server will accept compression from a remote and switch to that same compression mode, but will otherwise not initiate compression. That is, if 2 servers are configured with "accept", then compression will actually be "off". If one of the server had say s2_fast then they would both use this mode. If a server has compression mode set (other than "off") but connects to an older server, there will be no compression between those 2 routes. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-04-27 17:59:25 -06:00
Derek Collison	c75127b966	Benchmarks for stream limits, combine tests and benchmarks into fewer files (#4098 ) - [ ] Link to issue, e.g. `Resolves #NNN` - [ ] Documentation added (if applicable) - [x] Tests added - [x] Branch rebased on top of current main (`git pull --rebase origin main`) - [x] Changes squashed to a single commit (described [here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)) - [x] Build is green in Travis CI - [x] You have certified that the contribution is your original work and that you license the work to the project under the [Apache 2 license](https://github.com/nats-io/nats-server/blob/main/LICENSE) ### Changes proposed in this pull request: - Add JetStream benchmark that measures publish throughput with different limits (MaxBytes, MaxMessages, MaxPerSubject, MaxAge, ...) - Merge `jetstream_chaos__test.go` and and `jetstream_benchmark__test.go` into two files (down from 8) Example output (filtered subset): ``` $ go test -v -bench 'BenchmarkJetStreamInterestStreamWithLimit/.R=1./Storage=Memory/.*' -run NONE -benchtime 200000x -count 1 goos: darwin goarch: arm64 pkg: github.com/nats-io/nats-server/v2/server BenchmarkJetStreamInterestStreamWithLimit jetstream_benchmark_interest_stream_limit_test.go:44: BatchSize: 100, MsgSize: 256, Subjects: 2500, Publishers: 4, Random Message: true BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1 BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory/unlimited jetstream_benchmark_interest_stream_limit_test.go:230: Stream: {clusterSize:1 replicas:1}, Storage: [Memory] Limit: [unlimited], Ops: 200000 BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory/unlimited-8 200000 9743 ns/op 26.28 MB/s 0 %error BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory/MaxMsg=1000 jetstream_benchmark_interest_stream_limit_test.go:230: Stream: {clusterSize:1 replicas:1}, Storage: [Memory] Limit: [MaxMsg=1000], Ops: 200000 BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory/MaxMsg=1000-8 200000 9800 ns/op 26.12 MB/s 0 %error BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory/MaxMsg=10 jetstream_benchmark_interest_stream_limit_test.go:230: Stream: {clusterSize:1 replicas:1}, Storage: [Memory] Limit: [MaxMsg=10], Ops: 200000 BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory/MaxMsg=10-8 200000 9717 ns/op 26.35 MB/s 0 %error BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory/MaxPerSubject=10 jetstream_benchmark_interest_stream_limit_test.go:230: Stream: {clusterSize:1 replicas:1}, Storage: [Memory] Limit: [MaxPerSubject=10], Ops: 200000 BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory/MaxPerSubject=10-8 200000 78796 ns/op 3.25 MB/s 0 %error BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory/MaxAge=1s jetstream_benchmark_interest_stream_limit_test.go:230: Stream: {clusterSize:1 replicas:1}, Storage: [Memory] Limit: [MaxAge=1s], Ops: 200000 BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory/MaxAge=1s-8 200000 9648 ns/op 26.53 MB/s 0 %error BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory/MaxBytes=1MB jetstream_benchmark_interest_stream_limit_test.go:230: Stream: {clusterSize:1 replicas:1}, Storage: [Memory] Limit: [MaxBytes=1MB], Ops: 200000 BenchmarkJetStreamInterestStreamWithLimit/N=1,R=1/Storage=Memory/MaxBytes=1MB-8 200000 9706 ns/op 26.38 MB/s 0 %error PASS ok github.com/nats-io/nats-server/v2/server 26.359s ``` cc: @jnmoyne	2023-04-27 16:53:31 -07:00
Marco Primi	82eade93b4	Merge JS Chaos tests into a single file	2023-04-27 14:56:55 -07:00
Marco Primi	7908d8c05c	Merge JS benchmarks into a single file	2023-04-27 14:56:55 -07:00
Marco Primi	df552351ec	Benchmark for interest-based stream with limits Measure publish throughput with different limits (MaxBytes, MaxMessages, MaxPerSubject, MaxAge, ...)	2023-04-27 14:56:55 -07:00
Derek Collison	f972165b0e	Bump to 2.9.17-beta.2 Signed-off-by: Derek Collison <derek@nats.io>	2023-04-27 14:30:19 -07:00
Derek Collison	c3b07df86f	The server's Start() used to block but no longer does. (#4111 ) This updates tests and the function comment. Signed-off-by: Derek Collison <derek@nats.io> Resolves #4110	2023-04-27 09:50:03 -07:00
Derek Collison	59e2107435	Fix test flapper Signed-off-by: Derek Collison <derek@nats.io>	2023-04-27 07:19:56 -07:00
Derek Collison	a66ac8cb9b	The server's Start() used to block but no longer does. This updates tests and function comment. Fix for #4110 Signed-off-by: Derek Collison <derek@nats.io>	2023-04-27 06:55:03 -07:00
Neil	3feb9f73b9	Add op type to `panic`s (#4108 ) This updates the `panic` messages on applying meta entries to include the faulty op type, so that we can better work out what's going on in these cases. Signed-off-by: Neil Twigg <neil@nats.io>	2023-04-27 13:02:28 +01:00
Neil Twigg	e30ea34625	Add op type to `panic`s Signed-off-by: Neil Twigg <neil@nats.io>	2023-04-27 11:38:52 +01:00
Derek Collison	d573b78aee	Merge branch 'main' into dev	2023-04-26 18:42:31 -07:00
Derek Collison	f584df4b4a	[IMPROVED] Clustered consumer improvements (#4107 ) Consumer state from Jsz() would not be consistent for a leader vs follower. ConsumerFileStore could encode an empty state or update an empty state on startup. We needed to make sure at the lowest level that the state was read from disk and not depend on the upper layer consumer. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-26 17:22:32 -07:00
Derek Collison	9999f63853	ConsumerFileStore could encode an empty state or update an empty state on startup. We needed to make sure at the lowest level that the state was read from disk and not depend on upper layer consumer. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-26 15:48:10 -07:00
Derek Collison	7f06d6f5a7	When Jsz() was asked for consumer details, would report incorrect data if not a consumer leader. This is due to the way state is maintained for leaders vs followers for consumers. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-26 15:03:15 -07:00
Derek Collison	4ebdb69daf	Merge branch 'main' into dev	2023-04-26 11:34:37 -07:00
Derek Collison	aea4a4115d	Stream migration update (#4104 ) I noticed that stream migration could be delayed due to transferring leadership while the new leader was still paused for a upper layer catchup, resulting in downgrading to a normal lost quorum vote. This allows a leadership transfer to move ahead once the upper layer resumes. Also check quicker but slow down if the state we need to have is not there yet. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-26 08:14:46 -07:00
Derek Collison	83293f86ff	Reduce threshold for compressing messages during a catchup Signed-off-by: Derek Collison <derek@nats.io>	2023-04-25 19:01:06 -07:00
Derek Collison	3c964a12d7	Migration could be delayed due to transferring leadership while the new leader was still paused. Also check quicker but slow down if the state we need to have is not there yet. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-25 18:58:49 -07:00
Neil	08d341801f	Restore outbound queue coalescing (#4093 ) This PR effectively reverts part of #4084 which removed the coalescing from the outbound queues as I initially thought it was the source of a race condition. Further investigation has proven that not only was that untrue (the race actually came from the WebSocket code, all coalescing operations happen under the client lock) but removing the coalescing also worsens performance. Signed-off-by: Neil Twigg <neil@nats.io>	2023-04-25 15:53:00 +01:00

1 2 3 4 5 ...

7452 Commits