nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-02 03:38:42 -07:00

Author	SHA1	Message	Date
Ivan Kozlovic	49907c4537	[FIXED] Configuration Reload: possible panic if done during Shutdown If a configuration reload is issued as the server is being shutdown, we could get 2 different panics. One due to JetStream if an account is JetStream enabled, and one due to the send to a go channel that has been closed. ``` panic: send on closed channel [recovered] panic: send on closed channel goroutine 440 [running]: testing.tRunner.func1.2({0x1038d58e0, 0x1039e1270}) /usr/local/go/src/testing/testing.go:1545 +0x274 testing.tRunner.func1() /usr/local/go/src/testing/testing.go:1548 +0x448 panic({0x1038d58e0?, 0x1039e1270?}) /usr/local/go/src/runtime/panic.go:920 +0x26c github.com/nats-io/nats-server/v2/server.(Server).reloadAuthorization(0xc00024fb00) /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1998 +0x788 github.com/nats-io/nats-server/v2/server.(Server).applyOptions(0xc00024fb00, 0xc00021dc00, {0xc00038e4e0, 0x2, 0xc00021dc28?}) /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1746 +0x2b8 github.com/nats-io/nats-server/v2/server.(Server).reloadOptions(0xc000293500?, 0xc000118a80, 0xc000293500) /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1121 +0x178 github.com/nats-io/nats-server/v2/server.(Server).ReloadOptions(0xc00024fb00, 0xc000293500) /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1060 +0x368 github.com/nats-io/nats-server/v2/server.(Server).Reload(0xc00024fb00) /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/reload.go:995 +0x104 ``` and ``` panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x10077b224] goroutine 8 [running]: testing.tRunner.func1.2({0x101351640, 0x101b7d2a0}) /usr/local/go/src/testing/testing.go:1545 +0x274 testing.tRunner.func1() /usr/local/go/src/testing/testing.go:1548 +0x448 panic({0x101351640?, 0x101b7d2a0?}) /usr/local/go/src/runtime/panic.go:920 +0x26c github.com/nats-io/nats-server/v2/server.(Account).EnableJetStream(0xc00020fb80, 0xc000220240) /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:1045 +0xa4 github.com/nats-io/nats-server/v2/server.(Server).configJetStream(0xc000226d80, 0xc00020fb80) /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:707 +0xdc github.com/nats-io/nats-server/v2/server.(Server).configAllJetStreamAccounts(0xc000226d80) /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:768 +0x2b0 github.com/nats-io/nats-server/v2/server.(Server).enableJetStreamAccounts(0xc000226d80?) /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/jetstream.go:637 +0x128 github.com/nats-io/nats-server/v2/server.(Server).reloadAuthorization(0xc000226d80) /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:2039 +0x93c github.com/nats-io/nats-server/v2/server.(Server).applyOptions(0xc000226d80, 0xc000171c50, {0xc000074600, 0x2, 0xc000171c78?}) /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1746 +0x2b8 github.com/nats-io/nats-server/v2/server.(Server).reloadOptions(0xc000276000?, 0xc0000a6000, 0xc000276000) /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1121 +0x178 github.com/nats-io/nats-server/v2/server.(Server).ReloadOptions(0xc000226d80, 0xc000276000) /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:1060 +0x368 github.com/nats-io/nats-server/v2/server.(Server).Reload(0xc000226d80) /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/reload.go:995 +0x104 ``` Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-10-16 15:25:02 -06:00
Derek Collison	4df6c9aeb8	[ADDED] TLS: Handshake First for client connections (#4642 ) A new option instructs the server to perform the TLS handshake first, that is prior to sending the INFO protocol to the client. Only clients that implement equivalent option would be able to connect if the server runs with this option enabled. The configuration would look something like this: ``` ... tls { cert_file: ... key_file: ... handshake_first: true } ``` The same option can be set to "auto" or a Go time duration to fallback to the old behavior. This is intended for deployments where it is known that not all clients have been upgraded to a client library providing the TLS handshake first option. After the delay has elapsed without receiving the TLS handshake from the client, the server reverts to sending the INFO protocol so that older clients can connect. Clients that do connect with the "TLS first" option will be marked as such in the monitoring's Connz page/result. It will allow the administrator to keep track of applications still needing to upgrade. The configuration would be similar to: ``` ... tls { cert_file: ... key_file: ... handshake_first: auto } ``` With the above value, the fallback delay used by the server is 50ms. The duration can be explcitly set, say 300 milliseconds: ``` ... tls { cert_file: ... key_file: ... handshake_first: "300ms" } ``` It is understood that any configuration other that "true" will result in the server sending the INFO protocol after the elapsed amount of time without the client initiating the TLS handshake. Therefore, for administrators that do not want any data transmitted in plain text, the value must be set to "true" only. It will require applications to be updated to a library that provides the option, which may or may not be readily available. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-10-16 07:49:08 -07:00
Derek Collison	a797f0d794	Add fan-in/out benchmarks (#4660 ) Benchmarks for NATS core fan-in and fan-out pattern workloads. Signed-off-by: Reuben Ninan <reuben@nats.io>	2023-10-14 10:00:31 -07:00
R.I.Pienaar	d61ecf8a89	Report the raft group name in stream and consumer info Signed-off-by: R.I.Pienaar <rip@devco.net>	2023-10-14 12:28:36 +03:00
Reuben Ninan	524c1f544a	Add fan-in/out benchmarks Signed-off-by: reubenninan <reuben@nats.io>	2023-10-14 00:56:09 -04:00
Waldemar Quevedo	996bf2bf1c	Release v2.10.3 Signed-off-by: Waldemar Quevedo <wally@nats.io>	2023-10-12 13:46:11 -07:00
Derek Collison	e2414e6a04	Bump to 2.10.3-RC.3 Signed-off-by: Derek Collison <derek@nats.io>	2023-10-12 13:12:19 -07:00
Derek Collison	0a64f18060	Only mark fs as dirty vs full write on mb compaction. Signed-off-by: Derek Collison <derek@nats.io>	2023-10-12 12:59:19 -07:00
Derek Collison	ea70590aa2	Bump to 2.10.3-RC.2 Signed-off-by: Derek Collison <derek@nats.io>	2023-10-12 12:35:54 -07:00
Derek Collison	b7b40b0a69	Fixed a bug that was not correctly selecting next first because it was not skipping dbit entries. This could result in lookups failing, e.g. after a change in max msgs per subject to a lower value. Also fixed a bug that would not prperly update psim during compact when throwing away the whole block and a subject had more than one message. Signed-off-by: Derek Collison <derek@nats.io>	2023-10-12 10:58:37 -07:00
Derek Collison	1e8f6bf1e1	Fix updating a non unique consumer on workqueue stream not returning an error (#4654 ) This is a possible fix for #4653. Changes made: 1. Added tests for creating and updating consumers on a work queue stream with overlapping subjects. 2. Check for overlapping subjects before [updating](`a25af02c73/server/consumer.go (L770)`) the consumer config. 3. Changed [`func (*stream).partitionUnique(partitions []string) bool`](`a25af02c73/server/stream.go (L5269)`) to accept the consumer name being checked so we can skip it while checking for overlapping subjects (Required for [`FilterSubjects`](`a25af02c73/server/consumer.go (L75)`) updates), wasn't needed before because the checks were made on creation only. There's only 1 thing that I'm not sure about. In the [current work queue stream conflict checks](`a25af02c73/server/consumer.go (L796)`), the consumer config `Direct` is being checked if `false`, should we also make this check before the update? Signed-off-by: Pierre Mdawar <pierre@mdawar.dev>	2023-10-12 07:27:27 -07:00
Pierre Mdawar	c46d8093bc	Fix updating a non unique consumer on workqueue stream not returning an error	2023-10-12 12:18:24 +03:00
Derek Collison	38794e5af9	Bump to 2.10.3-RC.1 Signed-off-by: Derek Collison <derek@nats.io>	2023-10-11 08:26:09 -07:00
Derek Collison	94545f3206	[FIXED] Compaction with compression and added out of band compaction (#4645 ) This will also reclaim more space for streams with lots of interior deletes. Signed-off-by: Derek Collison <derek@nats.io>	2023-10-11 08:22:10 -07:00
Derek Collison	842d600e3f	Grab blk fn while mb lock held Signed-off-by: Derek Collison <derek@nats.io>	2023-10-11 07:54:36 -07:00
Lev Brouk	de1282c98d	Fixed a crash in MQTT outgoing PUBREL This really was a cut/paste/typo error. The effect was that when there was a pending PUBREL in JetStream, we would sometimes attempt to deliver it immediately once the client connected, cpending was already initialized, but the pubrel map was not (yet).	2023-10-10 18:08:18 -07:00
Derek Collison	f4387ec74e	Fix for compaction with compression and added an out of band compaction in syncBlocks to reclaim more space. Signed-off-by: Derek Collison <derek@nats.io>	2023-10-10 17:17:55 -07:00
Ivan Kozlovic	ce96de2ed5	[ADDED] TLS: Handshake First for client connections A new option instructs the server to perform the TLS handshake first, that is prior to sending the INFO protocol to the client. Only clients that implement equivalent option would be able to connect if the server runs with this option enabled. The configuration would look something like this: ``` ... tls { cert_file: ... key_file: ... handshake_first: true } ``` The same option can be set to "auto" or a Go time duration to fallback to the old behavior. This is intended for deployments where it is known that not all clients have been upgraded to a client library providing the TLS handshake first option. After the delay has elapsed without receiving the TLS handshake from the client, the server reverts to sending the INFO protocol so that older clients can connect. Clients that do connect with the "TLS first" option will be marked as such in the monitoring's Connz page/result. It will allow the administrator to keep track of applications still needing to upgrade. The configuration would be similar to: ``` ... tls { cert_file: ... key_file: ... handshake_first: auto } ``` With the above value, the fallback delay used by the server is 50ms. The duration can be explcitly set, say 300 milliseconds: ``` ... tls { cert_file: ... key_file: ... handshake_first: "300ms" } ``` It is understood that any configuration other that "true" will result in the server sending the INFO protocol after the elapsed amount of time without the client initiating the TLS handshake. Therefore, for administrators that do not want any data transmitted in plain text, the value must be set to "true" only. It will require applications to be updated to a library that provides the option, which may or may not be readily available. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-10-10 09:46:01 -06:00
Byron Ruth	4ab65b1871	Bump v2.10.3 Signed-off-by: Byron Ruth <byron@nats.io>	2023-10-06 16:39:45 -04:00
Byron Ruth	f8c9d8e686	Release v2.10.2 Signed-off-by: Byron Ruth <byron@nats.io>	2023-10-06 15:23:06 -04:00
Derek Collison	0c3609ed2a	Bump to 2.10.2-RC.15 Signed-off-by: Derek Collison <derek@nats.io>	2023-10-06 09:58:55 -07:00
Derek Collison	f29c7863e7	[FIXED] Setting initial min on dmap caused subtle bugs with dmap. (#4631 ) Under heavy load with max msgs per subject of 1 the dmap, when considered empty and resetting the initial min, could cause lookup misses that would lead to excess messages in a stream and longer restore issues. Signed-off-by: Derek Collison <derek@nats.io>	2023-10-06 09:58:17 -07:00
Derek Collison	dd646f6b71	Set initial min on dmap caused subtle bugs with dmap. Some minor cleanup. Signed-off-by: Derek Collison <derek@nats.io>	2023-10-06 09:42:09 -07:00
Lev	beee6fc72a	[FIXED] MQTT PUBREL header incompatibility (#4616 ) https://hivemq.github.io/mqtt-cli/docs/test/ pointed out the incompatibility.	2023-10-05 08:07:50 -07:00
Waldemar Quevedo	4e414f1f05	Skip processing consumer assignments after JS has shutdown (#4625 ) Signed-off-by: Waldemar Quevedo <wally@nats.io>	2023-10-04 13:17:22 -07:00
Neil Twigg	7124dc7bdc	Revert changes to `nbPoolPut`, force compressor to forget byte buffer Signed-off-by: Neil Twigg <neil@nats.io>	2023-10-04 17:41:36 +01:00
Neil Twigg	e20ca9043f	Don't append empty slices in the unfragmented path Signed-off-by: Neil Twigg <neil@nats.io>	2023-10-04 17:18:47 +01:00
Neil Twigg	6b65452bc7	Reduce allocations in WebSocket compression Signed-off-by: Neil Twigg <neil@nats.io>	2023-10-04 12:36:32 +01:00
Derek Collison	dbe700d192	Bump to 2.10.0-RC.14 Signed-off-by: Derek Collison <derek@nats.io>	2023-10-03 16:11:30 -07:00
Derek Collison	3f1afb4ca2	[IMPROVED] Bumped inflight updates to 16 and move one lock to rlock. (#4621 ) Signed-off-by: Derek Collison <derek@nats.io>	2023-10-03 16:10:59 -07:00
Derek Collison	2d21bc7008	Fix datarace Signed-off-by: Derek Collison <derek@nats.io>	2023-10-03 15:35:20 -07:00
Derek Collison	1ccc6dbf30	Bumped inflight updates to 16 and move one lock to rlock. Signed-off-by: Derek Collison <derek@nats.io>	2023-10-03 15:01:34 -07:00
Derek Collison	2f1a384bcb	Holding onto the compressor and not recycling the interbal byte slice was causing havoc with GC. This needs to be improved but this at least should allow the GC to cleanup more effectively. Signed-off-by: Derek Collison <derek@nats.io>	2023-10-03 14:39:00 -07:00
Derek Collison	195227edfd	Bump to 2.10.0-RC.12 Signed-off-by: Derek Collison <derek@nats.io>	2023-10-02 09:53:30 -07:00
Derek Collison	e4ca15c2c3	Optimize locking for consumer info Signed-off-by: Derek Collison <derek@nats.io>	2023-10-02 09:22:44 -07:00
Derek Collison	4165f869d2	Bump to 2.10.2-RC.11 Signed-off-by: Derek Collison <derek@nats.io>	2023-10-01 08:18:28 -07:00
Derek Collison	00839280fb	[IMPROVED] Reduce contention for high connections in a JetStream enabled account with high API usage. (#4613 ) Several strategies are used which are listed below. 1. Checking a RaftNode to see if it is the leader now uses atomics. 2. Checking if we are the JetStream meta leader from the server now uses an atomic. 3. Accessing the JetStream context no longer requires a server lock, uses atomic.Pointer. 4. Filestore syncBlocks would hold msgBlock locks during sync, now does not. Signed-off-by: Derek Collison <derek@nats.io>	2023-10-01 08:17:15 -07:00
Derek Collison	dba03dbc2f	Optimizations to reduce contention for high connections in a JetStream enabled account with high API usage. Several strategies which are listed below. 1. Checking a RaftNode to see if it is the leader now uses atomics. 2. Checking if we are the JetStream meta leader from the server now uses an atomic. 3. Accessing the JetStream context no longer requires a server lock, uses atomic.Pointer. 4. Filestore syncBlocks would hold msgBlock locks during sync, now does not. Signed-off-by: Derek Collison <derek@nats.io>	2023-09-30 14:52:15 -07:00
Tomasz Pietrek	1f4b986125	Fix consumer info if consumer was closed Co-authored-by: Derek Collison <derek@nats.io> Signed-off-by: Tomasz Pietrek <tomasz@nats.io> Signed-off-by: Derek Collison <derek@nats.io> Signed-off-by: Tomasz Pietrek <tomasz@nats.io>	2023-09-29 21:40:55 +02:00
Neil Twigg	212d92ca7e	Add more pprof labels to consumers, sources, mirrors Signed-off-by: Neil Twigg <neil@nats.io>	2023-09-29 19:12:47 +01:00
Derek Collison	720ac605a2	Bump to 2.10.0-RC.10 Signed-off-by: Derek Collison <derek@nats.io>	2023-09-28 14:43:08 -07:00
Derek Collison	c9fa001ebf	[IMPROVED] Add in additional warning when subject skew detected (#4606 ) Signed-off-by: Derek Collison <derek@nats.io>	2023-09-28 14:42:29 -07:00
Derek Collison	fa5b7afcb6	[FIXED] Do not bypass authorization blocks when turning on $SYS account access (#4605 ) Only setup auto no-auth for $G account iff no authorization block was defined. Signed-off-by: Derek Collison <derek@nats.io> Resolves #4535	2023-09-28 14:17:24 -07:00
Derek Collison	cb74f3f26e	Add in additional warning when subject skew detected Signed-off-by: Derek Collison <derek@nats.io>	2023-09-28 14:16:27 -07:00
Derek Collison	2737c56352	Only setup auto no-auth for $G account iff no authorization block was defined. Signed-off-by: Derek Collison <derek@nats.io>	2023-09-28 13:51:45 -07:00
Derek Collison	3d5564bbb1	[FIXED] Flapping TestMQTTLockedSession (#4604 ) A test-only fix. I can not reproduce the flapping behavior, but did see a race during debugging suggesting that the CONNACK is delivered to the test before `mqttProcessConnect` finishes and releases the record.	2023-09-28 13:16:46 -07:00
Lev Brouk	214711654e	PR feedback: use checkFor	2023-09-28 12:42:18 -07:00
Lev Brouk	a05d4416ef	PR feedback: nit	2023-09-28 12:02:35 -07:00
Derek Collison	783edaa36d	[FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once. (#4592 ) - [X] Tests added - [X] Branch rebased on top of current main (`git pull --rebase origin main`) - [X] Changes squashed to a single commit (described [here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)) - [x] Build is green in Travis CI - [X] You have certified that the contribution is your original work and that you license the work to the project under the [Apache 2 license](https://github.com/nats-io/nats-server/blob/main/LICENSE) ### Changes proposed in this pull request: Fixes a race condition in some leader failover scenarios leading to messages being potentially sourced more than once. In some failure scenarios where the current leader of a stream sourcing from other stream(s) gets shutdown while publications are happening on the stream(s) being sourced leads to `setLeader(true)` being called on the new leader for the sourcing stream before all the messages having been sourced by the previous leader are completely processed such that when the new leader does it's reverse scan from the last message in it's view of the stream in order to know what sequence number to start the consumer for the stream being sourced from, such that the last message(s) sourced by the previous leader get sourced again, leading to some messages being sourced more than once. The existing `TestNoRaceJetStreamSuperClusterSources` test would sidestep the issue by relying on the deduplication window in the sourcing stream. Without deduplication the test is a flapper. This avoid the race condition by adding a small delay before scanning for the last message(s) having been sourced and starting the sources' consumer(s). Now the test (without using the deduplication window) never fails because more messages than expected have been received in the sourcing stream. (Also adds a guard to give up if `setupSourceConsumers()` is called and we are no longer the leader for the stream (that check was already present in `setupMirrorConsumer()` so assuming it was forgotten for `setupSourceConsumers()`)	2023-09-28 11:22:20 -07:00
Lev Brouk	4b59efd6e7	[FIXED] Flapping TestMQTTLockedSession	2023-09-28 11:13:48 -07:00

1 2 3 4 5 ...

5718 Commits