nats-server

mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-02 03:38:42 -07:00

Author	SHA1	Message	Date
Derek Collison	75d274a636	If a NATS system has multiple domains make sure to process those during a remote update before bailing. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-13 18:36:42 -07:00
Derek Collison	a982bbcb73	[FIXED] Allow sorting by rtt for connz. (#4157 ) Signed-off-by: Derek Collison <derek@nats.io> Resolves #4150	2023-05-12 20:47:17 -07:00
Derek Collison	421775a32a	Fix to allow sorting by rtt for connz. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-12 20:22:07 -07:00
Derek Collison	c31e710d9e	[FIXED] Allow user filtering on connz for other user types like nkeys etc. (#4156 ) Signed-off-by: Derek Collison <derek@nats.io> Resolves #4149	2023-05-12 15:38:46 -07:00
Derek Collison	7f17e07d66	Filter by user at the end for closed connections Signed-off-by: Derek Collison <derek@nats.io>	2023-05-12 15:24:42 -07:00
Derek Collison	0c13f174c0	Fixed cap mistake in comment Signed-off-by: Derek Collison <derek@nats.io>	2023-05-12 15:07:00 -07:00
Derek Collison	c5eb46cb06	Make sure closed clients captures all user types and works with user filtering as well Signed-off-by: Derek Collison <derek@nats.io>	2023-05-12 15:05:40 -07:00
Derek Collison	90d1063674	Fix for #4149 to allow proper user filtering on connz for other user types. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-12 14:19:37 -07:00
Derek Collison	fc64c6119d	Use monotonic time for measuring time internally (#4154 ) - [x] Branch rebased on top of current main (`git pull --rebase origin main`) - [x] Changes squashed to a single commit (described [here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)) - [x] Build is green in Travis CI - [x] You have certified that the contribution is your original work and that you license the work to the project under the [Apache 2 license](https://github.com/nats-io/nats-server/blob/main/LICENSE)	2023-05-12 12:37:16 -07:00
Waldemar Quevedo	286a1632ca	Use monotonic time for measuring time internally Signed-off-by: Waldemar Quevedo <wally@nats.io>	2023-05-12 08:27:46 -07:00
Derek Collison	bdb0ba9ae5	[FIXED] Can't scale up some older streams (#4146 ) For some older R1 streams created by previous servers we could have no cluster for the stream assignment group which would prevent scale up with newer servers. This will inherit cluster if detected as absent from the placement tags or client cluster designation. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-10 21:42:59 -07:00
Derek Collison	5e029d08d5	For older R1 streams created by previous servers we could have no cluster for the stream assignment group which would prevent scale up with newer servers. This will inherit cluster if detected from placement tags or client cluster designation. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-10 17:59:28 -07:00
Derek Collison	2f2498ab7e	Bump to 2.9.17-beta.7 Signed-off-by: Derek Collison <derek@nats.io>	2023-05-10 15:32:45 -07:00
Derek Collison	81bf92b2c6	[IMPROVED] Leadership transfer (#4145 ) When doing leadership transfer stepdown as soon as we know we have sent the EntryLeaderTransfer entry. Delaying could allow something to be sent from the old leader which would cause the new leader to bail on being a candidate even though it would have gotten all the votes. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-10 15:30:38 -07:00
Derek Collison	a17357c6ae	When doing leadership transfer stepdown as soon as we know we have sent the EntryLeaderTransfer entry. Delaying could allow something to be sent from the old leader which would cause the new leader to bail on being a candidate even though it would have gotten all the votes. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-10 12:27:33 -07:00
Derek Collison	72485608d0	[IMPROVED] Leader transfer process (#4143 ) When doing a leader transfer clear vote state on leader and when non-chosen peers receive the update. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-10 08:24:15 -07:00
Derek Collison	717afae9ef	When doing a leader transfer clear vote state on leader and when non-chosen peers receive the update Signed-off-by: Derek Collison <derek@nats.io>	2023-05-10 07:49:22 -07:00
Derek Collison	c5c5a34fec	Bump to 2.9.17-beta.6 Signed-off-by: Derek Collison <derek@nats.io>	2023-05-09 20:12:22 -07:00
Derek Collison	b951cd155d	Improvements on raft leader handoff. (#4142 ) Signed-off-by: Derek Collison <derek@nats.io>	2023-05-09 18:22:37 -07:00
Derek Collison	b9af0d0294	Only do no-leader stepdown on transfer after a delay if we are still the leader Signed-off-by: Derek Collison <derek@nats.io>	2023-05-09 17:19:14 -07:00
Derek Collison	b44beb4b54	Make sure to update peer set and remove old peers after new leader takes over Signed-off-by: Derek Collison <derek@nats.io>	2023-05-09 15:15:02 -07:00
Derek Collison	6e6ce3a6f6	Backport outbound queues test changes (#4120 ) to `main` (#4139 ) This backports the changes to the outbound queues test to the `main` branch. Signed-off-by: Neil Twigg <neil@nats.io>	2023-05-09 07:41:23 -07:00
Neil Twigg	d7ae2cbb5f	Backport #4120 to `main` Signed-off-by: Neil Twigg <neil@nats.io>	2023-05-09 11:24:35 +01:00
Derek Collison	76f4358349	[IMPROVED] Optimizations for large single hub account leafnode fleets. (#4135 ) Added a leafnode lock to allow better traversal without copying of large leafnodes in a single hub account. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-06 09:53:08 -07:00
Derek Collison	80db7a22ab	Optimizations for large single hub account leafnode fleets. Added a leafnode lock to allow better traversal without copying of large leafnodes in a single hub account. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-05 13:14:49 -07:00
Waldemar Quevedo	40ea58fc51	Stop using UTC for time in flushClients (#4132 ) In #1943 it was adopted to use `UTC()` in some timestamps, but an unintended side effect from this is that it strips the monotonic time (`e5646b23de`), so it can be prone to clock skews when subtracting time in other areas of the code.	2023-05-04 17:35:50 -07:00
Waldemar Quevedo	b886fed2fb	Stop using UTC for time for flushClients In #1943 it was adopted to use `UTC()` in some timestamps, but an unintended side effect from this is that it strips the monotonic time, so it can be prone to clock skews when subtracting time in other areas of the code. `e5646b23de`	2023-05-04 15:50:45 -07:00
Derek Collison	da8aeac91b	Fix flapper Signed-off-by: Derek Collison <derek@nats.io>	2023-05-03 21:00:17 -07:00
Derek Collison	ae73e6a573	Bump to 2.9.17-beta.5 Signed-off-by: Derek Collison <derek@nats.io>	2023-05-03 19:50:21 -07:00
Derek Collison	413486f57d	[IMPROVED] Protect against usage drift (#4131 ) If we detect a drift for any unforeseen reason correct it. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-03 19:49:39 -07:00
Derek Collison	21239022bd	Protect against usage drift for any unforseen reason and if detected correct. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-03 17:09:06 -07:00
Derek Collison	793db749ff	[FIXED] Subscription interest issue due to configuration reload (#4130 ) This would impact only cases with accounts defined in configuration file (as opposed to operator mode). During the configuration reload, new accounts and sublists were created to later be replaced with existing ones. That left a window of time where a subscription could have been added (or attempted to be removed) from the "wrong" sublist. This could lead to route subscriptions seemingly not being forwarded. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-05-03 16:15:33 -07:00
Ivan Kozlovic	8a4ead22bc	Updates based on code review Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-05-03 16:14:51 -06:00
Ivan Kozlovic	7afe76caf8	Fixed Sublist.RemoveBatch to remove subs present, even if one isn't I have seen cases, maybe due to previous issue with configuration reload that would miss subscriptions in the sublist because of the sublist swap, where we would attempt to remove subscriptions by batch but some were not present. I would have expected that all present subscriptions would still be removed, even if the call overall returned an error. This is now fixed and a test has been added demonstrating that even on error, we remove all subscriptions that were present. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-05-03 15:21:26 -06:00
Ivan Kozlovic	95e4f2dfe1	Fixed accounts configuration reload Issues could manifest with subscription interest not properly propagated. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-05-03 14:35:06 -06:00
Ivan Kozlovic	840c264f45	Cleanup use of s.opts and fixed some lock (deadlock/inversion) issues One should not access s.opts directly but instead use s.getOpts(). Also, server lock needs to be released when performing an account lookup (since this may result in server lock being acquired). A function was calling s.LookupAccount under the client lock, which technically creates a lock inversion situation. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>	2023-05-03 14:09:02 -06:00
Derek Collison	b61e411b44	Fix race in reload and gateway sublist check (#4127 ) Fixes the following race: during reload account sublist can be changed: `2699465596/server/reload.go (L1598-L1610)` so this can become a race while checking interest in the gateway code here: `79de3302be/server/gateway.go (L2683)` ``` === RUN TestJetStreamSuperClusterPeerReassign ================== WARNING: DATA RACE Write at 0x00c0010854f0 by goroutine 15595: github.com/nats-io/nats-server/v2/server.(Server).reloadAuthorization.func2() /home/travis/gopath/src/github.com/nats-io/nats-server/server/reload.go:1610 +0x486 sync.(Map).Range() /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/sync/map.go:354 +0x225 github.com/nats-io/nats-server/v2/server.(Server).reloadAuthorization() /home/travis/gopath/src/github.com/nats-io/nats-server/server/reload.go:1594 +0x35d github.com/nats-io/nats-server/v2/server.(Server).applyOptions() /home/travis/gopath/src/github.com/nats-io/nats-server/server/reload.go:1454 +0xf4 github.com/nats-io/nats-server/v2/server.(Server).reloadOptions() /home/travis/gopath/src/github.com/nats-io/nats-server/server/reload.go:908 +0x204 github.com/nats-io/nats-server/v2/server.(Server).ReloadOptions() /home/travis/gopath/src/github.com/nats-io/nats-server/server/reload.go:847 +0x4a4 github.com/nats-io/nats-server/v2/server.(Server).Reload() /home/travis/gopath/src/github.com/nats-io/nats-server/server/reload.go:782 +0x125 github.com/nats-io/nats-server/v2/server.(cluster).removeJetStream() /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_helpers_test.go:1498 +0x310 github.com/nats-io/nats-server/v2/server.TestJetStreamSuperClusterPeerReassign() /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_super_cluster_test.go:395 +0xa38 testing.tRunner() /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1446 +0x216 testing.(T).Run.func1() /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1493 +0x47 Previous read at 0x00c0010854f0 by goroutine 15875: github.com/nats-io/nats-server/v2/server.(Server).gatewayHandleSubjectNoInterest() /home/travis/gopath/src/github.com/nats-io/nats-server/server/gateway.go:2683 +0x12d github.com/nats-io/nats-server/v2/server.(client).processInboundGatewayMsg() /home/travis/gopath/src/github.com/nats-io/nats-server/server/gateway.go:2980 +0x595 github.com/nats-io/nats-server/v2/server.(client).processInboundMsg() /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3532 +0xc7 github.com/nats-io/nats-server/v2/server.(client).parse() /home/travis/gopath/src/github.com/nats-io/nats-server/server/parser.go:497 +0x34f9 github.com/nats-io/nats-server/v2/server.(client).readLoop() /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1284 +0x17e8 github.com/nats-io/nats-server/v2/server.(Server).createGateway.func1() /home/travis/gopath/src/github.com/nats-io/nats-server/server/gateway.go:858 +0x37 Goroutine 15595 (running) created at: testing.(T).Run() /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1493 +0x75d testing.runTests.func1() /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1846 +0x99 testing.tRunner() /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1446 +0x216 testing.runTests() /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1844 +0x7ec testing.(M).Run() /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1726 +0xa84 github.com/nats-io/nats-server/v2/server.TestMain() /home/travis/gopath/src/github.com/nats-io/nats-server/server/sublist_test.go:1577 +0x292 main.main() _testmain.go:3615 +0x324 Goroutine 15875 (running) created at: github.com/nats-io/nats-server/v2/server.(Server).startGoRoutine() /home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3098 +0x88 github.com/nats-io/nats-server/v2/server.(Server).createGateway() /home/travis/gopath/src/github.com/nats-io/nats-server/server/gateway.go:858 +0xfc4 github.com/nats-io/nats-server/v2/server.(Server).startGatewayAcceptLoop.func1() /home/travis/gopath/src/github.com/nats-io/nats-server/server/gateway.go:553 +0x48 github.com/nats-io/nats-server/v2/server.(*Server).acceptConnections.func1() /home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:2184 +0x58 ================== testing.go:1319: race detected during execution of test --- FAIL: TestJetStreamSuperClusterPeerReassign (2.08s) ```	2023-05-02 18:12:56 -07:00
Waldemar Quevedo	938ffcba20	Fix race in reload and gateway sublist check Signed-off-by: Waldemar Quevedo <wally@nats.io>	2023-05-02 17:51:53 -07:00
Derek Collison	8cb32930d9	Small raft improvements. (#4126 ) Signed-off-by: Derek Collison <derek@nats.io>	2023-05-02 17:29:34 -07:00
Derek Collison	ae73f7be55	Small raft improvements. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-02 16:44:27 -07:00
Derek Collison	9ef71893db	Bump to 2.9.17-beta.4 Signed-off-by: Derek Collison <derek@nats.io>	2023-05-02 09:43:11 -07:00
Derek Collison	188eea42cc	[IMPROVED] Do not hold filestore lock during remove that needs to do IO. (#4123 ) When removing a msg and we need to load the msg block and incur IO, unlock fs lock to avoid stalling other activity on other blocks. E.g removing and adding msgs at the same time. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-02 09:42:38 -07:00
Derek Collison	4a58feff27	When removing a msg and we need to load the msg block and incur IO, unlock fs lock to avoid stalling other activity on other blocks. E.g removing and adding msgs at the same time. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-02 08:56:43 -07:00
Derek Collison	ff6c80350b	[FIXED] A stream raft node could stay running after a stop(). (#4118 ) This can happen when we reset a stream internally and the stream had a prior snapshot. Also make sure to always release resources back to the account regardless if the store is no longer present. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-01 16:23:03 -07:00
Derek Collison	f098c253aa	Make sure we adjust accounting reservations when deleting a stream with any issues. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-01 15:54:37 -07:00
Derek Collison	f5ac5a4da0	Fix for a bug that could leave a raft node running when stopping a stream. This can happen when we reset a stream internally and the stream had a prior snapshot. Also make sure to always release resources back to the account regardless if the store is no longer present. Signed-off-by: Derek Collison <derek@nats.io>	2023-05-01 13:22:06 -07:00
Derek Collison	1eed0e8c75	Bump to 2.9.17-beta.3 Signed-off-by: Derek Collison <derek@nats.io>	2023-04-30 17:43:59 -07:00
Derek Collison	7ad2dd2510	[IMPROVED] Updating of a large fleet of leafnodes. (#4117 ) When a fleet of leafnodes are isolated (not routed but using same cluster) we could do better at optimizing how we update the other leafnodes since if they are all in the same cluster and we know we are isolated we can skip. We can improve further in 2.10. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-30 17:32:14 -07:00
Derek Collison	c15cc0054a	When a fleet of leafnodes are isolated (not routed but using same cluster) we could do better at optimizing how we update the other leafnodes. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-30 17:08:16 -07:00
Derek Collison	91607d8459	[IMPROVED] Health repair (#4116 ) Under certain scenarios we have witnessed healthz() that will never return healthy due to a stream or consumer being missing or stopped. This will now allow the healthz() call to attempt to restart those assets. We will also periodically call this in clustered mode from the monitorCluster routine. Signed-off-by: Derek Collison <derek@nats.io>	2023-04-29 18:02:12 -07:00

1 2 3 4 5 ...

7167 Commits