Commit Graph

7174 Commits

Author SHA1 Message Date
Waldemar Quevedo
ee38f8bbc5 monitor: change account detail info back to utc when served (#4163)
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-05-15 15:33:57 -07:00
Derek Collison
584ea85d75 [FIXED] Protect against out of bounds access on usage updates. (#4164)
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-15 14:58:05 -07:00
Derek Collison
832df1cdba Protect against out of bounds access on usage updates.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-15 14:38:26 -07:00
Derek Collison
fe71ef524c [FIXED] Service imports reporting for Accountz() when mapping to local subjects. (#4158)
Signed-off-by: Derek Collison <derek@nats.io>

Resolves #4144
2023-05-15 14:04:57 -07:00
Derek Collison
ea75beaeb1 [FIXED] Track all remote servers in a NATS system with different domains. (#4159)
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-15 13:47:06 -07:00
Waldemar Quevedo
3c4ed549a5 resolver: improve signaling for missing account lookups (#4151)
When using the nats account resolver and a JWT is not found, the client could
often get an i/o timeout error due to not receiving a timely response
before the account resolver fetch request times out. Now instead
of waiting for the fetch request to timeout, a resolver without JWTs
will notify as well that it could not find a matching JWT, waiting for a
response from all active servers.

Also included in this PR is some cleanup to the logs emitted by the
resolver.

Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-05-14 11:10:25 -07:00
Derek Collison
75d274a636 If a NATS system has multiple domains make sure to process those during a remote update before bailing.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-13 18:36:42 -07:00
Derek Collison
d293af1da6 Fix to service imports reporting for Accountz() when import subject is mapped into different local subject.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-13 12:57:05 -07:00
Derek Collison
a982bbcb73 [FIXED] Allow sorting by rtt for connz. (#4157)
Signed-off-by: Derek Collison <derek@nats.io>

Resolves #4150
2023-05-12 20:47:17 -07:00
Derek Collison
421775a32a Fix to allow sorting by rtt for connz.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-12 20:22:07 -07:00
Derek Collison
c31e710d9e [FIXED] Allow user filtering on connz for other user types like nkeys etc. (#4156)
Signed-off-by: Derek Collison <derek@nats.io>
 
Resolves #4149
2023-05-12 15:38:46 -07:00
Derek Collison
7f17e07d66 Filter by user at the end for closed connections
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-12 15:24:42 -07:00
Derek Collison
0c13f174c0 Fixed cap mistake in comment
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-12 15:07:00 -07:00
Derek Collison
c5eb46cb06 Make sure closed clients captures all user types and works with user filtering as well
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-12 15:05:40 -07:00
Derek Collison
90d1063674 Fix for #4149 to allow proper user filtering on connz for other user types.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-12 14:19:37 -07:00
Derek Collison
fc64c6119d Use monotonic time for measuring time internally (#4154)
- [x] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [x] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [x] Build is green in Travis CI
- [x] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)
2023-05-12 12:37:16 -07:00
Waldemar Quevedo
286a1632ca Use monotonic time for measuring time internally
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-05-12 08:27:46 -07:00
Derek Collison
bdb0ba9ae5 [FIXED] Can't scale up some older streams (#4146)
For some older R1 streams created by previous servers we could have no
cluster for the stream assignment group which would prevent scale up
with newer servers.

This will inherit cluster if detected as absent from the placement tags
or client cluster designation.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-10 21:42:59 -07:00
Derek Collison
5e029d08d5 For older R1 streams created by previous servers we could have no cluster for the stream assignment group which would prevent scale up with newer servers.
This will inherit cluster if detected from placement tags or client cluster designation.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-10 17:59:28 -07:00
Derek Collison
2f2498ab7e Bump to 2.9.17-beta.7
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-10 15:32:45 -07:00
Derek Collison
81bf92b2c6 [IMPROVED] Leadership transfer (#4145)
When doing leadership transfer stepdown as soon as we know we have sent
the EntryLeaderTransfer entry.

Delaying could allow something to be sent from the old leader which
would cause the new leader to bail on being a candidate even though it
would have gotten all the votes.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-10 15:30:38 -07:00
Derek Collison
a17357c6ae When doing leadership transfer stepdown as soon as we know we have sent the EntryLeaderTransfer entry.
Delaying could allow something to be sent from the old leader which would cause the new leader to bail on being a candidate even though it would have gotten all the votes.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-10 12:27:33 -07:00
Derek Collison
72485608d0 [IMPROVED] Leader transfer process (#4143)
When doing a leader transfer clear vote state on leader and when
non-chosen peers receive the update.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-10 08:24:15 -07:00
Derek Collison
717afae9ef When doing a leader transfer clear vote state on leader and when non-chosen peers receive the update
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-10 07:49:22 -07:00
Derek Collison
c5c5a34fec Bump to 2.9.17-beta.6
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-09 20:12:22 -07:00
Derek Collison
b951cd155d Improvements on raft leader handoff. (#4142)
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-09 18:22:37 -07:00
Derek Collison
b9af0d0294 Only do no-leader stepdown on transfer after a delay if we are still the leader
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-09 17:19:14 -07:00
Derek Collison
b44beb4b54 Make sure to update peer set and remove old peers after new leader takes over
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-09 15:15:02 -07:00
Derek Collison
6e6ce3a6f6 Backport outbound queues test changes (#4120) to main (#4139)
This backports the changes to the outbound queues test to the `main`
branch.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-05-09 07:41:23 -07:00
Neil Twigg
d7ae2cbb5f Backport #4120 to main
Signed-off-by: Neil Twigg <neil@nats.io>
2023-05-09 11:24:35 +01:00
Derek Collison
76f4358349 [IMPROVED] Optimizations for large single hub account leafnode fleets. (#4135)
Added a leafnode lock to allow better traversal without copying of large
leafnodes in a single hub account.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-06 09:53:08 -07:00
Derek Collison
80db7a22ab Optimizations for large single hub account leafnode fleets.
Added a leafnode lock to allow better traversal without copying of large leafnodes in a single hub account.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-05 13:14:49 -07:00
Waldemar Quevedo
40ea58fc51 Stop using UTC for time in flushClients (#4132)
In #1943 it was adopted to use `UTC()` in some timestamps, but an
unintended side effect from this is that it strips the monotonic time
(e5646b23de),
so it can be prone to clock skews when subtracting time in other areas
of the code.
2023-05-04 17:35:50 -07:00
Waldemar Quevedo
b886fed2fb Stop using UTC for time for flushClients
In #1943 it was adopted to use `UTC()` in some timestamps,
but an unintended side effect from this is that it strips 
the monotonic time, so it can be prone to clock skews when
subtracting time in other areas of the code.
e5646b23de
2023-05-04 15:50:45 -07:00
Derek Collison
da8aeac91b Fix flapper
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-03 21:00:17 -07:00
Derek Collison
ae73e6a573 Bump to 2.9.17-beta.5
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-03 19:50:21 -07:00
Derek Collison
413486f57d [IMPROVED] Protect against usage drift (#4131)
If we detect a drift for any unforeseen reason correct it.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-03 19:49:39 -07:00
Derek Collison
21239022bd Protect against usage drift for any unforseen reason and if detected correct.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-03 17:09:06 -07:00
Derek Collison
793db749ff [FIXED] Subscription interest issue due to configuration reload (#4130)
This would impact only cases with accounts defined in configuration file
(as opposed to operator mode). During the configuration reload, new
accounts and sublists were created to later be replaced with existing
ones. That left a window of time where a subscription could have been
added (or attempted to be removed) from the "wrong" sublist. This could
lead to route subscriptions seemingly not being forwarded.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-03 16:15:33 -07:00
Ivan Kozlovic
8a4ead22bc Updates based on code review
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-03 16:14:51 -06:00
Ivan Kozlovic
7afe76caf8 Fixed Sublist.RemoveBatch to remove subs present, even if one isn't
I have seen cases, maybe due to previous issue with configuration
reload that would miss subscriptions in the sublist because
of the sublist swap, where we would attempt to remove subscriptions
by batch but some were not present. I would have expected that
all present subscriptions would still be removed, even if the
call overall returned an error.
This is now fixed and a test has been added demonstrating that
even on error, we remove all subscriptions that were present.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-03 15:21:26 -06:00
Ivan Kozlovic
95e4f2dfe1 Fixed accounts configuration reload
Issues could manifest with subscription interest not properly
propagated.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-03 14:35:06 -06:00
Ivan Kozlovic
840c264f45 Cleanup use of s.opts and fixed some lock (deadlock/inversion) issues
One should not access s.opts directly but instead use s.getOpts().
Also, server lock needs to be released when performing an account
lookup (since this may result in server lock being acquired).
A function was calling s.LookupAccount under the client lock, which
technically creates a lock inversion situation.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-05-03 14:09:02 -06:00
Derek Collison
b61e411b44 Fix race in reload and gateway sublist check (#4127)
Fixes the following race: during reload account sublist can be changed:
2699465596/server/reload.go (L1598-L1610)
so this can become a race while checking interest in the gateway code
here:
79de3302be/server/gateway.go (L2683)

```
=== RUN   TestJetStreamSuperClusterPeerReassign
==================
WARNING: DATA RACE
Write at 0x00c0010854f0 by goroutine 15595:
  github.com/nats-io/nats-server/v2/server.(*Server).reloadAuthorization.func2()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/reload.go:1610 +0x486
  sync.(*Map).Range()
      /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/sync/map.go:354 +0x225
  github.com/nats-io/nats-server/v2/server.(*Server).reloadAuthorization()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/reload.go:1594 +0x35d
  github.com/nats-io/nats-server/v2/server.(*Server).applyOptions()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/reload.go:1454 +0xf4
  github.com/nats-io/nats-server/v2/server.(*Server).reloadOptions()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/reload.go:908 +0x204
  github.com/nats-io/nats-server/v2/server.(*Server).ReloadOptions()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/reload.go:847 +0x4a4
  github.com/nats-io/nats-server/v2/server.(*Server).Reload()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/reload.go:782 +0x125
  github.com/nats-io/nats-server/v2/server.(*cluster).removeJetStream()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_helpers_test.go:1498 +0x310
  github.com/nats-io/nats-server/v2/server.TestJetStreamSuperClusterPeerReassign()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_super_cluster_test.go:395 +0xa38
  testing.tRunner()
      /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1446 +0x216
  testing.(*T).Run.func1()
      /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1493 +0x47
Previous read at 0x00c0010854f0 by goroutine 15875:
  github.com/nats-io/nats-server/v2/server.(*Server).gatewayHandleSubjectNoInterest()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/gateway.go:2683 +0x12d
  github.com/nats-io/nats-server/v2/server.(*client).processInboundGatewayMsg()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/gateway.go:2980 +0x595
  github.com/nats-io/nats-server/v2/server.(*client).processInboundMsg()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3532 +0xc7
  github.com/nats-io/nats-server/v2/server.(*client).parse()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/parser.go:497 +0x34f9
  github.com/nats-io/nats-server/v2/server.(*client).readLoop()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1284 +0x17e8
  github.com/nats-io/nats-server/v2/server.(*Server).createGateway.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/gateway.go:858 +0x37
Goroutine 15595 (running) created at:
  testing.(*T).Run()
      /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1493 +0x75d
  testing.runTests.func1()
      /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1846 +0x99
  testing.tRunner()
      /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1446 +0x216
  testing.runTests()
      /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1844 +0x7ec
  testing.(*M).Run()
      /home/travis/.gimme/versions/go1.19.8.linux.amd64/src/testing/testing.go:1726 +0xa84
  github.com/nats-io/nats-server/v2/server.TestMain()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/sublist_test.go:1577 +0x292
  main.main()
      _testmain.go:3615 +0x324
Goroutine 15875 (running) created at:
  github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3098 +0x88
  github.com/nats-io/nats-server/v2/server.(*Server).createGateway()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/gateway.go:858 +0xfc4
  github.com/nats-io/nats-server/v2/server.(*Server).startGatewayAcceptLoop.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/gateway.go:553 +0x48
  github.com/nats-io/nats-server/v2/server.(*Server).acceptConnections.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:2184 +0x58
==================
    testing.go:1319: race detected during execution of test
--- FAIL: TestJetStreamSuperClusterPeerReassign (2.08s)
```
2023-05-02 18:12:56 -07:00
Waldemar Quevedo
938ffcba20 Fix race in reload and gateway sublist check
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-05-02 17:51:53 -07:00
Derek Collison
8cb32930d9 Small raft improvements. (#4126)
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-02 17:29:34 -07:00
Derek Collison
ae73f7be55 Small raft improvements.
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-02 16:44:27 -07:00
Derek Collison
9ef71893db Bump to 2.9.17-beta.4
Signed-off-by: Derek Collison <derek@nats.io>
2023-05-02 09:43:11 -07:00
Derek Collison
188eea42cc [IMPROVED] Do not hold filestore lock during remove that needs to do IO. (#4123)
When removing a msg and we need to load the msg block and incur IO,
unlock fs lock to avoid stalling other activity on other blocks. E.g
removing and adding msgs at the same time.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-02 09:42:38 -07:00
Derek Collison
4a58feff27 When removing a msg and we need to load the msg block and incur IO, unlock fs lock to avoid stalling other activity on other blocks.
E.g removing and adding msgs at the same time.

Signed-off-by: Derek Collison <derek@nats.io>
2023-05-02 08:56:43 -07:00