Commit Graph

5533 Commits

Author SHA1 Message Date
Ivan Kozlovic
963cc8af92 Merge pull request #2957 from nats-io/fix_flappers
Fix some flappers
2022-03-28 08:59:39 -06:00
Ivan Kozlovic
6ad93d9b34 Fix some flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 18:24:17 -06:00
Ivan Kozlovic
27cfd22f5f Merge pull request #2951 from nats-io/fix_2912
[FIXED] JetStream: possible deadlock during consumer leadership change
2022-03-25 13:21:57 -06:00
Ivan Kozlovic
3f6d3c4936 Merge pull request #2954 from nats-io/fix_server_version_check
[FIXED] Server version check
2022-03-25 13:21:46 -06:00
Ivan Kozlovic
ef981be879 Merge pull request #2956 from nats-io/fix_msg_copy_with_go_routines
Fixed data race caused by moving some code inside startGoRoutine
2022-03-25 13:21:35 -06:00
Matthias Hanel
0b54a55e83 Merge pull request #2952 from nats-io/r1-consumer-update-fail
[FIXED] update of R1 Consumer in clustered setup.
2022-03-25 14:54:07 -04:00
Ivan Kozlovic
eaf5de05e9 Fixed data race caused by moving some code inside startGoRoutine
startGoRoutine will execute the closed function as a go routine,
so passing copyBytes(msg) as the argument caused a race. The
copy needs to be done before startGoRoutine, as it was before
being changed in https://github.com/nats-io/nats-server/pull/2925

Here is the race observed:
```
==================
WARNING: DATA RACE
Write at 0x00c0001dd930 by goroutine 367:
  runtime.racewriterange()
      <autogenerated>:1 +0x29
  internal/poll.ignoringEINTRIO()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/internal/poll/fd_unix.go:582 +0x454
  internal/poll.(*FD).Read()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/internal/poll/fd_unix.go:163 +0x26
  net.(*netFD).Read()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/net/fd_posix.go:56 +0x50
  net.(*conn).Read()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/net/net.go:183 +0xb0
  net.(*TCPConn).Read()
      <autogenerated>:1 +0x64
  github.com/nats-io/nats-server/v2/server.(*client).readLoop()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1188 +0x8f7
  github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:904 +0x5d
Previous read at 0x00c0001dd930 by goroutine 93:
  runtime.slicecopy()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/slice.go:284 +0x0
  github.com/nats-io/nats-server/v2/server.copyBytes()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/util.go:282 +0x10b
  github.com/nats-io/nats-server/v2/server.(*Server).jsStreamListRequest.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:1613 +0x26
Goroutine 367 (running) created at:
  github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3017 +0x86
  github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:904 +0x1b08
  github.com/nats-io/nats-server/v2/server.(*Server).startLeafNodeAcceptLoop.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:604 +0x4b
  github.com/nats-io/nats-server/v2/server.(*Server).acceptConnections.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:2122 +0x58
Goroutine 93 (running) created at:
  github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3017 +0x86
  github.com/nats-io/nats-server/v2/server.(*Server).jsStreamListRequest()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:1613 +0xbf1
  github.com/nats-io/nats-server/v2/server.(*Server).jsStreamListRequest-fm()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:1554 +0xcc
  github.com/nats-io/nats-server/v2/server.(*jetStream).apiDispatch()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:680 +0xcf0
  github.com/nats-io/nats-server/v2/server.(*jetStream).apiDispatch-fm()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:652 +0xcc
  github.com/nats-io/nats-server/v2/server.(*client).deliverMsg()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3181 +0xbde
  github.com/nats-io/nats-server/v2/server.(*client).processMsgResults()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4164 +0xe1e
  github.com/nats-io/nats-server/v2/server.(*client).processInboundLeafMsg()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:2183 +0x7eb
  github.com/nats-io/nats-server/v2/server.(*client).processInboundMsg()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3498 +0xb1
  github.com/nats-io/nats-server/v2/server.(*client).parse()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/parser.go:497 +0x3886
  github.com/nats-io/nats-server/v2/server.(*client).readLoop()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1228 +0x1669
  github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:904 +0x5d
==================
    testing.go:1152: race detected during execution of test
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:51:52 -06:00
R.I.Pienaar
a1e77a9e7c Merge pull request #2932 from ripienaar/jsz_cluster_on_leader
[IMPROVED] Ensures the cluster info in jsz is sent from the leader only
2022-03-25 19:51:07 +01:00
Matthias Hanel
2438c965e7 Fix update of R1 Consumer in clustered setup.
missing reply caused timeout

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-25 14:48:15 -04:00
Derek Collison
7e4a4c8fdd Merge pull request #2890 from nats-io/jnm/partition_mapping
[ADDED] deterministic subject tokens to partition mapping
2022-03-25 11:30:24 -07:00
Ivan Kozlovic
5e89374ee9 Fixed another possible lock inversion consumer->stream
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:21:51 -06:00
Ivan Kozlovic
4739eebfc4 [FIXED] JetStream: possible deadlock during consumer leadership change
Would possibly show up when a consumer leader changes for a consumer
that had redelivered messages and for instance messages were inbound
on the stream.

Resolves #2912

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:21:51 -06:00
Ivan Kozlovic
91bdcc30cc [FIXED] Server version check
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:11:55 -06:00
R.I.Pienaar
055703f4fa ensures the cluster info in jsz is sent from the leader only
The data from other nodes are usually wrong, this can be quite
confusing for users so we now only send it when we are the leader

Signed-off-by: R.I.Pienaar <rip@devco.net>
2022-03-25 18:27:35 +01:00
Derek Collison
edcddfae58 Make at least work
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 19:12:31 -07:00
Derek Collison
1d38a73bcb Fix for version comparison
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 18:39:28 -07:00
Derek Collison
7a9c2336e7 Bump to 2.8.0-beta.1
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 16:52:42 -07:00
Derek Collison
5fb9a39bfc Merge pull request #2947 from nats-io/restart_stream_recover
[IMPROVED] Stream recovery and Memory Utilization
2022-03-24 16:50:25 -07:00
Derek Collison
ef8f543ea5 Improve memory usage through JetStream storage layer.
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.

The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.

Also fixed some flappers and a bug where de-dupe might not be reformed correctly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Derek Collison
d7e1e5ae61 Make sure that we do not become a candidate/leader too soon or if we are not caughtup.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Derek Collison
7fd5f4dc24 Update Go client
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Ivan Kozlovic
fde6c61f9f Merge pull request #2948 from nats-io/js_backoff_check_pending
JS: BackOff list caused too frequent checkPending() calls
2022-03-23 14:58:29 -06:00
Ivan Kozlovic
2253bb6f1a JS: BackOff list caused too frequent checkPending() calls
Since the "next" timer value is set to the AckWait value, which
is the first element in the BackOff list if present, the check
would possibly happen at this interval, even when we were past
the first redelivery and the backoff interval had increased.

The end-user would still see the redelivery be done at the durations
indicated by the BackOff list, but internally, we would be checking
at the initial BackOff's ack wait.

I added a test that uses the store's interface to detect how many
times the checkPending() function is invoked. For this test it
should have been invoked twice, but without the fix it was invoked
15 times.

Also fixed an unrelated test that could possibly deadlock causing
tests to be aborted due to inactivity on Travis.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-23 12:46:17 -06:00
Ivan Kozlovic
e9bf972cec Merge pull request #2946 from nats-io/fix_ipqueue_unregister
Fixed panic on stream create failure (with filestore)
2022-03-22 16:11:04 -06:00
Ivan Kozlovic
8d4ff4bc96 Fixed panic on stream create failure (with filestore)
This was introduced by the change for ipQueues in #2931.
The (*ipQueue).unregister() was written with a protection for
the ipQueue to be nil, however, mset.outq is actually not a bare
ipQueue but a jsOutQ that embeds a pointer to an ipQueue. So we
need to implement register() for jsOutQ.

Added a test that reproduced the issue, but found it with a flapping
test (TestJetStreamLongStreamNamesAndPubAck) that failed due to
a file name too long.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-22 15:21:01 -06:00
Ivan Kozlovic
9d6525c8a3 Merge pull request #2943 from nats-io/fix_2926
[CHANGED] Duplicates in authorization{} and accounts{} now detected
2022-03-22 14:38:42 -06:00
Ivan Kozlovic
897c229fa9 Update test to capture accounts{} and single u/p or token
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-22 10:29:11 -06:00
Ivan Kozlovic
eef194c43b [CHANGED] Duplicates in authorization{} and accounts{} now detected
If accounts{} block is specified, authorization{} should not have
any user/password/token or users array defined.

The reason is that users parsed in accounts{} are associated with
their respective account but users parsed in authorization{} are
associated with the global account. If the same user name is
in both, and since internally the parsing of those 2 blocks is
completely random (even if layed out in the config in a specific
order), the outcome may be that a user is either associated with
an account or the default global account.

To minimize breaking changes, but still avoid this unexpected
outcome, the server will now detect if there are duplicate users
(or nkeys) inside authorization{} block itself, but also between
this block and accounts{}.
The check will also detect if accounts{} has any user/nkey, then
the authorization{} block should not have any user/password/token,
making this test similar to the check we had in authorization{}
block itself.

Resolves #2926

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-21 19:50:16 -06:00
Jaime Piña
60773be03f Use random high port in placement test (#2940) 2022-03-21 15:38:01 -07:00
Ivan Kozlovic
13cd977e50 Merge pull request #2939 from nats-io/js_panic_leader_change
[FIXED] JetStream: possible panic on leadership change notices
2022-03-21 12:30:09 -06:00
Ivan Kozlovic
f11f7a61e8 Merge pull request #2938 from nats-io/fix_2920
[FIXED] Removal of an external source stream
2022-03-21 12:29:57 -06:00
Ivan Kozlovic
e75020e275 [FIXED] JetStream: possible panic on leadership change notices
I got this panic in a test:
```
=== RUN   TestJetStreamClusterAccountLoadFailure
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0xb1501b]
goroutine 47853 [running]:
github.com/nats-io/nats-server/v2/server.(*jetStream).processLeaderChange(0xc000b60580, 0x0)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:3638 +0x9b
github.com/nats-io/nats-server/v2/server.(*jetStream).monitorCluster(0xc000b60580)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:853 +0x60f
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3017 +0x87
FAIL	github.com/nats-io/nats-server/v2/server	227.888s
```

which from that branch would point to function processLeaderChange()
line:
```
} else if node := js.getMetaGroup().GroupLeader(); node == _EMPTY_ {
```
which I guess meant that getMetaGroup() was returning `nil`.

Refactored a bit to get the group leader in 2 steps.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-21 12:11:41 -06:00
Jaime Piña
33cfc748bf Disable some supercluster limit placement tests (#2937) 2022-03-21 11:05:13 -07:00
Ivan Kozlovic
68da3e8253 [FIXED] Removal of an external source stream
Removal of a stream source that was external was not working properly,
allowing messages to still flow after the removal and until the
server hosting the stream to which the source was removed was
restarted.

Resolves #2920

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-21 10:59:47 -06:00
Ivan Kozlovic
29ff67e2ac Tests: Replace all Ack() with AckSync() for now
For reason explained in previous commit, for tests that were
expecting the number of ack/pending to be of a certain value after
an Ack(), they would be flapping. Replaced all references and
we can go back to selectively call Ack() when AckSync() is not
needed.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 20:25:01 -06:00
Ivan Kozlovic
ac52ecd9ff Fixing flapper
Since acks are now processed in different go-routine, the tests
that use Ack() cannot expect the number of ack messages to be
exact immediately. So in this test use AckSync() to ensure that
the ack is processed. Alternatively, the pending count should
be checked with a checkFor().

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 19:53:33 -06:00
Ivan Kozlovic
a23b1b73ef Merge pull request #2931 from nats-io/ipq_changes
Changes to IPQueues
2022-03-17 19:13:02 -06:00
Derek Collison
a4e795c996 Attempt to fix flapper
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-17 17:38:32 -07:00
Ivan Kozlovic
c3da392832 Changes to IPQueues
Removed the warnings, instead have a sync.Map where they are
registered/unregistered and can be inspected with an undocumented
monitor page.
Added the notion of "in progress" which is the number of messages
that have beend pop()'ed. When recycle() is invoked this count
goes down.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 17:53:06 -06:00
Derek Collison
69d265601d Merge pull request #2930 from nats-io/dupe_urls_config
Detect exact duplicates for URLs for routes, gateways or leafnodes.
2022-03-17 16:23:56 -07:00
Derek Collison
0bb84bf76b Make warning more detailed
Co-authored-by: Waldemar Quevedo <wally@synadia.com>
2022-03-17 14:59:14 -07:00
Derek Collison
e204a7961d When detecting exact duplicates for URLs for routes, gws or leafnodes, enter a warning and ignore.
If misconfigured could prevent the JetStream system from electing a leader.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-17 14:52:01 -07:00
Jaime Piña
50ca685a3b Add stream limit update test (#2929)
This adds a test to see if we can update a stream when the stream limit
is 1. Currently, this test fails, so we're skipping it. This test will
be enabled in a future PR.
2022-03-17 13:49:37 -07:00
Derek Collison
0601da2186 Merge pull request #2928 from nats-io/m_ver
Show version on main monitoring page with link to source
2022-03-17 11:24:35 -07:00
Ivan Kozlovic
7d9bb32c1d Fix a flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 12:18:22 -06:00
Derek Collison
fa098f1af0 Show version on main monitoring page with link to source
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-17 11:04:11 -07:00
Ivan Kozlovic
fe6d7b305f Merge pull request #2898 from nats-io/js_cons_ack_processing
[CHANGED] JetStream: Redeliveries may be delayed if necessary
2022-03-17 10:57:22 -06:00
Ivan Kozlovic
2c0f5046f1 Merge pull request #2923 from nats-io/gw_detect_duplicate_srv_name
[CHANGED] Gateway: Detect duplicate names between clusters
2022-03-17 10:57:08 -06:00
Derek Collison
b99fd81464 Merge pull request #2927 from nats-io/consumer_wipe
Improve consumer state handling.
2022-03-17 08:50:24 -07:00
Derek Collison
dbfa47f9b1 Improve state preservation for consumers, specifically DeliverNew variants when no activity has been present.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-16 20:55:14 -07:00