Commit Graph

5549 Commits

Author SHA1 Message Date
Matthias Hanel
0c5f3688a7 [ADDED] Tiered limits and fix limit issues on updates (#2945)
* Adding tiered limits and fix limit issues on updates

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-28 20:47:54 -04:00
Derek Collison
26c6dcdc4d Bump version to 2.8.0-beta.2
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 17:11:29 -07:00
Derek Collison
780d4c0dd8 Merge pull request #2960 from nats-io/mem_pool
Additional improvements to memory pooling and management.
2022-03-28 17:10:16 -07:00
Derek Collison
bd0a0b28c7 When recycling blocks we could potentially place partials into a tier. This would possibly cause the load code to thrash since it would not be big enough for a full block and we would need to recycle again and make a new one.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 16:46:46 -07:00
Ivan Kozlovic
f82eda30aa Fix map init
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-28 17:46:01 -06:00
Ivan Kozlovic
909c6754cb Changed subjString to accept a byte slice
This may prevent memory copies when not necessary. Also fixed a bug
there that would check twice if there was only 1 subject and that
subject did not match (say configured subject is foo.* and key is
foo.bar).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-28 17:37:28 -06:00
Ivan Kozlovic
929f849b93 Merge pull request #2966 from nats-io/fix_2941
[FIXED] JetStream: sampling not updated during consumer update
2022-03-28 14:13:10 -06:00
Derek Collison
004e5ce2c6 Merge pull request #2958 from nats-io/fix_2955
[FIXED] Scaling up an R1 stream would not replicate existing messages.
2022-03-28 12:18:20 -07:00
Derek Collison
5e5aab378e Additional improvements to memory pooling and management. Also logic fix for firstMatching that did unnecessary work when matching all.
During contention to the head write blk, the system could perform worse memory wise compared to simple go runtime.
Also had some references for the subject of messages bloating memory.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 10:15:23 -07:00
Ivan Kozlovic
25886e8819 [FIXED] JetStream: sampling not updated during consumer update
Resolves #2941

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-28 10:58:58 -06:00
Derek Collison
7607d37799 Make sure to prevent flappers if possible
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 09:34:48 -07:00
Derek Collison
0b8aa47259 Merge pull request #2959 from nats-io/fs_msgs_underflow
Heavy contention in filestore could result in underflow and panic.
2022-03-28 09:30:53 -07:00
Ivan Kozlovic
4e5519f999 Merge pull request #2942 from boris-ilijic/js-con-sampling-issue-update-flow
Add failing test for updating JS Consumer with sampling option
2022-03-28 10:21:29 -06:00
Ivan Kozlovic
963cc8af92 Merge pull request #2957 from nats-io/fix_flappers
Fix some flappers
2022-03-28 08:59:39 -06:00
Derek Collison
04d4f08e8c Under heavy contention skip combined with remove could result in index being stamped with underflow for number of messages.
We had a report of a panic on server restart with 2.8.0-beta.1. The panic was trying to malloc the size of a load block based off of the number of messages we thought the block had from the index.
Before, SkipMsg would decrement and when we added the record via writeMsgRecord we would add it back in. However we did release the lock, meaning other things could run.
If in between the decrement, say to 0 (we did protect against underflow there), then a remove and subsequent writeIndexInfo would stamp and underflow.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-26 11:05:38 -07:00
Derek Collison
6b379329d8 Fix for #2955. When scaling up a stream with existing messages the existing messages were not being replicated.
Also fixed a bug where we were incorrectly not spining up the monitoring loop for a stream when going from 3->1->3.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-26 07:26:46 -07:00
Ivan Kozlovic
6ad93d9b34 Fix some flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 18:24:17 -06:00
Ivan Kozlovic
27cfd22f5f Merge pull request #2951 from nats-io/fix_2912
[FIXED] JetStream: possible deadlock during consumer leadership change
2022-03-25 13:21:57 -06:00
Ivan Kozlovic
3f6d3c4936 Merge pull request #2954 from nats-io/fix_server_version_check
[FIXED] Server version check
2022-03-25 13:21:46 -06:00
Ivan Kozlovic
ef981be879 Merge pull request #2956 from nats-io/fix_msg_copy_with_go_routines
Fixed data race caused by moving some code inside startGoRoutine
2022-03-25 13:21:35 -06:00
Matthias Hanel
0b54a55e83 Merge pull request #2952 from nats-io/r1-consumer-update-fail
[FIXED] update of R1 Consumer in clustered setup.
2022-03-25 14:54:07 -04:00
Ivan Kozlovic
eaf5de05e9 Fixed data race caused by moving some code inside startGoRoutine
startGoRoutine will execute the closed function as a go routine,
so passing copyBytes(msg) as the argument caused a race. The
copy needs to be done before startGoRoutine, as it was before
being changed in https://github.com/nats-io/nats-server/pull/2925

Here is the race observed:
```
==================
WARNING: DATA RACE
Write at 0x00c0001dd930 by goroutine 367:
  runtime.racewriterange()
      <autogenerated>:1 +0x29
  internal/poll.ignoringEINTRIO()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/internal/poll/fd_unix.go:582 +0x454
  internal/poll.(*FD).Read()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/internal/poll/fd_unix.go:163 +0x26
  net.(*netFD).Read()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/net/fd_posix.go:56 +0x50
  net.(*conn).Read()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/net/net.go:183 +0xb0
  net.(*TCPConn).Read()
      <autogenerated>:1 +0x64
  github.com/nats-io/nats-server/v2/server.(*client).readLoop()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1188 +0x8f7
  github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:904 +0x5d
Previous read at 0x00c0001dd930 by goroutine 93:
  runtime.slicecopy()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/slice.go:284 +0x0
  github.com/nats-io/nats-server/v2/server.copyBytes()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/util.go:282 +0x10b
  github.com/nats-io/nats-server/v2/server.(*Server).jsStreamListRequest.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:1613 +0x26
Goroutine 367 (running) created at:
  github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3017 +0x86
  github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:904 +0x1b08
  github.com/nats-io/nats-server/v2/server.(*Server).startLeafNodeAcceptLoop.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:604 +0x4b
  github.com/nats-io/nats-server/v2/server.(*Server).acceptConnections.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:2122 +0x58
Goroutine 93 (running) created at:
  github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3017 +0x86
  github.com/nats-io/nats-server/v2/server.(*Server).jsStreamListRequest()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:1613 +0xbf1
  github.com/nats-io/nats-server/v2/server.(*Server).jsStreamListRequest-fm()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:1554 +0xcc
  github.com/nats-io/nats-server/v2/server.(*jetStream).apiDispatch()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:680 +0xcf0
  github.com/nats-io/nats-server/v2/server.(*jetStream).apiDispatch-fm()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_api.go:652 +0xcc
  github.com/nats-io/nats-server/v2/server.(*client).deliverMsg()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3181 +0xbde
  github.com/nats-io/nats-server/v2/server.(*client).processMsgResults()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4164 +0xe1e
  github.com/nats-io/nats-server/v2/server.(*client).processInboundLeafMsg()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:2183 +0x7eb
  github.com/nats-io/nats-server/v2/server.(*client).processInboundMsg()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3498 +0xb1
  github.com/nats-io/nats-server/v2/server.(*client).parse()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/parser.go:497 +0x3886
  github.com/nats-io/nats-server/v2/server.(*client).readLoop()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1228 +0x1669
  github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode.func1()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/leafnode.go:904 +0x5d
==================
    testing.go:1152: race detected during execution of test
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:51:52 -06:00
R.I.Pienaar
a1e77a9e7c Merge pull request #2932 from ripienaar/jsz_cluster_on_leader
[IMPROVED] Ensures the cluster info in jsz is sent from the leader only
2022-03-25 19:51:07 +01:00
Matthias Hanel
2438c965e7 Fix update of R1 Consumer in clustered setup.
missing reply caused timeout

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-25 14:48:15 -04:00
Derek Collison
7e4a4c8fdd Merge pull request #2890 from nats-io/jnm/partition_mapping
[ADDED] deterministic subject tokens to partition mapping
2022-03-25 11:30:24 -07:00
Ivan Kozlovic
5e89374ee9 Fixed another possible lock inversion consumer->stream
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:21:51 -06:00
Ivan Kozlovic
4739eebfc4 [FIXED] JetStream: possible deadlock during consumer leadership change
Would possibly show up when a consumer leader changes for a consumer
that had redelivered messages and for instance messages were inbound
on the stream.

Resolves #2912

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:21:51 -06:00
Ivan Kozlovic
91bdcc30cc [FIXED] Server version check
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:11:55 -06:00
R.I.Pienaar
055703f4fa ensures the cluster info in jsz is sent from the leader only
The data from other nodes are usually wrong, this can be quite
confusing for users so we now only send it when we are the leader

Signed-off-by: R.I.Pienaar <rip@devco.net>
2022-03-25 18:27:35 +01:00
Derek Collison
edcddfae58 Make at least work
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 19:12:31 -07:00
Derek Collison
1d38a73bcb Fix for version comparison
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 18:39:28 -07:00
Derek Collison
7a9c2336e7 Bump to 2.8.0-beta.1
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 16:52:42 -07:00
Derek Collison
5fb9a39bfc Merge pull request #2947 from nats-io/restart_stream_recover
[IMPROVED] Stream recovery and Memory Utilization
2022-03-24 16:50:25 -07:00
Derek Collison
ef8f543ea5 Improve memory usage through JetStream storage layer.
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.

The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.

Also fixed some flappers and a bug where de-dupe might not be reformed correctly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Derek Collison
d7e1e5ae61 Make sure that we do not become a candidate/leader too soon or if we are not caughtup.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Derek Collison
7fd5f4dc24 Update Go client
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Ivan Kozlovic
fde6c61f9f Merge pull request #2948 from nats-io/js_backoff_check_pending
JS: BackOff list caused too frequent checkPending() calls
2022-03-23 14:58:29 -06:00
Ivan Kozlovic
2253bb6f1a JS: BackOff list caused too frequent checkPending() calls
Since the "next" timer value is set to the AckWait value, which
is the first element in the BackOff list if present, the check
would possibly happen at this interval, even when we were past
the first redelivery and the backoff interval had increased.

The end-user would still see the redelivery be done at the durations
indicated by the BackOff list, but internally, we would be checking
at the initial BackOff's ack wait.

I added a test that uses the store's interface to detect how many
times the checkPending() function is invoked. For this test it
should have been invoked twice, but without the fix it was invoked
15 times.

Also fixed an unrelated test that could possibly deadlock causing
tests to be aborted due to inactivity on Travis.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-23 12:46:17 -06:00
Ivan Kozlovic
e9bf972cec Merge pull request #2946 from nats-io/fix_ipqueue_unregister
Fixed panic on stream create failure (with filestore)
2022-03-22 16:11:04 -06:00
Ivan Kozlovic
8d4ff4bc96 Fixed panic on stream create failure (with filestore)
This was introduced by the change for ipQueues in #2931.
The (*ipQueue).unregister() was written with a protection for
the ipQueue to be nil, however, mset.outq is actually not a bare
ipQueue but a jsOutQ that embeds a pointer to an ipQueue. So we
need to implement register() for jsOutQ.

Added a test that reproduced the issue, but found it with a flapping
test (TestJetStreamLongStreamNamesAndPubAck) that failed due to
a file name too long.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-22 15:21:01 -06:00
Ivan Kozlovic
9d6525c8a3 Merge pull request #2943 from nats-io/fix_2926
[CHANGED] Duplicates in authorization{} and accounts{} now detected
2022-03-22 14:38:42 -06:00
Ivan Kozlovic
897c229fa9 Update test to capture accounts{} and single u/p or token
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-22 10:29:11 -06:00
Ivan Kozlovic
eef194c43b [CHANGED] Duplicates in authorization{} and accounts{} now detected
If accounts{} block is specified, authorization{} should not have
any user/password/token or users array defined.

The reason is that users parsed in accounts{} are associated with
their respective account but users parsed in authorization{} are
associated with the global account. If the same user name is
in both, and since internally the parsing of those 2 blocks is
completely random (even if layed out in the config in a specific
order), the outcome may be that a user is either associated with
an account or the default global account.

To minimize breaking changes, but still avoid this unexpected
outcome, the server will now detect if there are duplicate users
(or nkeys) inside authorization{} block itself, but also between
this block and accounts{}.
The check will also detect if accounts{} has any user/nkey, then
the authorization{} block should not have any user/password/token,
making this test similar to the check we had in authorization{}
block itself.

Resolves #2926

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-21 19:50:16 -06:00
Boris Ilijic
a31d501f53 Add test for updating JS Consumer with sampling 2022-03-22 00:42:41 +01:00
Jaime Piña
60773be03f Use random high port in placement test (#2940) 2022-03-21 15:38:01 -07:00
Ivan Kozlovic
13cd977e50 Merge pull request #2939 from nats-io/js_panic_leader_change
[FIXED] JetStream: possible panic on leadership change notices
2022-03-21 12:30:09 -06:00
Ivan Kozlovic
f11f7a61e8 Merge pull request #2938 from nats-io/fix_2920
[FIXED] Removal of an external source stream
2022-03-21 12:29:57 -06:00
Ivan Kozlovic
e75020e275 [FIXED] JetStream: possible panic on leadership change notices
I got this panic in a test:
```
=== RUN   TestJetStreamClusterAccountLoadFailure
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0xb1501b]
goroutine 47853 [running]:
github.com/nats-io/nats-server/v2/server.(*jetStream).processLeaderChange(0xc000b60580, 0x0)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:3638 +0x9b
github.com/nats-io/nats-server/v2/server.(*jetStream).monitorCluster(0xc000b60580)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:853 +0x60f
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3017 +0x87
FAIL	github.com/nats-io/nats-server/v2/server	227.888s
```

which from that branch would point to function processLeaderChange()
line:
```
} else if node := js.getMetaGroup().GroupLeader(); node == _EMPTY_ {
```
which I guess meant that getMetaGroup() was returning `nil`.

Refactored a bit to get the group leader in 2 steps.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-21 12:11:41 -06:00
Jaime Piña
33cfc748bf Disable some supercluster limit placement tests (#2937) 2022-03-21 11:05:13 -07:00
Ivan Kozlovic
68da3e8253 [FIXED] Removal of an external source stream
Removal of a stream source that was external was not working properly,
allowing messages to still flow after the removal and until the
server hosting the stream to which the source was removed was
restarted.

Resolves #2920

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-21 10:59:47 -06:00