Commit Graph

164 Commits

Author SHA1 Message Date
Ivan Kozlovic
9b5797f63c Undo sending bad request on no-interest in apiDispatch
This broke cross-account functionality. Ported the test from the
Go client that showed the failure after PR#2997 was merged.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-05 08:51:28 -06:00
Derek Collison
7f78d3e618 Not allowing streams to be created meant we could not recover on server restart.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-01 06:41:22 -07:00
Jaime Piña
32b17f7a7e Skip SystemLimitsPlacement if we can't get the desired leader (#2989) 2022-03-31 16:24:29 -07:00
Matthias Hanel
92f4dc986a added max_ack_pending setting to js account limits (#2982)
* added max_ack_penind setting to js account limits

because of the addition, defaults now have to be set later (depend on
these new limits now)

also re-organized the code to closer track how stream create looks

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-31 14:17:16 -04:00
Derek Collison
2d7f941fea Merge pull request #2978 from nats-io/issue-2969
Fixes #2969, on reload stream import was not removed for js streams
2022-03-30 19:57:15 -07:00
Derek Collison
5182154cd2 We were not accounting for some newer internal clients (JETSTREAM, ACCOUNT, etc) when reloading authorization, etc.
We were also not copying over local state that has been added over the years to track different types of clients.
We also needed to make sure to reuse the account's internal client and the subscription id (acc.isid).

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-30 19:12:18 -07:00
Matthias Hanel
3933c1f3d8 Fixes #2969, on reload stream import was not removed for js streams
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-30 18:12:57 -04:00
Ivan Kozlovic
c0ab2d4959 [FIXED] Possible panic due to data races
A panic was reported that looked like this:
```
fatal error: concurrent map read and map write
goroutine 200 [running]:
runtime.throw({0xa366ce, 0xe620e0})
	/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/panic.go:1198 +0x71 fp=0xc00105f098 sp=0xc00105f068 pc=0x434ff1
runtime.mapaccess1_faststr(0x0, 0x0, {0xc0054b6f18, 0x11})
	/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:21 +0x3a5 fp=0xc00105f100 sp=0xc00105f098 pc=0x412285"
github.com/nats-io/nats-server/v2/server.(*consumer).processNextMsgReq(0xc000681000, 0xc00105f2a8, 0x4503e9, 0x11, {0x0, 0xc000246900}, {0xc0054b6f18, 0x11}, {0xc0002469c4, 0x90, ...})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/consumer.go:2454 +0x8ce fp=0xc00105f250 sp=0xc00105f100 pc=0x77dc2e
github.com/nats-io/nats-server/v2/server.(*consumer).processNextMsgReq-fm(0x9c, 0x7f302e954fff, 0xc00105f2f8, {0xc000774280, 0x400}, {0xc0054b6f18, 0x40}, {0xc0002469c4, 0x90, 0x63c})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/consumer.go:2380 +0x77 fp=0xc00105f2b8 sp=0xc00105f250 pc=x91e337
github.com/nats-io/nats-server/v2/server.(*client).deliverMsg(0xc0015f8000, 0xc003034f00, 0x41642f, {0xc000246969, 0x4b6166, 0x697}, {0xc0002469a9, 0x4b60be, 0x657}, {0xc0015f9480, ...}, ...)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3180 +0xbb0 fp=0xc00105f530 sp=0xc00105f2b8 pc=0x764470
github.com/nats-io/nats-server/v2/server.(*client).processMsgResults(0xc0015f8000, 0x8cd7a5, 0xc0089fb440, {0xc0002469c4, 0x92, 0x63c}, {0x0, 0x0, 0x4}, {0xc000246969, ...}, ...)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4163 +0x9af fp=0xc00105fa48 sp=0xc00105f530 pc=0x769e4f
github.com/nats-io/nats-server/v2/server.(*client).processInboundRoutedMsg(0xc0015f8000, {0xc0002469c4, 0xc0015f8220, 0x63c})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:443 +0x159 fp=0xc00105fae8 sp=0xc00105fa48 pc=0x8ce299
github.com/nats-io/nats-server/v2/server.(*client).processInboundMsg(0xc0015f8000, {0xc0002469c4, 0x92, 0x79e})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3493 +0x36 fp=0xc00105fb18 sp=0xc00105fae8 pc=0x765c76
github.com/nats-io/nats-server/v2/server.(*client).parse(0xc0015f8000, {0xc000246800, 0x800, 0xc087258a5d30c937})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/parser.go:497 +0x246a fp=0xc00105fd98 sp=0xc00105fb18 pc=0x8a4f6a
github.com/nats-io/nats-server/v2/server.(*client).readLoop(0xc0015f8000, {0x0, 0x0, 0x0})"
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1227 +0xe1f fp=0xc00105ffb0 sp=0xc00105fd98 pc=0x75841f
github.com/nats-io/nats-server/v2/server.(*Server).createRoute.func1()
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1372 +0x25 fp=0xc00105ffe0 sp=0xc00105ffb0 pc=0x8d46a5
runtime.goexit
```

Writting a test showed the data race:
```
==================
WARNING: DATA RACE
Read at 0x00c0008ea240 by goroutine 62:
  runtime.mapaccess1_faststr()
      /usr/local/go/src/runtime/map_faststr.go:12 +0x0
  github.com/nats-io/nats-server/v2/server.(*consumer).processNextMsgRequest()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/consumer.go:2567 +0xa64
(...)
Previous write at 0x00c0008ea240 by goroutine 15:
  runtime.mapdelete_faststr()
      /usr/local/go/src/runtime/map_faststr.go:300 +0x0
  github.com/nats-io/nats-server/v2/server.(*Account).checkForReverseEntry()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/accounts.go:1759 +0x61c
  github.com/nats-io/nats-server/v2/server.(*client).unsubscribe()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/client.go:2838 +0xa27
(...)
```

After fixing this data race, another showed up:
```
==================
WARNING: DATA RACE
Read at 0x00c000352200 by goroutine 99:
  github.com/nats-io/nats-server/v2/server.(*Account).checkForReverseEntry()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/accounts.go:1752 +0x4b3
  github.com/nats-io/nats-server/v2/server.(*client).unsubscribe()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/client.go:2838 +0xa27
(...)
Previous write at 0x00c000352200 by goroutine 92:
  runtime.slicecopy()
      /usr/local/go/src/runtime/slice.go:284 +0x0
  github.com/nats-io/nats-server/v2/server.(*Account).checkForReverseEntry()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/accounts.go:1737 +0x871
  github.com/nats-io/nats-server/v2/server.(*Account).removeRespServiceImport()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/accounts.go:1622 +0x24c
(...)
```

This PR addresses both.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-30 13:51:52 -06:00
Matthias Hanel
1445153130 Adding max stream bytes check (#2970)
* Adding max stream bytes check

Also start checking on  stream update

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-30 15:50:28 -04:00
Derek Collison
eb16c35016 OrderedConsumer was very conservative with slow start and small max outstanding bytes. This is increasing perf for longer rtt.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-30 05:08:36 -07:00
Ivan Kozlovic
98c1f0ecb2 Fixed some data race and some flappers
Got a data race:
```
==================
WARNING: DATA RACE
Write at 0x00c001c736b0 by goroutine 605:
  runtime.mapassign_faststr()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:202 +0x0
  github.com/nats-io/nats-server/v2/server.(*Account).addServiceImport()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/accounts.go:1868 +0xb7b
  github.com/nats-io/nats-server/v2/server.(*Account).AddServiceImportWithClaim()
...
Previous read at 0x00c001c736b0 by goroutine 301:
  runtime.mapaccess2_faststr()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:107 +0x0
  github.com/nats-io/nats-server/v2/server.(*Server).registerSystemImports()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/events.go:1577 +0x284
  github.com/nats-io/nats-server/v2/server.(*Server).updateAccountClaimsWithRefresh()
...
```

Also, remove some condition in gateway.go on how we were checking
if a subject was a serviec reply, which was causing a test to flap.

Finally, used AckSync() in a rest (instead of m.Respond(nil)) to
prevent it from flapping.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-29 19:02:41 -06:00
Matthias Hanel
1aeaaf0ca3 Adding server limits (max ack pending/dedupe window) to js config (#2967)
* Adding server limits (max ack pending/dedupe window) to js config

Also shifting consumer config check to jsConsumerCreate as in clustered
mode this was enforced in the wrong place

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-29 13:19:36 -04:00
Matthias Hanel
0c5f3688a7 [ADDED] Tiered limits and fix limit issues on updates (#2945)
* Adding tiered limits and fix limit issues on updates

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-28 20:47:54 -04:00
Ivan Kozlovic
25886e8819 [FIXED] JetStream: sampling not updated during consumer update
Resolves #2941

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-28 10:58:58 -06:00
Ivan Kozlovic
4e5519f999 Merge pull request #2942 from boris-ilijic/js-con-sampling-issue-update-flow
Add failing test for updating JS Consumer with sampling option
2022-03-28 10:21:29 -06:00
Ivan Kozlovic
6ad93d9b34 Fix some flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 18:24:17 -06:00
Matthias Hanel
2438c965e7 Fix update of R1 Consumer in clustered setup.
missing reply caused timeout

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-25 14:48:15 -04:00
Derek Collison
ef8f543ea5 Improve memory usage through JetStream storage layer.
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.

The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.

Also fixed some flappers and a bug where de-dupe might not be reformed correctly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Ivan Kozlovic
2253bb6f1a JS: BackOff list caused too frequent checkPending() calls
Since the "next" timer value is set to the AckWait value, which
is the first element in the BackOff list if present, the check
would possibly happen at this interval, even when we were past
the first redelivery and the backoff interval had increased.

The end-user would still see the redelivery be done at the durations
indicated by the BackOff list, but internally, we would be checking
at the initial BackOff's ack wait.

I added a test that uses the store's interface to detect how many
times the checkPending() function is invoked. For this test it
should have been invoked twice, but without the fix it was invoked
15 times.

Also fixed an unrelated test that could possibly deadlock causing
tests to be aborted due to inactivity on Travis.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-23 12:46:17 -06:00
Ivan Kozlovic
8d4ff4bc96 Fixed panic on stream create failure (with filestore)
This was introduced by the change for ipQueues in #2931.
The (*ipQueue).unregister() was written with a protection for
the ipQueue to be nil, however, mset.outq is actually not a bare
ipQueue but a jsOutQ that embeds a pointer to an ipQueue. So we
need to implement register() for jsOutQ.

Added a test that reproduced the issue, but found it with a flapping
test (TestJetStreamLongStreamNamesAndPubAck) that failed due to
a file name too long.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-22 15:21:01 -06:00
Boris Ilijic
a31d501f53 Add test for updating JS Consumer with sampling 2022-03-22 00:42:41 +01:00
Jaime Piña
60773be03f Use random high port in placement test (#2940) 2022-03-21 15:38:01 -07:00
Ivan Kozlovic
f11f7a61e8 Merge pull request #2938 from nats-io/fix_2920
[FIXED] Removal of an external source stream
2022-03-21 12:29:57 -06:00
Jaime Piña
33cfc748bf Disable some supercluster limit placement tests (#2937) 2022-03-21 11:05:13 -07:00
Ivan Kozlovic
68da3e8253 [FIXED] Removal of an external source stream
Removal of a stream source that was external was not working properly,
allowing messages to still flow after the removal and until the
server hosting the stream to which the source was removed was
restarted.

Resolves #2920

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-21 10:59:47 -06:00
Ivan Kozlovic
29ff67e2ac Tests: Replace all Ack() with AckSync() for now
For reason explained in previous commit, for tests that were
expecting the number of ack/pending to be of a certain value after
an Ack(), they would be flapping. Replaced all references and
we can go back to selectively call Ack() when AckSync() is not
needed.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 20:25:01 -06:00
Ivan Kozlovic
ac52ecd9ff Fixing flapper
Since acks are now processed in different go-routine, the tests
that use Ack() cannot expect the number of ack messages to be
exact immediately. So in this test use AckSync() to ensure that
the ack is processed. Alternatively, the pending count should
be checked with a checkFor().

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 19:53:33 -06:00
Derek Collison
a4e795c996 Attempt to fix flapper
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-17 17:38:32 -07:00
Jaime Piña
50ca685a3b Add stream limit update test (#2929)
This adds a test to see if we can update a stream when the stream limit
is 1. Currently, this test fails, so we're skipping it. This test will
be enabled in a future PR.
2022-03-17 13:49:37 -07:00
Ivan Kozlovic
7d9bb32c1d Fix a flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 12:18:22 -06:00
Ivan Kozlovic
fe6d7b305f Merge pull request #2898 from nats-io/js_cons_ack_processing
[CHANGED] JetStream: Redeliveries may be delayed if necessary
2022-03-17 10:57:22 -06:00
Jaime Piña
acfd456758 Prevent reserved bytes underflow (#2907) 2022-03-16 15:19:35 -07:00
Derek Collison
303bb93c18 Test ack metrics
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-15 16:41:06 -07:00
Ivan Kozlovic
b4128693ed Ensure file path is correct during stream restore
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.

Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 13:31:51 -07:00
Ivan Kozlovic
0fae8067ae [FIXED] Some lock inversions
The established ordering is client -> Account, so fixed few places
where we had Account -> client.

Added a new file, locksordering.txt with the list of known ordering
for some of the objects.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 09:47:37 -07:00
Derek Collison
1b5f651c22 Fixed bug that would not recover a stream after non-clean shutdown with deleted messages.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-04 10:48:10 -08:00
Ivan Kozlovic
97612c0fac Fix test by using AckSync() to avoid flapping
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-03 15:09:01 -07:00
Ivan Kozlovic
804ce102ac [CHANGED] JetStream: Redeliveries may be delayed if necessary
We have seen situations where when a lot of pending messages accumulate,
there is a contention between the processing of the ACKs and the
checking of the pending map.

Decision is made to abort checking of pending list if processing of
ack(s) would be delayed because of that. The result is that a
redelivery may be post-poned.

Internally, the ACKs are also now using a queue to prevent processing
of them from the network handler, which could cause head-of-line
blocking, especially bad for routes.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-03 10:32:35 -07:00
Derek Collison
4efce40bbd Small improvements to send performance to a full stream.
Cleaned up some locking and if fifo make index updates lazy like writeMsgRecord.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-17 05:39:27 -08:00
Derek Collison
5a93b0e9d8 Allow pull requests to specify a heartbeat when idle to detect when a request is invalidated.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-11 09:51:51 -08:00
Ivan Kozlovic
55ffde7251 Fixed consumer dlv count and num pending wrong due to redeliveries
Introduced by #2848, so should not have impacted existing releases.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-02-10 17:33:53 -07:00
Derek Collison
0cc7302be9 A stream name is tied to its identity and can not be changed on a restore.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-09 12:38:45 -08:00
Derek Collison
c13a84cf44 Fixed a bug that would calculate the first sequence of a filteredPending incorrectly.
Also added in more optimized version to select the first matching message in a message block for LoadNextMsg.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-08 13:29:38 -08:00
Derek Collison
d50febeeff Improved sparse consumers replay time.
When a stream has multiple subjects and a consumer filters the stream to a small and spread out list of messages the logic would do a linear scan looking for the next message for the filtered consumer.
This CL allows the store layer to utilize the per subject info to improve the times.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-07 17:26:32 -08:00
Derek Collison
55b7f11c9a Fixed flow control stall under specific conditions of message size.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-05 20:15:48 -08:00
Derek Collison
5da0453964 Add in NumSubjects to StreamInfo
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-02 08:51:13 -08:00
Derek Collison
6a3cf0f71e Added in ability to get number of subjects from StreamInfo, and optionally details per subject on how many messages each subject has.
This can also be filtered, meaning you can filter out the subjects when asking for details.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-02 08:51:13 -08:00
Derek Collison
12f5ea3655 When a consumer had not filtered subject and was attached to a interest policy retention stream we could incorrectly drop messages.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-01 14:21:05 -08:00
Derek Collison
fa814f7cee Fixed behavior for when MaxMsgsPerSubject is set and DiscardNew is also set.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-31 08:36:37 -08:00
Derek Collison
b38ced51b2 A true no wait pull request was not considering redeliveries.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-30 11:52:35 -08:00