Commit Graph

193 Commits

Author SHA1 Message Date
Derek Collison
e08f6d863d Allow for republish to be headers only
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-30 12:05:17 -07:00
Derek Collison
daa4b97eeb Don't do advisories or API stats for a direct get msg from a stream.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-30 09:32:07 -07:00
Ivan Kozlovic
a52f12613e Bump version to v2.8.4-beta.2 and fix flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-25 13:25:45 -06:00
Derek Collison
72ed48d096 Merge pull request #3149 from nats-io/pull_perf_stable
[FIXED] Spurious pull consumer 408s under load
2022-05-25 09:44:27 -07:00
Derek Collison
d69394efad Fix spurious 408s under load and move processing of acks to their own Go routine.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-25 09:27:34 -07:00
Derek Collison
46f7f7bfc9 Consumer pending was not correct when stream had max msgs per subject set > 1 and a consumer that filtered out part of the stream was created.
Also make sure to update stream's config on a stream restore in case of changes.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-24 14:44:15 -07:00
Ivan Kozlovic
53e3c53d96 [FIXED] JetStream: consumer with deliver new may miss messages
This could happen when a consumer had not sent anything to the
attached NATS subscription and there was a consumer leader
step down or server restart.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-23 12:01:48 -06:00
Derek Collison
c166c9b199 Enable republishing of messages once stored in a stream.
This enables lightweight distribution of messages to very large number of NATS subscribers.
We add in metadata as headers that allows for gap detection which enables initial value (via JetStream, maybe KV) and realtime NATS core updates but all globally ordered.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-17 15:18:54 -07:00
Derek Collison
6bbc5f627c Support for MaxBytes for pull requests.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-16 08:43:33 -07:00
Derek Collison
bcecae42ac Fix for #3119
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-12 15:45:29 -07:00
Derek Collison
b35988adf9 Remember the last timestamp by not removing last msgBlk when empty and during purge pull last timestamp forward until new messages arrive.
When a downstream stream uses retention modes that delete messages, fallback to timebased start time for the new source consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-09 09:04:19 -07:00
Ivan Kozlovic
cadf921ed1 [FIXED] JetStream: PullConsumer MaxWaiting==1 and Canceled requests
There was an issue with MaxWaiting==1 that was causing a request
with expiration to actually not expire. This was because processWaiting
would not pick it up because wq.rp was actually equal to wq.wp
(that is, the read pointer was equal to write pointer for a slice
of capacity of 1).

The other issue was that when reaching the maximum of waiting pull
requests, a new request would evict an old one with a "408 Request Canceled".

There is no reason for that, instead the server will first try to
find some existing expired requests (since some of the expiration
is lazily done), but if none is expired, and the queue is full,
the server will return a "409 Exceeded MaxWaiting" to the new
request, and not a "408 Request Canceled" to an old one...

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-03 15:17:20 -06:00
Derek Collison
e0f5fcffb8 Fix for subject transforms and JetStream delivery subjects.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-28 15:50:28 -07:00
Derek Collison
138034b3a1 For memory store KV with history of 1 we were scanning for our next first when we did not have to.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-26 19:55:47 -07:00
Ivan Kozlovic
0e2ab5eeea Changes to tests that run on Travis
- Remove code coverage from Travis and add it to a GitHub Action
that will be run as a nightly.
- Use tag builds to exclude some tests, such as the "norace" or
JS tests. Since "go test" does not support "negative" regexs, there
is no other way.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-26 14:11:31 -06:00
Ivan Kozlovic
df61a335c7 Merge pull request #3061 from nats-io/js_fix_stream_source
[FIXED] JetStream: stream sources issue in mixed mode clusters
2022-04-20 23:20:41 -06:00
Ivan Kozlovic
9975a38c6e [FIXED] JetStream: stream sources issue in mixed mode clusters
The main issue was that in mixed-mode, the interest through gateway
may still be in optimistic mode, which when creating the source
consumer would start delivery before we had a chance to setup
the subscription to receive those messages.

The approach is to create the subscription prior to sending
the consumer create request. Also refactored a bit the code in
the hope to make the retries a bit more bullet proof.

We may also look at making sure that gateways are switched to
interest-mode when detecting a mixed-mode setup.

Also fixed a defect that could cause a source to be canceled
when updating a stream.

Resovles #2801

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-20 21:02:35 -06:00
Matthias Hanel
254c970876 Fix subject renaming for leaf connections and queue subs (#3062)
* [fix] on queue sub, a consumers  delivery subject, was not changed

to the original publish subject the stream received
the code added is a copy of what regular subs do

* [fixed] subject renaming for leaf node connections as well

also updated multi server test to test for queue and non queue scenarios

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-20 19:23:21 -04:00
Matthias Hanel
ff5d60973d introducing max_age/dupe_window minimum value of 100ms. (#3056)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-20 13:58:19 -04:00
Matthias Hanel
79b4374d01 [Fixed] limits enforcement issues (#3046)
* [Fixed] limits enforcement issues

stream create had checks that stream restore did not have.
Moved code into commonly used function checkStreamCfg.
Also introduced (cluster/non clustered) StreamLimitsCheck functions to
perform checks specific to clustered /non clustered data structures.

Checking for valid stream config and limits/reservations before
receiving all the data. Now fails the request right away.

Added a jetstream limit "max_request_batch" to limit fetch batch size

Shortened max name length from 256 to 255, more common file name limit

Added check for loop in cyclic source stream configurations

features related to limits

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-18 01:53:48 -04:00
Ivan Kozlovic
c25b08a178 Change "server limit" to "system limit"
Updated tests accordingly.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-15 18:38:42 -06:00
Ivan Kozlovic
fc873c6f2f Return limit in consumer max_ack_pending limit exceeded
- Updated tests that were checking for the error to include the limit
- Moved some tests above the benchmark ones

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-15 18:23:25 -06:00
Ivan Kozlovic
a6b62f61a7 Fix test that should have been fixed following FC tweak
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-14 18:06:25 -06:00
Ivan Kozlovic
bd61d51a1c [IMPROVED] JetStream: reduce unnecessary leader election
- Wait of some sort of routing to be in place before starting
the raft run loop
- Remove use of lock in apiDispatch that was not necessary but
could have cause a route to block, causing memory growth, etc..

Unrelated rename of some tests so that they start with TestJetStream
and TestJetStreamCluster for cluster tests, fixed some flappers
and ensure that tests that change RAFT timeouts put them back
to default values on exit.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-14 10:47:14 -06:00
Jaime Piña
0dabed2ea3 Re-enable placement tests (#3034) 2022-04-13 13:44:24 -07:00
Ivan Kozlovic
c1a17e890a Fixed JetStream flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-13 09:55:24 -06:00
Ivan Kozlovic
50c3986863 [FIXED] JetStream stream catchup issues
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-12 16:05:12 -06:00
Jaime Piña
cfa55281ec Refactor SystemLimitsPlacement tests (#3014) 2022-04-11 11:41:38 -07:00
Derek Collison
331c2faaa6 When using a stream import for a push consumer's messages, if the message crossed a route we dropped the delivered subject.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-09 06:42:22 -07:00
Ivan Kozlovic
9b5797f63c Undo sending bad request on no-interest in apiDispatch
This broke cross-account functionality. Ported the test from the
Go client that showed the failure after PR#2997 was merged.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-05 08:51:28 -06:00
Derek Collison
7f78d3e618 Not allowing streams to be created meant we could not recover on server restart.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-01 06:41:22 -07:00
Jaime Piña
32b17f7a7e Skip SystemLimitsPlacement if we can't get the desired leader (#2989) 2022-03-31 16:24:29 -07:00
Matthias Hanel
92f4dc986a added max_ack_pending setting to js account limits (#2982)
* added max_ack_penind setting to js account limits

because of the addition, defaults now have to be set later (depend on
these new limits now)

also re-organized the code to closer track how stream create looks

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-31 14:17:16 -04:00
Derek Collison
2d7f941fea Merge pull request #2978 from nats-io/issue-2969
Fixes #2969, on reload stream import was not removed for js streams
2022-03-30 19:57:15 -07:00
Derek Collison
5182154cd2 We were not accounting for some newer internal clients (JETSTREAM, ACCOUNT, etc) when reloading authorization, etc.
We were also not copying over local state that has been added over the years to track different types of clients.
We also needed to make sure to reuse the account's internal client and the subscription id (acc.isid).

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-30 19:12:18 -07:00
Matthias Hanel
3933c1f3d8 Fixes #2969, on reload stream import was not removed for js streams
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-30 18:12:57 -04:00
Ivan Kozlovic
c0ab2d4959 [FIXED] Possible panic due to data races
A panic was reported that looked like this:
```
fatal error: concurrent map read and map write
goroutine 200 [running]:
runtime.throw({0xa366ce, 0xe620e0})
	/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/panic.go:1198 +0x71 fp=0xc00105f098 sp=0xc00105f068 pc=0x434ff1
runtime.mapaccess1_faststr(0x0, 0x0, {0xc0054b6f18, 0x11})
	/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:21 +0x3a5 fp=0xc00105f100 sp=0xc00105f098 pc=0x412285"
github.com/nats-io/nats-server/v2/server.(*consumer).processNextMsgReq(0xc000681000, 0xc00105f2a8, 0x4503e9, 0x11, {0x0, 0xc000246900}, {0xc0054b6f18, 0x11}, {0xc0002469c4, 0x90, ...})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/consumer.go:2454 +0x8ce fp=0xc00105f250 sp=0xc00105f100 pc=0x77dc2e
github.com/nats-io/nats-server/v2/server.(*consumer).processNextMsgReq-fm(0x9c, 0x7f302e954fff, 0xc00105f2f8, {0xc000774280, 0x400}, {0xc0054b6f18, 0x40}, {0xc0002469c4, 0x90, 0x63c})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/consumer.go:2380 +0x77 fp=0xc00105f2b8 sp=0xc00105f250 pc=x91e337
github.com/nats-io/nats-server/v2/server.(*client).deliverMsg(0xc0015f8000, 0xc003034f00, 0x41642f, {0xc000246969, 0x4b6166, 0x697}, {0xc0002469a9, 0x4b60be, 0x657}, {0xc0015f9480, ...}, ...)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3180 +0xbb0 fp=0xc00105f530 sp=0xc00105f2b8 pc=0x764470
github.com/nats-io/nats-server/v2/server.(*client).processMsgResults(0xc0015f8000, 0x8cd7a5, 0xc0089fb440, {0xc0002469c4, 0x92, 0x63c}, {0x0, 0x0, 0x4}, {0xc000246969, ...}, ...)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4163 +0x9af fp=0xc00105fa48 sp=0xc00105f530 pc=0x769e4f
github.com/nats-io/nats-server/v2/server.(*client).processInboundRoutedMsg(0xc0015f8000, {0xc0002469c4, 0xc0015f8220, 0x63c})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:443 +0x159 fp=0xc00105fae8 sp=0xc00105fa48 pc=0x8ce299
github.com/nats-io/nats-server/v2/server.(*client).processInboundMsg(0xc0015f8000, {0xc0002469c4, 0x92, 0x79e})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3493 +0x36 fp=0xc00105fb18 sp=0xc00105fae8 pc=0x765c76
github.com/nats-io/nats-server/v2/server.(*client).parse(0xc0015f8000, {0xc000246800, 0x800, 0xc087258a5d30c937})
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/parser.go:497 +0x246a fp=0xc00105fd98 sp=0xc00105fb18 pc=0x8a4f6a
github.com/nats-io/nats-server/v2/server.(*client).readLoop(0xc0015f8000, {0x0, 0x0, 0x0})"
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1227 +0xe1f fp=0xc00105ffb0 sp=0xc00105fd98 pc=0x75841f
github.com/nats-io/nats-server/v2/server.(*Server).createRoute.func1()
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1372 +0x25 fp=0xc00105ffe0 sp=0xc00105ffb0 pc=0x8d46a5
runtime.goexit
```

Writting a test showed the data race:
```
==================
WARNING: DATA RACE
Read at 0x00c0008ea240 by goroutine 62:
  runtime.mapaccess1_faststr()
      /usr/local/go/src/runtime/map_faststr.go:12 +0x0
  github.com/nats-io/nats-server/v2/server.(*consumer).processNextMsgRequest()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/consumer.go:2567 +0xa64
(...)
Previous write at 0x00c0008ea240 by goroutine 15:
  runtime.mapdelete_faststr()
      /usr/local/go/src/runtime/map_faststr.go:300 +0x0
  github.com/nats-io/nats-server/v2/server.(*Account).checkForReverseEntry()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/accounts.go:1759 +0x61c
  github.com/nats-io/nats-server/v2/server.(*client).unsubscribe()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/client.go:2838 +0xa27
(...)
```

After fixing this data race, another showed up:
```
==================
WARNING: DATA RACE
Read at 0x00c000352200 by goroutine 99:
  github.com/nats-io/nats-server/v2/server.(*Account).checkForReverseEntry()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/accounts.go:1752 +0x4b3
  github.com/nats-io/nats-server/v2/server.(*client).unsubscribe()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/client.go:2838 +0xa27
(...)
Previous write at 0x00c000352200 by goroutine 92:
  runtime.slicecopy()
      /usr/local/go/src/runtime/slice.go:284 +0x0
  github.com/nats-io/nats-server/v2/server.(*Account).checkForReverseEntry()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/accounts.go:1737 +0x871
  github.com/nats-io/nats-server/v2/server.(*Account).removeRespServiceImport()
      /Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/accounts.go:1622 +0x24c
(...)
```

This PR addresses both.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-30 13:51:52 -06:00
Matthias Hanel
1445153130 Adding max stream bytes check (#2970)
* Adding max stream bytes check

Also start checking on  stream update

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-30 15:50:28 -04:00
Derek Collison
eb16c35016 OrderedConsumer was very conservative with slow start and small max outstanding bytes. This is increasing perf for longer rtt.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-30 05:08:36 -07:00
Ivan Kozlovic
98c1f0ecb2 Fixed some data race and some flappers
Got a data race:
```
==================
WARNING: DATA RACE
Write at 0x00c001c736b0 by goroutine 605:
  runtime.mapassign_faststr()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:202 +0x0
  github.com/nats-io/nats-server/v2/server.(*Account).addServiceImport()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/accounts.go:1868 +0xb7b
  github.com/nats-io/nats-server/v2/server.(*Account).AddServiceImportWithClaim()
...
Previous read at 0x00c001c736b0 by goroutine 301:
  runtime.mapaccess2_faststr()
      /home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:107 +0x0
  github.com/nats-io/nats-server/v2/server.(*Server).registerSystemImports()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/events.go:1577 +0x284
  github.com/nats-io/nats-server/v2/server.(*Server).updateAccountClaimsWithRefresh()
...
```

Also, remove some condition in gateway.go on how we were checking
if a subject was a serviec reply, which was causing a test to flap.

Finally, used AckSync() in a rest (instead of m.Respond(nil)) to
prevent it from flapping.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-29 19:02:41 -06:00
Matthias Hanel
1aeaaf0ca3 Adding server limits (max ack pending/dedupe window) to js config (#2967)
* Adding server limits (max ack pending/dedupe window) to js config

Also shifting consumer config check to jsConsumerCreate as in clustered
mode this was enforced in the wrong place

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-29 13:19:36 -04:00
Matthias Hanel
0c5f3688a7 [ADDED] Tiered limits and fix limit issues on updates (#2945)
* Adding tiered limits and fix limit issues on updates

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-28 20:47:54 -04:00
Ivan Kozlovic
25886e8819 [FIXED] JetStream: sampling not updated during consumer update
Resolves #2941

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-28 10:58:58 -06:00
Ivan Kozlovic
4e5519f999 Merge pull request #2942 from boris-ilijic/js-con-sampling-issue-update-flow
Add failing test for updating JS Consumer with sampling option
2022-03-28 10:21:29 -06:00
Ivan Kozlovic
6ad93d9b34 Fix some flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 18:24:17 -06:00
Matthias Hanel
2438c965e7 Fix update of R1 Consumer in clustered setup.
missing reply caused timeout

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-25 14:48:15 -04:00
Derek Collison
ef8f543ea5 Improve memory usage through JetStream storage layer.
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.

The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.

Also fixed some flappers and a bug where de-dupe might not be reformed correctly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Ivan Kozlovic
2253bb6f1a JS: BackOff list caused too frequent checkPending() calls
Since the "next" timer value is set to the AckWait value, which
is the first element in the BackOff list if present, the check
would possibly happen at this interval, even when we were past
the first redelivery and the backoff interval had increased.

The end-user would still see the redelivery be done at the durations
indicated by the BackOff list, but internally, we would be checking
at the initial BackOff's ack wait.

I added a test that uses the store's interface to detect how many
times the checkPending() function is invoked. For this test it
should have been invoked twice, but without the fix it was invoked
15 times.

Also fixed an unrelated test that could possibly deadlock causing
tests to be aborted due to inactivity on Travis.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-23 12:46:17 -06:00
Ivan Kozlovic
8d4ff4bc96 Fixed panic on stream create failure (with filestore)
This was introduced by the change for ipQueues in #2931.
The (*ipQueue).unregister() was written with a protection for
the ipQueue to be nil, however, mset.outq is actually not a bare
ipQueue but a jsOutQ that embeds a pointer to an ipQueue. So we
need to implement register() for jsOutQ.

Added a test that reproduced the issue, but found it with a flapping
test (TestJetStreamLongStreamNamesAndPubAck) that failed due to
a file name too long.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-22 15:21:01 -06:00
Boris Ilijic
a31d501f53 Add test for updating JS Consumer with sampling 2022-03-22 00:42:41 +01:00