This could happen when a consumer had not sent anything to the
attached NATS subscription and there was a consumer leader
step down or server restart.
Signed-off-by: Derek Collison <derek@nats.io>
This enables lightweight distribution of messages to very large number of NATS subscribers.
We add in metadata as headers that allows for gap detection which enables initial value (via JetStream, maybe KV) and realtime NATS core updates but all globally ordered.
Signed-off-by: Derek Collison <derek@nats.io>
When a downstream stream uses retention modes that delete messages, fallback to timebased start time for the new source consumers.
Signed-off-by: Derek Collison <derek@nats.io>
There was an issue with MaxWaiting==1 that was causing a request
with expiration to actually not expire. This was because processWaiting
would not pick it up because wq.rp was actually equal to wq.wp
(that is, the read pointer was equal to write pointer for a slice
of capacity of 1).
The other issue was that when reaching the maximum of waiting pull
requests, a new request would evict an old one with a "408 Request Canceled".
There is no reason for that, instead the server will first try to
find some existing expired requests (since some of the expiration
is lazily done), but if none is expired, and the queue is full,
the server will return a "409 Exceeded MaxWaiting" to the new
request, and not a "408 Request Canceled" to an old one...
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Remove code coverage from Travis and add it to a GitHub Action
that will be run as a nightly.
- Use tag builds to exclude some tests, such as the "norace" or
JS tests. Since "go test" does not support "negative" regexs, there
is no other way.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The main issue was that in mixed-mode, the interest through gateway
may still be in optimistic mode, which when creating the source
consumer would start delivery before we had a chance to setup
the subscription to receive those messages.
The approach is to create the subscription prior to sending
the consumer create request. Also refactored a bit the code in
the hope to make the retries a bit more bullet proof.
We may also look at making sure that gateways are switched to
interest-mode when detecting a mixed-mode setup.
Also fixed a defect that could cause a source to be canceled
when updating a stream.
Resovles #2801
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
* [fix] on queue sub, a consumers delivery subject, was not changed
to the original publish subject the stream received
the code added is a copy of what regular subs do
* [fixed] subject renaming for leaf node connections as well
also updated multi server test to test for queue and non queue scenarios
Signed-off-by: Matthias Hanel <mh@synadia.com>
* [Fixed] limits enforcement issues
stream create had checks that stream restore did not have.
Moved code into commonly used function checkStreamCfg.
Also introduced (cluster/non clustered) StreamLimitsCheck functions to
perform checks specific to clustered /non clustered data structures.
Checking for valid stream config and limits/reservations before
receiving all the data. Now fails the request right away.
Added a jetstream limit "max_request_batch" to limit fetch batch size
Shortened max name length from 256 to 255, more common file name limit
Added check for loop in cyclic source stream configurations
features related to limits
Signed-off-by: Matthias Hanel <mh@synadia.com>
- Updated tests that were checking for the error to include the limit
- Moved some tests above the benchmark ones
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Wait of some sort of routing to be in place before starting
the raft run loop
- Remove use of lock in apiDispatch that was not necessary but
could have cause a route to block, causing memory growth, etc..
Unrelated rename of some tests so that they start with TestJetStream
and TestJetStreamCluster for cluster tests, fixed some flappers
and ensure that tests that change RAFT timeouts put them back
to default values on exit.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This broke cross-account functionality. Ported the test from the
Go client that showed the failure after PR#2997 was merged.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
* added max_ack_penind setting to js account limits
because of the addition, defaults now have to be set later (depend on
these new limits now)
also re-organized the code to closer track how stream create looks
Signed-off-by: Matthias Hanel <mh@synadia.com>
We were also not copying over local state that has been added over the years to track different types of clients.
We also needed to make sure to reuse the account's internal client and the subscription id (acc.isid).
Signed-off-by: Derek Collison <derek@nats.io>
Got a data race:
```
==================
WARNING: DATA RACE
Write at 0x00c001c736b0 by goroutine 605:
runtime.mapassign_faststr()
/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:202 +0x0
github.com/nats-io/nats-server/v2/server.(*Account).addServiceImport()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/accounts.go:1868 +0xb7b
github.com/nats-io/nats-server/v2/server.(*Account).AddServiceImportWithClaim()
...
Previous read at 0x00c001c736b0 by goroutine 301:
runtime.mapaccess2_faststr()
/home/travis/.gimme/versions/go1.17.8.linux.amd64/src/runtime/map_faststr.go:107 +0x0
github.com/nats-io/nats-server/v2/server.(*Server).registerSystemImports()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/events.go:1577 +0x284
github.com/nats-io/nats-server/v2/server.(*Server).updateAccountClaimsWithRefresh()
...
```
Also, remove some condition in gateway.go on how we were checking
if a subject was a serviec reply, which was causing a test to flap.
Finally, used AckSync() in a rest (instead of m.Respond(nil)) to
prevent it from flapping.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
* Adding server limits (max ack pending/dedupe window) to js config
Also shifting consumer config check to jsConsumerCreate as in clustered
mode this was enforced in the wrong place
Signed-off-by: Matthias Hanel <mh@synadia.com>
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.
The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.
Also fixed some flappers and a bug where de-dupe might not be reformed correctly.
Signed-off-by: Derek Collison <derek@nats.io>
Since the "next" timer value is set to the AckWait value, which
is the first element in the BackOff list if present, the check
would possibly happen at this interval, even when we were past
the first redelivery and the backoff interval had increased.
The end-user would still see the redelivery be done at the durations
indicated by the BackOff list, but internally, we would be checking
at the initial BackOff's ack wait.
I added a test that uses the store's interface to detect how many
times the checkPending() function is invoked. For this test it
should have been invoked twice, but without the fix it was invoked
15 times.
Also fixed an unrelated test that could possibly deadlock causing
tests to be aborted due to inactivity on Travis.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This was introduced by the change for ipQueues in #2931.
The (*ipQueue).unregister() was written with a protection for
the ipQueue to be nil, however, mset.outq is actually not a bare
ipQueue but a jsOutQ that embeds a pointer to an ipQueue. So we
need to implement register() for jsOutQ.
Added a test that reproduced the issue, but found it with a flapping
test (TestJetStreamLongStreamNamesAndPubAck) that failed due to
a file name too long.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>