Commit Graph

152 Commits

Author SHA1 Message Date
Matthias Hanel
0c5f3688a7 [ADDED] Tiered limits and fix limit issues on updates (#2945)
* Adding tiered limits and fix limit issues on updates

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-28 20:47:54 -04:00
Ivan Kozlovic
25886e8819 [FIXED] JetStream: sampling not updated during consumer update
Resolves #2941

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-28 10:58:58 -06:00
Ivan Kozlovic
4e5519f999 Merge pull request #2942 from boris-ilijic/js-con-sampling-issue-update-flow
Add failing test for updating JS Consumer with sampling option
2022-03-28 10:21:29 -06:00
Ivan Kozlovic
6ad93d9b34 Fix some flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 18:24:17 -06:00
Matthias Hanel
2438c965e7 Fix update of R1 Consumer in clustered setup.
missing reply caused timeout

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-25 14:48:15 -04:00
Derek Collison
ef8f543ea5 Improve memory usage through JetStream storage layer.
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.

The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.

Also fixed some flappers and a bug where de-dupe might not be reformed correctly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Ivan Kozlovic
2253bb6f1a JS: BackOff list caused too frequent checkPending() calls
Since the "next" timer value is set to the AckWait value, which
is the first element in the BackOff list if present, the check
would possibly happen at this interval, even when we were past
the first redelivery and the backoff interval had increased.

The end-user would still see the redelivery be done at the durations
indicated by the BackOff list, but internally, we would be checking
at the initial BackOff's ack wait.

I added a test that uses the store's interface to detect how many
times the checkPending() function is invoked. For this test it
should have been invoked twice, but without the fix it was invoked
15 times.

Also fixed an unrelated test that could possibly deadlock causing
tests to be aborted due to inactivity on Travis.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-23 12:46:17 -06:00
Ivan Kozlovic
8d4ff4bc96 Fixed panic on stream create failure (with filestore)
This was introduced by the change for ipQueues in #2931.
The (*ipQueue).unregister() was written with a protection for
the ipQueue to be nil, however, mset.outq is actually not a bare
ipQueue but a jsOutQ that embeds a pointer to an ipQueue. So we
need to implement register() for jsOutQ.

Added a test that reproduced the issue, but found it with a flapping
test (TestJetStreamLongStreamNamesAndPubAck) that failed due to
a file name too long.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-22 15:21:01 -06:00
Boris Ilijic
a31d501f53 Add test for updating JS Consumer with sampling 2022-03-22 00:42:41 +01:00
Jaime Piña
60773be03f Use random high port in placement test (#2940) 2022-03-21 15:38:01 -07:00
Ivan Kozlovic
f11f7a61e8 Merge pull request #2938 from nats-io/fix_2920
[FIXED] Removal of an external source stream
2022-03-21 12:29:57 -06:00
Jaime Piña
33cfc748bf Disable some supercluster limit placement tests (#2937) 2022-03-21 11:05:13 -07:00
Ivan Kozlovic
68da3e8253 [FIXED] Removal of an external source stream
Removal of a stream source that was external was not working properly,
allowing messages to still flow after the removal and until the
server hosting the stream to which the source was removed was
restarted.

Resolves #2920

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-21 10:59:47 -06:00
Ivan Kozlovic
29ff67e2ac Tests: Replace all Ack() with AckSync() for now
For reason explained in previous commit, for tests that were
expecting the number of ack/pending to be of a certain value after
an Ack(), they would be flapping. Replaced all references and
we can go back to selectively call Ack() when AckSync() is not
needed.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 20:25:01 -06:00
Ivan Kozlovic
ac52ecd9ff Fixing flapper
Since acks are now processed in different go-routine, the tests
that use Ack() cannot expect the number of ack messages to be
exact immediately. So in this test use AckSync() to ensure that
the ack is processed. Alternatively, the pending count should
be checked with a checkFor().

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 19:53:33 -06:00
Derek Collison
a4e795c996 Attempt to fix flapper
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-17 17:38:32 -07:00
Jaime Piña
50ca685a3b Add stream limit update test (#2929)
This adds a test to see if we can update a stream when the stream limit
is 1. Currently, this test fails, so we're skipping it. This test will
be enabled in a future PR.
2022-03-17 13:49:37 -07:00
Ivan Kozlovic
7d9bb32c1d Fix a flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 12:18:22 -06:00
Ivan Kozlovic
fe6d7b305f Merge pull request #2898 from nats-io/js_cons_ack_processing
[CHANGED] JetStream: Redeliveries may be delayed if necessary
2022-03-17 10:57:22 -06:00
Jaime Piña
acfd456758 Prevent reserved bytes underflow (#2907) 2022-03-16 15:19:35 -07:00
Derek Collison
303bb93c18 Test ack metrics
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-15 16:41:06 -07:00
Ivan Kozlovic
b4128693ed Ensure file path is correct during stream restore
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.

Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 13:31:51 -07:00
Ivan Kozlovic
0fae8067ae [FIXED] Some lock inversions
The established ordering is client -> Account, so fixed few places
where we had Account -> client.

Added a new file, locksordering.txt with the list of known ordering
for some of the objects.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 09:47:37 -07:00
Derek Collison
1b5f651c22 Fixed bug that would not recover a stream after non-clean shutdown with deleted messages.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-04 10:48:10 -08:00
Ivan Kozlovic
97612c0fac Fix test by using AckSync() to avoid flapping
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-03 15:09:01 -07:00
Ivan Kozlovic
804ce102ac [CHANGED] JetStream: Redeliveries may be delayed if necessary
We have seen situations where when a lot of pending messages accumulate,
there is a contention between the processing of the ACKs and the
checking of the pending map.

Decision is made to abort checking of pending list if processing of
ack(s) would be delayed because of that. The result is that a
redelivery may be post-poned.

Internally, the ACKs are also now using a queue to prevent processing
of them from the network handler, which could cause head-of-line
blocking, especially bad for routes.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-03 10:32:35 -07:00
Derek Collison
4efce40bbd Small improvements to send performance to a full stream.
Cleaned up some locking and if fifo make index updates lazy like writeMsgRecord.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-17 05:39:27 -08:00
Derek Collison
5a93b0e9d8 Allow pull requests to specify a heartbeat when idle to detect when a request is invalidated.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-11 09:51:51 -08:00
Ivan Kozlovic
55ffde7251 Fixed consumer dlv count and num pending wrong due to redeliveries
Introduced by #2848, so should not have impacted existing releases.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-02-10 17:33:53 -07:00
Derek Collison
0cc7302be9 A stream name is tied to its identity and can not be changed on a restore.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-09 12:38:45 -08:00
Derek Collison
c13a84cf44 Fixed a bug that would calculate the first sequence of a filteredPending incorrectly.
Also added in more optimized version to select the first matching message in a message block for LoadNextMsg.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-08 13:29:38 -08:00
Derek Collison
d50febeeff Improved sparse consumers replay time.
When a stream has multiple subjects and a consumer filters the stream to a small and spread out list of messages the logic would do a linear scan looking for the next message for the filtered consumer.
This CL allows the store layer to utilize the per subject info to improve the times.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-07 17:26:32 -08:00
Derek Collison
55b7f11c9a Fixed flow control stall under specific conditions of message size.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-05 20:15:48 -08:00
Derek Collison
5da0453964 Add in NumSubjects to StreamInfo
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-02 08:51:13 -08:00
Derek Collison
6a3cf0f71e Added in ability to get number of subjects from StreamInfo, and optionally details per subject on how many messages each subject has.
This can also be filtered, meaning you can filter out the subjects when asking for details.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-02 08:51:13 -08:00
Derek Collison
12f5ea3655 When a consumer had not filtered subject and was attached to a interest policy retention stream we could incorrectly drop messages.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-01 14:21:05 -08:00
Derek Collison
fa814f7cee Fixed behavior for when MaxMsgsPerSubject is set and DiscardNew is also set.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-31 08:36:37 -08:00
Derek Collison
b38ced51b2 A true no wait pull request was not considering redeliveries.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-30 11:52:35 -08:00
Derek Collison
275d42628b Fix for #2828. The original design of the consumer and the subsequent store did not allow updates.
Now that we do, we need to store the new config into our storage layer.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-30 09:45:05 -08:00
Derek Collison
7f572983ac The 2.7 update broke the one-shot pull consumer fetch behavior due to change and a fix to a bug that allowed it to work before.
This change tries to lock down all expected behaviors, and now does out of order timeouts for requests.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-20 18:06:29 -08:00
Ivan Kozlovic
3ce22adb76 Fixed some tests
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:14:05 -07:00
Derek Collison
43eff407b8 Add in explicit subscription for import responses when bound to a leafnode.
When we want to track service import response interest across a leafnode we need to send sub and unsub for all response _R_ subjects versus using a wildcard.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-12 18:24:03 -08:00
Derek Collison
32c3c9ecfb Track interest properly across accounts for pull consumers
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-12 12:16:53 -08:00
Derek Collison
279f31ecb5 Add in ability to have ephemeral pull based consumers
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-10 20:42:39 -08:00
Derek Collison
e12c8cda92 Add in ability to limit aspects of a pull request, specifically batch size and expiration.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-10 17:29:04 -08:00
Derek Collison
5592d923c4 Updated pull consumers.
Cleaned up code, made more consistent, utilize loopAndGather.
Allow pull consumers to have AckAll as well as AckExplicit.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-10 16:59:01 -08:00
Derek Collison
c4198d603c Added test to show cross account interest for push consumers works
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-21 19:30:35 -08:00
Matthias Hanel
3e8b66286d Js leaf deny (#2693)
Along a leaf node connection, unless the system account is shared AND the JetStream domain name is identical, the default JetStream traffic (without a domain set) will be denied.

As a consequence, all clients that wants to access a domain that is not the one in the server they are connected to, a domain name must be specified.
Affected from this change are setups where: a leaf node had no local JetStream OR the server the leaf node connected to had no local JetStream. 
One of the two accounts that are connected via a leaf node remote, must have no JetStream enabled.
The side that does not have JetStream enabled, will loose JetStream access and it's clients must set `nats.Domain` manually.

For workarounds on how to restore the old behavior, look at:
https://github.com/nats-io/nats-server/pull/2693#issuecomment-996212582

New config values added:
`default_js_domain` is a mapping from account to domain, settable when JetStream is not enabled in an account.
`extension_hint` are hints for non clustered server to start in clustered mode (and be usable to extend)
`js_domain` is a way to set the JetStream domain to use for mqtt.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-16 16:53:20 -05:00
Derek Collison
ca12a11be3 There were situations where invalid subjects could be assigned to streams.
This will patch them on the fly during recovery. Specifically subjects with leading or trailing spaces and mirror streams with any subjects at all.

Signed-off-by: Derek Collison <derek@nats.io>
2021-12-01 14:00:23 -07:00
Ivan Kozlovic
1cf8b40304 Merge pull request #2719 from nats-io/js_mem_corruption
[FIXED] Corrupted headers receiving from consumer with meta-only
2021-12-01 13:42:47 -07:00