Commit Graph

296 Commits

Author SHA1 Message Date
Ivan Kozlovic
29ea280fe7 [FIXED] JetStream: send "bad request" response for malformed API requests
An example was a "consumer info" request with a consumer name that
had tokens, which is illegal. This results in the request being
dropped in apiDispatch() because there was no interest.
The server will now return a "bad request" error in such case.

Resolves #2995

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-04 11:25:55 -06:00
Ivan Kozlovic
19783a9f11 [CHANGED] Rate limit similar warnings
Some warnings, especially when dealing with JS limits that were
printed on a per-message basis, are now limited to ~1 per second
if the content of the warning is already found in a map.

This is also for "client" warnings, but the client porting of the
warning is not taken into account so that helps with reducing logging
for similar content, but coming from different clients.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-01 15:24:03 -06:00
Ivan Kozlovic
34650e9dd5 Fixed data race and some flappers
Data race that has been seen:
```
Read at 0x00c00134bec0 by goroutine 159:
  github.com/nats-io/nats-server/v2/server.(*client).msgHeaderForRouteOrLeaf()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:2935 +0x254
  github.com/nats-io/nats-server/v2/server.(*client).processMsgResults()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4364 +0x2147
(...)
Previous write at 0x00c00134bec0 by goroutine 201:
  github.com/nats-io/nats-server/v2/server.(*Server).addRoute()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1475 +0xdb4
  github.com/nats-io/nats-server/v2/server.(*client).processRouteInfo()
      /home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:641 +0x1704
```

Also fixed some flappers and removed use of `s.js.` since we have
already captured `js` in Jsz monitoring.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-31 10:05:34 -06:00
Derek Collison
76b56b6b0e Fix for a flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-29 19:15:40 -07:00
Derek Collison
607858f213 Improved consumer snapshot logic in clustered mode and disk usage.
Also fixed a bug that could cause memory based replicated consumers to no longer work after snapshots and server restarts.

The snapshot logic would allow non-state changing updates to continously grow the raft logs. We also were too conservative on when we snapshotted and why.
Also added in ability to have FileStore.Compact() reclaim space from the block file from the head of last changed block.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-29 18:02:49 -07:00
Matthias Hanel
0c5f3688a7 [ADDED] Tiered limits and fix limit issues on updates (#2945)
* Adding tiered limits and fix limit issues on updates

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-28 20:47:54 -04:00
Derek Collison
004e5ce2c6 Merge pull request #2958 from nats-io/fix_2955
[FIXED] Scaling up an R1 stream would not replicate existing messages.
2022-03-28 12:18:20 -07:00
Derek Collison
7607d37799 Make sure to prevent flappers if possible
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-28 09:34:48 -07:00
Derek Collison
6b379329d8 Fix for #2955. When scaling up a stream with existing messages the existing messages were not being replicated.
Also fixed a bug where we were incorrectly not spining up the monitoring loop for a stream when going from 3->1->3.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-26 07:26:46 -07:00
Ivan Kozlovic
6ad93d9b34 Fix some flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 18:24:17 -06:00
Ivan Kozlovic
4739eebfc4 [FIXED] JetStream: possible deadlock during consumer leadership change
Would possibly show up when a consumer leader changes for a consumer
that had redelivered messages and for instance messages were inbound
on the stream.

Resolves #2912

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:21:51 -06:00
Derek Collison
ef8f543ea5 Improve memory usage through JetStream storage layer.
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.

The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.

Also fixed some flappers and a bug where de-dupe might not be reformed correctly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Derek Collison
7fd5f4dc24 Update Go client
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Ivan Kozlovic
2253bb6f1a JS: BackOff list caused too frequent checkPending() calls
Since the "next" timer value is set to the AckWait value, which
is the first element in the BackOff list if present, the check
would possibly happen at this interval, even when we were past
the first redelivery and the backoff interval had increased.

The end-user would still see the redelivery be done at the durations
indicated by the BackOff list, but internally, we would be checking
at the initial BackOff's ack wait.

I added a test that uses the store's interface to detect how many
times the checkPending() function is invoked. For this test it
should have been invoked twice, but without the fix it was invoked
15 times.

Also fixed an unrelated test that could possibly deadlock causing
tests to be aborted due to inactivity on Travis.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-23 12:46:17 -06:00
Ivan Kozlovic
29ff67e2ac Tests: Replace all Ack() with AckSync() for now
For reason explained in previous commit, for tests that were
expecting the number of ack/pending to be of a certain value after
an Ack(), they would be flapping. Replaced all references and
we can go back to selectively call Ack() when AckSync() is not
needed.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 20:25:01 -06:00
Derek Collison
e204a7961d When detecting exact duplicates for URLs for routes, gws or leafnodes, enter a warning and ignore.
If misconfigured could prevent the JetStream system from electing a leader.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-17 14:52:01 -07:00
Derek Collison
dbfa47f9b1 Improve state preservation for consumers, specifically DeliverNew variants when no activity has been present.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-16 20:55:14 -07:00
Derek Collison
e4ebc4648e When a stream or consumer was offline we would not properly respond to a delete.
We also would hang if no stream info requests were sent during a stream list due to the asset being offline.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-15 21:11:23 -07:00
Ivan Kozlovic
b4128693ed Ensure file path is correct during stream restore
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.

Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 13:31:51 -07:00
Ivan Kozlovic
0cb0f6d380 Merge pull request #2914 from nats-io/fix_2913
[FIXED] Consumer with no activity can lose quorum
2022-03-09 11:55:50 -07:00
Matthias Hanel
9a2da9ed8c Adding denies $KV.>/$OBJ.> along leaf connections on differing domain (#2916)
* Adding denies $KV.>/$OBJ.> along leaf connections on differing domain

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-09 13:17:59 -05:00
Derek Collison
3216eb5ee5 When a consumer has no state we are now compacting the log, but were not snapshotting.
This caused issues on leader change and losing quorum.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-09 07:21:25 -05:00
Derek Collison
58da4b917a Made improvements to scale up and down for streams and consumers.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-06 16:59:02 -08:00
Derek Collison
eb1ed5574d Merge pull request #2904 from nats-io/peer_remove_bad_consumer_state
[FIXED] Inconsistent durable consumer state after stream peer removal
2022-03-06 10:24:31 -08:00
Ivan Kozlovic
196319b106 [FIXED] JetStream: Some stream advisories missing
The "deleted" advisory was missing because the stream's send loop
was closed before the advisory was pushed to the queue to be sent.

Added tests, both for single and clustered mode to test all stream
advisories.

Resolves #2886

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-06 10:52:42 -07:00
Derek Collison
31a19729b0 When removing a stream peer with an attached durable consumer, the consumer could become inconsistent.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-06 05:42:22 -08:00
Derek Collison
4b9bc29e53 If we had not heard from a source or mirror we would still calculate the delta since now.
This would wrap and create a large number which overflowed JSON's 2^53 limit.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-05 12:46:55 -08:00
Derek Collison
ad6020ae72 Fix for #2885.
When a filtered consumer who has no state, meaning no messages are being processed, it still will receive updates to properly track the delivered sequence as it relates to the entire stream.
Since we did not have state we were inadvertently skipping the compaction logic for the raft store.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-04 08:53:16 -08:00
Ivan Kozlovic
dfe96944d2 [FIXED] JetStream stream info consumers count in clustered mode
In clustering mode, the number of consumers in stream info may be
wrong in presence of non durable consumers. Ephemeral are handled
by specific nodes. The StreamInfo response would contain only the
consumer count that the stream leader is handling.

This fix overrides the stream's state consumers count with the
number of consumers from the stream assignment record.

Resolves #2895

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-03 09:46:35 -07:00
Derek Collison
1c8f7de848 On filtered subjects when consumers were staggered we need to disqualify a filtered consumer if not applicable.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-16 18:24:27 -08:00
Derek Collison
ca1132a01d Allow stream placement by tags.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-15 17:07:32 -08:00
Derek Collison
fb15dfd9b7 Allow replica updates during stream update.
Also add in HAAssets count to Jsz.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-13 19:33:46 -08:00
Derek Collison
5a93b0e9d8 Allow pull requests to specify a heartbeat when idle to detect when a request is invalidated.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-11 09:51:51 -08:00
Derek Collison
ecfe42630a Merge pull request #2858 from nats-io/add_consumer_with_info
Make sure we snapshot initial consumer info during consumer creation.
2022-02-09 17:05:01 -08:00
Derek Collison
0cc7302be9 A stream name is tied to its identity and can not be changed on a restore.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-09 12:38:45 -08:00
Ivan Kozlovic
3dcf0246c6 [FIXED] Adding a consumer could return inaccurate consumer info
The issue is that the consumer info returned by the consumer create
API is gathered after the consumer is added and possibly after
starting to deliver pending messages.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-02-09 09:04:16 -07:00
Ivan Kozlovic
30c431a9a3 [FIXED] JetStream: BackOff redeliveries would always use first in list
If the consumer's sequence was not the same than the stream's sequence,
then the redelivery would always use the first duration from the
BackOff list.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-31 17:44:08 -07:00
Derek Collison
0d158728d1 Merge pull request #2824 from nats-io/fix-nodeinfo
Store JetStream Config in node info map
2022-01-31 13:58:12 -08:00
Derek Collison
a57bd96def Updating a push consumer to be pull would succeed but cause a panic if used.
This disallows that upgrade. We had a check in place for pull to push, but not the reverse.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-28 13:11:58 -08:00
Jaime Piña
ae8eedb88e Store JetStream Config in node info map 2022-01-27 14:46:41 -08:00
Derek Collison
6486cd8fc8 Added in /healthz endpoint for health and liveness probes in environments like k8s.
Currently this code returns a 200 and { "status": "ok" } iff all configured ports are open
and if JetStream is configured and we have contact with the metaleader and the cluster and all streams are up to date.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 19:30:10 -08:00
Derek Collison
bd78b1a99b Formal json version for NAK delay
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 15:01:52 -08:00
Derek Collison
d486c24199 Allow a consumer to be configured with BackOffs.
This allows a consumer to have exponential backoffs vs static AckWait and MaxDeliver.
When BackOff is set it will overridde AckWait to BackOff[0] and MaxDeliver will be len(BackOff)+1.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 14:57:36 -08:00
Derek Collison
579bf336ad Allow NAK to take a delay parameter to delay redelivery for a certain amount of time.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 14:57:28 -08:00
Derek Collison
d332684322 Fixed data race and fuxed bug that we would not clear our waiting queue when a leader stepped down.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 13:01:25 -08:00
Derek Collison
d962500827 Track reply subjects for pending pull requests across clustered consumers.
We will only send if all peers in our group are >= 2.7.1 and we will check for updates.
When a consumer follower takes over it will notify all pending requests that those requests are invalid now.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-21 16:31:59 -08:00
Derek Collison
103f710479 Fixed consumer info num pending bug.
Under load we could have a message committed to the underlying store when a consumer was being created and then it increase num pending again when the stream signals the consumers.
This fix just remembers the last seq of the state when we calculate sgap and test before adding in the stream code.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-12 20:03:26 -08:00
Derek Collison
5592d923c4 Updated pull consumers.
Cleaned up code, made more consistent, utilize loopAndGather.
Allow pull consumers to have AckAll as well as AckExplicit.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-10 16:59:01 -08:00
Derek Collison
d02ad88297 Only report peers that we have seen a stats/usage update for
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-07 10:42:06 -08:00
Derek Collison
52da55c8c6 Implement overflow placement for JetStream streams.
This allows stream placement to overflow to adjacent clusters.
We also do more balanced placement based on resources (store or mem). We can continue to expand this as well.
We also introduce an account requirement that stream configs contain a MaxBytes value.

We now track account limits and server limits more distinctly, and do not reserver server resources based on account limits themselves.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-06 19:33:08 -08:00