Commit Graph

239 Commits

Author SHA1 Message Date
Ivan Kozlovic
c3da392832 Changes to IPQueues
Removed the warnings, instead have a sync.Map where they are
registered/unregistered and can be inspected with an undocumented
monitor page.
Added the notion of "in progress" which is the number of messages
that have beend pop()'ed. When recycle() is invoked this count
goes down.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 17:53:06 -06:00
Derek Collison
dbfa47f9b1 Improve state preservation for consumers, specifically DeliverNew variants when no activity has been present.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-16 20:55:14 -07:00
Derek Collison
e4ebc4648e When a stream or consumer was offline we would not properly respond to a delete.
We also would hang if no stream info requests were sent during a stream list due to the asset being offline.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-15 21:11:23 -07:00
Ivan Kozlovic
b4128693ed Ensure file path is correct during stream restore
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.

Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 13:31:51 -07:00
Derek Collison
3216eb5ee5 When a consumer has no state we are now compacting the log, but were not snapshotting.
This caused issues on leader change and losing quorum.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-09 07:21:25 -05:00
Derek Collison
58da4b917a Made improvements to scale up and down for streams and consumers.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-06 16:59:02 -08:00
Derek Collison
31a19729b0 When removing a stream peer with an attached durable consumer, the consumer could become inconsistent.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-06 05:42:22 -08:00
Derek Collison
ad6020ae72 Fix for #2885.
When a filtered consumer who has no state, meaning no messages are being processed, it still will receive updates to properly track the delivered sequence as it relates to the entire stream.
Since we did not have state we were inadvertently skipping the compaction logic for the raft store.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-04 08:53:16 -08:00
Derek Collison
30009fdd78 Merge pull request #2897 from nats-io/js-raft-logging
Better startup logging to help debug RAFT to streams/consumers.
2022-03-03 11:26:09 -07:00
Derek Collison
11cad6be6b In the process of working on #2885 with a user, I was struggling to map $SYS directories to consumer names.
This change allows a bit better logging on startup to more easily map a RAFT log directory etc to the stream/consumer.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-03 09:50:00 -08:00
Ivan Kozlovic
dfe96944d2 [FIXED] JetStream stream info consumers count in clustered mode
In clustering mode, the number of consumers in stream info may be
wrong in presence of non durable consumers. Ephemeral are handled
by specific nodes. The StreamInfo response would contain only the
consumer count that the stream leader is handling.

This fix overrides the stream's state consumers count with the
number of consumers from the stream assignment record.

Resolves #2895

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-03 09:46:35 -07:00
Derek Collison
ca1132a01d Allow stream placement by tags.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-15 17:07:32 -08:00
Derek Collison
fb15dfd9b7 Allow replica updates during stream update.
Also add in HAAssets count to Jsz.

Signed-off-by: Derek Collison <derek@nats.io>
2022-02-13 19:33:46 -08:00
R.I.Pienaar
6bb0861eb7 avoid seg fault when stream restore fails
Signed-off-by: R.I.Pienaar <rip@devco.net>
2022-02-11 10:45:09 +01:00
Derek Collison
da9046b2e6 Snapshot initial consumer info when needed.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-09 15:23:53 -08:00
Derek Collison
579bf336ad Allow NAK to take a delay parameter to delay redelivery for a certain amount of time.
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 14:57:28 -08:00
Derek Collison
6fd41e5ea4 Updates based on review feedback
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-24 10:23:47 -08:00
Derek Collison
d962500827 Track reply subjects for pending pull requests across clustered consumers.
We will only send if all peers in our group are >= 2.7.1 and we will check for updates.
When a consumer follower takes over it will notify all pending requests that those requests are invalid now.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-21 16:31:59 -08:00
Ivan Kozlovic
84f6cbb760 Pooling pubMsg and jsPubMsg objects
This should help with GC pressure, however, it may have an effect
on performance (based on some benchmark). Calling sync.Pool.Get/Put
too often has a performance impact...

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:14:25 -07:00
Ivan Kozlovic
29c40c874c Adding logger for IPQueue
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:14:00 -07:00
Ivan Kozlovic
fc7a4047a5 Renamed variables, removing the "c" that indicated it was a channel 2022-01-13 13:11:05 -07:00
Ivan Kozlovic
62a07adeb9 Replaced catchup and stream restore channels
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:09:49 -07:00
Ivan Kozlovic
ceb06d6a13 Replaced RAFT's apply channel
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:06:10 -07:00
Ivan Kozlovic
23ebf9d2f8 Adapted jsOutQ
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:05:27 -07:00
Derek Collison
103f710479 Fixed consumer info num pending bug.
Under load we could have a message committed to the underlying store when a consumer was being created and then it increase num pending again when the stream signals the consumers.
This fix just remembers the last seq of the state when we calculate sgap and test before adding in the stream code.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-12 20:03:26 -08:00
Derek Collison
d02ad88297 Only report peers that we have seen a stats/usage update for
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-07 10:42:06 -08:00
Derek Collison
16f5c95785 Update atomics placements based on feedback
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-07 09:50:19 -08:00
Derek Collison
de5022ad7e Make cluster placement log more detailed
Signed-off-by: Derek Collison <derek@nats.io>
2022-01-07 07:44:30 -08:00
Derek Collison
52da55c8c6 Implement overflow placement for JetStream streams.
This allows stream placement to overflow to adjacent clusters.
We also do more balanced placement based on resources (store or mem). We can continue to expand this as well.
We also introduce an account requirement that stream configs contain a MaxBytes value.

We now track account limits and server limits more distinctly, and do not reserver server resources based on account limits themselves.

Signed-off-by: Derek Collison <derek@nats.io>
2022-01-06 19:33:08 -08:00
Derek Collison
5932fa1852 Avoid deadlock, release js lock
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-29 10:46:53 -08:00
Derek Collison
1a37f0963a Avoid race condition
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-29 08:26:10 -08:00
Derek Collison
490acf5f29 Full stream state with interior delete details not needed by recipient of snapshot
Signed-off-by: Derek Collison <derek@nats.io>
2021-12-20 17:37:07 -08:00
Matthias Hanel
3e8b66286d Js leaf deny (#2693)
Along a leaf node connection, unless the system account is shared AND the JetStream domain name is identical, the default JetStream traffic (without a domain set) will be denied.

As a consequence, all clients that wants to access a domain that is not the one in the server they are connected to, a domain name must be specified.
Affected from this change are setups where: a leaf node had no local JetStream OR the server the leaf node connected to had no local JetStream. 
One of the two accounts that are connected via a leaf node remote, must have no JetStream enabled.
The side that does not have JetStream enabled, will loose JetStream access and it's clients must set `nats.Domain` manually.

For workarounds on how to restore the old behavior, look at:
https://github.com/nats-io/nats-server/pull/2693#issuecomment-996212582

New config values added:
`default_js_domain` is a mapping from account to domain, settable when JetStream is not enabled in an account.
`extension_hint` are hints for non clustered server to start in clustered mode (and be usable to extend)
`js_domain` is a way to set the JetStream domain to use for mqtt.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-16 16:53:20 -05:00
Matthias Hanel
dd735f4a18 Adding missing entry to stream/consumer list
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-08 18:44:40 -05:00
Matthias Hanel
aa25a2f600 Set incomplete error when cluster list fails
Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-02 12:31:31 -05:00
Matthias Hanel
0dc695762d Merge pull request #2722 from nats-io/stream-list-to
Aligning timeout to be shorter than 5 second cli default
2021-12-01 16:04:59 -05:00
Matthias Hanel
39a710780e Aligning timeout to be shorter than 5 second cli default
Also align stream and consumer timeouts

Signed-off-by: Matthias Hanel <mh@synadia.com>
2021-12-01 15:44:06 -05:00
Ivan Kozlovic
9f30bf00e0 [FIXED] Corrupted headers receiving from consumer with meta-only
When a consumer is configured with "meta-only" option, and the
stream was backed by a memory store, a memory corruption could
happen causing the application to receive corrupted headers.

Also replaced most of usage of `append(a[:0:0], a...)` to make
copies. This was based on this wiki:
https://github.com/go101/go101/wiki/How-to-efficiently-clone-a-slice%3F

But since Go 1.15, it is actually faster to call make+copy instead.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-12-01 10:50:15 -07:00
Derek Collison
e65f3d4a30 [FIXED #2706] - Only utilize full state with deleted details when really needed. Otherwise fast state will suffice.
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-29 10:50:28 -08:00
Derek Collison
6e78bf315e Use local variable that we got under the lock
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 14:43:33 -08:00
Derek Collison
49c5c873ca Better handling of stream mismatch scenarios.
1. When a snapshot did not yield actionable data, we were not setting new last sequence if we have to readjust based on snapshot. This could lead to spinning on stream reset for followers.
2. When a stream has lots of failures by design, like KV abstraction, if we cleared the clfs state we would endlessly spin trying to reset the stream.

Signed-off-by: Derek Collison <derek@nats.io>
2021-11-18 14:00:41 -08:00
R.I.Pienaar
270ff87beb allow streams api to be filtered like list api
Signed-off-by: R.I.Pienaar <rip@devco.net>
2021-11-18 13:59:12 +01:00
Derek Collison
5ead954fee [ADDED] Allow certain consumer attributes to be updated #2670, #2603
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-04 13:43:11 -07:00
Derek Collison
ae999aabe9 Merge pull request #2669 from nats-io/fix-2658
[FIXED] Duplicate stream create returned wrong response type #2658
2021-11-02 15:39:30 -07:00
Derek Collison
c78d700e90 Fix for #2658
Signed-off-by: Derek Collison <derek@nats.io>
2021-11-02 15:23:15 -07:00
Derek Collison
1af3ab1b4e Fix for #2666
When encountering errors for sequence mismatches that were benign we were returning an error and not processing the rest of the entries.
This would lead to more severe sequence mismatches later on that would cause stream resets.

Also added code to deal with server restarts and the clfs fixup states which should have been reset properly.

Signed-off-by: Derek Collison <derek@nats.io>
2021-11-02 14:38:22 -07:00
Derek Collison
cf5322088d Race around accessing storage type
Signed-off-by: Derek Collison <derek@nats.io>
2021-10-28 12:36:01 -07:00
Derek Collison
d4b0b38a8f Fix for #2642
There was a bug that would erase the sync subject for upper level catchup for streams.
Raft layer repair was ok but if that was compacted it gets kicked up to the upper layers which would fail.
Users would see "Catchup stalled" messages repeatedly and consumers that had their leaders attached to that replica would also stop working.

Changes were put in to repair the corrupt state after the fact as well, regardless of presence of fix.

Signed-off-by: Derek Collison <derek@nats.io>
2021-10-26 20:09:00 -07:00
Derek Collison
bbffd71c4a Improvements to meta raft layer around snapshots and recovery.
Signed-off-by: Derek Collison <derek@nats.io>
2021-10-12 05:53:52 -07:00
Derek Collison
5fc2cc5754 Allow streams to be sealed through a stream update.
Sealed streams can not accept new messages, allow you to delete or purge messages, or have messages expire due to age.
Sealed stream can not be unsealed through an update.

Signed-off-by: Derek Collison <derek@nats.io>
2021-09-29 15:25:38 -07:00