Commit Graph

296 Commits

Author SHA1 Message Date
Derek Collison
e08f6d863d Allow for republish to be headers only
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-30 12:05:17 -07:00
Derek Collison
5592315e89 Suppress consumer create and R1 stream update advisories on server restart.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-30 09:58:35 -07:00
R.I.Pienaar
dc9d6776f8 Export the subject transformer
This exports the one key function of the subject transformer
allowing external tools to be written to test mappings are
valid and see how they would interact without the hassle of
configuring a serrver

The APIs are specifically marked as being unsupported and
having kept the transform struct itself unexported one can
not cast from the interface to the real implementation

Signed-off-by: R.I.Pienaar <rip@devco.net>
2022-05-27 10:33:59 +02:00
Derek Collison
46f7f7bfc9 Consumer pending was not correct when stream had max msgs per subject set > 1 and a consumer that filtered out part of the stream was created.
Also make sure to update stream's config on a stream restore in case of changes.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-24 14:44:15 -07:00
Ivan Kozlovic
53e3c53d96 [FIXED] JetStream: consumer with deliver new may miss messages
This could happen when a consumer had not sent anything to the
attached NATS subscription and there was a consumer leader
step down or server restart.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-23 12:01:48 -06:00
Derek Collison
790d643431 Consumer's num pending can now rely on the stream's store vs trying to maintain furing runtime which could be wrong under certain conditions.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-20 08:45:43 -07:00
Derek Collison
e3249d8b6c Move cfg check for republish to common func
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-17 15:33:43 -07:00
Derek Collison
c166c9b199 Enable republishing of messages once stored in a stream.
This enables lightweight distribution of messages to very large number of NATS subscribers.
We add in metadata as headers that allows for gap detection which enables initial value (via JetStream, maybe KV) and realtime NATS core updates but all globally ordered.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-17 15:18:54 -07:00
Derek Collison
bcecae42ac Fix for #3119
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-12 15:45:29 -07:00
Derek Collison
88ebfdaee8 Merge pull request #3109 from nats-io/issue-3107-3069
[FIXED] Downstream sourced retention policy streams during restart have redelivered messages
2022-05-09 09:13:48 -07:00
Derek Collison
b35988adf9 Remember the last timestamp by not removing last msgBlk when empty and during purge pull last timestamp forward until new messages arrive.
When a downstream stream uses retention modes that delete messages, fallback to timebased start time for the new source consumers.

Signed-off-by: Derek Collison <derek@nats.io>
2022-05-09 09:04:19 -07:00
Derek Collison
6507cba2a9 Fix for race on recovery
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-07 12:42:56 -07:00
Ivan Kozlovic
5050092468 [FIXED] JetStream: possible lock inversion
When updating usage, there is a lock inversion in that the jetStream
lock was acquired while under the stream's (mset) lock, which is
not correct. Also, updateUsage was locking the jsAccount lock, which
again, is not really correct since jsAccount contains streams, so
it should be jsAccount->stream, not the other way around.

Removed the locking of jetStream to check for clustered state since
js.clustered is immutable.

Replaced using jsAccount lock to update usage with a dedicated lock.

Originally moved all the update/limit fields in jsAccount to new
structure to make sure that I would see all code that is updating
or reading those fields, and also all functions so that I could
make sure that I use the new lock when calling these. Once that
works was done, and to reduce code changes, I put the fields back
into jsAccount (although I grouped them under the new usageMu mutex
field).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-05-02 09:50:32 -06:00
Matthias Hanel
d520a27c36 [fixed] step down timing, consumer stream seqno, clear redelivery (#3079)
Step down timing for consumers or streams.
Signals loss of leadership and sleeps before stepping down.
This makes it less likely that messages are being processed during step
down.

When becoming leader, consumer stream seqno got reset,
even though the consumer existed already.

Proper cleanup of redelivery data structures and timer

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-27 03:32:08 -04:00
Derek Collison
f702e279ab Fix for a consumer recovery issue.
Also update healthz to check all assets that are assigned, not just running.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-26 19:22:19 -07:00
Ivan Kozlovic
b9463b322f [FIXED] JetStream: stream mirror issues in mixed mode clusters
Similar to PR #3061

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-20 23:21:15 -06:00
Ivan Kozlovic
df61a335c7 Merge pull request #3061 from nats-io/js_fix_stream_source
[FIXED] JetStream: stream sources issue in mixed mode clusters
2022-04-20 23:20:41 -06:00
Ivan Kozlovic
9975a38c6e [FIXED] JetStream: stream sources issue in mixed mode clusters
The main issue was that in mixed-mode, the interest through gateway
may still be in optimistic mode, which when creating the source
consumer would start delivery before we had a chance to setup
the subscription to receive those messages.

The approach is to create the subscription prior to sending
the consumer create request. Also refactored a bit the code in
the hope to make the retries a bit more bullet proof.

We may also look at making sure that gateways are switched to
interest-mode when detecting a mixed-mode setup.

Also fixed a defect that could cause a source to be canceled
when updating a stream.

Resovles #2801

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-20 21:02:35 -06:00
Matthias Hanel
ff5d60973d introducing max_age/dupe_window minimum value of 100ms. (#3056)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-20 13:58:19 -04:00
Ivan Kozlovic
a78ccdcb2f [FIXED] JetStream: some stream SOURCE issues
- Possibly missing some early messages from the sourced stream
- In some cancel situations, the processing of sourced messages
would not longer work

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-18 12:42:16 -06:00
Matthias Hanel
79b4374d01 [Fixed] limits enforcement issues (#3046)
* [Fixed] limits enforcement issues

stream create had checks that stream restore did not have.
Moved code into commonly used function checkStreamCfg.
Also introduced (cluster/non clustered) StreamLimitsCheck functions to
perform checks specific to clustered /non clustered data structures.

Checking for valid stream config and limits/reservations before
receiving all the data. Now fails the request right away.

Added a jetstream limit "max_request_batch" to limit fetch batch size

Shortened max name length from 256 to 255, more common file name limit

Added check for loop in cyclic source stream configurations

features related to limits

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-18 01:53:48 -04:00
Derek Collison
12730af5c2 Make sure mirror re-syncs after origin is moved.
Speed up mirror and sources heartbeats.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-15 07:01:55 -07:00
Derek Collison
9748925f13 Improvements to stream and consumer move.
During elected stepdown and transfer allow the new leader to take over before we stepdown.
We could receive a leader change, so make sure to also check migration state.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-14 07:27:29 -07:00
Ivan Kozlovic
50c3986863 [FIXED] JetStream stream catchup issues
- A stream could become leader when it should not, causing
messages to be lost.
- A catchup could stall because the server sending data
could bail out of the runCatchup routine but still send
the EOF signal.
- Deadlock with monitoring of Jsz

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-12 16:05:12 -06:00
Derek Collison
aa256de55b Add in Domain to alternates
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 18:47:19 -06:00
Derek Collison
b7718e2b7a First pass support for stream alternates
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-11 18:47:19 -06:00
Derek Collison
7e38ebcb6e Allow assets such as streams and their associated consumers to migrate between clusters.
The system will allow an update to a stream, and subsequently all attached consumers, to be placed in another cluster either directly or via tag placement.
The meta layer will scale the underlying peerset appropriately to straddle the two clusters for both the stream and consumers, taking into account the consumer type.
Control will then pass to the current leaders of the assets who will monitor the catchup status of the new peers.
(Note we can optimize this later to only traverse once across a GW for any given asset, but for now this is simpler)
Once the original leaders have determined the assets are synched it will pass leadership to a member of the new peerset.
Once the new leader has been elected, it will forward a request for the meta layer to shrink the peerset by removing the old peers.

Signed-off-by: Derek Collison <derek@nats.io>
2022-04-04 18:28:36 -07:00
Ivan Kozlovic
19783a9f11 [CHANGED] Rate limit similar warnings
Some warnings, especially when dealing with JS limits that were
printed on a per-message basis, are now limited to ~1 per second
if the content of the warning is already found in a map.

This is also for "client" warnings, but the client porting of the
warning is not taken into account so that helps with reducing logging
for similar content, but coming from different clients.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-04-01 15:24:03 -06:00
Matthias Hanel
a77f95faa8 error handling and info when moving a stream from non existing tier (#2992)
adds unit test to test this scenario
improves reporting of correct error
only show info for non existing tiers where streams exist

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-04-01 14:21:35 -04:00
Derek Collison
7f78d3e618 Not allowing streams to be created meant we could not recover on server restart.
Signed-off-by: Derek Collison <derek@nats.io>
2022-04-01 06:41:22 -07:00
Matthias Hanel
1aeaaf0ca3 Adding server limits (max ack pending/dedupe window) to js config (#2967)
* Adding server limits (max ack pending/dedupe window) to js config

Also shifting consumer config check to jsConsumerCreate as in clustered
mode this was enforced in the wrong place

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-29 13:19:36 -04:00
Matthias Hanel
0c5f3688a7 [ADDED] Tiered limits and fix limit issues on updates (#2945)
* Adding tiered limits and fix limit issues on updates

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-03-28 20:47:54 -04:00
Derek Collison
6b379329d8 Fix for #2955. When scaling up a stream with existing messages the existing messages were not being replicated.
Also fixed a bug where we were incorrectly not spining up the monitoring loop for a stream when going from 3->1->3.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-26 07:26:46 -07:00
Ivan Kozlovic
4739eebfc4 [FIXED] JetStream: possible deadlock during consumer leadership change
Would possibly show up when a consumer leader changes for a consumer
that had redelivered messages and for instance messages were inbound
on the stream.

Resolves #2912

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:21:51 -06:00
Derek Collison
ef8f543ea5 Improve memory usage through JetStream storage layer.
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.

The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.

Also fixed some flappers and a bug where de-dupe might not be reformed correctly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Ivan Kozlovic
8d4ff4bc96 Fixed panic on stream create failure (with filestore)
This was introduced by the change for ipQueues in #2931.
The (*ipQueue).unregister() was written with a protection for
the ipQueue to be nil, however, mset.outq is actually not a bare
ipQueue but a jsOutQ that embeds a pointer to an ipQueue. So we
need to implement register() for jsOutQ.

Added a test that reproduced the issue, but found it with a flapping
test (TestJetStreamLongStreamNamesAndPubAck) that failed due to
a file name too long.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-22 15:21:01 -06:00
Ivan Kozlovic
68da3e8253 [FIXED] Removal of an external source stream
Removal of a stream source that was external was not working properly,
allowing messages to still flow after the removal and until the
server hosting the stream to which the source was removed was
restarted.

Resolves #2920

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-21 10:59:47 -06:00
Ivan Kozlovic
c3da392832 Changes to IPQueues
Removed the warnings, instead have a sync.Map where they are
registered/unregistered and can be inspected with an undocumented
monitor page.
Added the notion of "in progress" which is the number of messages
that have beend pop()'ed. When recycle() is invoked this count
goes down.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 17:53:06 -06:00
Jaime Piña
acfd456758 Prevent reserved bytes underflow (#2907) 2022-03-16 15:19:35 -07:00
Ivan Kozlovic
b4128693ed Ensure file path is correct during stream restore
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.

Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 13:31:51 -07:00
Derek Collison
58da4b917a Made improvements to scale up and down for streams and consumers.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-06 16:59:02 -08:00
Ivan Kozlovic
196319b106 [FIXED] JetStream: Some stream advisories missing
The "deleted" advisory was missing because the stream's send loop
was closed before the advisory was pushed to the queue to be sent.

Added tests, both for single and clustered mode to test all stream
advisories.

Resolves #2886

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-06 10:52:42 -07:00
Derek Collison
4b9bc29e53 If we had not heard from a source or mirror we would still calculate the delta since now.
This would wrap and create a large number which overflowed JSON's 2^53 limit.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-05 12:46:55 -08:00
Derek Collison
11cad6be6b In the process of working on #2885 with a user, I was struggling to map $SYS directories to consumer names.
This change allows a bit better logging on startup to more easily map a RAFT log directory etc to the stream/consumer.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-03 09:50:00 -08:00
Derek Collison
0cc7302be9 A stream name is tied to its identity and can not be changed on a restore.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-09 12:38:45 -08:00
Derek Collison
12f5ea3655 When a consumer had not filtered subject and was attached to a interest policy retention stream we could incorrectly drop messages.
Signed-off-by: Derek Collison <derek@nats.io>
2022-02-01 14:21:05 -08:00
Ivan Kozlovic
84f6cbb760 Pooling pubMsg and jsPubMsg objects
This should help with GC pressure, however, it may have an effect
on performance (based on some benchmark). Calling sync.Pool.Get/Put
too often has a performance impact...

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:14:25 -07:00
Ivan Kozlovic
29c40c874c Adding logger for IPQueue
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:14:00 -07:00
Ivan Kozlovic
d74dba2df9 Replaced RAFT's append entry response channel
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:06:48 -07:00
Ivan Kozlovic
05c033c46c Replaced stream's inbound list
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-01-13 13:05:51 -07:00