Commit Graph

4128 Commits

Author SHA1 Message Date
Derek Collison
d48ccf4c5a When filestore is used for raft layer do not attempt to track subject metadata.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-17 13:46:13 -07:00
Ivan Kozlovic
5d3ee8ebf4 [FIXED] Gateway: possible panic if monitor endpoint inspected too soon
The monitoring http server is started early and the gateway setup
(when configured) may not be fully ready when the `/gatewayz`
endpoint is inspected and could cause a panic.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-17 13:30:58 -06:00
Matthias Hanel
c67d6aad79 fix jwt template ordering issue and error message (#3373)
ordering of templates got messed up by a map (now removed)
Also improved error message when template generation fails

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-16 19:00:41 -07:00
Ivan Kozlovic
02ecda535c Stop the raft node to not cause test to flap.
Test TestNoRaceJetStreamClusterCorruptWAL() would start to flap
because of the snapshot on cluster shutdown. Disable the snapshot
on exit for this test by stopping the raft node before shutdown.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-16 18:44:32 -06:00
Ivan Kozlovic
7de4497815 Install consumer snapshot on clean exit and few other fixes
- didRemove in applyMetaEntries() could be reset when processing
multiple entries
- change "no race" test names to include JetStream
- separate raft nodes leader stepdown and stop in server
shutdown process
- in InstallSnapshot, call wal.Compact() with lastIndex+1

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-16 17:05:49 -06:00
Ivan Kozlovic
f0b098af92 [FIXED] JetStream: issue with max deliver and server/cluster restart
This is a regression introduced in v2.8.3. If a message reaches
the max redeliver count, it stops being delivered to the consumer.
However, after a server or cluster restart, those messages would
be redelivered again.

Resolves #3361

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-16 17:05:47 -06:00
Derek Collison
443f04d262 Bump to 2.9.0-RC.4
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-16 13:21:46 -07:00
Derek Collison
09a3da1412 Merge pull request #3371 from nats-io/aes
[ADDED] Support for AES-GCM as a cipher along with ChaChaPoly.
2022-08-16 13:21:03 -07:00
Derek Collison
9508276b98 Make kek function based on review feedback
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-16 12:49:03 -07:00
Marco Primi
02a34117e4 Add chaos tests for Ordered, Async, Pull, Durable consumers
Tests consists of a single client trying to consume a fixed number of messages in a stream.
While the cluster is being bounced by a chaos monkey.
2022-08-16 09:52:48 -07:00
Marco Primi
c6af1ecc9c Fix typo in comment 2022-08-16 09:07:05 -07:00
Derek Collison
ef91d67708 Support auto-conversion
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-16 08:41:39 -07:00
Derek Collison
827b34a77a Add support for AES cipher encryption for filestore.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-15 14:21:37 -07:00
Matthias Hanel
b7ee177497 Adding templates to scoped signing key user permis (#3367)
For security reasons we have introduced scoped signing keys to jwt.
They carry user permissions.
Wich is why jwt issued by those keys are not allowed to carry their own permission.
Instead they are copied from the signing key.
If the scoped signing key gets compromised, an attacker can only issue jwt with the permissions of the key.
With a plain signing key, an attacker can create arbitrary user with permissions.
Because user jwt creation is greatly simplified we added a single utility function to go/java/.net which issues user for such keys.
This is function is documented in ADR-14:

```
/**
 * signingKey, is a mandatory account nkey pair to sign the generated jwt.
 * accountId, is a mandatory public account nkey. Will return error when not set or not account nkey.
 * publicUserKey, is a mandatory public user nkey. Will return error when not set or not user nkey.
 * name, optional human readable name. When absent, default to publicUserKey.
 * expiration, optional but recommended duration, when the generated jwt needs to expire. If not set, JWT will not expire.
 * tags, optional list of tags to be included in the JWT.
 *
 * Returns:
 * error, when issues arose.
 * string, resulting jwt.
 **/
IssueUserJWT(signingKey nkey, accountId string, publicUserKey string, name string, expiration time.Duration, tags []string) (error, string)
```

Currently the only downside of this is that the permissions are static and can't be tailored to the user.

This PR changes that by allowing the user pub/sub permissions to be parameterized with templates.

templates are for entire tokens only and include:
{{name()}} -> username
{{subject()}} -> user subject (nkey)
{{account-name()}} -> users account name
{{account-subject()}} -> user accoutn subject (nkey)

{{tag(arbitrary-prefix)}}
provided the tag "arbitrary-prefix:value" will result in "value"
provided the tags ["arbitrary-prefix:1", "arbitrary-prefix:2"] will result in two subjects "1" & "2"

If the resulting subject is not valid.
Say a tag is not present or name is not set.
This will result in an error for deny subjects
and result in no subject for allow subject.

Signed-off-by: Matthias Hanel <mh@synadia.com>

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-15 12:49:35 -07:00
Ivan Kozlovic
9e748ed2e7 Bump to RC.3
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-12 11:15:38 -06:00
Ivan Kozlovic
396aa5527c Merge pull request #3366 from nats-io/fs-subject-state
[FIXED] Make sure when SubjectState is called we have loaded fss state.
2022-08-12 11:15:06 -06:00
Derek Collison
d7534dff5f Make sure when SubjectState is called we have loaded fss state.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-12 07:14:39 -05:00
Ivan Kozlovic
00345cac64 [FIXED] JetStream: subject overlap error should be returned
In standalone mode, when attempting to create a stream which has
subjects that overlap with an existing stream, the generic
stream create error "10049" was returned instead of the more
accurate "10065" error code corresponding to subject overlap,
as it was the case in clustered mode.

Resolves #3362

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-11 13:32:29 -06:00
Matthias Hanel
76219f8e5b fix unit test (#3359)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-11 01:46:30 +02:00
Ivan Kozlovic
1f428310b0 Fixed message timestamp formatting for direct message get feature
In normal message get, the returned format is RFC3339Nano, which
is what is being used when using JSON marshaling. However, for
the direct get we had to pass a string to construct the header
and we were using time.Time.String() which was using a different
layout. So use time.Time.MarshalJSON() to be consistent with
the non-direct get message.

Libraries that already parsed the non RFC3339Nano time format
can be updated since none should have been released yet (since
the feature in the server is not released yet)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-10 12:53:08 -06:00
Matthias Hanel
f1d42646fe bump version to 2.9.0-RC.2 (#3357)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-10 19:17:06 +02:00
Matthias Hanel
c26e915c5b adding source/mirror unit tests (#3352)
* adding source/mirror unit tests

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-10 19:01:45 +02:00
Matthias Hanel
c6e37cf7af Fix race between stream stop and monitorStream (#3350)
* Fix race between stream stop and monitorStream

monitorCluster stops the stream, when doing so, monitorStream
needs to be stopped to avoid miscounting of store size.
In a test stop and reset of store size happened first and then
was followed by storing more messages via monitorStream

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-10 19:01:21 +02:00
Matthias Hanel
7015e46dd9 fix move cancel issue where tags and peers diverge (#3354)
This can happen if the move was initiated by the user.
A subsequent cancel resets the initial peer list.
The original peer list was picked on the old set of tags.
A cancel would then keep the new list of tags but reset
to the old peers. Thus tags and peers diverge.

The problem is that at the time of cancel, the old
placement tags can't be found anymore.

This fix causes cancel to remove the placement tags, if
the old peers do not satisfy the new placement tags.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-10 18:48:18 +02:00
Matthias Hanel
2cf2868406 fixed consumer restart on source filter update (#3355)
* fixed consumer restart on source filter update

When a stream source filter subject was updated, the internal consumer
was not re created

If the upstream stream contains a tail of previously filtered messages,
these will now be delivered

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-10 18:47:19 +02:00
Matthias Hanel
5588c3d0de Added check for source/mirror filter subjects (#3356)
* Added check for source/mirror filter subjects

When the origin stream exists, the sourec/mirror filter subject
will be checked against the stream subjects.
If there is no overlap, an error will be returned

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-10 18:46:52 +02:00
Derek Collison
6bc82bb4e6 Fic a data race
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-09 17:42:02 -05:00
Derek Collison
9a92d10cc9 Bump to 2.9.0-RC.1
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-09 17:29:15 -05:00
Derek Collison
9a61537b1e Merge pull request #3351 from nats-io/fs-kv
[IMPROVED] DirectGet performance and memory usage for large streams.
2022-08-09 15:27:39 -07:00
Derek Collison
8c04adc009 Improvements to filestore for large KVs.
Use better indexing for lookups, we used to do simple linear scan backwards, now track first and last block.
Will expire the fss cache at will to reduce memory usage.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-09 15:51:13 -05:00
Ivan Kozlovic
a4bf4e87f6 Merge pull request #3326 from mfaizanse/health_endpoint_params
Added param options to /healthz endpoint
2022-08-09 08:49:22 -06:00
Muhammad Faizan
1634f33de7 Added param options to /healthz endpoint 2022-08-09 08:32:54 +02:00
Ivan Kozlovic
502e5b13f7 Declare some catchup static errors
Use `var .. = errors.New()`.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-08 17:51:31 -06:00
Ivan Kozlovic
ecddb08469 [IMPROVED] JetStream catchup can be aborted and better flow control
If the leader sends messages but the follower for any reason aborts
or retry the snapshot process, it will now send the error that
caused this and the leader can then abort the catchup instead of
waiting for its inactivity threshold of 5 seconds.

Also make the send of a batch be delayed for a bit until the number
of "acks" is 1/2 of the batch size or after reaching 100ms. This
helps avoid trickling of messages. Tested with the new test
TestJetStreamSuperClusterStreamCathupLongRTT() and see better
results both in size of batches and overall time is smaller or
similar but not longer.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-08 17:19:36 -06:00
Derek Collison
c4abba4ed5 Bump to 2.9.0-beta.22
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 10:33:09 -07:00
Derek Collison
06112d6885 Reset activity interval on catchup to default vs ramp up. Tweak test.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
906afccb8a Make a check loop based on review feedback.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
33526f4d93 Make sure empty msgs do not interfere with catchup process.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
758b733d43 Attempt to improve long RTT catchup time during stream moves.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
e635de7526 Additional stability improvements for catchup.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
3407112292 Write lock not needed
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
d54899de0a No need to hold server write lock since sendq has its own.
I noticed some contention when I was investigating a catchup bug on the server write lock.
Medium term we could have a separate lock, longer term formal client support in the server will alleviate.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
a5119008a5 Fix up some processing during account purge to fix flapping tests
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Derek Collison
5a050fc10b Improve handling when a snapshot represents state we no longer have.
We would send skip messages for a sync request that was completely below our current state, but this could be more traffic then we might want.
Now we only send EOF and the other side can detect the skip forward and adjust on a successful catchup.
We still send skips if we can partially fill the sync request.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:08 -06:00
Ivan Kozlovic
d96e801825 Change the report to something like this instead:
```
Replica: Server name unknown at this time (peerID: jZ6RvVRH), outdated, OFFLINE, not seen
```
After discussing with @ripienaar, this text convey better a sense
that this is a transient situation.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-08 09:30:37 -06:00
Ivan Kozlovic
267e6d1958 [IMPROVED] Replicas ordering and info regarding unknown in stream info
If a cluster is brought down and then partially restarted, the
replica information about the non restarted node would be completely
missing. The CLI could report replicas 3 but then only the leader
and the running replicas, but nothing about the other node.
Since this node's server name is not know, this PR adds an entry
with something similar to this:
```
<unknown (peerID: jZ6RvVRH)>, outdated, OFFLINE, not seen
```

Also, replicas array is now ordered, which will help when using
a watcher or repeating stream info commands in that the replicas
output will be stable in regards to the list of replicas.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-07 18:54:26 -06:00
Marco Primi
be460b7bf1 Exclude chaos tests from build by default
Before: build chaos tests unless `skip_js_chaos_tests` is set
After: exclude chaos tests unless `js_chaos_tests` is set
2022-08-05 15:20:09 -07:00
Marco Primi
815948f02f Exclude chaos tests helpers from default build 2022-08-05 15:20:09 -07:00
Matthias Hanel
52c4872666 better error when peer selection fails (#3342)
* better error when peer selection fails

It is pretty hard to diagnose what went wrong when not enough peers for
an operation where found. This change now returns counts of reasons why
peers where discarded.

Changed the error to JSClusterNoPeers as it seems more appropriate
of an error for that operation. Not having enough resources is one of
the conditions for a peer not being considered. But so is having a non
matching tag. Which is why JSClusterNoPeers seems more appropriate
In addition, JSClusterNoPeers was already used as error after one call
to selectPeerGroup already.

example:
no suitable peers for placement: peer selection cluster 'C' with 3 peers
offline: 0
excludeTag: 1
noTagMatch: 2
noSpace: 0
uniqueTag: 0
misc: 0

Examle for mqtt:
mid:12 - "mqtt" - unable to connect: create sessions stream for account "$G":
no suitable peers for placement: peer selection cluster 'MQTT' with 3 peers
        offline: 0
        excludeTag: 0
        noTagMatch: 0
        noSpace: 0
        uniqueTag: 0
        misc: 0
         (10005)

Signed-off-by: Matthias Hanel <mh@synadia.com>

* review comment

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-06 00:17:01 +02:00
Ivan Kozlovic
653b739fa1 Use filepath.Join() instead of manual concatenation
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-05 14:41:23 -06:00