Commit Graph

279 Commits

Author SHA1 Message Date
Derek Collison
2463d83ad5 Merge branch 'main' into dev 2023-02-25 19:48:47 -08:00
Derek Collison
24c2f3b452 Improved performance of subjects details for stream info.
This version avoids all disk IO in the filestore version.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-24 17:22:18 -08:00
Derek Collison
7ce23c46ce Merge branch 'main' into dev 2023-02-21 08:34:08 -08:00
Neil Twigg
68961ffedd Refactor ipQueue to use generics, reduce allocations 2023-02-21 14:50:09 +00:00
Derek Collison
cac712b1d1 Merge branch 'main' into dev 2023-02-20 18:30:26 -08:00
Derek Collison
efa3bcc49d Parallel consumer creation could drop responses (create and info) and could also run monitorConsumer twice.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-18 05:16:05 -08:00
Tomasz Pietrek
b390163908 Make JetStream errors naming consistent
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-02-13 14:08:52 +01:00
Tomasz Pietrek
af338d0d59 Add multiple subject filters 2023-02-13 09:38:40 +01:00
Tomasz Pietrek
9ab52d4ba9 Add metadata to StreamConfig and ConsumerConfig
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-01-27 07:27:25 +01:00
Tomasz Pietrek
86da656fff Fix not validating single token filtered consumer
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-01-11 15:48:38 +01:00
Todd Beets
3fdfb8a12f Merge branch 'main' into ut-replacepeer
# Conflicts:
#	server/jetstream_cluster_3_test.go
2022-12-06 10:51:22 -08:00
Todd Beets
ef27d4d534 tag policies not honored in reassignment after peer remove 2022-12-04 20:39:11 -08:00
Derek Collison
5f7c8e21a2 Fixed issues with multiple concurrent stream create requests.
First issue was applications not getting any response.
However, there was also a more serious issue that would create multiple raft groups for each concurrent request.
The servers would only run one stream monitor loop, however they would update the state to the new raft group's name, so on server restart the stream would be using a different raft group then existing servers.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-04 19:13:51 -08:00
Ivan Kozlovic
6ffa6d1e4b [FIXED] JetStream: possible panic on stream info when leader not elected
It is possible that a stream info request would be handled at a
time where the raft group would not yet be set/created, causing
a panic.

Resolves #3626 (at least the panic reports there)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-11-15 11:56:41 -07:00
Ivan Kozlovic
3358247e6b Added warning if internal sub callback takes too long
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-10-10 14:39:37 -06:00
Derek Collison
fef702a688 [FIXED] bug in consumer names paging, did not honor limits and returned duplicate results.
Signed-off-by: Derek Collison <derek@nats.io>
2022-09-29 06:14:00 -07:00
Ivan Kozlovic
170ff49837 [ADDED] JetStream: peer (the hash of server name) in statsz/jsz
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
    "meta_cluster": {
      "name": "local",
      "leader": "A",
      "peer": "NUmM6cRx",
      "replicas": [
        {
          "name": "B",
          "current": true,
          "active": 690369000,
          "peer": "b2oh2L6w"
        },
        {
          "name": "Server name unknown at this time (peerID: jZ6RvVRH)",
          "current": false,
          "offline": true,
          "active": 0,
          "peer": "jZ6RvVRH"
        }
      ],
      "cluster_size": 3
    }
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-16 15:31:37 -06:00
Ivan Kozlovic
f113163b9f Change ByID boolean to Peer string and add Peer id in replicas output
The CLI will now be able to display the peer IDs in MetaGroupInfo
if it choses to do so, and possibly help user select the peer ID
from a list with a new command to remove by peer ID instead of
by server name.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-15 10:39:23 -06:00
Ivan Kozlovic
e1f0361b98 [ADDED] JetStream: ability to remove a server by peer ID instead of name
This can be helpful after a partial cluster restart since in that
case the server name may not be known. However "server report jetstream"
would report the peer ID that then can be used.

For instance here is the output after a cluster restart where server "C"
is not restarted.

```
nats -s nats://sys:pwd@localhost:4222 server report jetstream
...
╭────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                  RAFT Meta Group Information                                   │
├─────────────────────────────────────────────────────┬────────┬─────────┬────────┬────────┬─────┤
│ Name                                                │ Leader │ Current │ Online │ Active │ Lag │
├─────────────────────────────────────────────────────┼────────┼─────────┼────────┼────────┼─────┤
│ A                                                   │ yes    │ true    │ true   │ 0.00s  │ 0   │
│ B                                                   │        │ true    │ true   │ 0.53s  │ 0   │
│ Server name unknown at this time (peerID: jZ6RvVRH) │        │ false   │ false  │ 0.00s  │ 0   │
╰─────────────────────────────────────────────────────┴────────┴─────────┴────────┴────────┴─────╯
```

With a change to the NATS CLI we could have something like:
```
nats -s nats://sys:pwd@localhost:4222 server raft peer-remove jZ6RvVRH --by_id
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-14 18:10:26 -06:00
Matthias Hanel
f7cb5b1f0d changed format of JSClusterNoPeers error (#3459)
* changed format of JSClusterNoPeers error

This error was introduced in #3342 and reveals to much information
This change gets rid of cluster names and peer counts.

All other counts where changed to booleans,
which are only included in the output when the filter was hit.

In addition, the set of not matching tags is included.
Furthermore, the static error description in server/errors.json 
is moved into selectPeerError

sample errors:
1) no suitable peers for placement, tags not matched ['cloud:GCP', 'country:US']"
2) no suitable peers for placement, insufficient storage

Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Co-authored-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-08 18:25:48 -07:00
jnmoyne
95c1946231 Implements pagination for JS Stream Info requests 2022-09-08 10:45:20 -07:00
Derek Collison
c3203a3bb5 Use lostQuorum default versus live for reporting.
Signed-off-by: Derek Collison <derek@nats.io>
2022-09-07 13:56:38 -07:00
Derek Collison
98bf861a7a Updates to stream and consumer move logic.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 16:11:35 -07:00
Derek Collison
aa94a0bc0f New consumer create that allows elevation of stream and consumer names, and optional filter subject to the request subject.
Similar to changes in direct get allows proper security if needed for filter subject selection.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 09:29:38 -07:00
Matthias Hanel
7015e46dd9 fix move cancel issue where tags and peers diverge (#3354)
This can happen if the move was initiated by the user.
A subsequent cancel resets the initial peer list.
The original peer list was picked on the old set of tags.
A cancel would then keep the new list of tags but reset
to the old peers. Thus tags and peers diverge.

The problem is that at the time of cancel, the old
placement tags can't be found anymore.

This fix causes cancel to remove the placement tags, if
the old peers do not satisfy the new placement tags.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-10 18:48:18 +02:00
Derek Collison
a5119008a5 Fix up some processing during account purge to fix flapping tests
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Matthias Hanel
52c4872666 better error when peer selection fails (#3342)
* better error when peer selection fails

It is pretty hard to diagnose what went wrong when not enough peers for
an operation where found. This change now returns counts of reasons why
peers where discarded.

Changed the error to JSClusterNoPeers as it seems more appropriate
of an error for that operation. Not having enough resources is one of
the conditions for a peer not being considered. But so is having a non
matching tag. Which is why JSClusterNoPeers seems more appropriate
In addition, JSClusterNoPeers was already used as error after one call
to selectPeerGroup already.

example:
no suitable peers for placement: peer selection cluster 'C' with 3 peers
offline: 0
excludeTag: 1
noTagMatch: 2
noSpace: 0
uniqueTag: 0
misc: 0

Examle for mqtt:
mid:12 - "mqtt" - unable to connect: create sessions stream for account "$G":
no suitable peers for placement: peer selection cluster 'MQTT' with 3 peers
        offline: 0
        excludeTag: 0
        noTagMatch: 0
        noSpace: 0
        uniqueTag: 0
        misc: 0
         (10005)

Signed-off-by: Matthias Hanel <mh@synadia.com>

* review comment

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-06 00:17:01 +02:00
Ivan Kozlovic
d90854a45f Merge pull request #3341 from nats-io/go_1_19
Move to Go 1.19, remote io/util, fix data race and a flapper
2022-08-05 12:49:06 -06:00
Matthias Hanel
c56f3b9fbd Adding account purge operation (#3319)
* Adding account purge operation

The new request is available for the system account.
The subject to send the request to is $JS.API.ACCOUNT.PURGE.*
With the name of the account to purge instead of the wildcard.

Also added directory cleanup code such that server do not
end up with empty streams directories and account dirs that
only contain streams

Also adding ACCOUNT to leaf node domain rewrite table

Addresses #3186 and #3306 by providing a way to
get rid of the streams for existing and non existing accounts

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-05 18:24:19 +02:00
Ivan Kozlovic
3c9a7cc6e5 Move to Go 1.19, remote io/util, fix data race and a flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-08-05 09:55:37 -06:00
Derek Collison
28ccaa4371 Direct get across a leafnode using cross domain mappings to a queue subscriber did not work.
The interest moved across the leafnode would be for the mapping, and not the actual qsub.
So when received if we did detect that we are mapped and do not have a queue filter present make sure to ignore.
This will allow queue subscriber processing on the local server that received the message from the leafnode.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-03 20:21:28 -07:00
Derek Collison
c82c49451c Allow direct get by subject to be all subject based.
This avoids marshalling or unmarshalling but also allows subject based permissioning.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-02 18:19:33 -07:00
Matthias Hanel
3358205de3 add implementation for consumer replica change (#3293)
* add implementation for consumer replica change

fixes #3262

also check peer list on every update

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-27 03:56:28 +02:00
Ivan Kozlovic
ebeca00e20 [FIXED] JetStream/Cluster: Stream names/infos would return bad response
If there are more stream names that the current limit of 1024, getting
the list of names would return them all instead of using pagination.

For "stream infos", the Total amount returned would be the API limit
instead of the actual number of streams.

Resolves https://github.com/nats-io/natscli/issues/541

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-07-25 14:41:05 -06:00
Ivan Kozlovic
1da5ecfb96 [IMPROVED] JetStream: stream already exists error description
The `JSStreamNameExistErr` will now include in the description that
the stream exists with a different configuration, because that is
the error clients would get when trying to add a stream with a
different configuration (otherwise this is a no-op and client
don't get an error).

Since that error was used in case of restore, a new error is added
but uses the same description prefix "stream name already in use"
but adds ", cannot restore" to indicate that this is a restore
failure because the stream already exists.

Resolves #3273

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-07-21 10:20:07 -06:00
Matthias Hanel
89b5e872ac Move and cancel fixes (#3270)
The Move/Cancel/Downscale mechanism did not take into account that
the consumer's replica count can be set independently.

This also alters peer selection to have the ability to skip 
unique tag prefix check for server that will be replaced.
Say you have 3 az, and want to add another server to az:1, 
in order to replace a server that is the same zone.
Without this change, uniqueTagPrefix check would filter 
the server to replace with and cause a failure.

The cancel move response could not be received due to 
the wrong account name.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-18 18:42:03 +02:00
Matthias Hanel
023500e1da add the ability to cancel a move in progress (#3253)
* add the ability to cancel a move in progress

Move to individual subjects for move and cancel_move

New subjects are:
$JS.API.ACCOUNT.STREAM.MOVE.*.*
$JS.API.ACCOUNT.STREAM.CANCEL_MOVE.*.*

last and second to last token are account and stream name

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-12 21:54:18 +02:00
Derek Collison
7ff534d1da Allow get next for json stream get version
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-08 07:19:15 -07:00
Matthias Hanel
70be4b77f9 fixes peer removal, simplifies move, more tests
Make sure when processing a peer removal that the stream assignment agrees.
When a new leader takes over it can resend a peer removal, and if the stream/consumer really was rescheduled we could remove by accident.

Also need to make sure that when we remove a stream we remove the node as part of the stream assignment.
If we didn't, if the same asset returned to this server we would not start up the monitoring loop.

Simplify migration logic in monitorStream, to be driven by leader only

Improved unit tests

Added failure when server not in peer list

Move command does not require server anymore

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-07-07 03:32:13 +02:00
Derek Collison
1ea608eabf Allows direct get to also do get next for subject with starting sequence
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-06 14:22:28 -07:00
Derek Collison
4075721651 Allow direct msg get for stream to operate in queue group and allows mirrors to opt-in to the same group.
Signed-off-by: Derek Collison <derek@nats.io>
2022-07-02 14:16:55 -07:00
R.I.Pienaar
97ad346c34 fix json tag for meta stream move
Signed-off-by: R.I.Pienaar <rip@devco.net>
2022-07-02 11:51:50 +02:00
Matthias Hanel
6bd14e1b7a removed commented out code (#3228)
Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-06-29 20:31:12 +02:00
Derek Collison
abc5905aa9 Merge pull request #3221 from nats-io/direct
Made direct get from a stream part of the $JS.API hierarchy vs separate.
2022-06-28 09:59:44 -07:00
Derek Collison
b8ef9b19a0 Made direct get from a stream part of the $JS.API hierarchy vs separate.
Also for direct get and for pull requests, if we are not on a client connection check how long we have been away from the readloop.
If need be execute in a separate go routine.

Signed-off-by: Derek Collison <derek@nats.io>
2022-06-28 08:53:48 -07:00
Matthias Hanel
3421c49310 [Add] ability for operator to move streams (#3217)
Also added:
ability to reload tags
special tag (!jetstream) to remove peer from peer placement
$JS.API.SERVER.STREAM.MOVE subject to initiate move away from a server

This changes a detail about regular stream move as well.
Before differing cluster names where used to start/stop a transfer.
Now only the peer list and it's size relative to configured replica matter.
Once a transfer is considered completed, excess peers will be dropped
from the beginning of the list.
This allows transfers within the cluster as well.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-06-28 02:36:32 +02:00
Derek Collison
1ade8fc881 When stream or consumer names contained path separators it prevented backup and restore.
Signed-off-by: Derek Collison <derek@nats.io>
2022-06-20 11:59:18 -07:00
Derek Collison
301eb11725 Merge pull request #3168 from nats-io/no_fds_imp
[IMPROVED] Loaded server and low on resources like FDs.
2022-06-06 06:01:20 -07:00
Derek Collison
e1c8f9fb55 This improves when a server is under load or low on resources like FDs and a user is trying to delete a stream with lots of consumers.
Signed-off-by: Derek Collison <derek@nats.io>
2022-06-04 16:49:17 -07:00
Derek Collison
c8a730ce55 Stream get for KV was going through API layer, but with popularity needed a more peformant and lighter weight and direct approach.
Signed-off-by: Derek Collison <derek@nats.io>
2022-05-30 16:34:54 -07:00