302 Commits

Author SHA1 Message Date
Derek Collison
e4ca15c2c3 Optimize locking for consumer info
Signed-off-by: Derek Collison <derek@nats.io>
2023-10-02 09:22:44 -07:00
Derek Collison
00839280fb [IMPROVED] Reduce contention for high connections in a JetStream enabled account with high API usage. (#4613)
Several strategies are used which are listed below.

1. Checking a RaftNode to see if it is the leader now uses atomics.
2. Checking if we are the JetStream meta leader from the server now uses
an atomic.
3. Accessing the JetStream context no longer requires a server lock,
uses atomic.Pointer.
4. Filestore syncBlocks would hold msgBlock locks during sync, now does
not.

Signed-off-by: Derek Collison <derek@nats.io>
2023-10-01 08:17:15 -07:00
Derek Collison
dba03dbc2f Optimizations to reduce contention for high connections in a JetStream enabled account with high API usage.
Several strategies which are listed below.

1. Checking a RaftNode to see if it is the leader now uses atomics.
2. Checking if we are the JetStream meta leader from the server now uses an atomic.
3. Accessing the JetStream context no longer requires a server lock, uses atomic.Pointer.
4. Filestore syncBlocks would hold msgBlock locks during sync, now does not.

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-30 14:52:15 -07:00
Tomasz Pietrek
1f4b986125 Fix consumer info if consumer was closed
Co-authored-by: Derek Collison <derek@nats.io>
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-09-29 21:40:55 +02:00
Derek Collison
a7ca71017b When under load, concurrent stream creation of the same stream could return stream not found, which is odd.
Here we know that if we can't find the stream but have the stream assignment, this is a distinct possibility. So we wait, since not processed inline, to see if it appears.

Fixes TestJetStreamClusterParallelStreamCreation as well that could flap.

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-27 18:05:43 -07:00
Derek Collison
9781025b40 Fix for data race accessing consumer assignment
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-15 16:21:12 -07:00
Derek Collison
4a5b76b0e8 Print out restore time for streams to nearest millisecond.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-03 13:28:18 -07:00
Tomasz Pietrek
d105e68c96 Add consumer api action for create and update
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-08-07 08:28:21 +02:00
Derek Collison
42752ec551 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-01 21:46:54 -07:00
Derek Collison
5c8db89506 Make sure we do not drift on accounting.
Three issues were found and resolved.

1. Purge replays after recovery could execute full purge.
2. Callback was registered without lock, which could lead to skew.
3. Cluster reset could stop stream store and recreate it, which could lead to double accounting.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-01 18:35:20 -07:00
Jean-Noël Moyne
449b27535e [ADDED] Support for multi-filter in stream sources (#4276)
- [X] Tests added
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [X] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

### Changes proposed in this pull request:

Adds support for multi-filter (and associated transform destinations) to
stream sources

---------

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-08-01 10:50:11 -07:00
Derek Collison
4c41c765f7 Merge branch 'main' into dev 2023-07-26 22:30:17 -07:00
Derek Collison
9a8f846dbb Merge branch 'main' into dev 2023-07-26 22:22:34 -07:00
Tomasz Pietrek
4b72e37f27 Fix not validating single token filtered consumer
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
Signed-off-by: Neil Twigg <neil@nats.io>
2023-07-26 16:21:00 +01:00
Waldemar Quevedo
bbfeb2a887 Fix typo on internal function
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-07-22 20:40:26 -07:00
R.I.Pienaar
c24547eb4e Record the stream and consumer info timestamps (#4133)
This records the server time when info for streams and consumers are
created so that tools such as the nats cli can calculate time deltas for
last ack, last delivered and so forth in the context of the server
clock.

This will help aleviate problems with client devices experiencing clock
jitter that can show up in user interfaces as negative seconds since
last ack etc
2023-06-02 08:53:28 +03:00
Derek Collison
7760aa5107 Merge branch 'main' into dev 2023-05-16 14:01:57 -07:00
Savion
cd192f0e03 CHANGED - typo err 2023-05-16 16:41:52 +08:00
R.I.Pienaar
fb1d86d506 Record the stream and consumer info timestamps
This records the server time when info for streams and
consumers are created so that tools such as the nats cli
can calculate time deltas for last ack, last delivered and
so forth in the context of the server clock.

This will help aleviate problems with client devices experiencing
clock jitter that can show up in user interfaces as negative
seconds since last ack etc

Signed-off-by: R.I.Pienaar <rip@devco.net>
2023-05-05 18:53:09 +02:00
Derek Collison
ce0d8514be Merge branch 'main' into dev 2023-04-07 05:32:05 -07:00
Derek Collison
33451e5d56 Fix for API data race. (#4030)
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-06 10:45:29 -07:00
Derek Collison
dee5229f9b Fix data race
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-06 10:37:31 -07:00
Derek Collison
0d2269b1e9 Move stepdowns of streams and consumers to not be inline
Signed-off-by: Derek Collison <derek@nats.io>
2023-04-06 10:27:36 -07:00
Derek Collison
2463d83ad5 Merge branch 'main' into dev 2023-02-25 19:48:47 -08:00
Derek Collison
24c2f3b452 Improved performance of subjects details for stream info.
This version avoids all disk IO in the filestore version.

Signed-off-by: Derek Collison <derek@nats.io>
2023-02-24 17:22:18 -08:00
Derek Collison
7ce23c46ce Merge branch 'main' into dev 2023-02-21 08:34:08 -08:00
Neil Twigg
68961ffedd Refactor ipQueue to use generics, reduce allocations 2023-02-21 14:50:09 +00:00
Derek Collison
cac712b1d1 Merge branch 'main' into dev 2023-02-20 18:30:26 -08:00
Derek Collison
efa3bcc49d Parallel consumer creation could drop responses (create and info) and could also run monitorConsumer twice.
Signed-off-by: Derek Collison <derek@nats.io>
2023-02-18 05:16:05 -08:00
Tomasz Pietrek
b390163908 Make JetStream errors naming consistent
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-02-13 14:08:52 +01:00
Tomasz Pietrek
af338d0d59 Add multiple subject filters 2023-02-13 09:38:40 +01:00
Tomasz Pietrek
9ab52d4ba9 Add metadata to StreamConfig and ConsumerConfig
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-01-27 07:27:25 +01:00
Tomasz Pietrek
86da656fff Fix not validating single token filtered consumer
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-01-11 15:48:38 +01:00
Todd Beets
3fdfb8a12f Merge branch 'main' into ut-replacepeer
# Conflicts:
#	server/jetstream_cluster_3_test.go
2022-12-06 10:51:22 -08:00
Todd Beets
ef27d4d534 tag policies not honored in reassignment after peer remove 2022-12-04 20:39:11 -08:00
Derek Collison
5f7c8e21a2 Fixed issues with multiple concurrent stream create requests.
First issue was applications not getting any response.
However, there was also a more serious issue that would create multiple raft groups for each concurrent request.
The servers would only run one stream monitor loop, however they would update the state to the new raft group's name, so on server restart the stream would be using a different raft group then existing servers.

Signed-off-by: Derek Collison <derek@nats.io>
2022-12-04 19:13:51 -08:00
Ivan Kozlovic
6ffa6d1e4b [FIXED] JetStream: possible panic on stream info when leader not elected
It is possible that a stream info request would be handled at a
time where the raft group would not yet be set/created, causing
a panic.

Resolves #3626 (at least the panic reports there)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-11-15 11:56:41 -07:00
Ivan Kozlovic
3358247e6b Added warning if internal sub callback takes too long
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-10-10 14:39:37 -06:00
Derek Collison
fef702a688 [FIXED] bug in consumer names paging, did not honor limits and returned duplicate results.
Signed-off-by: Derek Collison <derek@nats.io>
2022-09-29 06:14:00 -07:00
Ivan Kozlovic
170ff49837 [ADDED] JetStream: peer (the hash of server name) in statsz/jsz
A request to `$SYS.REQ.SERVER.PING.JSZ` would now return something
like this:
```
...
    "meta_cluster": {
      "name": "local",
      "leader": "A",
      "peer": "NUmM6cRx",
      "replicas": [
        {
          "name": "B",
          "current": true,
          "active": 690369000,
          "peer": "b2oh2L6w"
        },
        {
          "name": "Server name unknown at this time (peerID: jZ6RvVRH)",
          "current": false,
          "offline": true,
          "active": 0,
          "peer": "jZ6RvVRH"
        }
      ],
      "cluster_size": 3
    }
```
Note the "peer" field following the "leader" field that contains
the server name. The new field is the node ID, which is a hash of
the server name.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-16 15:31:37 -06:00
Ivan Kozlovic
f113163b9f Change ByID boolean to Peer string and add Peer id in replicas output
The CLI will now be able to display the peer IDs in MetaGroupInfo
if it choses to do so, and possibly help user select the peer ID
from a list with a new command to remove by peer ID instead of
by server name.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-15 10:39:23 -06:00
Ivan Kozlovic
e1f0361b98 [ADDED] JetStream: ability to remove a server by peer ID instead of name
This can be helpful after a partial cluster restart since in that
case the server name may not be known. However "server report jetstream"
would report the peer ID that then can be used.

For instance here is the output after a cluster restart where server "C"
is not restarted.

```
nats -s nats://sys:pwd@localhost:4222 server report jetstream
...
╭────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                  RAFT Meta Group Information                                   │
├─────────────────────────────────────────────────────┬────────┬─────────┬────────┬────────┬─────┤
│ Name                                                │ Leader │ Current │ Online │ Active │ Lag │
├─────────────────────────────────────────────────────┼────────┼─────────┼────────┼────────┼─────┤
│ A                                                   │ yes    │ true    │ true   │ 0.00s  │ 0   │
│ B                                                   │        │ true    │ true   │ 0.53s  │ 0   │
│ Server name unknown at this time (peerID: jZ6RvVRH) │        │ false   │ false  │ 0.00s  │ 0   │
╰─────────────────────────────────────────────────────┴────────┴─────────┴────────┴────────┴─────╯
```

With a change to the NATS CLI we could have something like:
```
nats -s nats://sys:pwd@localhost:4222 server raft peer-remove jZ6RvVRH --by_id
```

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-14 18:10:26 -06:00
Matthias Hanel
f7cb5b1f0d changed format of JSClusterNoPeers error (#3459)
* changed format of JSClusterNoPeers error

This error was introduced in #3342 and reveals to much information
This change gets rid of cluster names and peer counts.

All other counts where changed to booleans,
which are only included in the output when the filter was hit.

In addition, the set of not matching tags is included.
Furthermore, the static error description in server/errors.json 
is moved into selectPeerError

sample errors:
1) no suitable peers for placement, tags not matched ['cloud:GCP', 'country:US']"
2) no suitable peers for placement, insufficient storage

Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Co-authored-by: Ivan Kozlovic <ivan@synadia.com>
2022-09-08 18:25:48 -07:00
jnmoyne
95c1946231 Implements pagination for JS Stream Info requests 2022-09-08 10:45:20 -07:00
Derek Collison
c3203a3bb5 Use lostQuorum default versus live for reporting.
Signed-off-by: Derek Collison <derek@nats.io>
2022-09-07 13:56:38 -07:00
Derek Collison
98bf861a7a Updates to stream and consumer move logic.
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 16:11:35 -07:00
Derek Collison
aa94a0bc0f New consumer create that allows elevation of stream and consumer names, and optional filter subject to the request subject.
Similar to changes in direct get allows proper security if needed for filter subject selection.

Signed-off-by: Derek Collison <derek@nats.io>
2022-08-30 09:29:38 -07:00
Matthias Hanel
7015e46dd9 fix move cancel issue where tags and peers diverge (#3354)
This can happen if the move was initiated by the user.
A subsequent cancel resets the initial peer list.
The original peer list was picked on the old set of tags.
A cancel would then keep the new list of tags but reset
to the old peers. Thus tags and peers diverge.

The problem is that at the time of cancel, the old
placement tags can't be found anymore.

This fix causes cancel to remove the placement tags, if
the old peers do not satisfy the new placement tags.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-10 18:48:18 +02:00
Derek Collison
a5119008a5 Fix up some processing during account purge to fix flapping tests
Signed-off-by: Derek Collison <derek@nats.io>
2022-08-08 11:06:10 -06:00
Matthias Hanel
52c4872666 better error when peer selection fails (#3342)
* better error when peer selection fails

It is pretty hard to diagnose what went wrong when not enough peers for
an operation where found. This change now returns counts of reasons why
peers where discarded.

Changed the error to JSClusterNoPeers as it seems more appropriate
of an error for that operation. Not having enough resources is one of
the conditions for a peer not being considered. But so is having a non
matching tag. Which is why JSClusterNoPeers seems more appropriate
In addition, JSClusterNoPeers was already used as error after one call
to selectPeerGroup already.

example:
no suitable peers for placement: peer selection cluster 'C' with 3 peers
offline: 0
excludeTag: 1
noTagMatch: 2
noSpace: 0
uniqueTag: 0
misc: 0

Examle for mqtt:
mid:12 - "mqtt" - unable to connect: create sessions stream for account "$G":
no suitable peers for placement: peer selection cluster 'MQTT' with 3 peers
        offline: 0
        excludeTag: 0
        noTagMatch: 0
        noSpace: 0
        uniqueTag: 0
        misc: 0
         (10005)

Signed-off-by: Matthias Hanel <mh@synadia.com>

* review comment

Signed-off-by: Matthias Hanel <mh@synadia.com>
2022-08-06 00:17:01 +02:00