Commit Graph

5461 Commits

Author SHA1 Message Date
Neil Twigg
af2ff3d17d Fix lock issue in filestore
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-31 15:16:15 +01:00
Derek Collison
adef8281a2 Updates to the way meta indexing is handled for filestore.
Historically we kept indexing information, either by sequence or by subject, as a per msg block operation. These were the "*.idx" and "*.fss" indexing files. When streams became very large this could have an impact on recovery time. Also, for encryption the fast path for determining if the indexing was current would require loading and decrypting the complete block.

This design moves to a more traditional WAL and snapshot approach. The snapshots for the complete stream, including sumary information, global per subject information maps (PSIM) and per msg block details including summary and dmap, are processed asynchronously. The snapshot includes the msh block and has for the last record considered in the snapshot. On recovery the snapshot is read and processed and any additional records past the point of the snapshot itself are processed. To this end, any removal of a message has to be expressed as a delete tombstone that is always added the the fs.lmb file. These are processed on recovery and our indexing layer knows to skip them.

Changing to this method drastically improves startup and recovery times, and has simplified the code. Some normal performance benefits have been seen as well.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-30 16:12:45 -07:00
Neil Twigg
8d194e8bf9 Tweak TestJetStreamClusterMetaSnapshotsMultiChange and TestJetStreamClusterStreamUpdateSyncBug
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-30 15:49:50 +01:00
Derek Collison
7f884062d1 Merge branch 'main' into dev 2023-08-29 20:01:26 -07:00
Derek Collison
e4a1b81d30 Fix on rebuild first when rebuild results in empty block (from dev branch)
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-29 19:50:33 -07:00
Derek Collison
acfb593ed5 Merge branch 'main' into dev 2023-08-29 16:48:04 -07:00
Derek Collison
8865c2a703 Fix for update to max msgs per where recalculating first was not checking for seq < mb.first.seq
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-29 16:02:52 -07:00
Derek Collison
a64f7a0d18 MQTT: Cleanup code regarding retain flag and add test (#4443)
As per specification MQTT-3.3.1-8, we are now setting the RETAIN flag
when delivering to new subscriptions and clear the flag in all other
conditions.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-08-29 15:58:11 -07:00
Ivan Kozlovic
d6bc12d18b Since the server is connected to 2 servers and the pool size is 5
the limit of 10 was too small.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-08-29 15:18:56 -06:00
Ivan Kozlovic
0d74453919 Fixed a route pooling flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-08-29 14:20:36 -06:00
Ivan Kozlovic
8bd68b550d [FIXED] MQTT: Retain flag did not always have the correct value.
As per specification MQTT-3.3.1-8, we are now setting the RETAIN
flag when delivering to new subscriptions and clear the flag in
all other conditions.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2023-08-29 12:39:59 -06:00
Lev
dbd2cb61da [FIXED] MQTT: Removed the use of tkDomain from retained msg subjects (#4440)
(Partially?) addresses
https://github.com/nats-io/nats-server/pull/4349#discussion_r1306576048

@kozlovic @neilalexander I did not remove the use of `domainTk` in the
session subject since it seems to have significance to it; removing it
failed `TestMQTTSessionsDifferentDomains` and I did not understand the
specifics of the issue enough. Please let me know your thoughts.
2023-08-29 11:13:02 -07:00
Lev
bd93f087d4 [Added] MQTT: QoS2 support (#4349)
@derekcollison @neilalexander @kozlovic 

#### Summary

Adds MQTT QoS2 support

 - [X] Resolves https://github.com/nats-io/nats-server/issues/3244
 - [X] Tests added
 - [x] Build is green in Travis CI
2023-08-29 11:09:49 -07:00
Lev Brouk
ad2e9d7b8d MQTT QoS2 support 2023-08-28 11:52:01 -07:00
Waldemar Quevedo
d366027bbf Fix resetting TLS name from solicited remotes
In +Go 1.20, the x509.HostnameError changed to be wrapped in a
tls.CertificateVerificationError so sometimes the name would not
be reset causing tests to be extra flaky.

Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-28 10:09:55 -07:00
Lev Brouk
b9ea85b5d0 MQTT: Removed the use of tkDomain from retained msg subjects 2023-08-28 04:13:50 -07:00
Derek Collison
f50b772a14 Merge branch 'main' into dev 2023-08-27 14:20:45 -07:00
Derek Collison
b66a7f6e9f When expiring complete blocks make sure to update global subject index psim.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-27 12:03:44 -07:00
Derek Collison
70bbf5081a Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-26 12:53:07 -07:00
Derek Collison
5b18e80d42 Added CORS support for the monitoring server (#4423)
- [x] Link to issue, e.g. `Resolves #NNN`
 - [ ] Documentation added (if applicable)
 - [x] Tests added
- [ ] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [ ] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [ ] Build is green in Travis CI
- [x] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

Resolves #4422 

### Changes proposed in this pull request:

- Added `Access-Control-Allow-Origin` header to allow CORS requests for
the monitoring server
- Added a check in the tests for the header when the `Content-Type` is
`application/json`
2023-08-25 14:49:09 -07:00
Derek Collison
0d135d4161 Bump to 2.9.22-RC.1
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-25 11:04:37 -07:00
Derek Collison
f1bf4127c5 Merge branch 'main' into dev 2023-08-25 11:03:54 -07:00
Derek Collison
e19f883120 [FIX] PurgeEx with keep and deleted bug (#4431)
Fix for purge with keep bug with user deletes and improved search for
large number of blocks.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-25 11:03:32 -07:00
Derek Collison
e5625b9d9b If a leader is asked for an item and we have no items left, make sure to also step-down.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-25 10:20:07 -07:00
Derek Collison
22ed97c6c9 Fix for purge with keep bug and improved search for large number of blocks.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-25 08:59:47 -07:00
Waldemar Quevedo
1417ca6671 Fix shutdown deadlock in TestJetStreamClusterMemLeaderRestart (#4430)
While shutting down a server an error during purge from a memory stream
would cause a deadlock sometimes, this would sometimes show up in the
`TestJetStreamClusterMemLeaderRestart` while tearing down the cluster.

This was introduced in
4d8d01949b
so only relates to v2.10.
2023-08-25 07:41:23 -07:00
Tomasz Pietrek
6df4403913 Fix flaky TestJetStreamClusterConsumerFollowerStoreStateAckFloorBug
Signed-off-by: Tomasz Pietrek <tomasz@nats.io>
2023-08-25 15:31:20 +02:00
Waldemar Quevedo
f8b6728d3a Fix shutdown deadlock in TestJetStreamClusterMemLeaderRestart
While shutting down a server an error during purge from a memory stream
would cause a deadlock sometimes, this would sometimes show up in the
TestJetStreamClusterMemLeaderRestart while tearing down the cluster.

Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-25 01:23:24 -07:00
Derek Collison
fd50bc2918 Merge branch 'main' into dev 2023-08-24 21:10:22 -07:00
Derek Collison
2669f77190 Make sure to reset election timer on catching up
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-24 19:58:08 -07:00
Derek Collison
346c22788e Merge branch 'main' into dev 2023-08-24 16:20:46 -07:00
Derek Collison
48bf7ba151 When a consumer reached a max delivered condition, we did not properly synchronize the state such that on a restore or leader switch the ack pending could jump and be higher than max ack pending and block the consumer.
This propagates a delivered update and we updated the store state engine to do the right thing when the condition is reached.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-24 16:00:27 -07:00
Pierre Mdawar
e5836fc98d Added CORS support for the monitoring server 2023-08-23 16:47:30 +03:00
Derek Collison
a04a3154af Bump to 2.10.0-beta.52
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-22 20:05:45 -07:00
Derek Collison
8544cb7adf Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-22 20:04:59 -07:00
Derek Collison
ddb7f9f9d5 Fix for a peer-remove of an R1 that would brick the stream.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-22 17:45:19 -07:00
Waldemar Quevedo
2b2fbf7359 Bump to v2.9.22-beta.1
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-22 13:37:12 -07:00
Waldemar Quevedo
baa2805de9 Fix discarding explicit routes while removing duplicate ones (#4414)
In the new clustering logic for v2.10, sometimes the `TestStressChainedSolicitWorks` 
test would flake because a node would end up with only implicit routes. In this change, 
we stamp that one of the remotes is configured so that the nodes at least have one explicit
configured remote node.
2023-08-22 08:50:35 -07:00
Derek Collison
84536761a9 Bump to 2.9.22-beta
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-22 08:27:44 -07:00
Waldemar Quevedo
bdb874a6a8 Update LastActivity on connect for routes
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-22 07:10:30 -07:00
Derek Collison
bcf5da04e3 Merge branch 'main' into dev 2023-08-22 06:50:36 -07:00
Derek Collison
e5d208bf33 When moving streams, we could check too soon and be in a gap where the replica peer has not registered a catchup request.
This would cause us to think the replica was caughtup incorrectly and drop our leadership, which would cancel any cacthup requests.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 20:07:48 -07:00
Derek Collison
e088583cd3 Bump to 2.10.0-beta.50
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 15:59:53 -07:00
Derek Collison
f0e2765b44 Fixes for merge conflicts from main
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 15:55:31 -07:00
Derek Collison
fb8525b713 Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 15:55:00 -07:00
Derek Collison
2fc3f45ea1 [FIXED] Durable pull consumers could get cleaned up incorrectly on leader change. (#4412)
Fix for a bug that would allow old leaders of pull based durables to
delete a consumer from an inactivity threshold timer inadvertently.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 15:35:44 -07:00
Derek Collison
6e3ae20650 [FIXED] Fixed deadlock when checkAndSync was being called as part of storing message (#4411)
We violated the locking pattern, so we now make sure we do this in a
separate Go routine and put checks to only run it once.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 15:28:58 -07:00
Waldemar Quevedo
673f654fbe Fix discarding explicit routes while removing duplicate ones
In the new clustering logic sometimes the TestStressChainedSolicitWorks test
would fail because the a node would end up with only implicit routes.
In this change, we stamp that one of the remotes is configured so that the nodes
at least have one explicit configured remote node.

Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-21 15:16:08 -07:00
Derek Collison
0a86bf4a9a Should reset to false, not true when done
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 14:57:17 -07:00
Derek Collison
43314fd439 Fix for a bug that would allow old leaders of pull based durables to delete a consumer from an inactivity threshold.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-21 14:53:09 -07:00