Commit Graph

8378 Commits

Author SHA1 Message Date
Derek Collison
4df5f515ca Fix for filestore data race on hash during snapshots
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 19:38:09 -07:00
Derek Collison
cb8b94a9e9 Fixes to /healthz response (v2.10) (#4467)
Follow up from #4437 content-type fix for v2.9.22, some fixes to the
response from `/healthz` for dev:

- In #[3326](https://github.com/nats-io/nats-server/pull/4097) it was
changed to return 500 status when before we used to return 503 so this
changes it back.
- Also as part of #3326 we started to return `status_code` in the
healthz response (e.g `{"status":"ok","status_code":200}`) so this
removes it for http responses just relying on the http header.
2023-08-31 19:26:33 -07:00
Derek Collison
83fab5c9a7 [FIXED] Unlock panic on start when filestore needs to remove msgs for enforcement. (#4469)
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 19:26:03 -07:00
Derek Collison
0ec42f85f0 Fix for merge issue that duplicated the index increment, causing snapshot tests to fail
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 18:51:34 -07:00
Derek Collison
411ac175fc Fixed: MQTT: more consistent name for PUBREL durable (#4466)
Resolves: no ticket

### Changes proposed in this pull request:

- rename PUBREL durable consumer from `<idhash>_pubrel` to
`$MQTT_PUBREL_<idhash>` for consistency with other durable consumer
names.
2023-08-31 17:15:00 -07:00
Derek Collison
60fa2d8781 Only have removeMsg release lock if it really has a callback.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 16:56:40 -07:00
Derek Collison
9ff3261af2 On startup make sure to hold lock for enforcing limits due to removeMsg() needing to remove msgs possibly.
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 16:56:35 -07:00
Waldemar Quevedo
1f2d56a554 Fixes to http healthz monitoring response
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2023-08-31 16:05:09 -07:00
Derek Collison
b1a59a35e2 Bump to 2.10.0-beta.54
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 15:52:58 -07:00
Derek Collison
2bfa14d9bd Fix from main merge
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 15:52:36 -07:00
Derek Collison
49c30b6d2f Merge branch 'main' into dev
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 15:52:00 -07:00
Derek Collison
45e6812d70 [FIXED] Sending too fast to have replicas be caught up enough to register directs. (#4468)
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 15:43:14 -07:00
Derek Collison
afb052651a Sending too fast to have replicas be caught up enough to register direct subs
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 15:16:19 -07:00
Derek Collison
d7ea3b94d9 [FIXED] Check for checksum violations for all records and before any sequence processing. (#4465)
Also small bug fix for leaking fds under certain scenarios during
corruption.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 15:08:04 -07:00
Derek Collison
a45281d51f Added check to test
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 14:00:14 -07:00
Pierre Mdawar
6d6d3cfa55 Fix Content-Type header in /healthz when status is not 200 OK (#4437)
- Added a new internal function `handleResponse` that accepts the HTTP 
  status code and sets it after setting the headers
- Added tests for the `/healthz` endpoint for the `ok`, `error` and `unavailable` statuses
- Changed the IETF API health check URL to 
https://datatracker.ietf.org/doc/html/draft-inadarei-api-health-check

Resolves #4436
2023-08-31 13:55:20 -07:00
Derek Collison
c110ceea94 Check for checksum violations for all records and before sequence processing.
Also fix for bitrot test and a small bug fix for a leaking fd.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 13:53:28 -07:00
Derek Collison
7c8f402264 Fix data race when updating account (#4435)
Fixes race that would make the `TestJetStreamJWTMove` test fail
sometimes:

[0]:
f1bf4127c5/server/accounts.go (L3535)
[1]:
f1bf4127c5/server/server.go (L1902)

 ```
=== FAIL: server TestJetStreamJWTMove/non-tiered/R1 (4.79s)
==================
WARNING: DATA RACE
Write at 0x00c0014631f8 by goroutine 22900:

github.com/nats-io/nats-server/v2/server.(*Server).updateAccountClaimsWithRefresh()
      /go/server/accounts.go:3535 +0x53dc

github.com/nats-io/nats-server/v2/server.(*Server).UpdateAccountClaims()
      /go/server/accounts.go:3074 +0x45

github.com/nats-io/nats-server/v2/server.(*Server).updateAccountWithClaimJWT()
      /go/server/server.go:1937 +0x3e5
  github.com/nats-io/nats-server/v2/server.(*Server).updateAccount()
      /go/server/server.go:1910 +0x1f1
  github.com/nats-io/nats-server/v2/server.(*Server).lookupAccount()
      /go/server/server.go:1875 +0x176
  github.com/nats-io/nats-server/v2/server.(*Server).LookupAccount()
      /go/server/server.go:1895 +0x2e4
  github.com/nats-io/nats-server/v2/server.(*Server).getRequestInfo()
      /go/server/jetstream_api.go:936 +0x2b4

github.com/nats-io/nats-server/v2/server.(*Server).jsStreamCreateRequest()
      /go/server/jetstream_api.go:1285 +0xca

github.com/nats-io/nats-server/v2/server.(*Server).jsStreamCreateRequest-fm()
      <autogenerated>:1 +0xcc

github.com/nats-io/nats-server/v2/server.(*Server).processJSAPIRoutedRequests()
      /go/server/jetstream_api.go:799 +0x60c

github.com/nats-io/nats-server/v2/server.(*Server).processJSAPIRoutedRequests-fm()
      <autogenerated>:1 +0x39

github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine.func1()
      /go/server/server.go:3604 +0x27d
 
Previous read at 0x00c0014631f8 by goroutine 22995:
  github.com/nats-io/nats-server/v2/server.(*Server).updateAccount()
      /go/server/server.go:1902 +0x6d
  github.com/nats-io/nats-server/v2/server.(*Server).lookupAccount()
      /go/server/server.go:1875 +0x176
  github.com/nats-io/nats-server/v2/server.(*Server).LookupAccount()
      /go/server/server.go:1895 +0x4e

github.com/nats-io/nats-server/v2/server.(*Server).updateInterestForAccountOnGateway()
      /go/server/leafnode.go:2030 +0x3a

github.com/nats-io/nats-server/v2/server.(*client).processGatewayRSub.func1()
      /go/server/gateway.go:1966 +0xc4
  runtime.deferreturn()
      /usr/local/go/src/runtime/panic.go:476 +0x32
  github.com/nats-io/nats-server/v2/server.(*client).parse()
      /go/server/parser.go:664 +0x40b7
  github.com/nats-io/nats-server/v2/server.(*client).readLoop()
      /go/server/client.go:1373 +0x1c98

github.com/nats-io/nats-server/v2/server.(*Server).createGateway.func1()
      /go/server/gateway.go:858 +0x37

github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine.func1()
      /go/server/server.go:3604 +0x27d
 
Goroutine 22900 (running) created at:
  github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
      /go/server/server.go:3600 +0x2f2

github.com/nats-io/nats-server/v2/server.(*Server).setJetStreamExportSubs()
      /go/server/jetstream_api.go:820 +0x178
  github.com/nats-io/nats-server/v2/server.(*Server).enableJetStream()
      /go/server/jetstream.go:425 +0xcf1
  github.com/nats-io/nats-server/v2/server.(*Server).EnableJetStream()
      /go/server/jetstream.go:217 +0x6f7
  github.com/nats-io/nats-server/v2/server.(*Server).Start()
      /go/server/server.go:2218 +0x1924
  github.com/nats-io/nats-server/v2/server.RunServer()
      /go/server/server_test.go:95 +0x30e
  github.com/nats-io/nats-server/v2/server.RunServerWithConfig()
      /go/server/server_test.go:117 +0x44

github.com/nats-io/nats-server/v2/server.createJetStreamSuperClusterWithTemplateAndModHook()
      /go/server/jetstream_helpers_test.go:449 +0x1331
  github.com/nats-io/nats-server/v2/server.TestJetStreamJWTMove.func1()
      /go/server/jetstream_jwt_test.go:303 +0x204
github.com/nats-io/nats-server/v2/server.TestJetStreamJWTMove.func3.2()
      /go/server/jetstream_jwt_test.go:409 +0x50
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x216
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1629 +0x47
 
Goroutine 22995 (running) created at:
  github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
      /go/server/server.go:3600 +0x2f2
  github.com/nats-io/nats-server/v2/server.(*Server).createGateway()
      /go/server/gateway.go:858 +0xf04
  github.com/nats-io/nats-server/v2/server.(*Server).solicitGateway()
      /go/server/gateway.go:707 +0x12e7

github.com/nats-io/nats-server/v2/server.(*Server).solicitGateways.func1()
      /go/server/gateway.go:643 +0x44

github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine.func1()
      /go/server/server.go:3604 +0x27d
==================
    testing.go:1446: race detected during execution of test
        --- FAIL: TestJetStreamJWTMove/non-tiered/R1 (4.79s)
 
=== FAIL: server TestJetStreamJWTMove/non-tiered (11.03s)
    testing.go:1446: race detected during execution of test
    --- FAIL: TestJetStreamJWTMove/non-tiered (11.03s)
 
=== FAIL: server TestJetStreamJWTMove (23.30s)
    testing.go:1446: race detected during execution of test
```
2023-08-31 13:46:17 -07:00
Lev Brouk
8de48339d3 Fixed: MQTT: more consistent name for PUBREL durable 2023-08-31 12:46:13 -07:00
Waldemar Quevedo
76c3942609 Fix leaf connection missing LS+ sometimes (#4464)
`TestNoRaceLeafNodeSmapUpdate` could occasionally fail with missing
`LS+` commands due not capturing all the inflight SUB commands as they
were being processed outside the client lock.
2023-08-31 11:18:00 -07:00
Ivan Kozlovic
9a9e84ea5c Fix leaf connection missing LS+ sometimes
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-31 10:06:02 -07:00
Derek Collison
2834142bdd Revert lock guard
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 08:59:15 -07:00
Derek Collison
0bd4763584 Revert lock guard
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 08:58:22 -07:00
Derek Collison
8a9f441c40 Bump to 2.9.22-RC.3
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 08:33:22 -07:00
Derek Collison
fbaed8f220 Merge branch 'main' into dev 2023-08-31 08:29:30 -07:00
Derek Collison
887a4ae692 [FIXED] Unlock needed to be guarded, could deadlock filestore (#4461)
Needed to check guard for unlock here.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 08:24:08 -07:00
Derek Collison
2b677c231a Unlock needed to be guarded
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 08:16:47 -07:00
Derek Collison
9e26574707 Make sure we unlock only if we did not acquire
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-31 08:09:16 -07:00
Derek Collison
b25b4f2cff Fix lock issue in filestore (#4458)
This should hopefully fix a panic on unlock of unlocked mutex in the
file store.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-31 07:59:02 -07:00
Neil Twigg
af2ff3d17d Fix lock issue in filestore
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-31 15:16:15 +01:00
Neil Twigg
d08eeee94d Use Go 1.21 for nightlies, Dockerfile, code coverage, bump go.mod version to Go 1.20
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-31 09:02:20 +01:00
Waldemar Quevedo
ee4c04dec4 Run tests using Go 1.21 (#4433)
Flips the order to test with ~~Go 1.20~~ Go 1.21 instead of Go 1.19
2023-08-30 17:02:38 -07:00
Derek Collison
b9b284dffa Updates to the way meta indexing is handled for filestore. (#4450)
Historically we kept indexing information, either by sequence or by
subject, as a per msg block operation. These were the "*.idx" and
"*.fss" indexing files. When streams became very large this could have
an impact on recovery time. Also, for encryption the fast path for
determining if the indexing was current would require loading and
decrypting the complete block.

This design moves to a more traditional WAL and snapshot approach. The
snapshots for the complete stream, including summary information, global
per subject information maps (PSIM) and per msg block details including
summary and dmap, are processed asynchronously. The snapshot includes
the msg block and has for the last record hash that was considered in
the snapshot. On recovery the snapshot is read and processed and any
additional records past the point of the snapshot itself are processed.
To this end, any non-system removal of a message has to be expressed as
a delete tombstone that is always added the the fs.lmb file. These are
processed on recovery and our indexing layer knows to skip them.

Changing to this method drastically improves startup and recovery times,
and has simplified the code. Some normal performance benefits have been
seen as well.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-30 16:49:37 -07:00
Waldemar Quevedo
4eedcecf78 Run tests using Go 1.21
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-30 16:24:08 -07:00
Derek Collison
2e1392a234 [FIXED] potential message duplication from sources when downgrading back from 2.10 (#4454)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [ ] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

2.10 adds a couple space separated fields to the sourcing message header
from 2 to 4 but the current 2.9 code is too strict of checking the
number of fields is exactly 2 rather than at least 2
2023-08-30 16:21:59 -07:00
Derek Collison
415bbb2ee1 [FIXED] Make sure order correct (#4455)
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-30 16:17:12 -07:00
Derek Collison
abae24086c Make sure order correct
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-30 16:13:56 -07:00
Derek Collison
adef8281a2 Updates to the way meta indexing is handled for filestore.
Historically we kept indexing information, either by sequence or by subject, as a per msg block operation. These were the "*.idx" and "*.fss" indexing files. When streams became very large this could have an impact on recovery time. Also, for encryption the fast path for determining if the indexing was current would require loading and decrypting the complete block.

This design moves to a more traditional WAL and snapshot approach. The snapshots for the complete stream, including sumary information, global per subject information maps (PSIM) and per msg block details including summary and dmap, are processed asynchronously. The snapshot includes the msh block and has for the last record considered in the snapshot. On recovery the snapshot is read and processed and any additional records past the point of the snapshot itself are processed. To this end, any removal of a message has to be expressed as a delete tombstone that is always added the the fs.lmb file. These are processed on recovery and our indexing layer knows to skip them.

Changing to this method drastically improves startup and recovery times, and has simplified the code. Some normal performance benefits have been seen as well.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-30 16:12:45 -07:00
Derek Collison
1de649a690 Remove OCSP debug log on reload (#4453)
When reloading TLS we would always be logging the attempt to plug OCSP:

```
[42801] 2023/08/30 14:52:33.766638 [INF] Reloaded: authorization users
[42801] 2023/08/30 14:52:33.766648 [INF] Reloaded: accounts
[42801] 2023/08/30 14:52:33.766652 [INF] Reloaded: tls = enabled
[42801] 2023/08/30 14:52:33.766756 [DBG] Plugging TLS OCSP peer for [Client]
[42801] 2023/08/30 14:52:33.766763 [INF] Reloaded server configuration
```
2023-08-30 16:05:24 -07:00
Derek Collison
3be9e97760 Bump to 2.9.22-RC.2
Signed-off-by: Derek Collison <derek@nats.io>
2023-08-30 15:32:31 -07:00
Derek Collison
774987cd99 [IMPROVED] Allow 2.10 tombstones to be skipped and allow us to recover on downgrade (#4452)
Also fixed small bug that could set bad first sequence for subject
tracking info.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-30 15:31:51 -07:00
Jean-Noël Moyne
003daf3db8 Fixes possible message duplication in sourcing streams if upgrading to 2.10 and then back down to 2.9
2.10 adds a couple space separated fields to the sourcing message header from 2 to 4 but the current 2.9 code is too strict of checking the number of fields is exactly 2 rather than at least 2

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
2023-08-30 15:27:26 -07:00
Waldemar Quevedo
4109e420d2 Remove ocsp debug log on reload
Signed-off-by: Waldemar Quevedo <wally@nats.io>
2023-08-30 14:54:30 -07:00
Derek Collison
8841432d03 Allow 2.10 tombstones to be skipped and allow us to recover on downgrade from 2.10 to 2.9.
Also fixed small bug that could set bad first seq.

Signed-off-by: Derek Collison <derek@nats.io>
2023-08-30 14:38:41 -07:00
Neil
ce08d452d4 Tweak TestJetStreamClusterMetaSnapshotsMultiChange and TestJetStreamClusterStreamUpdateSyncBug (#4449)
This should resolve a couple flakes.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-30 17:35:09 +01:00
Neil Twigg
8d194e8bf9 Tweak TestJetStreamClusterMetaSnapshotsMultiChange and TestJetStreamClusterStreamUpdateSyncBug
Signed-off-by: Neil Twigg <neil@nats.io>
2023-08-30 15:49:50 +01:00
Ginger Collison
d6e7106eee Update Slack invite URL for Slack badge (#4448)
This badge was using an old expired invite URL for NATS Slack. Updating
to the general slack.nats.io URL for invites
2023-08-30 09:35:31 -05:00
Ginger Collison
6ab7f0c0a8 Update Slack invite URL for badges
This badge was using an old expired invite URL for NATS Slack. Updating to the general slack.nats.io URL for invites
2023-08-30 09:30:50 -05:00
Neil
bd23469ebe Add benchmark for request-reply workload over encrypted connection (#4399)
- [x] Tests added
- [x] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [x] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
- [x] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

### Changes proposed in this pull request:
- Creates new TLS certificates and private keys for testing with various
key types
    - RSA (1024, 2048, 4096)
    - ED25519
- Adds a benchmark that measures NATS Core request-reply performance
over TLS-encrypted connections
2023-08-30 10:10:25 +01:00
Derek Collison
7f884062d1 Merge branch 'main' into dev 2023-08-29 20:01:26 -07:00