Commit Graph

8391 Commits

Author SHA1 Message Date
Waldemar Quevedo
2d23e9b348 Fix to stop forwarding proposals in consumers after scaling down a stream (#4556)
Sometimes when scaling down a stream, a raft node could continue
forwarding proposals after already being closed, in the debug logs this
can be confirmed by many entries logging 'Direct proposal ignored, not
leader (state: CLOSED)'.
2023-09-18 13:51:44 -07:00
Derek Collison
71b8a33456 Update to not pop directly, just bail when we detect leadership change
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-18 13:27:27 -07:00
Waldemar Quevedo
ea775a80e8 Skip TestJetStreamClusterRestartThenScaleStreamReplicas for now
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2023-09-18 12:46:53 -07:00
Derek Collison
850c89e175 When scaling a consumer down make sure to pop the loopAndForwardProposals go routine
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-18 12:26:25 -07:00
Waldemar Quevedo
27245891f2 Add test for scaling replica with pull consumers
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2023-09-18 12:26:05 -07:00
Derek Collison
de76275d8e Update of dependencies (#4555)
```
-       github.com/klauspost/compress v1.16.7
+       github.com/klauspost/compress v1.17.0
        github.com/minio/highwayhash v1.0.2
-       github.com/nats-io/jwt/v2 v2.5.0
+       github.com/nats-io/jwt/v2 v2.5.2
        github.com/nats-io/nats.go v1.29.0
-       github.com/nats-io/nkeys v0.4.4
+       github.com/nats-io/nkeys v0.4.5
        github.com/nats-io/nuid v1.0.1
        go.uber.org/automaxprocs v1.5.3
-       golang.org/x/crypto v0.12.0
+       golang.org/x/crypto v0.13.0
```
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-18 11:51:07 -07:00
Derek Collison
da70ef27b5 Update of dependencies
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-18 11:20:39 -07:00
Derek Collison
22514a033f Add logfile_max_num feature (#4548)
### Changes proposed in this pull request:

NATS Server 2.9 has `logfile_size_limit` option which allows the
operator to set an optional byte limit on the NATS Server log file which
when met causes a "rotation" such that the current log file is renamed
(original file name appended with a time stamp to nanosecond accuracy)
and a new log file is instantiated.

This PR is a new `logfile_max_num` companion option (alias
`log_max_num`) which allows the operator to designate that the server
should prune the **total number of log files** -- the currently active
log file plus backups -- to the maximum setting.

A max value of `0` (the implicit default) or a negative number has
meaning of unlimited log files (no maximum) as this is an opt-in
feature.

A max value of `1` is effectively a truncate-only logging pattern as any
backup made at rotation will subsequently be purged.

A max value of `2` will maintain the active log file plus the latest
backup. And so on...

> The currently active log file is never purged. Only backups are
purged.

When enabled, backup log deletion is evaluated inline after each
successful rotation event. To be considered for log deletion, backup log
files MUST adhere to the file naming format used in log rotation as well
as agree with the current `logfile` name and location. Any other files
or sub-directories in the log directory will be ignored. E.g. if an
operator makes a manual copy of the log file to `logfile.bak` that file
will not be evaluated as a backup.

### Typical use case:

This feature is useful in a constrained hosting environment for NATS
Server, for example an embedded, edge-compute, or IoT device scenario,
in which _more featureful_ platform or operating system log management
features do not exist or the complexity is not required.
2023-09-18 09:02:41 -07:00
Derek Collison
8f0e65fe0d Bump to 2.10.0-RC.5
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-17 21:38:34 -07:00
Derek Collison
216df811ff Various fixes and improvements to tombstone and buffer gaps. (#4553)
We fixed a few bugs in tombstone handling, and formalized support for
holes in the underlying buffers. Due to customer data from the field we
also now use holes during compaction.

Signed-off-by: Derek Collison <derek@nats.io>
2023-09-17 21:36:07 -07:00
Derek Collison
acffa0668a Various fixes and improvements to tombstone and buffer gaps.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-17 19:49:01 -07:00
Waldemar Quevedo
156e1a5b1c Fix for data race when changing retention policy (#4551) 2023-09-17 19:46:22 -07:00
Waldemar Quevedo
0e63608716 Fix for data race when changing retention policy
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2023-09-17 18:15:55 -07:00
Waldemar Quevedo
fc51af9542 Fix for data race in memstore.LoadNextMsg (#4552) 2023-09-17 18:15:11 -07:00
Waldemar Quevedo
32021f66f1 Use write lock in memstore.LoadNextMsg
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2023-09-17 17:24:53 -07:00
Derek Collison
6f3805650b [FIXED] Data race, protect access to c.acc (#4550)
Signed-off-by: Derek Collison <derek@nats.io>

Resolves #4549
2023-09-17 10:35:34 -07:00
Derek Collison
4e9cd9aa36 Protect access to c.acc
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-17 10:01:24 -07:00
Derek Collison
0d9328027f Change code coverage GHA workflow to use main (#4546) 2023-09-16 11:00:04 -07:00
Neil
0283c4bc45 Add Raft goroutine labels, tweak logging (#4545)
This adds some more debugging information to the Raft goroutines in
pprof and improves the logging when a consumer was already running.

Example:
```
1 @ 0x1025b1838 0x1025c2ac8 0x102a47d1c 0x102a47244 0x102a858e0 0x1025e5ad4
# labels: {"account":"$SYS", "group":"_meta_", "type":"metaleader"}
#	0x102a47d1b	github.com/nats-io/nats-server/v2/server.(*raft).runAsFollower+0xbb		server/raft.go:1795
#	0x102a47243	github.com/nats-io/nats-server/v2/server.(*raft).run+0x2c3			server/raft.go:1715
#	0x102a858df	github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine.func1+0x17f	server/server.go:3609

1 @ 0x1025b1838 0x1025c2ac8 0x102a47d1c 0x102a47244 0x102a858e0 0x1025e5ad4
# labels: {"account":"$G", "group":"S-R3M-hn5zv7o3", "stream":"benchstream", "type":"stream"}
#	0x102a47d1b	github.com/nats-io/nats-server/v2/server.(*raft).runAsFollower+0xbb		server/raft.go:1795
#	0x102a47243	github.com/nats-io/nats-server/v2/server.(*raft).run+0x2c3			server/raft.go:1715
#	0x102a858df	github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine.func1+0x17f	server/server.go:3609

1 @ 0x1025b1838 0x1025c2ac8 0x102a49b60 0x102a47250 0x102a858e0 0x1025e5ad4
# labels: {"account":"$G", "consumer":"foobar", "group":"C-R3M-djqHTUCq", "stream":"benchstream", "type":"consumer"}
#	0x102a49b5f	github.com/nats-io/nats-server/v2/server.(*raft).runAsLeader+0x4bf		server/raft.go:2198
#	0x102a4724f	github.com/nats-io/nats-server/v2/server.(*raft).run+0x2cf			server/raft.go:1719
#	0x102a858df	github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine.func1+0x17f	server/server.go:3609
```

Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-16 11:28:43 +01:00
Neil Twigg
1f9ddf2bbd Add Raft goroutine labels, tweak logging
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-16 11:15:06 +01:00
Todd Beets
349e718d39 Changes for max log files option (active plus backups); remove redundant lexical sort of backups; adjust test 2023-09-15 22:08:09 -07:00
Derek Collison
7df0e42ce8 [FIXED] Fix for data race accessing consumer assignment (#4547)
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-15 16:48:00 -07:00
Todd Beets
46147cf0ea Add logfile_max_archives feature and test. 2023-09-15 16:21:51 -07:00
Derek Collison
9781025b40 Fix for data race accessing consumer assignment
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-15 16:21:12 -07:00
Derek Collison
a5344c099f AuthCallout request should include TLS data when client is NATS WS client (#4544)
Make sure the client handshake flag is set when TLS handshake is made as
part of WebSocket connection/upgrade (notionally HTTPS) rather than as
part of the NATS protocol TLS initiation chain. AuthCallout tests the
flag when building the data for the AuthCallout service request.

Added AuthCallout unit test for NATS WS client auth that requires the
TLS data.
2023-09-15 15:56:12 -07:00
Todd Beets
aed99441c6 Use preferred value tests (equal, not equal) rather than booleans for better fail logs 2023-09-15 14:41:41 -07:00
Todd Beets
7b0a12d7da Add *tls.Conn safe type check as some black box unit tests override the natural underlying type for test purposes which would otherwise cause a panic 2023-09-15 13:52:41 -07:00
Todd Beets
40cf145ee6 Map both 127.0.0.1 and 127.0.1.1 to localhost for HTTPS server host validate 2023-09-15 13:13:24 -07:00
Byron Ruth
8b089b4a12 Change ref to main
Signed-off-by: Byron Ruth <byron@nats.io>
2023-09-15 16:12:46 -04:00
Todd Beets
75d2ddb26b AuthCallout request should include TLS data when client is NATS WS client 2023-09-15 12:36:34 -07:00
Derek Collison
0af378cf85 Bump to 2.10.0-RC.4
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-15 08:54:27 -07:00
Derek Collison
d7c66e753f [FIXED] Possible panic in consumer, needed to recheck if consumer was closed (#4541)
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-15 08:53:57 -07:00
Derek Collison
097e4097d1 Allow longer times due to travis slowdowns
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-15 08:52:50 -07:00
Derek Collison
f2e7ed91cb Fix for panic in consumer, needed to recheck if consumer was closed
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-15 08:40:21 -07:00
Waldemar Quevedo
8f84ea4224 Bump to 2.10.0-RC.3 (#4537) 2023-09-14 12:16:56 -07:00
Waldemar Quevedo
76cbef79cc Bump to 2.10.0-RC.3
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2023-09-14 12:11:09 -07:00
Derek Collison
56c5e4aede [IMPROVED] Consumer cleanup monitoring and FIX for datarace (#4536)
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-14 11:57:30 -07:00
Derek Collison
22f40eafa0 Add in jitter in case there are many that all try to cleanuo at the same time
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-14 11:24:32 -07:00
Derek Collison
392f25b6da Fix for data race and adjustment to do a backoff on making sure consumers are cleaned up.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-14 11:21:11 -07:00
Waldemar Quevedo
b79b180498 flake: Fix TestJetStreamConsumerAckFloorFill (#4534)
Can sometimes fail the first time checking for the ack floor but fine
after checking again.
2023-09-14 10:02:05 -07:00
Waldemar Quevedo
db0faf4538 flake: Fix TestJetStreamConsumerAckFloorFill
Can sometimes fail the first time checking for the ack floor
but fine after checking again.

Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2023-09-14 09:31:38 -07:00
Neil Twigg
f38faafbc9 Bump to 2.10.0-RC.2
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-14 16:35:36 +01:00
Neil
46361e86a3 Fix leaking timers in stream sources (#4532)
Repeated calls to `scheduleSetSourceConsumerRetry` could end up creating
multiple timers for the same source, which would eventually schedule
even more timers, which would result in runaway CPU usage. This PR
instead bounds to one timer per source per stream.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-14 16:32:36 +01:00
Neil
f259207270 De-flake TestJetStreamClusterAccountPurge (#4533)
This adds a new `waitForAccount` test helper that ensures that an
account exists across the cluster, and updates
`TestJetStreamClusterAccountPurge` to use it after submitting new JWTs.
This should prevent `require no error, but got: nats: JetStream not
enabled for account` errors.

Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-14 13:34:54 +01:00
Neil Twigg
904f4c388e De-flake TestJetStreamClusterAccountPurge by waiting for account to exist
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-14 11:40:30 +01:00
Neil Twigg
6f3f544841 Fix leaking timers in stream sources
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-14 10:30:24 +01:00
Derek Collison
ea93e77b7f [FIXED] Fix for a call into mb.recalculateFirstForSubj() that did not hold lock. (#4530)
This unprotected access allowed the cache to most likely be flushed and
after a subsequent writeMsgRecord would have the offset > slot value
which can't happen if lock is held due to us loading cache properly at
beginning of the function.

Signed-off-by: Derek Collison <derek@nats.io>

Resolves #4529
2023-09-13 16:26:39 -07:00
Derek Collison
787f6acf31 Fix for a call into fs.recalculateFirstForSubj() from fs.recalculateFirstForSubj() that did not lock the mb properly.
Signed-off-by: Derek Collison <derek@nats.io>
2023-09-13 15:35:34 -07:00
Neil
c7d5441900 Bump to 2.10.0-RC.1 (#4527)
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-13 17:22:52 +01:00
Neil Twigg
505190266a Bump to 2.10.0-RC.1
Signed-off-by: Neil Twigg <neil@nats.io>
2023-09-13 17:22:30 +01:00