mirror of https://github.com/gogrlx/nats-server.git synced 2026-04-02 03:38:42 -07:00

Files

Derek Collison 783edaa36d [FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once. (#4592 )

- [X] Tests added
- [X] Branch rebased on top of current main (`git pull --rebase origin
main`)
- [X] Changes squashed to a single commit (described
[here](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
 - [x] Build is green in Travis CI
- [X] You have certified that the contribution is your original work and
that you license the work to the project under the [Apache 2
license](https://github.com/nats-io/nats-server/blob/main/LICENSE)

### Changes proposed in this pull request:

Fixes a race condition in some leader failover scenarios leading to
messages being potentially sourced more than once.

In some failure scenarios where the current leader of a stream sourcing
from other stream(s) gets shutdown while publications are happening on
the stream(s) being sourced leads to `setLeader(true)` being called on
the new leader for the sourcing stream before all the messages having
been sourced by the previous leader are completely processed such that
when the new leader does it's reverse scan from the last message in it's
view of the stream in order to know what sequence number to start the
consumer for the stream being sourced from, such that the last
message(s) sourced by the previous leader get sourced again, leading to
some messages being sourced more than once.

The existing `TestNoRaceJetStreamSuperClusterSources` test would
sidestep the issue by relying on the deduplication window in the
sourcing stream. Without deduplication the test is a flapper.

This avoid the race condition by adding a small delay before scanning
for the last message(s) having been sourced and starting the sources'
consumer(s). Now the test (without using the deduplication window) never
fails because more messages than expected have been received in the
sourcing stream.

(Also adds a guard to give up if `setupSourceConsumers()` is called and
we are no longer the leader for the stream (that check was already
present in `setupMirrorConsumer()` so assuming it was forgotten for
`setupSourceConsumers()`)

2023-09-28 11:22:20 -07:00

avl

Allow more time in TestNoRaceSeqSetEncodeLarge

2023-09-08 16:36:28 +01:00

certidp

OCSP Peer Feature

2023-08-02 11:25:48 -07:00

certstore

Fixed local issuer determination for OCSP Staple, issue #3773

2023-08-02 11:52:36 -07:00

configs

Use dynamic port number in benchmark

2023-09-01 12:58:52 -07:00

pse

Initial support for zOS (#4209 )

2023-07-18 12:21:31 -07:00

sysmem

Initial support for zOS

2023-06-03 10:03:23 +05:30

accounts_test.go

test: fix TestAccountImportCycle flake

2023-08-08 23:41:18 -07:00

accounts.go

Small performance tweak to checkForReverseEntries.

2023-09-26 21:43:20 -07:00

auth_callout_test.go

Use preferred value tests (equal, not equal) rather than booleans for better fail logs

2023-09-15 14:41:41 -07:00

auth_callout.go

Cleanup for some staticcheck warnings

2023-07-21 19:17:54 -07:00

auth_test.go

Authentication and Authorization callouts for server configuration mode.

2022-12-28 10:32:45 -08:00

auth.go

Moved to atomics to detect if we have mapped subjects for an account since check for each inbound message.

2023-09-25 11:43:34 -07:00

benchmark_publish_test.go

…

certstore_windows_test.go

Cert Store (aka wincert)

2023-06-22 12:25:54 -07:00

ciphersuites.go

…

client_test.go

monitoring: track slow consumers per connection type

2023-08-09 05:57:42 -07:00

client.go

When unsubscribing do not check rrMap for reserved replies.

2023-09-26 21:43:36 -07:00

closed_conns_test.go

…

config_check_test.go

Fixes for merge conflicts from main

2023-08-21 15:55:31 -07:00

const.go

Bump to 2.10.2-RC.9

2023-09-27 20:49:55 -07:00

consumer.go

Bump start interval for cleanup check

2023-09-24 15:44:15 -07:00

core_benchmarks_test.go

Add benchmark for TLS content encryption overhead

2023-09-01 12:58:52 -07:00

dirstore_test.go

Fix some lint errors after move to golangci-lint

2022-12-30 20:00:08 +00:00

dirstore.go

resolver: improve signaling for missing account lookups (#4151 )

2023-05-14 11:10:25 -07:00

disk_avail_netbsd.go

…

disk_avail_openbsd.go

…

disk_avail_wasm.go

…

disk_avail_windows.go

…

disk_avail.go

…

errors_gen.go

…

errors_test.go

…

errors.go

Harmonize subject mapping error variable names

2023-06-01 14:15:27 -07:00

errors.json

Consumers inherit limits for max_ack_pending and inactive_threshold from stream

2023-09-01 10:54:11 +01:00

events_test.go

flake: Fixes TestAccountReqMonitoring

2023-09-06 03:43:11 -07:00

events.go

Move server running state to atomic to avoid contention at NRG layer.

2023-09-25 11:18:15 -07:00

filestore_test.go

Add in warnings for filestore recover state if happy path fails.

2023-09-27 16:22:15 -07:00

filestore.go

Additional markers for dirty state

2023-09-27 20:32:17 -07:00

fuzz.go

…

gateway_test.go

monitoring: track slow consumers per connection type

2023-08-09 05:57:42 -07:00

gateway.go

Make server shutdown an atomic and check inside unsubscribe to avoid unnecessary work.

2023-09-26 17:53:58 -07:00

ipqueue_test.go

RWMutex does not help here and could hurt

2023-04-05 20:26:45 -07:00

ipqueue.go

different panic fixes

2023-06-02 13:19:22 +03:00

jetstream_api.go

When under load, concurrent stream creation of the same stream could return stream not found, which is odd.

2023-09-27 18:05:43 -07:00

jetstream_benchmark_test.go

Refactor cluster creation for JS benchmarks

2023-09-27 09:26:11 -07:00

jetstream_chaos_test.go

Reduce messages in chaos tests

2023-06-09 17:07:53 +01:00

jetstream_cluster_1_test.go

De-flake TestJetStreamClusterAccountPurge by waiting for account to exist

2023-09-14 11:40:30 +01:00

jetstream_cluster_2_test.go

[FIXED] Account resolver lock inversion

2023-09-25 15:09:11 -06:00

jetstream_cluster_3_test.go

[FIXED] Routes: Pinned Accounts connect/reconnect in some cases

2023-09-28 10:46:32 -06:00

jetstream_cluster.go

Add in warnings for filestore recover state if happy path fails.

2023-09-27 16:22:15 -07:00

jetstream_consumer_test.go

Remove rand.Seed use, not needed in Go +1.20

2023-09-05 16:55:04 -07:00

jetstream_errors_generated.go

Consumers inherit limits for max_ack_pending and inactive_threshold from stream

2023-09-01 10:54:11 +01:00

jetstream_errors_test.go

…

jetstream_errors.go

…

jetstream_events.go

…

jetstream_helpers_test.go

[FIXED] Account resolver lock inversion

2023-09-25 15:09:11 -06:00

jetstream_jwt_test.go

Merge branch 'main' into dev

2022-12-13 13:08:35 -08:00

jetstream_leafnode_test.go

Move assigned ports out of ephemeral range

2023-07-14 15:08:17 +01:00

jetstream_super_cluster_test.go

Fixes for move test.

2023-09-12 11:38:35 -07:00

jetstream_test.go

[FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once.

2023-09-28 10:50:54 -07:00

jetstream.go

flake: Fixes TestAccountReqMonitoring

2023-09-06 03:43:11 -07:00

jwt_test.go

flake: Fixes TestServerOperatorModeUserInfoExpiration

2023-09-13 11:57:53 +02:00

jwt.go

Authentication and Authorization callouts for server configuration mode.

2022-12-28 10:32:45 -08:00

leafnode_test.go

Set S2 writer concurrency to 1

2023-09-25 09:54:54 +01:00

leafnode.go

Make server shutdown an atomic and check inside unsubscribe to avoid unnecessary work.

2023-09-26 17:53:58 -07:00

log_test.go

Add logtime_utc option

2023-07-21 16:56:13 -07:00

log.go

Changes for max log files option (active plus backups); remove redundant lexical sort of backups; adjust test

2023-09-15 22:08:09 -07:00

memstore_test.go

Track deleted with single avl.SeqSet dmap for now vs old method.

2023-08-05 12:32:29 -07:00

memstore.go

Use write lock in memstore.LoadNextMsg

2023-09-17 17:24:53 -07:00

monitor_sort_opts.go

Fix monitoring server connz idle time sorting

2023-09-01 14:32:08 +03:00

monitor_test.go

Allow sync intervals to be set and the ability to have all data writes synchronous.

2023-09-04 11:05:13 -07:00

monitor.go

Add *tls.Conn safe type check as some black box unit tests override the natural underlying type for test purposes which would otherwise cause a panic

2023-09-15 13:52:41 -07:00

mqtt_test.go

[FIXED] Increased AckWait in TestMQTTQoS2RetriesPublish, TestMQTTQoS2RetriesPubRel (#4518 )

2023-09-13 11:49:17 +01:00

mqtt.go

Make server shutdown an atomic and check inside unsubscribe to avoid unnecessary work.

2023-09-26 17:53:58 -07:00

nkey_test.go

Replace rand.Rand with crypto/rand.Read in Go +1.20

2023-09-05 16:57:49 -07:00

nkey.go

Use crypto/rand.Read instead of math/rand.Read

2023-07-13 12:04:58 +01:00

norace_test.go

[FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once. (#4592 )

2023-09-28 11:22:20 -07:00

ocsp_peer.go

Remove ocsp debug log on reload

2023-08-30 14:54:30 -07:00

ocsp_responsecache.go

OCSP Peer Feature

2023-08-02 11:25:48 -07:00

ocsp.go

Merge branch 'main' into dev

2023-08-04 10:15:35 -07:00

opts_test.go

Allow sync intervals to be set and the ability to have all data writes synchronous.

2023-09-04 11:05:13 -07:00

opts.go

Add prof_block_rate option for enabling/configuring the block profile

2023-09-25 21:04:25 +01:00

parser_test.go

[ADDED] Multiple routes and ability to have per-account routes

2023-04-03 09:32:25 -06:00

parser.go

…

ping_test.go

…

raft_helpers_test.go

Add Raft goroutine labels, tweak logging

2023-09-16 11:15:06 +01:00

raft_test.go

Basic raft tests

2023-04-12 11:48:22 -07:00

raft.go

In lameduck mode shutdown jetstream at start, do not leave running during connection drain.

2023-09-24 14:42:59 -07:00

rate_counter_test.go

…

rate_counter.go

…

README-MQTT.md

[ADDED] README-MQTT.md: MQTT implementation notes

2023-09-04 06:46:16 -07:00

README.md

…

reload_test.go

Fixes to service imports on reload

2023-08-05 18:21:01 -07:00

reload.go

Add prof_block_rate option for enabling/configuring the block profile

2023-09-25 21:04:25 +01:00

ring_test.go

…

ring.go

…

route.go

[FIXED] Routes: Pinned Accounts connect/reconnect in some cases

2023-09-28 10:46:32 -06:00

routes_test.go

[FIXED] Routes: Pinned Accounts connect/reconnect in some cases

2023-09-28 10:46:32 -06:00

sendq.go

Optimize to not allocate converting strings to []byte

2023-04-08 20:46:05 -07:00

server_test.go

Make server shutdown an atomic and check inside unsubscribe to avoid unnecessary work.

2023-09-26 17:53:58 -07:00

server.go

Make server shutdown an atomic and check inside unsubscribe to avoid unnecessary work.

2023-09-26 17:53:58 -07:00

service_test.go

…

service_windows_test.go

Fix spelling

2023-01-17 17:40:39 -08:00

service_windows.go

More compact syntax

2022-12-27 09:41:39 +01:00

service.go

…

signal_test.go

Match --signal PIDs with globular-style expression.

2023-08-07 10:16:05 -07:00

signal_wasm.go

…

signal_windows.go

…

signal.go

Match --signal PIDs with globular-style expression.

2023-08-07 10:16:05 -07:00

split_test.go

[ADDED] Multiple routes and ability to have per-account routes

2023-04-03 09:32:25 -06:00

store.go

Detect mal-formed stream state snapshots and return appropriate error

2023-07-30 11:06:06 -07:00

stream.go

[FIXED] Race condition in some leader failover scenarios leading to messages being potentially sourced more than once. (#4592 )

2023-09-28 11:22:20 -07:00

subject_transform_test.go

[ADDED] Support for multi-filter in stream sources (#4276 )

2023-08-01 10:50:11 -07:00

subject_transform.go

[ADDED] Support for multi-filter in stream sources (#4276 )

2023-08-01 10:50:11 -07:00

sublist_test.go

Fixes to http healthz monitoring response

2023-08-31 16:05:09 -07:00

sublist.go

Don't take sublist write lock in match if sublist cache disabled

2023-09-27 16:33:58 +01:00

test_test.go

Allow more time in TestFileStoreNumPendingLargeNumBlks, improve logging on failure

2023-09-12 11:02:43 +01:00

trust_test.go

Use testing.TempDir() where possible

2022-12-12 13:18:44 -08:00

util_test.go

…

util.go

[ADDED] Multiple routes and ability to have per-account routes

2023-04-03 09:32:25 -06:00

websocket_test.go

test: update TestWSTLSVerifyClientCert for go1.21

2023-08-09 21:50:46 -07:00

websocket.go

Make server shutdown an atomic and check inside unsubscribe to avoid unnecessary work.

2023-09-26 17:53:58 -07:00

README.md

Tests

Tests that run on Travis have been split into jobs that run in their own VM in parallel. This reduces the overall running time but also is allowing recycling of a job when we get a flapper as opposed to have to recycle the whole test suite.

JetStream Tests

For JetStream tests, we need to observe a naming convention so that no tests are omitted when running on Travis.

The script runTestsOnTravis.sh will run a given job based on the definition found in ".travis.yml".

As for the naming convention:

All JetStream tests name should start with TestJetStream
Cluster tests should go into jetstream_cluster_test.go and start with TestJetStreamCluster
Super-cluster tests should go into jetstream_super_cluster_test.go and start with TestJetStreamSuperCluster

Not following this convention means that some tests may not be executed on Travis.