Commit Graph

5512 Commits

Author SHA1 Message Date
Ivan Kozlovic
4739eebfc4 [FIXED] JetStream: possible deadlock during consumer leadership change
Would possibly show up when a consumer leader changes for a consumer
that had redelivered messages and for instance messages were inbound
on the stream.

Resolves #2912

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-25 12:21:51 -06:00
Derek Collison
edcddfae58 Make at least work
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 19:12:31 -07:00
Derek Collison
1d38a73bcb Fix for version comparison
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 18:39:28 -07:00
Derek Collison
7a9c2336e7 Bump to 2.8.0-beta.1
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 16:52:42 -07:00
Derek Collison
5fb9a39bfc Merge pull request #2947 from nats-io/restart_stream_recover
[IMPROVED] Stream recovery and Memory Utilization
2022-03-24 16:50:25 -07:00
Derek Collison
ef8f543ea5 Improve memory usage through JetStream storage layer.
Previously we would rely more heavily on Go's garbage collector since when we loaded a block for an underlying stream we would pass references upward to avoimd copies.
Now we always copy when passing back to the upper layers which allows us to not only expire our cache blocks but pool and reuse them.

The upper layers also had changes made to allow the pooling layer at that level to interoperate with the storage layer optionally.

Also fixed some flappers and a bug where de-dupe might not be reformed correctly.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Derek Collison
d7e1e5ae61 Make sure that we do not become a candidate/leader too soon or if we are not caughtup.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Derek Collison
7fd5f4dc24 Update Go client
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-24 17:45:15 -06:00
Ivan Kozlovic
fde6c61f9f Merge pull request #2948 from nats-io/js_backoff_check_pending
JS: BackOff list caused too frequent checkPending() calls
2022-03-23 14:58:29 -06:00
Ivan Kozlovic
2253bb6f1a JS: BackOff list caused too frequent checkPending() calls
Since the "next" timer value is set to the AckWait value, which
is the first element in the BackOff list if present, the check
would possibly happen at this interval, even when we were past
the first redelivery and the backoff interval had increased.

The end-user would still see the redelivery be done at the durations
indicated by the BackOff list, but internally, we would be checking
at the initial BackOff's ack wait.

I added a test that uses the store's interface to detect how many
times the checkPending() function is invoked. For this test it
should have been invoked twice, but without the fix it was invoked
15 times.

Also fixed an unrelated test that could possibly deadlock causing
tests to be aborted due to inactivity on Travis.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-23 12:46:17 -06:00
Ivan Kozlovic
e9bf972cec Merge pull request #2946 from nats-io/fix_ipqueue_unregister
Fixed panic on stream create failure (with filestore)
2022-03-22 16:11:04 -06:00
Ivan Kozlovic
8d4ff4bc96 Fixed panic on stream create failure (with filestore)
This was introduced by the change for ipQueues in #2931.
The (*ipQueue).unregister() was written with a protection for
the ipQueue to be nil, however, mset.outq is actually not a bare
ipQueue but a jsOutQ that embeds a pointer to an ipQueue. So we
need to implement register() for jsOutQ.

Added a test that reproduced the issue, but found it with a flapping
test (TestJetStreamLongStreamNamesAndPubAck) that failed due to
a file name too long.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-22 15:21:01 -06:00
Ivan Kozlovic
9d6525c8a3 Merge pull request #2943 from nats-io/fix_2926
[CHANGED] Duplicates in authorization{} and accounts{} now detected
2022-03-22 14:38:42 -06:00
Ivan Kozlovic
897c229fa9 Update test to capture accounts{} and single u/p or token
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-22 10:29:11 -06:00
Ivan Kozlovic
eef194c43b [CHANGED] Duplicates in authorization{} and accounts{} now detected
If accounts{} block is specified, authorization{} should not have
any user/password/token or users array defined.

The reason is that users parsed in accounts{} are associated with
their respective account but users parsed in authorization{} are
associated with the global account. If the same user name is
in both, and since internally the parsing of those 2 blocks is
completely random (even if layed out in the config in a specific
order), the outcome may be that a user is either associated with
an account or the default global account.

To minimize breaking changes, but still avoid this unexpected
outcome, the server will now detect if there are duplicate users
(or nkeys) inside authorization{} block itself, but also between
this block and accounts{}.
The check will also detect if accounts{} has any user/nkey, then
the authorization{} block should not have any user/password/token,
making this test similar to the check we had in authorization{}
block itself.

Resolves #2926

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-21 19:50:16 -06:00
Jaime Piña
60773be03f Use random high port in placement test (#2940) 2022-03-21 15:38:01 -07:00
Ivan Kozlovic
13cd977e50 Merge pull request #2939 from nats-io/js_panic_leader_change
[FIXED] JetStream: possible panic on leadership change notices
2022-03-21 12:30:09 -06:00
Ivan Kozlovic
f11f7a61e8 Merge pull request #2938 from nats-io/fix_2920
[FIXED] Removal of an external source stream
2022-03-21 12:29:57 -06:00
Ivan Kozlovic
e75020e275 [FIXED] JetStream: possible panic on leadership change notices
I got this panic in a test:
```
=== RUN   TestJetStreamClusterAccountLoadFailure
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0xb1501b]
goroutine 47853 [running]:
github.com/nats-io/nats-server/v2/server.(*jetStream).processLeaderChange(0xc000b60580, 0x0)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:3638 +0x9b
github.com/nats-io/nats-server/v2/server.(*jetStream).monitorCluster(0xc000b60580)
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:853 +0x60f
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
	/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3017 +0x87
FAIL	github.com/nats-io/nats-server/v2/server	227.888s
```

which from that branch would point to function processLeaderChange()
line:
```
} else if node := js.getMetaGroup().GroupLeader(); node == _EMPTY_ {
```
which I guess meant that getMetaGroup() was returning `nil`.

Refactored a bit to get the group leader in 2 steps.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-21 12:11:41 -06:00
Jaime Piña
33cfc748bf Disable some supercluster limit placement tests (#2937) 2022-03-21 11:05:13 -07:00
Ivan Kozlovic
68da3e8253 [FIXED] Removal of an external source stream
Removal of a stream source that was external was not working properly,
allowing messages to still flow after the removal and until the
server hosting the stream to which the source was removed was
restarted.

Resolves #2920

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-21 10:59:47 -06:00
Ivan Kozlovic
29ff67e2ac Tests: Replace all Ack() with AckSync() for now
For reason explained in previous commit, for tests that were
expecting the number of ack/pending to be of a certain value after
an Ack(), they would be flapping. Replaced all references and
we can go back to selectively call Ack() when AckSync() is not
needed.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 20:25:01 -06:00
Ivan Kozlovic
ac52ecd9ff Fixing flapper
Since acks are now processed in different go-routine, the tests
that use Ack() cannot expect the number of ack messages to be
exact immediately. So in this test use AckSync() to ensure that
the ack is processed. Alternatively, the pending count should
be checked with a checkFor().

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 19:53:33 -06:00
Ivan Kozlovic
a23b1b73ef Merge pull request #2931 from nats-io/ipq_changes
Changes to IPQueues
2022-03-17 19:13:02 -06:00
Derek Collison
a4e795c996 Attempt to fix flapper
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-17 17:38:32 -07:00
Ivan Kozlovic
c3da392832 Changes to IPQueues
Removed the warnings, instead have a sync.Map where they are
registered/unregistered and can be inspected with an undocumented
monitor page.
Added the notion of "in progress" which is the number of messages
that have beend pop()'ed. When recycle() is invoked this count
goes down.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 17:53:06 -06:00
Derek Collison
69d265601d Merge pull request #2930 from nats-io/dupe_urls_config
Detect exact duplicates for URLs for routes, gateways or leafnodes.
2022-03-17 16:23:56 -07:00
Derek Collison
0bb84bf76b Make warning more detailed
Co-authored-by: Waldemar Quevedo <wally@synadia.com>
2022-03-17 14:59:14 -07:00
Derek Collison
e204a7961d When detecting exact duplicates for URLs for routes, gws or leafnodes, enter a warning and ignore.
If misconfigured could prevent the JetStream system from electing a leader.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-17 14:52:01 -07:00
Jaime Piña
50ca685a3b Add stream limit update test (#2929)
This adds a test to see if we can update a stream when the stream limit
is 1. Currently, this test fails, so we're skipping it. This test will
be enabled in a future PR.
2022-03-17 13:49:37 -07:00
Derek Collison
0601da2186 Merge pull request #2928 from nats-io/m_ver
Show version on main monitoring page with link to source
2022-03-17 11:24:35 -07:00
Ivan Kozlovic
7d9bb32c1d Fix a flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-17 12:18:22 -06:00
Derek Collison
fa098f1af0 Show version on main monitoring page with link to source
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-17 11:04:11 -07:00
Ivan Kozlovic
fe6d7b305f Merge pull request #2898 from nats-io/js_cons_ack_processing
[CHANGED] JetStream: Redeliveries may be delayed if necessary
2022-03-17 10:57:22 -06:00
Ivan Kozlovic
2c0f5046f1 Merge pull request #2923 from nats-io/gw_detect_duplicate_srv_name
[CHANGED] Gateway: Detect duplicate names between clusters
2022-03-17 10:57:08 -06:00
Derek Collison
b99fd81464 Merge pull request #2927 from nats-io/consumer_wipe
Improve consumer state handling.
2022-03-17 08:50:24 -07:00
Derek Collison
dbfa47f9b1 Improve state preservation for consumers, specifically DeliverNew variants when no activity has been present.
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-16 20:55:14 -07:00
Derek Collison
287b567b1c Add consumer check to healthz and allow to be called directly
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-16 20:52:31 -07:00
Jaime Piña
acfd456758 Prevent reserved bytes underflow (#2907) 2022-03-16 15:19:35 -07:00
Derek Collison
59753ec0da Bump to 2.7.5-beta.2
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-16 09:29:58 -07:00
Derek Collison
848670e45c Merge pull request #2925 from nats-io/delete_offline_stream
Offline streams behavior during list and delete improved.
2022-03-16 09:29:20 -07:00
Derek Collison
2290a132ad Merge pull request #2924 from nats-io/ack_metrics
Test ack metrics
2022-03-16 09:04:54 -07:00
Derek Collison
e4ebc4648e When a stream or consumer was offline we would not properly respond to a delete.
We also would hang if no stream info requests were sent during a stream list due to the asset being offline.

Signed-off-by: Derek Collison <derek@nats.io>
2022-03-15 21:11:23 -07:00
Derek Collison
303bb93c18 Test ack metrics
Signed-off-by: Derek Collison <derek@nats.io>
2022-03-15 16:41:06 -07:00
Ivan Kozlovic
63c750e295 [CHANGED] Gateway: Detect duplicate names between clusters
Gateway connection will be closed and error reported if a remote
has a name that is a duplicate of the local cluster.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-15 15:00:13 -06:00
Ivan Kozlovic
5c0d1999ff Bump version
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 14:21:30 -07:00
Ivan Kozlovic
a86b84a9f3 Merge pull request #2918 from nats-io/release_2_7_4
Release v2.7.4
2022-03-09 14:00:53 -07:00
Ivan Kozlovic
773636c1c5 Release v2.7.4
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 13:58:33 -07:00
Ivan Kozlovic
818c2c7a7e Merge pull request #2917 from nats-io/file_path
Ensure file path is correct during stream restore
2022-03-09 13:52:01 -07:00
Ivan Kozlovic
b4128693ed Ensure file path is correct during stream restore
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.

Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 13:31:51 -07:00