Commit Graph

777 Commits

Author SHA1 Message Date
Derek Collison
9b6dbe112c Make sure randomServer() adapts for shutdown servers
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 20:14:11 -08:00
Derek Collison
27f8cbd069 Wait for interest
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 19:46:43 -08:00
Derek Collison
83e2c719b7 Wait in case stopped server was also stream leader
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 19:00:51 -08:00
Derek Collison
0e5e9cb5ee Possible retry in case peers have not committed state
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 18:54:42 -08:00
Derek Collison
e40c3e6f55 Templates not supported currently in clustered mode
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 17:13:31 -08:00
Derek Collison
c8a75e1ed0 test fixes
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 16:15:28 -08:00
Derek Collison
76058c5ec6 Timing for state propagation
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 14:32:38 -08:00
Derek Collison
c7c86c7929 Attempt to fix flapper
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 10:25:47 -08:00
Derek Collison
5148bbf898 Fixes based on PR feedback, cleanup
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 10:04:21 -08:00
Derek Collison
7b1e84c086 Fixed raft bug that would cause entries to be missed on restart with leader HB trigger.
Also added in creation times to stream and consumer assignments to make them consistent.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-25 08:47:37 -08:00
Derek Collison
117607ef11 Fix for race and test for issue R.I. was seeing in nightly. Also fixed flappers.
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-24 21:21:02 -08:00
Derek Collison
a72ddedb55 Fix for issue with stream info and R=1 and fix for a flapper
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-24 19:48:33 -08:00
Derek Collison
9c858d197a Added ability to properly restore consumers from a snapshot.
This made us add forwarding proposals functionality in the raft layer.
More general cleanup and bug fixes as well.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-24 19:30:34 -08:00
Derek Collison
cad0db2aec Cleanup the consumer assignments when consumers become inactive.
This involved extending our raft implementation to forward proposals to the current leader.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-23 13:44:10 -08:00
Derek Collison
d1d2d5b24e Fix for consumer names list in clustered mode
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-23 10:09:44 -08:00
Derek Collison
d7cfb8f6e9 Use client version for stream and consumer extended info
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-22 13:11:36 -08:00
Derek Collison
a43a69a403 Fix for interest only, broken test
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-22 11:04:06 -08:00
Derek Collison
227901a56b More cleanup and stabilization for consumers and failing when sending messages.
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-22 10:09:30 -08:00
Derek Collison
6f2b50a374 Added support for clustered account info and limit enforcement
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-21 18:47:21 -08:00
Derek Collison
74c06ed046 Shutdown cluster on errors
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-20 11:58:31 -08:00
Derek Collison
ff54c9dc9c Reworked snapshot and restore.
Underestimated the effort to get stream restore working properly in cluster mode.
Some good bug fixes and stability improvments.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-20 11:58:31 -08:00
Derek Collison
a1730f1b31 Report on RAFT group information.
This adds in optional reporting to stream and consumer info when running in clsutered mode.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-20 11:58:31 -08:00
Ivan Kozlovic
f5df209022 Fixed SIGSEGV when sending update for unknown stream
Will now return an error that the stream is unknown.

Resolves #1827

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-20 12:42:14 -07:00
Ivan Kozlovic
7d1a4778b8 Merge pull request #1826 from nats-io/fix_consumer_loop_delivery_exit
Fix stop of consumer's delivery loop
2021-01-20 10:34:57 -07:00
Ivan Kozlovic
c4a284b58f Fix stop of consumer's delivery loop
I noticed that some consumer go routines were left running at the end
of the test suite.
It turns out that there was a race the way the consumer's qch was closed.
Since it was closed and then set to nil, it is possible that the go
routines that are started and then try to capture o.qch would actually
get qch==nil, wich then when doing a select on that nil channel would
block forever.

So we know pass the qch to the 2 go routines loopAndGatherMsgs() and
loopAndDeliverMsgs() so that when we close the channel there is
no risk of that race happening.

I do believe that there is still something that should be looked at:
it seems that a consumer's delivery loop can now be started/stopped
many times based on leadership acquired/lost. If that is the case,
I think that the consumer should wait for previous go routine to
complete before trying to start new ones.

Also moved 3 JetStream tests to the test/norace_test.go file because
they would consumer several GB of memory when running with the -race flag.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-19 17:39:32 -07:00
Ivan Kozlovic
42dcdd2eb2 Simplify sendSubsToRoute()
Since we were creating subs on the fly, sub.im would always be nil.
We were passing a client because it was needed in sendRouteSubOrUnSubProtos().

This PR simply fills the buffer with each account's subscriptions.
There is also no need to have subs sent from different go routine
based on some threshold. Routes are no longer subject to max pending.

Some code has been made into a function so that they can be shared
by sendSubsToRoute() and sendRouteSubOrUnSubProtos(). The function
is simply adding to given buffer the RS+/- protocol.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-19 14:01:43 -07:00
Derek Collison
78747b2414 Stability improvements around startup and restore.
We were incorrectly starting clustering before enabling accounts and restoring state.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-17 13:44:49 -08:00
Derek Collison
a603f439bb Make all requests same timeout
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-16 14:17:35 -08:00
Derek Collison
a18a6803c1 Added support for stream and consumer lists.
This utilizes a scatter and gather approach.

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-16 12:42:45 -08:00
Derek Collison
cb69df7118 Add proper support for stream update
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-16 06:29:37 -08:00
Derek Collison
b606dceb59 Stabilize restart/catchup for raft.
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-16 05:47:48 -08:00
Ivan Kozlovic
1874964498 Merge pull request #1812 from nats-io/leafnode_fixes
Fixed some leafnode issues introduced from JS cluster work
2021-01-15 18:22:02 -07:00
Ivan Kozlovic
0d78bce9cf Fixed some leafnode issues introduced from JS cluster work
Also fixed a flapper.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-15 12:00:34 -07:00
Ryota
91a1d9a556 Update error message with correct config value 2021-01-15 13:18:31 +00:00
Ivan Kozlovic
6c4229300a Fixed service import cycle detection that broke with JS clustering
Also added some no-op error handler for some tests to silence the
error report in the log.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-14 11:27:36 -07:00
Derek Collison
1b0e740123 Fix for race
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 07:28:14 -08:00
Derek Collison
491b3c34cd Tweak timing more for tests
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 07:07:04 -08:00
Derek Collison
ab2a645791 Fix for various flappers
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 06:54:08 -08:00
Derek Collison
4bfe9d4c24 Fixes to PR.
Add nats to default storage directory
Fix race in raft, change leader notice
Fix test crash on failure

Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 05:56:05 -08:00
Derek Collison
37cf7584bd Merge branch 'master' into jsc 2021-01-14 02:52:35 -07:00
Derek Collison
f0cdf89c61 JetStream Clustering WIP
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 01:14:52 -08:00
Ivan Kozlovic
7b116379cb Propose going back to condition variable to notify writeLoop
This is how it was up to v2.1.2 included (changed in v2.1.4 onward).
I added a benchmark that has 3 subscribers running and increase
the number of publishers: 1, 2, 5 and 10. This is the comparison
between the pre-PR and post-PR:

```
benchcmp old.txt new.txt
benchmark                           old ns/op     new ns/op     delta
Benchmark___BumpPubCount_1x3-16     396           385           -2.78%
Benchmark___BumpPubCount_2x3-16     495           406           -17.98%
Benchmark___BumpPubCount_5x3-16     542           395           -27.12%
Benchmark__BumpPubCount_10x3-16     549           515           -6.19%

benchmark                           old MB/s     new MB/s     speedup
Benchmark___BumpPubCount_1x3-16     717.27       737.54       1.03x
Benchmark___BumpPubCount_2x3-16     574.31       699.02       1.22x
Benchmark___BumpPubCount_5x3-16     524.35       718.80       1.37x
Benchmark__BumpPubCount_10x3-16     517.26       551.53       1.07x
```

It is inline with what the user reported, seeing a 20% drop in performance
when going from 1 publisher to 2. But, as we can see, the difference
between go channel and cond variable reduces with the increased number
of publishers after a certain number.

I am not sure of the performance impact on other situations, so this
PR is more of a proposal than a fix.

Resolves #1786

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-12 12:24:37 -07:00
Ivan Kozlovic
14aecb2202 Fixed headers support for inbound leafnode connection
The server that solicits a LeafNode connection does not send an
INFO, so the accepting side had no way to know if the remote
supports headers or not. The solicit side will now send the headers
support capability in the CONNECT protocol so that the receiving
side can mark the inbound connection with headers support based
on that and its own support for headers.

Resolves #1781

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-12-21 11:53:24 -07:00
Alberto Ricart
f09992a889 updated iteration of signing keys (previously a list, now a map). (#1779) 2020-12-17 13:59:18 -07:00
Derek Collison
eb403ed4d0 Merge pull request #1773 from nats-io/cycle_wc_bug
Catch condition where a serviceImport response matched the original import.
2020-12-14 08:20:55 -08:00
Derek Collison
ced28eca93 Fix flapper
Signed-off-by: Derek Collison <derek@nats.io>
2020-12-13 10:29:34 -08:00
Derek Collison
a3f7e97f9a Catch condition where a serviceImport response matched the original import subject.
Signed-off-by: Derek Collison <derek@nats.io>
2020-12-13 10:17:29 -08:00
Ivan Kozlovic
d5f255b98e Merge pull request #1771 from nats-io/gw_ln_tls_config_reload
[FIXED] Config reload for gateways/leaf remote TLS configurations
2020-12-12 10:51:52 -07:00
Ivan Kozlovic
9f345ac420 Reduce risk of failure for TestJetStreamConsumerMaxDeliveryAndServerRestart
Just increased the AckWait from 20ms to 100ms and reduced max
deliveries from 4 to 3.

I believe that there is still the risk that the message is redelivered
while the server is being shutdown and that message is not making it
to the sub.

But using those new values (100ms/3), I have ran 200 rounds on a Linux
VM and did not get the failure (but did before the change).

Again, this is not proper test fix, but may help. This test has been
failing 11 times already (keeping track in spreadsheet) and causes
several minutes of tests to have to be recycled.
Note that the test ran in about 0.4s and now 0.7s, so not that much
and would be worth the added delay if it helps not breaking the whole
test suite!

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-12-11 19:49:58 -07:00
Ivan Kozlovic
399ff89817 Fixed debug num subs tests
Subject interest propagation delays could cause some of the system
service tests to fail.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-12-11 19:27:23 -07:00