Commit Graph

36 Commits

Author SHA1 Message Date
Ivan Kozlovic
f805f23d6e Travis updates
- Add Go 1.17
- Fix go fmt from Go 1.17 (build directives)
- Download version of misspell and staticcheck instead of doing
"go get" since current staticcheck would be broken without go.mod

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-11-15 17:23:08 -07:00
Ivan Kozlovic
a025ce7472 Set defaultServerOptions port to -1 for random
Updated some tests based on this change but also missing defer
connection close or server shutdown.

Fixed how the OCSP run go routine would shutdown, which would
never complete because grWG was not decremented by this go routine
prior to invoking s.Shutdown()

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-09-02 14:22:56 -06:00
Derek Collison
5f93ca09cd Bumped memory ceiling
Signed-off-by: Derek Collison <derek@nats.io>
2021-06-10 07:28:20 -07:00
Jaime Piña
d929ee1348 Check errors when removing test directories and files
Currently in tests, we have calls to os.Remove and os.RemoveAll where we
don't check the returned error. This hides useful error messages when
tests fail to run, such as "too many open files".

This change checks for more filesystem related errors and calls t.Fatal
if there is an error.
2021-04-07 11:09:47 -07:00
Derek Collison
d4e4c37e94 Test fixes
Signed-off-by: Derek Collison <derek@nats.io>
2021-03-14 06:18:50 -07:00
Derek Collison
eecec2aed1 Increase due to sendq
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-16 14:17:46 -08:00
Derek Collison
c16f6e193d Move JetStream direct APIs to private.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-07 15:19:22 -08:00
Derek Collison
38267d5110 Stability improvements
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-06 20:15:32 -08:00
Derek Collison
a8982c040f Suppress lost quorum processing if to close to raft node creation time.
Signed-off-by: Derek Collison <derek@nats.io>
2021-02-02 06:27:07 -08:00
Derek Collison
9b20d5c888 Fixed bug on raft inline cacthup when apply channel was full.
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-30 13:22:27 -08:00
Ivan Kozlovic
7d1a4778b8 Merge pull request #1826 from nats-io/fix_consumer_loop_delivery_exit
Fix stop of consumer's delivery loop
2021-01-20 10:34:57 -07:00
Ivan Kozlovic
c4a284b58f Fix stop of consumer's delivery loop
I noticed that some consumer go routines were left running at the end
of the test suite.
It turns out that there was a race the way the consumer's qch was closed.
Since it was closed and then set to nil, it is possible that the go
routines that are started and then try to capture o.qch would actually
get qch==nil, wich then when doing a select on that nil channel would
block forever.

So we know pass the qch to the 2 go routines loopAndGatherMsgs() and
loopAndDeliverMsgs() so that when we close the channel there is
no risk of that race happening.

I do believe that there is still something that should be looked at:
it seems that a consumer's delivery loop can now be started/stopped
many times based on leadership acquired/lost. If that is the case,
I think that the consumer should wait for previous go routine to
complete before trying to start new ones.

Also moved 3 JetStream tests to the test/norace_test.go file because
they would consumer several GB of memory when running with the -race flag.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-19 17:39:32 -07:00
Ivan Kozlovic
42dcdd2eb2 Simplify sendSubsToRoute()
Since we were creating subs on the fly, sub.im would always be nil.
We were passing a client because it was needed in sendRouteSubOrUnSubProtos().

This PR simply fills the buffer with each account's subscriptions.
There is also no need to have subs sent from different go routine
based on some threshold. Routes are no longer subject to max pending.

Some code has been made into a function so that they can be shared
by sendSubsToRoute() and sendRouteSubOrUnSubProtos(). The function
is simply adding to given buffer the RS+/- protocol.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-01-19 14:01:43 -07:00
Derek Collison
f0cdf89c61 JetStream Clustering WIP
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 01:14:52 -08:00
Derek Collison
439e090e0d Updates based on feedback
Signed-off-by: Derek Collison <derek@nats.io>
2020-06-30 18:14:30 -07:00
Derek Collison
06ca580334 Update write deadline, client processing and slow proxy
Signed-off-by: Derek Collison <derek@nats.io>
2020-06-30 16:41:01 -07:00
Ivan Kozlovic
25bd5ca352 [FIXED] Unsubscribe may not be propagated through a leaf node
There is a race between the time the processing of a subscription
and the init/send of subscriptions when accepting a leaf node
connection that may cause internally a subscription's subject
to be counted many times, which would then prevent the send of
an LS- when the subscription's interest goes away.

Imagine this sequence of events, each side represents a "thread"
of execution:
```
client readLoop                         leaf node readLoop
----------------------------------------------------------
recv SUB foo 1
sub added to account's sublist

                                         recv CONNECT
                                     auth, added to acc.

updateSmap
smap["foo"]++ -> 1
no LS+ because !allSubsSent

                                         init smap
                                    finds sub in acc sl
                                    smap["foo"]++ -> 2
                                        sends LS+ foo
                                    allSubsSent == true

recv UNSUB 1
updateSmap
smap["foo"]-- -> 1
no LS- because count != 0
----------------------------------------------------------
```
Equivalent result but with slightly diffent execution:
```
client readLoop                         leaf node readLoop
----------------------------------------------------------
recv SUB foo 1
sub added to account's sublist

                                         recv CONNECT
                                     auth, added to acc.

                                         init smap
                                    finds sub in acc sl
                                    smap["foo"]++ -> 1
                                        sends LS+ foo
                                    allSubsSent == true

updateSmap
smap["foo"]++ -> 2
no LS+ because count != 1

recv UNSUB 1
updateSmap
smap["foo"]-- -> 1
no LS- because count != 0
----------------------------------------------------------
```

The approach for the fix is delay the creation of the smap
until we actually initialize the map and send the subs on processing
of the CONNECT.
In the meantime, as soon as the LN connection is registered
and available in updateSmap, we check that smap is nil or
not. If nil, we do nothing.

In "init smap" we keep track of the subscriptions that have been
added to smap. This map will be short lived, just enough to
protect against races above.

In updateSmap, when smap is not nil, we need to checki, if we
are adding, that the subscription has not already been handled.
The tempory subscription map will be ultimately emptied/set to
nil with the use of a timer (if not emptied in place when
processing smap updates).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-06-05 10:07:15 -06:00
Derek Collison
2bd7553c71 System Account on by default.
Most of the changes are to turn it off for tests that were watching subscriptions and such.

Signed-off-by: Derek Collison <derek@nats.io>
2020-05-29 17:56:45 -07:00
Ivan Kozlovic
46b45b3148 Ensure route INFO is processed before starting queue test
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-05-25 06:58:23 -07:00
Derek Collison
0a36706958 PubAck details that provide stream name and sequence assigned
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:21:27 -07:00
Derek Collison
1394a7118d Breaking change to upgrade ConsumerConfig for consumer creation.
This is a breaking change and will not be able to restore consumer's from a filestore when upgraded.
We are getting close to settling on the API an once that happens we will not be introducnig any
breaking changes.

Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:20:02 -07:00
Derek Collison
47c28b2fb0 JetStream major refactor for name changes.
MsgSet -> Stream
Observable -> Consumer

Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:16:03 -07:00
Derek Collison
75908f80a4 API cleanup
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:07:02 -07:00
Derek Collison
b7b98df4ee Server limits and account reservations
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:07:02 -07:00
Derek Collison
d02b2a3d9c NoAck option for MsgSets
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:07:02 -07:00
Derek Collison
0a92d8e87d AckWait and redelivery
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:07:02 -07:00
Derek Collison
16e6952cd6 Move load balance test to norace
Signed-off-by: Derek Collison <derek@nats.io>
2020-05-19 14:07:02 -07:00
Ivan Kozlovic
1b2754475b Refactor async client tests
Updated all tests that use "async" clients.
- start the writeLoop (this is in preparation for changes in the
  server that will not do send-in-place for some protocols, such
  as PING, etc..)
- Added missing defers in several tests
- fixed an issue in client.go where test was wrong possibly causing
  a panic.
- Had to skip a test for now since it would fail without server code
  change.

The next step will be ensure that all protocols are sent through
the writeLoop and that the data is properly flushed on close (important
for -ERR for instance).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-12 11:58:24 -07:00
Derek Collison
3330820502 Fixed a bug where we leaked service imports. Also prior this would have leaked subscriptions as well.
Signed-off-by: Derek Collison <derek@nats.io>
2019-11-14 13:29:17 -08:00
Ivan Kozlovic
15201a19cd Fixed a lock inversion issue with account
In updateRouteSubscriptionMap(), when a queue sub is added/removed,
the code locks the account and then the route to send the update.
However, when a route is accepted and the subs are sent, the
opposite (locking wise) occurs. The route is locked, then the account.

This lock inversion is possible because a route is registered (added
to the server's map) and then the subs are sent.

Use a special lock to protect the send, but don't hold the acc.mu
lock while getting the route's lock.

The tests that were created for the original missed queue updates
issue, namely TestClusterLeaksSubscriptions() and
TestQueueSubWeightOrderMultipleConnections() pass with this change.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-13 14:30:00 -06:00
Ivan Kozlovic
90d592e163 Leaf and Route RTT
When a leaf or route connection is created, set the first ping
timer to fire at 1sec, which will allow to compute the RTT
reasonably soon (since the PingInterval could be user configured
and set much higher).

For Route in PR #1101, I was sending the PING on receiving the
INFO which required changing bunch of tests. Changing that to
also use the first timer interval of 1sec and reverted changes
to route tests.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 09:34:17 -06:00
Ivan Kozlovic
89dd13f134 [ADDED] RTT in routez's route info
Added the RTT field to each route reported in routez.
Ensure that when a route is accepted, we send a PING to compute
the first RTT and don't have to wait for the ping timer to fire.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:16:07 -06:00
Ivan Kozlovic
fc8087daa7 Updates based on comments
- add sha256 algo
- move some mem hungry tests while running with -race to the norace
- remove GOGC=10

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-15 09:06:56 -06:00
Derek Collison
8bfe14bbfd check response perms more often, make sure we limit memory growth
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-25 16:53:54 -07:00
Derek Collison
d1a782e014 Messages not distributed evenly when sourced from leafnode.
When messages came from a leafnode there were not being distributed evenly to the destination cluster.

Signed-off-by: Derek Collison <derek@nats.io>
2019-06-11 20:37:49 -07:00
Ivan Kozlovic
04d824c4d4 [FIXED] Possible slow consumers when routes exchange sub list
If each server has a long list of subscriptions, when the route
is established, sending this list could result in each server
treating the peer as a slow consumer, resulting in a reconnect,
etc..
Also bumping the fan-in threshold for route connections.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-02-20 12:09:26 -08:00