Will now breakout the internal NATS latency to show requestor client RTT, responder client RTT and any internal latency caused by hopping between servers, etc.
Signed-off-by: Derek Collison <derek@nats.io>
Changed some of client.outbound fields to int64.
Moved fields around to minimize size of struct (checked with
unsafe.Sizeof())
Checked benchmark results before/after
Added test
Resolves#1118
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If account A imports from B and B from A, when the account A
is built, it causes B to be fetch, but since B imports from A,
A was fetch/built again in an infinite loop.
Resolves#1117
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Currently, the $SYSTEM subject is used in this repo, but it seems like this
subject name is out of date.
This change updates the code to use $SYS to match the documentation.
With Go 1.12 (strangely was not able to reproduce with Go 1.11)
the test TestRouteNoCrashOnAddingSubToRoute() would frequently
locks up and consume all avail CPUs on the machine. Running
this test with GOMAXPROCS=2 you would see server.test CPU usage
pegged at 200% (assuming you have at least 2 CPUs).
The reason was that the writeLoop was spinning because another
routine was already in flushOutbound() and stack trace would
show that it was stuck in system calls. It seems that even though
the writeLoop does release the lock but grab it right away was
not allowing the syscall to complete.
So decided to put back the unlock/gosched/lock back in flushOutbound()
when flag is already set, but then protect the closeConnection()
with its own flag (similar to clearConnection) to not re-introduce
issue fixed in #1092.
Had to fix the benchmark test RoutedInterestGraph because after a
route is accepted, the initial PING will be sent after 1sec which
was breaking this test.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Defaults to 1sec but will be opts.PingInterval if value is lower.
All non client connections invoked this function for the first
PING.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When a leaf or route connection is created, set the first ping
timer to fire at 1sec, which will allow to compute the RTT
reasonably soon (since the PingInterval could be user configured
and set much higher).
For Route in PR #1101, I was sending the PING on receiving the
INFO which required changing bunch of tests. Changing that to
also use the first timer interval of 1sec and reverted changes
to route tests.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- On startup, verify that local account in leafnode (if specified
can be found otherwise fail startup).
- At runtime, print error and continue trying to reconnect.
Will need to decide a better approach.
- When using basic auth (user/password), it was possible for a
solicited Leafnode connection to not use user/password when
trying an URL that was discovered through gossip. The server
now saves the credentials of a configured URL to use with
the discovered ones.
Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
I noticed that TestNoRaceRoutedQueueAutoUnsubscribe started to
fail a lot on Travis. Running locally I could see a 45 to 50%
failures. After investigation I realized that the issue was that
we have wrongly re-used `subscription.nm` and set to -1 on unsubscribe
however, I believe that it was possible that when subscription was
closed, the server may have already picked that consumer for a delivery
which then causes nm==-1 to be bumped to 0, which was wrong.
Commenting out the subscription.close() that sets nm to -1, I could
not get the test to fail on macOS but would still get 7% failure on
Linux VM. Adding the check to see if sub is closed in deliverMsg()
completely erase the failures, even on Linux VM.
We could still use `nm` set to -1 but check on deliverMsg(), the
same way I use the closed int32 now.
Fixed some flappers.
Updated .travis.yml to failfast if one of the command in the
`script` fails. User `set -e` and `set +e` as recommended in
https://github.com/travis-ci/travis-ci/issues/1066
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Added the RTT field to each route reported in routez.
Ensure that when a route is accepted, we send a PING to compute
the first RTT and don't have to wait for the ping timer to fire.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This was introduced in PR#930. The first commit had the route's
check if the flushOutbound() returned false, and if so would
locally unlock/lock the connection's lock. Unfortunately, this
was replaced in the second commit (a6aeed3a6b)
to the flushOutbound() function itself.
This causes the function closeConnection() to possibly unlock
the connection while calling flushOutbound(), which if the
connection is closed due to both a tls timeout for instance
and explicitly, it would result in the connection being scheduled
for a reconnect (if explicit gateway connection, possibly route).
Added defensive code in Gateway to register a unique outbound gateway.
Fixed a test that was now failing with newer Go version in which
they fixed url.Parse()
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Take into account tracking of response maps that are created and do proper cleanup.
Also fixes#1089 which was discovered while working on this.
Signed-off-by: Derek Collison <derek@nats.io>
When a route receives a message, it uses a thread local cache to
find the account and subscriptions match for a given subject.
When not found, an entry is added to this cache. The problem is
that this cache will reference subscriptions that in turn
reference connections.
When the subscriptions/connections are closed, this thread local
cannot be purged from those closed subscriptions (since it is
thread local - no lockin is used).
The real issue is that connection's buffer was not set to nil on
close, which then could cause more than expected memory to be
still referenced. Setting the buffer to nil will help reduce the
memory being used.
When an entry is added to the cache, the cache may reach a size
that will cause the server to prune some entries. From time to
time, the cache will be scanned to look for entries that contain
only closed subscriptions and remove those.
Resolves#1082
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is helpful for clients who send data and close the connection.
Also helpful to process errors like auth for solicited leafnodes.
Signed-off-by: Derek Collison <derek@nats.io>
When a cluster of servers are having routes to each other, there
is a chance that the list of leafnode URLs maintained on each
server is not complete. This would result in LN servers connecting
to this cluster to not get the full list of possible URLs the
server could reconnect to.
Also fixed a DATA RACE that appeared when running the updated
TestLeafNodeInfoURLs test. Fixed the race and added specific
test that easily demonstrated the race: TestLeafNodeNoRaceGeneratingNonce
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Also fix for #1071 in that we need to check remote gateways TLS
config even if main gateway section is not configured with TLS.
Related to #1071
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>