1259 Commits

Author SHA1 Message Date
Ivan Kozlovic
802074292f Release v2.1.0
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-20 12:19:27 -06:00
Waldemar Quevedo
d44b0dec51 Merge pull request #1136 from nats-io/svc-latency-values
Adjust to zero negative latency values
2019-09-20 11:39:33 -05:00
Waldemar Quevedo
d0e36f3b88 Adjust to zero negative latency values
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2019-09-20 09:24:18 -07:00
Derek Collison
ffdbe864a8 Version bump
Signed-off-by: Derek Collison <derek@nats.io>
2019-09-19 19:50:03 -07:00
Derek Collison
37a5612460 Merge pull request #1137 from nats-io/latency_update
Latency tracking updates
2019-09-19 19:48:53 -07:00
Derek Collison
0360f46c2f fixes based on PR updates
Signed-off-by: Derek Collison <derek@nats.io>
2019-09-19 17:16:36 -07:00
Derek Collison
7fe47ace2b Make sure to turn latency on with a claim update
Signed-off-by: Derek Collison <derek@nats.io>
2019-09-19 14:20:35 -07:00
Ivan Kozlovic
731941a18f Fixed ResponsePermissions
- Ensure that defaults are set when values are 0
- Fixed some tests
- Added some helpers in jwt tests to reduce copy/paste

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-19 14:42:38 -06:00
Ivan Kozlovic
256ad4ac15 Bump version to 2.1.0-RC1
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-18 14:26:42 -06:00
Jaime Piña
ab24cddc06 Add latency config
Currently, the config file doesn't recognize the latency config block in
account exports. This change exposes those settings in the config file.

Signed-off-by: Jaime Piña <jaime@synadia.com>
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2019-09-18 13:20:26 -07:00
Ivan Kozlovic
6a70f36e09 Merge pull request #1131 from nats-io/fix_acc_lookup
[FIXED] Locking issue around account lookup/updates
2019-09-18 12:59:28 -06:00
Ivan Kozlovic
20a925ae86 Updates to registerAccount
Make it a function that grabs server lock/unlock and invokes
registerAccountNoLock(). Use that function when already under
the server's lock.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-18 12:45:12 -06:00
Derek Collison
7cf211b056 Use multiple connections to amortize TLS
Signed-off-by: Derek Collison <derek@nats.io>
2019-09-18 11:40:00 -07:00
Derek Collison
0551371b31 Add in JWT support for tracking latency
Signed-off-by: Derek Collison <derek@nats.io>
2019-09-18 08:51:43 -07:00
Ivan Kozlovic
150d47cab3 [FIXED] Locking issue around account lookup/updates
Ensure that lookupAccount does not hold server lock during
updateAccount and fetchAccount.
Updating the account cannot have the server lock because it is
possible that during updateAccountClaims(), clients are being
removed, which would try to get the server lock (deep down in
closeConnection/s.removeClient).
Added a test that would have show the deadlock prior to changes
in this PR.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-17 18:48:23 -06:00
Derek Collison
b98b75b166 Merge pull request #1127 from nats-io/sysdebug
System level services for debugging.
2019-09-17 09:45:53 -07:00
Derek Collison
52430c304a System level services for debugging.
This is the first pass at introducing exported services to the system account for generally debugging of blackbox systems.
The first service reports number of subscribers for a given subject. The payload of the request is the subject, and optional queue group, and can contain wildcards.

Signed-off-by: Derek Collison <derek@nats.io>
2019-09-17 09:37:35 -07:00
Alberto Ricart
eb56ad22ea review comment 2019-09-17 09:56:03 -05:00
Alberto Ricart
af97b5b9df FIX #1128 - Modified the cluster listenstr parsing to allow cluster urls that have
a -1 for a port. This re-enables ability to create clusters on a random
port for testing.
2019-09-16 10:45:27 -05:00
Ivan Kozlovic
5eebc42f47 Merge pull request #1126 from nats-io/fix_acc_lock_issue
Fixed a lock inversion issue with account
2019-09-13 15:11:02 -06:00
Ivan Kozlovic
15201a19cd Fixed a lock inversion issue with account
In updateRouteSubscriptionMap(), when a queue sub is added/removed,
the code locks the account and then the route to send the update.
However, when a route is accepted and the subs are sent, the
opposite (locking wise) occurs. The route is locked, then the account.

This lock inversion is possible because a route is registered (added
to the server's map) and then the subs are sent.

Use a special lock to protect the send, but don't hold the acc.mu
lock while getting the route's lock.

The tests that were created for the original missed queue updates
issue, namely TestClusterLeaksSubscriptions() and
TestQueueSubWeightOrderMultipleConnections() pass with this change.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-13 14:30:00 -06:00
Derek Collison
26db43001f Shorter names for latency tracking JSON
Signed-off-by: Derek Collison <derek@nats.io>
2019-09-12 15:11:43 -07:00
Derek Collison
25d5cb337d Make json tags consistent
Signed-off-by: Derek Collison <derek@nats.io>
2019-09-11 17:30:01 -07:00
Derek Collison
94f143ccce Latency tracking updates.
Will now breakout the internal NATS latency to show requestor client RTT, responder client RTT and any internal latency caused by hopping between servers, etc.

Signed-off-by: Derek Collison <derek@nats.io>
2019-09-11 16:43:19 -07:00
Ivan Kozlovic
effa30ce4a [FIXED] MaxPending > MaxInt32 causes client to be disconnected
Changed some of client.outbound fields to int64.
Moved fields around to minimize size of struct (checked with
unsafe.Sizeof())
Checked benchmark results before/after
Added test

Resolves #1118

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-11 14:29:02 -06:00
Ivan Kozlovic
4253b31dcf [FIXED] Circular account service import dependency
If account A imports from B and B from A, when the account A
is built, it causes B to be fetch, but since B imports from A,
A was fetch/built again in an infinite loop.

Resolves #1117

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-10 18:05:21 -06:00
Derek Collison
97f89ffd3f Merge pull request #1115 from nats-io/update-system-account
Update SYS account name
2019-09-05 19:09:42 +03:00
Jaime Piña
176a19de75 Update SYS account name
Currently, the $SYSTEM subject is used in this repo, but it seems like this
subject name is out of date.

This change updates the code to use $SYS to match the documentation.
2019-09-04 13:49:59 -05:00
Derek Collison
67470911fe Prune remote reply tracking
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 17:35:20 -07:00
Derek Collison
bb11f7bd2d Merge pull request #1111 from nats-io/latency
Track latency for exported services
2019-08-30 11:02:36 -07:00
Derek Collison
7989118c3f First pass latency tracking for exported services
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 10:52:48 -07:00
Ivan Kozlovic
2a8973a62b Fixed flushOutbound
With Go 1.12 (strangely was not able to reproduce with Go 1.11)
the test TestRouteNoCrashOnAddingSubToRoute() would frequently
locks up and consume all avail CPUs on the machine. Running
this test with GOMAXPROCS=2 you would see server.test CPU usage
pegged at 200% (assuming you have at least 2 CPUs).
The reason was that the writeLoop was spinning because another
routine was already in flushOutbound() and stack trace would
show that it was stuck in system calls. It seems that even though
the writeLoop does release the lock but grab it right away was
not allowing the syscall to complete.

So decided to put back the unlock/gosched/lock back in flushOutbound()
when flag is already set, but then protect the closeConnection()
with its own flag (similar to clearConnection) to not re-introduce
issue fixed in #1092.

Had to fix the benchmark test RoutedInterestGraph because after a
route is accepted, the initial PING will be sent after 1sec which
was breaking this test.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-29 12:59:27 -06:00
Ivan Kozlovic
cd4b8d3fad [ADDED] /leafz endpoint
Resolves #1061

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 12:00:24 -06:00
Ivan Kozlovic
cd9f898eb0 Made a server's helper to set first ping timer
Defaults to 1sec but will be opts.PingInterval if value is lower.
All non client connections invoked this function for the first
PING.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 10:21:43 -06:00
Ivan Kozlovic
90d592e163 Leaf and Route RTT
When a leaf or route connection is created, set the first ping
timer to fire at 1sec, which will allow to compute the RTT
reasonably soon (since the PingInterval could be user configured
and set much higher).

For Route in PR #1101, I was sending the PING on receiving the
INFO which required changing bunch of tests. Changing that to
also use the first timer interval of 1sec and reverted changes
to route tests.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 09:34:17 -06:00
Ivan Kozlovic
5518cbe070 Merge pull request #1106 from nats-io/fix_leafnode
[FIXED] Some Leafnode issues
2019-08-26 09:17:36 -06:00
Ivan Kozlovic
7ca8723942 [FIXED] Some Leafnode issues
- On startup, verify that local account in leafnode (if specified
  can be found otherwise fail startup).
- At runtime, print error and continue trying to reconnect.
  Will need to decide a better approach.
- When using basic auth (user/password), it was possible for a
  solicited Leafnode connection to not use user/password when
  trying an URL that was discovered through gossip. The server
  now saves the credentials of a configured URL to use with
  the discovered ones.

Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-23 14:08:07 -06:00
Guangming Wang
927991321d Cleanup: fix some typos in code comment
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-22 21:36:37 +08:00
Ivan Kozlovic
2959b982ea Merge pull request #1101 from nats-io/route_rtt
[ADDED] RTT in routez's route info
2019-08-20 17:23:18 -06:00
Ivan Kozlovic
77c63dbce1 Fix flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 17:07:22 -06:00
Ivan Kozlovic
2f48ad5150 Fixed subscription close
I noticed that TestNoRaceRoutedQueueAutoUnsubscribe started to
fail a lot on Travis. Running locally I could see a 45 to 50%
failures. After investigation I realized that the issue was that
we have wrongly re-used `subscription.nm` and set to -1 on unsubscribe
however, I believe that it was possible that when subscription was
closed, the server may have already picked that consumer for a delivery
which then causes nm==-1 to be bumped to 0, which was wrong.
Commenting out the subscription.close() that sets nm to -1, I could
not get the test to fail on macOS but would still get 7% failure on
Linux VM. Adding the check to see if sub is closed in deliverMsg()
completely erase the failures, even on Linux VM.

We could still use `nm` set to -1 but check on deliverMsg(), the
same way I use the closed int32 now.

Fixed some flappers.
Updated .travis.yml to failfast if one of the command in the
`script` fails. User `set -e` and `set +e` as recommended in
https://github.com/travis-ci/travis-ci/issues/1066

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:39:23 -06:00
Ivan Kozlovic
89dd13f134 [ADDED] RTT in routez's route info
Added the RTT field to each route reported in routez.
Ensure that when a route is accepted, we send a PING to compute
the first RTT and don't have to wait for the ping timer to fire.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:16:07 -06:00
Ivan Kozlovic
07e3db6b8e Prepare for v2.0.4 with goreleaser
Also fixed some flappers

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-15 09:06:56 -06:00
Derek Collison
0fd42cffcb Merge pull request #1096 from nats-io/flap
Fix for flapping test
2019-08-15 07:50:41 -07:00
Guangming Wang
09954eee5c cleanup: fix word errors in errors.go
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-15 22:12:57 +08:00
Derek Collison
93313a149e Fix for flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-14 23:52:49 -07:00
Waldemar Quevedo
5c776d4363 Fix typo
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2019-08-13 19:59:28 -07:00
Ivan Kozlovic
c20afd4016 [FIXED] Connection could be closed twice
This was introduced in PR#930. The first commit had the route's
check if the flushOutbound() returned false, and if so would
locally unlock/lock the connection's lock. Unfortunately, this
was replaced in the second commit (a6aeed3a6b)
to the flushOutbound() function itself.
This causes the function closeConnection() to possibly unlock
the connection while calling flushOutbound(), which if the
connection is closed due to both a tls timeout for instance
and explicitly, it would result in the connection being scheduled
for a reconnect (if explicit gateway connection, possibly route).

Added defensive code in Gateway to register a unique outbound gateway.

Fixed a test that was now failing with newer Go version in which
they fixed url.Parse()

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-13 20:11:03 -06:00
Stephen Asbury
4d63709852 Added support for service response types
Test checks that response types are initialized
Updated to latest JWT library with response types
Updated jwt in vendor
2019-08-09 17:54:17 -07:00
Derek Collison
2fad14a915 Add in copy of respMap on reload
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-06 18:43:06 -07:00