Commit Graph

1435 Commits

Author SHA1 Message Date
Ivan Kozlovic
effa30ce4a [FIXED] MaxPending > MaxInt32 causes client to be disconnected
Changed some of client.outbound fields to int64.
Moved fields around to minimize size of struct (checked with
unsafe.Sizeof())
Checked benchmark results before/after
Added test

Resolves #1118

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-11 14:29:02 -06:00
Ivan Kozlovic
4253b31dcf [FIXED] Circular account service import dependency
If account A imports from B and B from A, when the account A
is built, it causes B to be fetch, but since B imports from A,
A was fetch/built again in an infinite loop.

Resolves #1117

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-09-10 18:05:21 -06:00
Derek Collison
97f89ffd3f Merge pull request #1115 from nats-io/update-system-account
Update SYS account name
2019-09-05 19:09:42 +03:00
Jaime Piña
176a19de75 Update SYS account name
Currently, the $SYSTEM subject is used in this repo, but it seems like this
subject name is out of date.

This change updates the code to use $SYS to match the documentation.
2019-09-04 13:49:59 -05:00
Derek Collison
67470911fe Prune remote reply tracking
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 17:35:20 -07:00
Derek Collison
bb11f7bd2d Merge pull request #1111 from nats-io/latency
Track latency for exported services
2019-08-30 11:02:36 -07:00
Derek Collison
7989118c3f First pass latency tracking for exported services
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 10:52:48 -07:00
Ivan Kozlovic
2a8973a62b Fixed flushOutbound
With Go 1.12 (strangely was not able to reproduce with Go 1.11)
the test TestRouteNoCrashOnAddingSubToRoute() would frequently
locks up and consume all avail CPUs on the machine. Running
this test with GOMAXPROCS=2 you would see server.test CPU usage
pegged at 200% (assuming you have at least 2 CPUs).
The reason was that the writeLoop was spinning because another
routine was already in flushOutbound() and stack trace would
show that it was stuck in system calls. It seems that even though
the writeLoop does release the lock but grab it right away was
not allowing the syscall to complete.

So decided to put back the unlock/gosched/lock back in flushOutbound()
when flag is already set, but then protect the closeConnection()
with its own flag (similar to clearConnection) to not re-introduce
issue fixed in #1092.

Had to fix the benchmark test RoutedInterestGraph because after a
route is accepted, the initial PING will be sent after 1sec which
was breaking this test.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-29 12:59:27 -06:00
Ivan Kozlovic
cd4b8d3fad [ADDED] /leafz endpoint
Resolves #1061

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 12:00:24 -06:00
Ivan Kozlovic
cd9f898eb0 Made a server's helper to set first ping timer
Defaults to 1sec but will be opts.PingInterval if value is lower.
All non client connections invoked this function for the first
PING.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 10:21:43 -06:00
Ivan Kozlovic
90d592e163 Leaf and Route RTT
When a leaf or route connection is created, set the first ping
timer to fire at 1sec, which will allow to compute the RTT
reasonably soon (since the PingInterval could be user configured
and set much higher).

For Route in PR #1101, I was sending the PING on receiving the
INFO which required changing bunch of tests. Changing that to
also use the first timer interval of 1sec and reverted changes
to route tests.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 09:34:17 -06:00
Ivan Kozlovic
5518cbe070 Merge pull request #1106 from nats-io/fix_leafnode
[FIXED] Some Leafnode issues
2019-08-26 09:17:36 -06:00
Ivan Kozlovic
7ca8723942 [FIXED] Some Leafnode issues
- On startup, verify that local account in leafnode (if specified
  can be found otherwise fail startup).
- At runtime, print error and continue trying to reconnect.
  Will need to decide a better approach.
- When using basic auth (user/password), it was possible for a
  solicited Leafnode connection to not use user/password when
  trying an URL that was discovered through gossip. The server
  now saves the credentials of a configured URL to use with
  the discovered ones.

Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-23 14:08:07 -06:00
Guangming Wang
927991321d Cleanup: fix some typos in code comment
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-22 21:36:37 +08:00
Ivan Kozlovic
2959b982ea Merge pull request #1101 from nats-io/route_rtt
[ADDED] RTT in routez's route info
2019-08-20 17:23:18 -06:00
Ivan Kozlovic
77c63dbce1 Fix flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 17:07:22 -06:00
Ivan Kozlovic
2f48ad5150 Fixed subscription close
I noticed that TestNoRaceRoutedQueueAutoUnsubscribe started to
fail a lot on Travis. Running locally I could see a 45 to 50%
failures. After investigation I realized that the issue was that
we have wrongly re-used `subscription.nm` and set to -1 on unsubscribe
however, I believe that it was possible that when subscription was
closed, the server may have already picked that consumer for a delivery
which then causes nm==-1 to be bumped to 0, which was wrong.
Commenting out the subscription.close() that sets nm to -1, I could
not get the test to fail on macOS but would still get 7% failure on
Linux VM. Adding the check to see if sub is closed in deliverMsg()
completely erase the failures, even on Linux VM.

We could still use `nm` set to -1 but check on deliverMsg(), the
same way I use the closed int32 now.

Fixed some flappers.
Updated .travis.yml to failfast if one of the command in the
`script` fails. User `set -e` and `set +e` as recommended in
https://github.com/travis-ci/travis-ci/issues/1066

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:39:23 -06:00
Ivan Kozlovic
89dd13f134 [ADDED] RTT in routez's route info
Added the RTT field to each route reported in routez.
Ensure that when a route is accepted, we send a PING to compute
the first RTT and don't have to wait for the ping timer to fire.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:16:07 -06:00
Ivan Kozlovic
07e3db6b8e Prepare for v2.0.4 with goreleaser
Also fixed some flappers

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-15 09:06:56 -06:00
Derek Collison
0fd42cffcb Merge pull request #1096 from nats-io/flap
Fix for flapping test
2019-08-15 07:50:41 -07:00
Guangming Wang
09954eee5c cleanup: fix word errors in errors.go
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-15 22:12:57 +08:00
Derek Collison
93313a149e Fix for flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-14 23:52:49 -07:00
Waldemar Quevedo
5c776d4363 Fix typo
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2019-08-13 19:59:28 -07:00
Ivan Kozlovic
c20afd4016 [FIXED] Connection could be closed twice
This was introduced in PR#930. The first commit had the route's
check if the flushOutbound() returned false, and if so would
locally unlock/lock the connection's lock. Unfortunately, this
was replaced in the second commit (a6aeed3a6b)
to the flushOutbound() function itself.
This causes the function closeConnection() to possibly unlock
the connection while calling flushOutbound(), which if the
connection is closed due to both a tls timeout for instance
and explicitly, it would result in the connection being scheduled
for a reconnect (if explicit gateway connection, possibly route).

Added defensive code in Gateway to register a unique outbound gateway.

Fixed a test that was now failing with newer Go version in which
they fixed url.Parse()

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-13 20:11:03 -06:00
Stephen Asbury
4d63709852 Added support for service response types
Test checks that response types are initialized
Updated to latest JWT library with response types
Updated jwt in vendor
2019-08-09 17:54:17 -07:00
Derek Collison
2fad14a915 Add in copy of respMap on reload
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-06 18:43:06 -07:00
Derek Collison
35c96713a0 fixes based on feedback
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-06 15:55:33 -07:00
Derek Collison
8f5bc503e5 Add ability for cross account import services to return streams as well as singeltons.
Take into account tracking of response maps that are created and do proper cleanup.
Also fixes #1089 which was discovered while working on this.

Signed-off-by: Derek Collison <derek@nats.io>
2019-08-06 14:15:40 -07:00
Derek Collison
5252099ff6 Bump version [ci skip]
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-01 15:44:39 -07:00
Ivan Kozlovic
f5a6c0d476 Merge pull request #1087 from nats-io/reduce_memuse_routing
[FIXED] Reduce memory usage on routes
2019-07-29 21:00:30 -06:00
Ivan Kozlovic
b537f130cc Use goto to remove entry from cache
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-29 20:52:57 -06:00
Ivan Kozlovic
6fd6ac2821 Update based on comments
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-29 20:38:22 -06:00
Derek Collison
ea33b1093b Add leafnode connections to varz
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-29 21:43:30 -04:00
Ivan Kozlovic
887e744d07 [FIXED] Reduce memory usage on routes
When a route receives a message, it uses a thread local cache to
find the account and subscriptions match for a given subject.
When not found, an entry is added to this cache. The problem is
that this cache will reference subscriptions that in turn
reference connections.
When the subscriptions/connections are closed, this thread local
cannot be purged from those closed subscriptions (since it is
thread local - no lockin is used).
The real issue is that connection's buffer was not set to nil on
close, which then could cause more than expected memory to be
still referenced. Setting the buffer to nil will help reduce the
memory being used.
When an entry is added to the cache, the cache may reach a size
that will cause the server to prune some entries. From time to
time, the cache will be scanned to look for entries that contain
only closed subscriptions and remove those.

Resolves #1082

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-29 17:54:21 -06:00
Derek Collison
5bec08ac6a Added support for user and activation token revocation
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-28 06:49:39 -07:00
Derek Collison
bf902d9e7c Add in user JWT support for ResponsePermissions
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-26 16:15:13 -07:00
Derek Collison
8bfe14bbfd check response perms more often, make sure we limit memory growth
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-25 16:53:54 -07:00
Derek Collison
495a1a7ec3 Allow dynamic publish permissions based on reply subjects of received msgs
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-25 13:17:26 -07:00
Derek Collison
df29be11ed Changes based on PR comments
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-22 18:37:40 -07:00
Derek Collison
1d6c58074f Fix for #1065 (leaked subscribers from dq subs across routes)
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-22 17:17:43 -07:00
Alberto Ricart
273e5af0a8 Fixed an issue where the leaf authentication was not checking for account/signers, so user JWTs signed by a signer failed authentication. 2019-07-17 16:03:55 -04:00
Ivan Kozlovic
b61744aa17 Release 2.0.2
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-15 09:49:00 -06:00
Andy Xie
2f99b144aa add ut for tracemsg 2019-07-15 14:02:02 +08:00
Derek Collison
7a3fb4ebe0 Merge pull request #1057 from andyxning/allow_limits_to_traced_message
allow limit to traced message
2019-07-14 21:34:31 -07:00
Andy Xie
cd214fca89 allow limit to traced message 2019-07-15 11:39:00 +08:00
Derek Collison
8262082289 If we read data and have an error, still process and parse data.
This is helpful for clients who send data and close the connection.
Also helpful to process errors like auth for solicited leafnodes.

Signed-off-by: Derek Collison <derek@nats.io>
2019-07-13 05:19:35 -07:00
Ivan Kozlovic
0873b46f67 [FIXED] LeafNode urls may be missing in INFO sent to LN connections
When a cluster of servers are having routes to each other, there
is a chance that the list of leafnode URLs maintained on each
server is not complete. This would result in LN servers connecting
to this cluster to not get the full list of possible URLs the
server could reconnect to.

Also fixed a DATA RACE that appeared when running the updated
TestLeafNodeInfoURLs test. Fixed the race and added specific
test that easily demonstrated the race: TestLeafNodeNoRaceGeneratingNonce

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-12 19:15:30 -06:00
Ivan Kozlovic
0a72993d80 Add warning for TLS insecure setting on LeafNodes
Also fix for #1071 in that we need to check remote gateways TLS
config even if main gateway section is not configured with TLS.

Related to #1071

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-12 17:22:57 -06:00
Derek Collison
7766f27616 Bump version to RC2 [ci skip]
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-12 14:29:07 -07:00
Derek Collison
18a2c357e4 Merge pull request #1072 from nats-io/handshake
Report authorization error and use TLS hostname for IPs on leafnodes.
2019-07-12 14:11:53 -07:00