Commit Graph

2218 Commits

Author SHA1 Message Date
Derek Collison
67470911fe Prune remote reply tracking
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 17:35:20 -07:00
Derek Collison
bb11f7bd2d Merge pull request #1111 from nats-io/latency
Track latency for exported services
2019-08-30 11:02:36 -07:00
Derek Collison
7989118c3f First pass latency tracking for exported services
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-30 10:52:48 -07:00
Ivan Kozlovic
8ca58c7de0 Merge pull request #1110 from nats-io/fix_flush_outbound
Fixed flushOutbound
2019-08-29 13:32:37 -06:00
Ivan Kozlovic
2a8973a62b Fixed flushOutbound
With Go 1.12 (strangely was not able to reproduce with Go 1.11)
the test TestRouteNoCrashOnAddingSubToRoute() would frequently
locks up and consume all avail CPUs on the machine. Running
this test with GOMAXPROCS=2 you would see server.test CPU usage
pegged at 200% (assuming you have at least 2 CPUs).
The reason was that the writeLoop was spinning because another
routine was already in flushOutbound() and stack trace would
show that it was stuck in system calls. It seems that even though
the writeLoop does release the lock but grab it right away was
not allowing the syscall to complete.

So decided to put back the unlock/gosched/lock back in flushOutbound()
when flag is already set, but then protect the closeConnection()
with its own flag (similar to clearConnection) to not re-introduce
issue fixed in #1092.

Had to fix the benchmark test RoutedInterestGraph because after a
route is accepted, the initial PING will be sent after 1sec which
was breaking this test.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-29 12:59:27 -06:00
Ivan Kozlovic
8a0120d1b8 Merge pull request #1108 from nats-io/add_leafz
[ADDED] /leafz endpoint
2019-08-26 12:46:44 -06:00
Ivan Kozlovic
cd4b8d3fad [ADDED] /leafz endpoint
Resolves #1061

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 12:00:24 -06:00
Ivan Kozlovic
6c4a88f34e Merge pull request #1107 from nats-io/leaf_route_rtt
Leaf and Route RTT
2019-08-26 11:57:54 -06:00
Ivan Kozlovic
cd9f898eb0 Made a server's helper to set first ping timer
Defaults to 1sec but will be opts.PingInterval if value is lower.
All non client connections invoked this function for the first
PING.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 10:21:43 -06:00
Ivan Kozlovic
90d592e163 Leaf and Route RTT
When a leaf or route connection is created, set the first ping
timer to fire at 1sec, which will allow to compute the RTT
reasonably soon (since the PingInterval could be user configured
and set much higher).

For Route in PR #1101, I was sending the PING on receiving the
INFO which required changing bunch of tests. Changing that to
also use the first timer interval of 1sec and reverted changes
to route tests.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 09:34:17 -06:00
Ivan Kozlovic
5518cbe070 Merge pull request #1106 from nats-io/fix_leafnode
[FIXED] Some Leafnode issues
2019-08-26 09:17:36 -06:00
Ivan Kozlovic
7ca8723942 [FIXED] Some Leafnode issues
- On startup, verify that local account in leafnode (if specified
  can be found otherwise fail startup).
- At runtime, print error and continue trying to reconnect.
  Will need to decide a better approach.
- When using basic auth (user/password), it was possible for a
  solicited Leafnode connection to not use user/password when
  trying an URL that was discovered through gossip. The server
  now saves the credentials of a configured URL to use with
  the discovered ones.

Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-23 14:08:07 -06:00
Derek Collison
452a98572a Merge pull request #1105 from ethan-daocloud/typo-terminator
Cleanup: fix some typos in code comment
2019-08-22 07:57:29 -07:00
Guangming Wang
927991321d Cleanup: fix some typos in code comment
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-22 21:36:37 +08:00
Ivan Kozlovic
2959b982ea Merge pull request #1101 from nats-io/route_rtt
[ADDED] RTT in routez's route info
2019-08-20 17:23:18 -06:00
Ivan Kozlovic
41a7457ecf Merge pull request #1103 from nats-io/flappers
Fix flappers
2019-08-20 17:21:56 -06:00
Ivan Kozlovic
77c63dbce1 Fix flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 17:07:22 -06:00
Ivan Kozlovic
02acbb6419 Merge pull request #1102 from nats-io/fix_sub_close
Fixed subscription close
2019-08-20 15:14:29 -06:00
Ivan Kozlovic
2f48ad5150 Fixed subscription close
I noticed that TestNoRaceRoutedQueueAutoUnsubscribe started to
fail a lot on Travis. Running locally I could see a 45 to 50%
failures. After investigation I realized that the issue was that
we have wrongly re-used `subscription.nm` and set to -1 on unsubscribe
however, I believe that it was possible that when subscription was
closed, the server may have already picked that consumer for a delivery
which then causes nm==-1 to be bumped to 0, which was wrong.
Commenting out the subscription.close() that sets nm to -1, I could
not get the test to fail on macOS but would still get 7% failure on
Linux VM. Adding the check to see if sub is closed in deliverMsg()
completely erase the failures, even on Linux VM.

We could still use `nm` set to -1 but check on deliverMsg(), the
same way I use the closed int32 now.

Fixed some flappers.
Updated .travis.yml to failfast if one of the command in the
`script` fails. User `set -e` and `set +e` as recommended in
https://github.com/travis-ci/travis-ci/issues/1066

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:39:23 -06:00
Ivan Kozlovic
89dd13f134 [ADDED] RTT in routez's route info
Added the RTT field to each route reported in routez.
Ensure that when a route is accepted, we send a PING to compute
the first RTT and don't have to wait for the ping timer to fire.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:16:07 -06:00
Ivan Kozlovic
25e15d7162 Merge pull request #1100 from nats-io/remove_skip_email_notifications
Removing skipping email notification [ci skip]
2019-08-15 12:37:17 -06:00
Ivan Kozlovic
ef65ccbfd4 Removing skipping email notification [ci skip]
I had put that in place when testing the release process.
Removing now.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-15 11:00:34 -06:00
Ivan Kozlovic
c8ca58efb4 Merge pull request #1095 from nats-io/goreleaser
Prepare for v2.0.4 with goreleaser
v2.0.4
2019-08-15 10:02:32 -06:00
Ivan Kozlovic
a44e3f037c Change checksum file name to SHA256SUMS so it can be used with rget
See:
https://github.com/merklecounty/rget/blob/master/Documentation/integrations.md
6c02b98031/.goreleaser.yml (L30)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-15 09:07:20 -06:00
Ivan Kozlovic
e230e7fde9 Attempt at fixing flapper again
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-15 09:06:56 -06:00
Ivan Kozlovic
fc8087daa7 Updates based on comments
- add sha256 algo
- move some mem hungry tests while running with -race to the norace
- remove GOGC=10

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-15 09:06:56 -06:00
Ivan Kozlovic
07e3db6b8e Prepare for v2.0.4 with goreleaser
Also fixed some flappers

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-15 09:06:56 -06:00
Derek Collison
0fd42cffcb Merge pull request #1096 from nats-io/flap
Fix for flapping test
2019-08-15 07:50:41 -07:00
Derek Collison
2657092a04 Merge pull request #1098 from ethan-daocloud/patch-1
cleanup: fix word errors in errors.go
2019-08-15 07:36:26 -07:00
Guangming Wang
09954eee5c cleanup: fix word errors in errors.go
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-15 22:12:57 +08:00
Derek Collison
93313a149e Fix for flapping test
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-14 23:52:49 -07:00
Derek Collison
c185a88146 Merge pull request #1094 from nats-io/cover
tmp disable of coveralls testing due to site down
2019-08-13 20:26:30 -07:00
Derek Collison
1fc7e99e76 tmp disable of coveralls testing due to site down
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-13 20:16:18 -07:00
Derek Collison
e76f66c1cd Merge pull request #1093 from wallyqs/typo
Fix typo
2019-08-13 20:05:08 -07:00
Waldemar Quevedo
5c776d4363 Fix typo
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2019-08-13 19:59:28 -07:00
Ivan Kozlovic
a11524c543 Merge pull request #1092 from nats-io/fix_duplicate_conn_close
[FIXED] Connection could be closed twice
2019-08-13 20:36:22 -06:00
Ivan Kozlovic
c20afd4016 [FIXED] Connection could be closed twice
This was introduced in PR#930. The first commit had the route's
check if the flushOutbound() returned false, and if so would
locally unlock/lock the connection's lock. Unfortunately, this
was replaced in the second commit (a6aeed3a6b)
to the flushOutbound() function itself.
This causes the function closeConnection() to possibly unlock
the connection while calling flushOutbound(), which if the
connection is closed due to both a tls timeout for instance
and explicitly, it would result in the connection being scheduled
for a reconnect (if explicit gateway connection, possibly route).

Added defensive code in Gateway to register a unique outbound gateway.

Fixed a test that was now failing with newer Go version in which
they fixed url.Parse()

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-13 20:11:03 -06:00
Derek Collison
5620300074 Merge pull request #1091 from nats-io/jwt_response_type
Added support for service response types
2019-08-12 09:03:20 -06:00
Stephen Asbury
4d63709852 Added support for service response types
Test checks that response types are initialized
Updated to latest JWT library with response types
Updated jwt in vendor
2019-08-09 17:54:17 -07:00
Derek Collison
49ab0dfbb7 Merge pull request #1090 from nats-io/streams
Add ability for cross account import services to return streams
2019-08-06 19:40:47 -07:00
Derek Collison
2fad14a915 Add in copy of respMap on reload
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-06 18:43:06 -07:00
Derek Collison
35c96713a0 fixes based on feedback
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-06 15:55:33 -07:00
Derek Collison
8f5bc503e5 Add ability for cross account import services to return streams as well as singeltons.
Take into account tracking of response maps that are created and do proper cleanup.
Also fixes #1089 which was discovered while working on this.

Signed-off-by: Derek Collison <derek@nats.io>
2019-08-06 14:15:40 -07:00
Derek Collison
5252099ff6 Bump version [ci skip]
Signed-off-by: Derek Collison <derek@nats.io>
2019-08-01 15:44:39 -07:00
Ivan Kozlovic
f5a6c0d476 Merge pull request #1087 from nats-io/reduce_memuse_routing
[FIXED] Reduce memory usage on routes
2019-07-29 21:00:30 -06:00
Ivan Kozlovic
b537f130cc Use goto to remove entry from cache
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-29 20:52:57 -06:00
Ivan Kozlovic
6fd6ac2821 Update based on comments
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-29 20:38:22 -06:00
Derek Collison
5d9ca4a919 Merge pull request #1088 from nats-io/leafnode
Add leafnode connections to varz
2019-07-29 22:30:01 -04:00
Derek Collison
ea33b1093b Add leafnode connections to varz
Signed-off-by: Derek Collison <derek@nats.io>
2019-07-29 21:43:30 -04:00
Ivan Kozlovic
887e744d07 [FIXED] Reduce memory usage on routes
When a route receives a message, it uses a thread local cache to
find the account and subscriptions match for a given subject.
When not found, an entry is added to this cache. The problem is
that this cache will reference subscriptions that in turn
reference connections.
When the subscriptions/connections are closed, this thread local
cannot be purged from those closed subscriptions (since it is
thread local - no lockin is used).
The real issue is that connection's buffer was not set to nil on
close, which then could cause more than expected memory to be
still referenced. Setting the buffer to nil will help reduce the
memory being used.
When an entry is added to the cache, the cache may reach a size
that will cause the server to prune some entries. From time to
time, the cache will be scanned to look for entries that contain
only closed subscriptions and remove those.

Resolves #1082

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-07-29 17:54:21 -06:00