Commit Graph

76 Commits

Author SHA1 Message Date
Ivan Kozlovic
a22da91647 [FIXED] Closing of Gateway or Route TLS connection may hang
This could happen if the remote server is running but not dequeueing
from the socket. TLS connection Close() may send/read and so we
need to protect with a deadline.

For non client/leaf connection, do not call flushOutbound().
Set the write deadline regardless of handshakeComplete flag, and
set it to a low value.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-04 17:27:00 -07:00
Ivan Kozlovic
cd9f898eb0 Made a server's helper to set first ping timer
Defaults to 1sec but will be opts.PingInterval if value is lower.
All non client connections invoked this function for the first
PING.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 10:21:43 -06:00
Ivan Kozlovic
90d592e163 Leaf and Route RTT
When a leaf or route connection is created, set the first ping
timer to fire at 1sec, which will allow to compute the RTT
reasonably soon (since the PingInterval could be user configured
and set much higher).

For Route in PR #1101, I was sending the PING on receiving the
INFO which required changing bunch of tests. Changing that to
also use the first timer interval of 1sec and reverted changes
to route tests.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 09:34:17 -06:00
Ivan Kozlovic
7ca8723942 [FIXED] Some Leafnode issues
- On startup, verify that local account in leafnode (if specified
  can be found otherwise fail startup).
- At runtime, print error and continue trying to reconnect.
  Will need to decide a better approach.
- When using basic auth (user/password), it was possible for a
  solicited Leafnode connection to not use user/password when
  trying an URL that was discovered through gossip. The server
  now saves the credentials of a configured URL to use with
  the discovered ones.

Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-23 14:08:07 -06:00
Ivan Kozlovic
2959b982ea Merge pull request #1101 from nats-io/route_rtt
[ADDED] RTT in routez's route info
2019-08-20 17:23:18 -06:00
Ivan Kozlovic
77c63dbce1 Fix flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 17:07:22 -06:00
Ivan Kozlovic
89dd13f134 [ADDED] RTT in routez's route info
Added the RTT field to each route reported in routez.
Ensure that when a route is accepted, we send a PING to compute
the first RTT and don't have to wait for the ping timer to fire.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:16:07 -06:00
Ivan Kozlovic
03930ba0e4 [UPDATED] Reduce report of failed connection attempts
This applies to routes, gateways and leaf node connections.
The failed attempts will be printed at the first, after the first
minute and then every hour.
The connect/error statements now include the attempt number.

Note that in debug mode, all attempts are traced, so you may get
double trace (one for debug, one for info/error).

Resolves #969

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-20 10:13:56 -06:00
Derek Collison
d7140a0fd1 Update for client rename
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-10 15:11:30 -07:00
Ivan Kozlovic
288f00ff81 Fixed panic when server needs to send message to more than 8 routes
Resolves #955

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-04-17 13:02:41 -06:00
Ivan Kozlovic
d654b18476 Fixed reload of boolean flags
PR #874 caused an issue in case logtime was actually not configured
and not specified in the command line. A reload would then remove
logtime.

Revisited the fix for that and included other boolean flags, such
as debug, trace, etc..

Related to #874

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-01-14 19:18:00 -07:00
Derek Collison
18bca5603f Added server version and cluster name to statsz.
Fixed account connection accounting sending after local connections is 0.

Signed-off-by: Derek Collison <derek@nats.io>
2018-12-06 10:57:39 -08:00
Ivan Kozlovic
1817b354e3 Update tests on Travis with tweaked GC settings
Moved some tests to "no race" tests that are run separately.
Removing -v and adding -p=1.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-11-08 16:56:20 -07:00
Derek Collison
ea5a6d9589 Updates for comments, some golint fixes
Signed-off-by: Derek Collison <derek@nats.io>
2018-10-31 20:28:44 -07:00
Derek Collison
47963303f8 First pass at new cluster design
Signed-off-by: Derek Collison <derek@nats.io>
2018-10-24 21:29:29 -07:00
Ivan Kozlovic
d5ceade750 Merge pull request #753 from nats-io/route_perms_reload
[ADDED] Support for route permissions config reload
2018-09-27 10:08:55 -06:00
Ivan Kozlovic
2a1811b600 Fixed flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-09-26 15:58:48 -06:00
Ivan Kozlovic
e7f5cc82f0 Updates
- Use stack buffers
- Ensure that buffer size is no greater than 90% of max_pending
- Added test with low max_pending

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-09-26 12:19:14 -06:00
Ivan Kozlovic
846544ecfe Merge pull request #747 from nats-io/update_route_perms
[CHANGED] Cluster permissions moved out of cluster's authorization
2018-09-11 10:04:13 -06:00
Ivan Kozlovic
e1202dd30a [CHANGED] Cluster permissions moved out of cluster's authorization
It will be possible to set subjects permissions regardless of the
presence of an authorization block.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-09-10 17:03:50 -06:00
Waldemar Quevedo
5e3950df0a Add Warnf to logger interface
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2018-09-10 14:50:48 -07:00
Derek Collison
90a3a1d8b4 Slow down sweeper to make sure we receive all messages
Signed-off-by: Derek Collison <derek@nats.io>
2018-07-02 12:02:59 -07:00
Derek Collison
3b953ce838 Allow localhost to not be defined, only need 127.0.0.1
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-28 16:10:19 -07:00
Ivan Kozlovic
aff1dcf089 Fix some tests
Add some helpers to check on some state.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-06-27 17:26:49 -06:00
Derek Collison
844f376140 Performance optimizations, beta3, fixes to various tests.
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-11 15:11:03 -07:00
Derek Collison
4dd4d2bd9d lock users access
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-04 17:45:05 -07:00
Derek Collison
26dafe464b Don't send route unsub with max
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-04 17:45:05 -07:00
Derek Collison
049db6e854 Support for queue subscriber retries over routes
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-04 17:45:05 -07:00
Derek Collison
50a99241ea Slow consumer updates and latency improvements.
Use pending bytes as slow consumer trigger, so reintroduce max_pending.
Improve latency with inplace flush calls when appropriate. Utilize simple
time budget for readLoop routine.

Signed-off-by: Derek Collison <derek@nats.io>
2018-06-04 17:45:05 -07:00
Ivan Kozlovic
ac42bb0bb9 Remove route connection from temp map
When a route connection is created, the server will keep track
of the client structure in a special map until the route protocol
completes. This is meant so that if the server is shutdown before
the route is registered in routes map, the server can kick out
the connection's readLoop.

The route connection was correctly removed on success, but was
not for route connections that were not registered and dropped.
This was not causing any issue, but for correctness, doing the
removal now when server removes a route connection.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-04-11 16:49:08 -06:00
Ivan Kozlovic
40cf0107d6 Ensure sig handler routine returns on shutdown, turn it off in most tests
I noticed that when running the test suite, there would be a file
server/log1.txt left. This file is created by one of the config
reload test. Running this test individually was doing the proper
cleanup. I noticed that the Signal test that was checking
that files could be rotated was causing this side effect.
It turns out that none of the config reload tests were disabling
the signal handler (NoSigs=true), and since the go routine would
be left running, running the TestSignalToReOpenLogFile() test
would interact with an already finished test.

I put a thread dump in handleSignals() to track all tests that
were causing this function to start the go routine because NoSigs
was not set to true. I fixed all those tests. At this time, there
are only 2 tests that need to start the signal handler.

I have also fixed the code so that the signal handler routine select
on a server quitCh that is closed on shutdown so that this go routine
exit and is waiting on using the grWG wait group.
2018-04-06 17:14:02 -06:00
Derek Collison
b2a9ed97d6 Merge pull request #650 from nats-io/cncf
Move to CNCF and Apache 2 License
2018-03-16 16:23:10 -07:00
Derek Collison
00901acc78 Update license to Apache 2 2018-03-15 22:31:07 -07:00
Ivan Kozlovic
ee8f001131 [IMPROVED] Better attempt at delivering messages to routed queue subs
This PR is based out of #633. It imroves parsing QRSID so that the
TestRouteQueueSemantics test now passes (when dealing with malformed
QRSID).
A test similar to what is reported in #632 was also added. This
test however, uncovers a race condition that will be fixed in a
separate PR.

Resolves #632
2018-03-09 14:45:52 -07:00
Ivan Kozlovic
461ab96a58 Additional fix to #631
This is the result of flapping tests in go-nats that were caused
by a defect (see PR https://github.com/nats-io/go-nats/pull/348).
However, during debugging, I realize that there were also things
that were not quite right in the server side. This change should
make it the notification of cluster topology changes to clients
more robust.
2018-03-05 20:03:46 -07:00
Ivan Kozlovic
aeca31ce51 [FIXED] Cluster toplogy change possibly not sent to clients
When a server accepts a route, it will keep track of that server
`connectURLs` array. However, if the server was creating a route
to that other server at the same time, it will promote the route
as a solicited one. The content of that array was not transfered,
which means that on a disconnect, it was possible that the cluster
topology change was not properly sent to clients.
2018-03-04 14:38:03 -07:00
Ivan Kozlovic
1acf330e07 [ADDED] Notification to clients when servers leave the cluster
Until now, a server would only notify clients of servers that join
the cluster. More than that, a server would send ot its clients only
information if new servers were added.
This PR changes this by sending to clients that support async INFO
the list of URLs for all servers in the cluster any time that there
is a change (joining or leaving the cluster).
As of now, clients will not be affected by the change (and will not
take benefit of this: removing servers from their server pool). This
will be addressed in each supported client once this is merged.
2018-02-27 14:22:13 -07:00
Ivan Kozlovic
acf4a31e4b Major updates + support for config reload of client/cluster advertise 2018-02-05 20:15:36 -07:00
Peter Miron
306a3f9507 Resolving Ivan's feedback. 2017-11-29 15:35:05 -05:00
Peter Miron
4829592107 removed support for array of Advertise addresses. Added support for Route advertise address. 2017-11-29 11:41:08 -05:00
Ivan Kozlovic
69891d37ac Fix import ordering 2017-07-19 09:02:25 -06:00
Ivan Kozlovic
0e2882d741 [FIXED] Handling of duplicate routes
When A connects to B and B connects to A (either based on static
configuration - explicit routes, or because of auto-discovery -
implicit routes), it is possible that each server initially
registers the route from the opposite TCP connection. It will
then result in each server dropping the connection.

We were previously setting a retry flag in the first accepted route
based on the name of servers, which means that regardless of
duplicate detection, the server with the "smaller" server name would
try to reconnect when the route connection was closed. For instance,
suppose that server B connects to server A, when B disconnects, A
would try to reconnect once to B. This became problematic in the
case of configuration reload, because removing the route from B to
A would still result in a route created from A to B.

Also, when a route attempts a reconnect, a random delay is added
to avoid repeated failure cycles that may occur in case where
A connects to B and B to A.
2017-07-18 18:25:56 -06:00
Ivan Kozlovic
6755617914 Bumped timeout for checkClusterFormed and use stackFatalf() 2017-07-11 10:19:34 -06:00
Tyler Treat
ca1b3485cb Fix "error opening file" errors in tests for Windows
This sort of just punts on the problem by not creating log files in the
tests, but it seemed like the simplest solution.
2017-07-10 12:21:01 -05:00
Tyler Treat
6e3f58f48c Fix tests and cleanup log output 2017-06-14 10:38:21 -05:00
Peter Miron
00744ff426 converted MonitorAddr and ClusterAddr to *net.TCPAddr 2017-06-12 17:40:36 -04:00
Peter Miron
606502091c Updated based on @tylertreat feedback. 2017-06-12 10:48:30 -04:00
Peter Miron
d1f38f38a2 changes to support random ports for clusters and profiler. 2017-06-10 10:35:01 -04:00
Tyler Treat
9902c3da84 First pass at implementing config reload 2017-05-30 16:18:36 -05:00
Derek Collison
acb740dbe9 Use checkClusterFormed 2017-04-20 18:34:08 -07:00