Commit Graph

98 Commits

Author SHA1 Message Date
Ivan Kozlovic
b4128693ed Ensure file path is correct during stream restore
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.

Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2022-03-09 13:31:51 -07:00
Ivan Kozlovic
a025ce7472 Set defaultServerOptions port to -1 for random
Updated some tests based on this change but also missing defer
connection close or server shutdown.

Fixed how the OCSP run go routine would shutdown, which would
never complete because grWG was not decremented by this go routine
prior to invoking s.Shutdown()

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-09-02 14:22:56 -06:00
Derek Collison
476c264560 If we are in a simple mixed-mode setup with just global account and system account and clustered, allow pass through.
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-26 09:41:01 -07:00
Derek Collison
944dd248c4 Fix for tests
Signed-off-by: Derek Collison <derek@nats.io>
2021-08-14 17:39:51 -07:00
Ivan Kozlovic
4865dc7ae3 [CHANGED] Check that max_payload is not greater than max_pending
This is related to PR #2407. Since the 64MB pending size is actually
configurable, we should fail only if max_payload is greater than
the configured max_pending. This is done in validateOptions() which
covers both config file and direct options in embedded cases.
The check in opts.go is reverted to max int32 since at this point
we don't know if/what max_pending will be, so we simply check
that it is not more than a int32.

For the next minor release, we could have another change that
imposes a lower limit to max_payload (regardless if max_pending
is higher).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-08-04 16:33:21 -06:00
Ivan Kozlovic
d7933631a9 [FIXED] Failed route TLS handshake would leave failed conn's lock, locked
This is a regression introduced in v2.2.6.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-06-22 14:05:43 -06:00
Jaime Piña
e12181cb83 Return not ready for connection reason
Currently, we use ReadyForConnections in server tests to wait for the
server to be ready. However, when this fails we don't get a clue about
why it failed.

This change adds a new unexported method called readyForConnections that
returns an error describing which check failed. The exported
ReadyForConnections version works exactly as before. The unexported
version gets used in internal tests only.
2021-04-20 11:45:08 -07:00
Ivan Kozlovic
a7e5853a3c Fixed expected pid path in options
This was introduced by PR #2071.

On some tests, options are loaded based on a config file that has
the pid set to "/tm/nats-server/nats-server.pid", however, the
expected option's pid path was set based on tmpRoot. The problem
is that on macOS, that value would be "/var/folders/xxx" which
would not match.
So this PR simply reverts the changes to the expected pid file
name: it simply needs to match was in the test.conf file.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2021-04-08 15:08:23 -06:00
Jaime Piña
d929ee1348 Check errors when removing test directories and files
Currently in tests, we have calls to os.Remove and os.RemoveAll where we
don't check the returned error. This hides useful error messages when
tests fail to run, such as "too many open files".

This change checks for more filesystem related errors and calls t.Fatal
if there is an error.
2021-04-07 11:09:47 -07:00
Jaime Piña
e44275b963 Consolidate temporary test files and directories
Currently, temporary test files and directories are written in lots of
different paths within the OS's temp dir. This makes it hard to know
which files are from nats-server and which are unrelated. This in turn
makes it hard to clean up nats-server test files.
2021-04-06 10:42:55 -07:00
Derek Collison
f0cdf89c61 JetStream Clustering WIP
Signed-off-by: Derek Collison <derek@nats.io>
2021-01-14 01:14:52 -08:00
Ivan Kozlovic
13df1a55fd Changed warning message
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-10-09 09:36:30 -06:00
Ivan Kozlovic
df9d5f5fd9 Accepting route warns if remote server has same name
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-10-08 17:59:33 -06:00
Ivan Kozlovic
2ad2bed170 [ADDED] Support for route hostname resolution
We previously simply called DialTimeout() on a route's url when
soliciting. If it resolved to the IP of the host, it would create
a route to self, which server detects, but then would not try again
with other IPs that would have allowed to form a cluster with
other servers running on the other IPs.

This PR keeps track of local IPs + cluster port and exclude them
from the list of IPs returned by LookupHost API. This even prevent
solicitation of routes to self. Only non-local IPs will be tried.

Resolves #1586

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-09-08 13:40:17 -06:00
Phil Pennock
3c680eceb9 Inhibit Go's default TCP keepalive settings for NATS (#1562)
Inhibit Go's default TCP keepalive settings for NATS

Go 1.13 changed the semantics of the tuning parameters for TCP keepalives, including the default value.  This affects all TCP listeners.  The NATS protocol has its own L7 keepalive system (PING/PONG) and the Go defaults are not a good fit for some valid deployment scenarios, while Go doesn't directly expose a working API for tuning these.

Rather than add a configuration knob and pull in another dependency (with portability issues) just disable TCP keepalives for all listeners used for speaking the NATS protocol.

Change the tests so we test the same logic.  Do not change HTTP monitoring, profiling, or the websocket API listeners.

Change KeepAlive on client connections too.
2020-08-14 13:37:59 -04:00
Ivan Kozlovic
96ccf91566 [FIXED] Possible deadlock with solicited leafnodes when cluster conflict
We cannot call c.closeConnection() under the server lock because
closeConnection() can invoke server lock in some cases.

Created a test that should run without `-race` to reproduce the deadlock
(which it does) but sometimes would fail because cluster would not be
formed. This unconvered an issue with conflict resolution which
test TestRouteClusterNameConflictBetweenStaticAndDynamic() can reproduce
easily. The issue was that we were not updating a dynamic name with
the remote if the remote was non dynamic.

Resolves #1543

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-07-30 18:45:36 -06:00
Ivan Kozlovic
9288283d90 Fixed accept loops that could leave connections opened
This was discovered with the test TestLeafNodeWithGatewaysServerRestart
that was sometimes failing. Investigation showed that when cluster B
was shutdown, one of the server on A that had a connection from B
that just broke tried to reconnect (as part of reconnect retries of
implicit gateways) to a server in B that was in the process of shuting down.
The connection had been accepted but createGateway not called because
the server's running boolean had been set to false as part of the shutdown.
However, the connection was not closed so the server on A had a valid
connection to a dead server from cluster B. When the B cluster (now single
server) was restarted and a LeafNode connection connected to it, then
the gateway from B to A was created, that server on A did not create outbound
connection to that B server because it already had one (the zombie one).

So this PR strengthens the starting of accept loops and also make sure
that if a connection (all type of connections) is not accepted because
the server is shuting down, that connection is properly closed.

Since all accept loops had almost same code, made a generic function
that accept functions to call specific create connection functions.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-07-06 17:03:19 -06:00
Derek Collison
146d8f5dcb Updates based on feedback, sped up some slow tests
Signed-off-by: Derek Collison <derek@nats.io>
2020-06-12 17:26:43 -07:00
Derek Collison
dd61535e5a Cluster names are now required.
Added cluster names as required for prep work for clustered JetStream. System can dynamically pick a cluster name and settle on one even in large clusters.

Signed-off-by: Derek Collison <derek@nats.io>
2020-06-12 15:48:38 -07:00
Ivan Kozlovic
b9bd5c2d35 Fixed flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-06-09 15:34:52 -06:00
Derek Collison
2bd7553c71 System Account on by default.
Most of the changes are to turn it off for tests that were watching subscriptions and such.

Signed-off-by: Derek Collison <derek@nats.io>
2020-05-29 17:56:45 -07:00
Ivan Kozlovic
f76f0df5ce Remove update of start in readLoop
That broke sending async INFO in case where there was an update
between accepting the tcp connection and receiving the CONNECT
that indicates that client can receive async INFO.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-05-25 06:58:23 -07:00
Ivan Kozlovic
a22da91647 [FIXED] Closing of Gateway or Route TLS connection may hang
This could happen if the remote server is running but not dequeueing
from the socket. TLS connection Close() may send/read and so we
need to protect with a deadline.

For non client/leaf connection, do not call flushOutbound().
Set the write deadline regardless of handshakeComplete flag, and
set it to a low value.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-04 17:27:00 -07:00
Ivan Kozlovic
cd9f898eb0 Made a server's helper to set first ping timer
Defaults to 1sec but will be opts.PingInterval if value is lower.
All non client connections invoked this function for the first
PING.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 10:21:43 -06:00
Ivan Kozlovic
90d592e163 Leaf and Route RTT
When a leaf or route connection is created, set the first ping
timer to fire at 1sec, which will allow to compute the RTT
reasonably soon (since the PingInterval could be user configured
and set much higher).

For Route in PR #1101, I was sending the PING on receiving the
INFO which required changing bunch of tests. Changing that to
also use the first timer interval of 1sec and reverted changes
to route tests.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-26 09:34:17 -06:00
Ivan Kozlovic
7ca8723942 [FIXED] Some Leafnode issues
- On startup, verify that local account in leafnode (if specified
  can be found otherwise fail startup).
- At runtime, print error and continue trying to reconnect.
  Will need to decide a better approach.
- When using basic auth (user/password), it was possible for a
  solicited Leafnode connection to not use user/password when
  trying an URL that was discovered through gossip. The server
  now saves the credentials of a configured URL to use with
  the discovered ones.

Updated RouteRTT test in case RTT does not seem to be updated
because getting always the same value.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-23 14:08:07 -06:00
Ivan Kozlovic
2959b982ea Merge pull request #1101 from nats-io/route_rtt
[ADDED] RTT in routez's route info
2019-08-20 17:23:18 -06:00
Ivan Kozlovic
77c63dbce1 Fix flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 17:07:22 -06:00
Ivan Kozlovic
89dd13f134 [ADDED] RTT in routez's route info
Added the RTT field to each route reported in routez.
Ensure that when a route is accepted, we send a PING to compute
the first RTT and don't have to wait for the ping timer to fire.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-08-20 14:16:07 -06:00
Ivan Kozlovic
03930ba0e4 [UPDATED] Reduce report of failed connection attempts
This applies to routes, gateways and leaf node connections.
The failed attempts will be printed at the first, after the first
minute and then every hour.
The connect/error statements now include the attempt number.

Note that in debug mode, all attempts are traced, so you may get
double trace (one for debug, one for info/error).

Resolves #969

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-20 10:13:56 -06:00
Derek Collison
d7140a0fd1 Update for client rename
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-10 15:11:30 -07:00
Ivan Kozlovic
288f00ff81 Fixed panic when server needs to send message to more than 8 routes
Resolves #955

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-04-17 13:02:41 -06:00
Ivan Kozlovic
d654b18476 Fixed reload of boolean flags
PR #874 caused an issue in case logtime was actually not configured
and not specified in the command line. A reload would then remove
logtime.

Revisited the fix for that and included other boolean flags, such
as debug, trace, etc..

Related to #874

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-01-14 19:18:00 -07:00
Derek Collison
18bca5603f Added server version and cluster name to statsz.
Fixed account connection accounting sending after local connections is 0.

Signed-off-by: Derek Collison <derek@nats.io>
2018-12-06 10:57:39 -08:00
Ivan Kozlovic
1817b354e3 Update tests on Travis with tweaked GC settings
Moved some tests to "no race" tests that are run separately.
Removing -v and adding -p=1.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-11-08 16:56:20 -07:00
Derek Collison
ea5a6d9589 Updates for comments, some golint fixes
Signed-off-by: Derek Collison <derek@nats.io>
2018-10-31 20:28:44 -07:00
Derek Collison
47963303f8 First pass at new cluster design
Signed-off-by: Derek Collison <derek@nats.io>
2018-10-24 21:29:29 -07:00
Ivan Kozlovic
d5ceade750 Merge pull request #753 from nats-io/route_perms_reload
[ADDED] Support for route permissions config reload
2018-09-27 10:08:55 -06:00
Ivan Kozlovic
2a1811b600 Fixed flappers
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-09-26 15:58:48 -06:00
Ivan Kozlovic
e7f5cc82f0 Updates
- Use stack buffers
- Ensure that buffer size is no greater than 90% of max_pending
- Added test with low max_pending

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-09-26 12:19:14 -06:00
Ivan Kozlovic
846544ecfe Merge pull request #747 from nats-io/update_route_perms
[CHANGED] Cluster permissions moved out of cluster's authorization
2018-09-11 10:04:13 -06:00
Ivan Kozlovic
e1202dd30a [CHANGED] Cluster permissions moved out of cluster's authorization
It will be possible to set subjects permissions regardless of the
presence of an authorization block.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-09-10 17:03:50 -06:00
Waldemar Quevedo
5e3950df0a Add Warnf to logger interface
Signed-off-by: Waldemar Quevedo <wally@synadia.com>
2018-09-10 14:50:48 -07:00
Derek Collison
90a3a1d8b4 Slow down sweeper to make sure we receive all messages
Signed-off-by: Derek Collison <derek@nats.io>
2018-07-02 12:02:59 -07:00
Derek Collison
3b953ce838 Allow localhost to not be defined, only need 127.0.0.1
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-28 16:10:19 -07:00
Ivan Kozlovic
aff1dcf089 Fix some tests
Add some helpers to check on some state.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-06-27 17:26:49 -06:00
Derek Collison
844f376140 Performance optimizations, beta3, fixes to various tests.
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-11 15:11:03 -07:00
Derek Collison
4dd4d2bd9d lock users access
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-04 17:45:05 -07:00
Derek Collison
26dafe464b Don't send route unsub with max
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-04 17:45:05 -07:00
Derek Collison
049db6e854 Support for queue subscriber retries over routes
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-04 17:45:05 -07:00