This is an issue in master only, not in any public release.
The issue is that permissions should be assigned as-is for the
route perms because Publish/Subscribe could be nil, so trying
to dereference Publish.Allow/Deny or Subscribe.Allow/Deny could
crash. The code checking for permissions correctly check if
Publish/Subscribe is nil or not.
This was introduced with PR #725
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When changing something in the cluster, such as Timeout and doing
a config reload, the route could be closed with an `Authorization
Error` report. Moreover, the route would not try to reconnect,
even if specified as an explicit route.
There were 2 issues:
- When checking if a solicited route is still valid, we need to
check the Routes' URL against the URL that we try to connect
to but not compare the pointers, but either do a reflect
deep equal, or compare their String representation (this is
what I do in the PR).
- We should check route authorization only if this is an accepted
route, not an explicit one. The reason is that we a server
explicitly connect to another server, it does not get the remote
server's username and password. So the check would always fail.
Note: It is possible that a config reload even without any change
in the cluster triggers the code checking if routes are properly
authorized, and that happens if there is TLS specified. When
the reload code checks if config has changed, the TLSConfig
between the old and new seem to indicate a change, eventhough there
is apparently none. Another reload does not detect a change. I
suspect some internal state in TLSConfig that causes the
reflect.DeepEqual() to report a difference.
Note2: This commit also contains fixes to regex that staticcheck
would otherwise complain about (they did not have any special
character), and I have removed printing the usage on startup when
getting an error. The usage is still correctly printed if passing
a parameter that is unknown.
Resolves#719
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Increate WriteDeadline test that otherwise could cause a client
connect to fail
- Check failed NumRoutes() with retry
- Check that subs are propagated in route permissions test
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Some tests consume too much memory when running with -race which
can cause some failures on Travis.
Moreover, those tests may not be meaningful if they are running
slow, which -race causes.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The added option writes a file in the specified directory called <exename>_<pid>.ports which
contains a JSON representation of ports that the gnatsd has opened.
This change is intended to facilitate testing by having ports be specified with a -1, so
they are auto assigned and allow tests to locate and connect to the launched gnatsd(s).
The `client.perms` struct is left unchanged. We simply map Import
and Export semantics to existing Publish and Subscribe ones.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Use pending bytes as slow consumer trigger, so reintroduce max_pending.
Improve latency with inplace flush calls when appropriate. Utilize simple
time budget for readLoop routine.
Signed-off-by: Derek Collison <derek@nats.io>
I noticed that when running the test suite, there would be a file
server/log1.txt left. This file is created by one of the config
reload test. Running this test individually was doing the proper
cleanup. I noticed that the Signal test that was checking
that files could be rotated was causing this side effect.
It turns out that none of the config reload tests were disabling
the signal handler (NoSigs=true), and since the go routine would
be left running, running the TestSignalToReOpenLogFile() test
would interact with an already finished test.
I put a thread dump in handleSignals() to track all tests that
were causing this function to start the go routine because NoSigs
was not set to true. I fixed all those tests. At this time, there
are only 2 tests that need to start the signal handler.
I have also fixed the code so that the signal handler routine select
on a server quitCh that is closed on shutdown so that this go routine
exit and is waiting on using the grWG wait group.
This PR is based out of #633. It imroves parsing QRSID so that the
TestRouteQueueSemantics test now passes (when dealing with malformed
QRSID).
A test similar to what is reported in #632 was also added. This
test however, uncovers a race condition that will be fixed in a
separate PR.
Resolves#632
This is used by RunDefaultServer() and some external projects tests
may rely on the fact that this runs on the default port.
Our tests that want to use ephemeral ports to avoid port conflicts
should be updated to not use these default options and/or RunDefaultServer().
Until now, a server would only notify clients of servers that join
the cluster. More than that, a server would send ot its clients only
information if new servers were added.
This PR changes this by sending to clients that support async INFO
the list of URLs for all servers in the cluster any time that there
is a change (joining or leaving the cluster).
As of now, clients will not be affected by the change (and will not
take benefit of this: removing servers from their server pool). This
will be addressed in each supported client once this is merged.
When the option Cluster.NoAdvertise is false, a server will send
an INFO protocol message to its client when a server has joined
the cluster.
Previously, the protocol would be sent only if the
joining server's "client URLs" (the addresses where clients connect
to) were new. It will now be sent regardless if the server joins
(for the first time) or rejoins the cluster.
Clients are still by default invoking the DiscoveredServersCB callback
only if they themselves detect that new URLs were added. A separate
PR may be filled to client libraries repo to be able to invoke
the callback anytime an async INFO protocol is received.
Based on @madgrenadier PR #597.
The http servers for those two were recently modified to set
a ReadTimeout and WriteTimeout. The WriteTimeout specifically
caused issues for Profiling since it is common to ask sampling
of several seconds. Pprof code would reject the request if it
detected that http server's WriteTimeout was more than sampling
in request.
For monitoring, any situation that would cause the monitoring code
to take more than 2 seconds to gather information (could be due
to locking, amount of objects to return, time required for sorting,
etc..) would also cause cURL to return empty response or WebBrowser
to fail to display the page.
Resolves#600