This is the first pass at introducing exported services to the system account for generally debugging of blackbox systems.
The first service reports number of subscribers for a given subject. The payload of the request is the subject, and optional queue group, and can contain wildcards.
Signed-off-by: Derek Collison <derek@nats.io>
Will now breakout the internal NATS latency to show requestor client RTT, responder client RTT and any internal latency caused by hopping between servers, etc.
Signed-off-by: Derek Collison <derek@nats.io>
Changed some of client.outbound fields to int64.
Moved fields around to minimize size of struct (checked with
unsafe.Sizeof())
Checked benchmark results before/after
Added test
Resolves#1118
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
With Go 1.12 (strangely was not able to reproduce with Go 1.11)
the test TestRouteNoCrashOnAddingSubToRoute() would frequently
locks up and consume all avail CPUs on the machine. Running
this test with GOMAXPROCS=2 you would see server.test CPU usage
pegged at 200% (assuming you have at least 2 CPUs).
The reason was that the writeLoop was spinning because another
routine was already in flushOutbound() and stack trace would
show that it was stuck in system calls. It seems that even though
the writeLoop does release the lock but grab it right away was
not allowing the syscall to complete.
So decided to put back the unlock/gosched/lock back in flushOutbound()
when flag is already set, but then protect the closeConnection()
with its own flag (similar to clearConnection) to not re-introduce
issue fixed in #1092.
Had to fix the benchmark test RoutedInterestGraph because after a
route is accepted, the initial PING will be sent after 1sec which
was breaking this test.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
I noticed that TestNoRaceRoutedQueueAutoUnsubscribe started to
fail a lot on Travis. Running locally I could see a 45 to 50%
failures. After investigation I realized that the issue was that
we have wrongly re-used `subscription.nm` and set to -1 on unsubscribe
however, I believe that it was possible that when subscription was
closed, the server may have already picked that consumer for a delivery
which then causes nm==-1 to be bumped to 0, which was wrong.
Commenting out the subscription.close() that sets nm to -1, I could
not get the test to fail on macOS but would still get 7% failure on
Linux VM. Adding the check to see if sub is closed in deliverMsg()
completely erase the failures, even on Linux VM.
We could still use `nm` set to -1 but check on deliverMsg(), the
same way I use the closed int32 now.
Fixed some flappers.
Updated .travis.yml to failfast if one of the command in the
`script` fails. User `set -e` and `set +e` as recommended in
https://github.com/travis-ci/travis-ci/issues/1066
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This was introduced in PR#930. The first commit had the route's
check if the flushOutbound() returned false, and if so would
locally unlock/lock the connection's lock. Unfortunately, this
was replaced in the second commit (a6aeed3a6b)
to the flushOutbound() function itself.
This causes the function closeConnection() to possibly unlock
the connection while calling flushOutbound(), which if the
connection is closed due to both a tls timeout for instance
and explicitly, it would result in the connection being scheduled
for a reconnect (if explicit gateway connection, possibly route).
Added defensive code in Gateway to register a unique outbound gateway.
Fixed a test that was now failing with newer Go version in which
they fixed url.Parse()
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Take into account tracking of response maps that are created and do proper cleanup.
Also fixes#1089 which was discovered while working on this.
Signed-off-by: Derek Collison <derek@nats.io>
When a route receives a message, it uses a thread local cache to
find the account and subscriptions match for a given subject.
When not found, an entry is added to this cache. The problem is
that this cache will reference subscriptions that in turn
reference connections.
When the subscriptions/connections are closed, this thread local
cannot be purged from those closed subscriptions (since it is
thread local - no lockin is used).
The real issue is that connection's buffer was not set to nil on
close, which then could cause more than expected memory to be
still referenced. Setting the buffer to nil will help reduce the
memory being used.
When an entry is added to the cache, the cache may reach a size
that will cause the server to prune some entries. From time to
time, the cache will be scanned to look for entries that contain
only closed subscriptions and remove those.
Resolves#1082
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is helpful for clients who send data and close the connection.
Also helpful to process errors like auth for solicited leafnodes.
Signed-off-by: Derek Collison <derek@nats.io>
When a solicited leafnode comes from multiple servers that themselves are a cluster, cycles were formed.
This change allows solicited leafnodes to behave similar to gateways in that each server of a cluster
is expected to have a solicted leafnode per destination account and cluster.
We no longer forward subscription interest or messages to a cluster from a server that has a solicited leafnode.
Signed-off-by: Derek Collison <derek@nats.io>
One could craft a PUB protocol to cause server to panic. This can
happen if the size in the PUB protocol overruns an int32.
(note that if authorization is enabled, the user would need to
authenticate first, limiting the impact).
Thank you to Aviv Sasson and Ariel Zelivansky from Twistlock
for the security report!
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Issue a warning in readLoop if execution of code after connection
Read() until end of for loop reaches a certain threshold.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Remove sub from rsubs sublist when user UNSUBs.
Fix bench test that was not actually creating a SUB per request
in the Benchmark_Gateways_Requests_CreateOneSubForEach test.
Also UNSUBs older SUBs after a certain threshold to simulate
actual req/reply.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This addresses the following race:
- client connection creates a subscription on a reply subject
- client connection sends a request
- server sends the subscription to inbound gateway
- server sends the message to outbound gateway (those may be
to different servers)
- receiving server sends to sub interested in request subject
- app sends reply
- its server then check for interest on the reply's subject
In interestOnly mode, there is a possibility that this server
has not received the interest on the reply subject yet and would
then drop the reply.
This PR detects above scenario and will prefix the reply subject
to identify the origin cluster if it is detected that the last
subscription from the sending connection was created less than
a second ago.
Once the destination has this prefix, the destination cluster
will always send back that message to origin cluster even if
there is no registered interest.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>