Commit Graph

97 Commits

Author SHA1 Message Date
Matthias Hanel
f5bd07b36c [FIXED] trace/debug/sys_log reload will affect existing clients
Fixed #1296, by altering client state on reload

Detect a trace level change on reload and update all clients.
To avoid data races, read client.trace while holding the lock,
pass the value into functionis that trace while not holding the lock.
Delete unused client.debug.

Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-03-04 13:54:15 -05:00
Ivan Kozlovic
37291df206 Fixed yet another flapper
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-02-19 17:33:16 -07:00
Matthias Hanel
82a275943e Fix unreliable test
Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-02-17 20:22:52 -05:00
Matthias Hanel
db83b8a55a Avoid all else and adhere to general style. Adding Flush as requested.
Signed-off-by: Matthias Hanel <mh@synadia.com>
2020-02-17 16:52:13 -05:00
Ivan Kozlovic
7208e7f817 [ADDED] Ability to specify TLS configuration for account resolver
A new config section allows to specify specific TLS parameters for
the account resolver:
```
resolver_tls {
  cert_file: ...
  key_file: ...
  ca_file: ...
}
```

Resolves #1271

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2020-02-03 14:35:05 -07:00
Ivan Kozlovic
947798231b [UPDATED] TCP Write and SlowConsumer handling
- All writes will now be done by the writeLoop, unless when the
  writeLoop has not been started yet (likely in connection init).
- Slow consumers for non CLIENT connections will be reported but
  not failed. The idea is that routes, gateway, etc.. connections
  should stay connected as much as possible. However if a flush
  operation times out and no data at all has been written, the
  connection will be closed (regardless of type).
- Slow consumers due to max pending is only for CLIENT connections.
  This allows sending of SUBs through routes, etc.. to not have
  to be chunked.
- The backpressure to CLIENT connections is increased (up to 1sec)
  based on the sub's connection pending bytes level.
- Connection is flushed on close from the writeLoop as to not block
  the "fast path".

Some tests have been fixed and adapted since now closeConnection()
is not flushing/closing/removing connection in place.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-31 15:06:27 -07:00
Ivan Kozlovic
1b2754475b Refactor async client tests
Updated all tests that use "async" clients.
- start the writeLoop (this is in preparation for changes in the
  server that will not do send-in-place for some protocols, such
  as PING, etc..)
- Added missing defers in several tests
- fixed an issue in client.go where test was wrong possibly causing
  a panic.
- Had to skip a test for now since it would fail without server code
  change.

The next step will be ensure that all protocols are sent through
the writeLoop and that the data is properly flushed on close (important
for -ERR for instance).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-12-12 11:58:24 -07:00
Ivan Kozlovic
5a44e3b4c6 Changes on how tests can override route protocol
I may need to introduce a new route protocol version for an upcoming
PR and realized that this needed some cleaning.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-10-26 10:12:30 -06:00
Derek Collison
874f06a212 Fix bugs on reloadAuthorization
When tls is on routes it can cause reloadAuthorization to be called.
We were assuming configured accounts, but did not copy the remote map.
This copies the remote map when transferring for configured accounts
and also handles operator mode. In operator mode we leave the accounts
in place, and if we have a memory resolver we will remove accounts that
are not longer defined or have bad claims.

Signed-off-by: Derek Collison <derek@nats.io>
2019-05-29 13:19:58 -07:00
Ivan Kozlovic
d2578f9e05 Update to connect/reconnect error reports logic
Changed the introduced new option and added a new one. The idea
is to be able to differentiate between never connected and reconnected
event. The never connected situation will be logged at first attempt
and every hour (by default, configurable).
However, once connected and if trying to reconnect, will report every
attempts by default, but this is configurable too.

These two options are supported for config reload.

Related to #1000
Related to #1001
Resolves #969

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-26 17:51:01 -06:00
Derek Collison
d7140a0fd1 Update for client rename
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-10 15:11:30 -07:00
Ivan Kozlovic
c014211318 [FIXED] Changes to Varz content and fixed race conditions
----------------------------------------------------------------
Backward-incompatibility note:

Varz used to embed *Info and *Options which are other server objects.
However, Info is a struct that servers used to send protocols to other
servers or clients and its content must contain json tags since we
need to marshal those to be sent over. The problem is that it made
those fields now accessible to users calling Varz() and also visible
to the http /varz output. Some fields in Info were introduced in the
2.0 branch that clashed with json tag in Options, which made cluster{}
for instance disappear in the /varz output - because a Cluster string
in Info has the same json tag, and Cluster in Info is empty in some
cases.
For users that embed NATS and were using Server.Varz() directly,
without the use of the monitoring endpoint, they were then given
access (which was not the intent) to server internals (Info and Options).
Fields that were in Info or Options or directly in Varz that did not
clash with each other could be referenced directly, for instace, this
is you could access the server ID:

v, _ := s.Varz(nil)
fmt.Println(v.ID)

Another way would be:

fmt.Println(v.Info.ID)

Same goes for fields that were brought from embedding the Options:

fmt.Println(v.MaxConn)

or

fmt.Println(v.Options.MaxConn)

We have decided to explicitly define fields in Varz, which means
that if you previously accessed fields through v.Info or v.Options,
you will have to update your code to use the corresponding field
directly: v.ID or v.MaxConn for instance.

So fields were also duplicated between Info/Options and Varz itself
so depending on which one your application was accessing, you may
have to update your code.
---------------------------------------------------------------

Other issues that have been fixed is races that were introduced
by the fact that the creation of a Varz object (pointing to
some server data) was done under server lock, but marshaling not
being done under that lock caused races.

The fact that object returned to user through Server.Varz() also
had references to server internal objects had to be fixed by
returning deep copy of those internal objects.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-09 14:33:04 -06:00
Derek Collison
acfe372d63 Changes for rename from gnatsd -> nats-server
Signed-off-by: Derek Collison <derek@nats.io>
2019-05-06 15:04:24 -07:00
Ivan Kozlovic
5e01570ad4 Fixed failed configuration reload due to present of leafnode with TLS
We don't support reload of leafnode config yet, but we need to make
sure it does not fail the reload process if nothing has been changed.
(it would fail because TLSConfig internally do change in some cases)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-05-02 15:49:56 -06:00
Ivan Kozlovic
81eb065391 Ensure leafnode listen port set to -1 does not prevent config reload
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-03-25 15:04:52 -06:00
Ivan Kozlovic
65cc218cba [FIXED] Allow use of custom auth with config reload
Resolves #923

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-03-20 15:45:17 -06:00
Derek Collison
007c98dc03 Support reload of max_control_line by updating connected clients
Signed-off-by: Derek Collison <derek@nats.io>
2019-02-06 14:33:34 -08:00
Ivan Kozlovic
d654b18476 Fixed reload of boolean flags
PR #874 caused an issue in case logtime was actually not configured
and not specified in the command line. A reload would then remove
logtime.

Revisited the fix for that and included other boolean flags, such
as debug, trace, etc..

Related to #874

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-01-14 19:18:00 -07:00
Ivan Kozlovic
d8817a37e6 [FIXED] Logtime reset to true on config reload
Resolves #789

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2019-01-09 19:51:37 -07:00
Ivan Kozlovic
a4fa06aaec Fixed TLS tests to work with new go-nats behavior
Since we no longer default to InsecureSkipVerify:true when
not specifying tls://, some tests needed updating.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-12-19 12:08:46 -07:00
Ivan Kozlovic
7c220ba700 Support for service export with wildcards
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-12-13 21:22:01 -07:00
Derek Collison
2d54fc3ee7 Account lookup failures, account and client limits, options reload.
Changed account lookup and validation failures to be more understandable by users.
Changed limits to be -1 for unlimited to match jwt pkg.

The limits changed exposed problems with options holding real objects causing issues with reload tests under race mode.
Longer term this code should be reworked such that options only hold config data, not real structs, etc.

Signed-off-by: Derek Collison <derek@nats.io>
2018-12-05 14:25:40 -08:00
Ivan Kozlovic
4f8100ebc8 Fix config reload that failed because of Gateways
Although Gateways reload is not supported at the moment, I had
to add the trap in the switch statement because it would find
a difference. The reason is the TLSConfig object that is likely
to not pass the reflect.DeepEqual test. So for now, I exclude this
from the deep equal test and fail the reload only if the user
has explicitly changed the configuration.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-12-04 19:25:59 -07:00
Ivan Kozlovic
0ba587249a Fixing setting of default gateway TLS Timeout
Moved setting to the default value in setBaselineOptions()
so that config reload does not fail.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-12-03 18:20:15 -07:00
Derek Collison
744795ead5 Allow servers to send system events.
Specifically this is to support distributed tracking of number of account connections across clusters.
Gateways may not work yet based on attempts to only generate payloads when we know there is outside interest.

Signed-off-by: Derek Collison <derek@nats.io>
2018-12-01 13:54:25 -08:00
Derek Collison
e2ce2c0cff Change to RawURLEncoding
Signed-off-by: Derek Collison <derek@nats.io>
2018-11-29 17:04:58 -08:00
Derek Collison
0ee714ce28 Add JWT support for users, accounts and import activations.
Add in trusted keys options and binary stamp
User JWT and Account fetch with AccountResolver
Account and User expiration
Account Imports/Exports w/ updates
Import activation expiration

Signed-off-by: Derek Collison <derek@nats.io>
2018-11-21 10:36:32 -08:00
Derek Collison
3dde5b5a93 megacheck fix
Signed-off-by: Derek Collison <derek@nats.io>
2018-11-06 20:06:18 -08:00
Derek Collison
b2ec5b3a98 Added more tests, e.g. reload
Signed-off-by: Derek Collison <derek@nats.io>
2018-11-06 19:58:42 -08:00
Derek Collison
47963303f8 First pass at new cluster design
Signed-off-by: Derek Collison <derek@nats.io>
2018-10-24 21:29:29 -07:00
Ivan Kozlovic
d35bb56d11 Added support for Accounts reload
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-10-23 14:58:53 -06:00
Ivan Kozlovic
178766d6c9 [ADDED] Support for route permissions config reload
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-09-18 18:28:40 -06:00
Ivan Kozlovic
deec3b821a Fixed flappers
During a config reload, it is possible for the server to send
an -ERR with auth violation and then close the connection.
Client library most of the time will process the -ERR but in
some cases, the socket read gets an error before that can happen.

Some tests were expectign the async error handler to fire, and would
fail the test otherwise. Changed those tests to still check that
if the async error is fire, we get the expected error, but not fail
the test if we don't. We still must get the disconnected callback
in those cases though.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-09-07 11:56:21 -06:00
Derek Collison
2ee868ba18 Propogate route imports and exports to other connected servers
Signed-off-by: Derek Collison <derek@nats.io>
2018-09-05 16:15:31 -07:00
Ivan Kozlovic
c5203dc763 Update some tests
- Config reload tests have been modified to not rely on symlink.
- Close logger on shutdown (for Windows tests cleanup)

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-09-05 10:45:09 -06:00
Ivan Kozlovic
d98d51c8cc [FIXED] Possible cluster Authorization Error during config reload
When changing something in the cluster, such as Timeout and doing
a config reload, the route could be closed with an `Authorization
Error` report. Moreover, the route would not try to reconnect,
even if specified as an explicit route.

There were 2 issues:
- When checking if a solicited route is still valid, we need to
  check the Routes' URL against the URL that we try to connect
  to but not compare the pointers, but either do a reflect
  deep equal, or compare their String representation (this is
  what I do in the PR).
- We should check route authorization only if this is an accepted
  route, not an explicit one. The reason is that we a server
  explicitly connect to another server, it does not get the remote
  server's username and password. So the check would always fail.

Note: It is possible that a config reload even without any change
in the cluster triggers the code checking if routes are properly
authorized, and that happens if there is TLS specified. When
the reload code checks if config has changed, the TLSConfig
between the old and new seem to indicate a change, eventhough there
is apparently none. Another reload does not detect a change. I
suspect some internal state in TLSConfig that causes the
reflect.DeepEqual() to report a difference.

Note2: This commit also contains fixes to regex that staticcheck
would otherwise complain about (they did not have any special
character), and I have removed printing the usage on startup when
getting an error. The usage is still correctly printed if passing
a parameter that is unknown.

Resolves #719

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-08-15 18:20:29 -06:00
Derek Collison
85c2edc314 Make sure to flush the sub
Signed-off-by: Derek Collison <derek@nats.io>
2018-07-02 12:10:17 -07:00
Derek Collison
bd972a9aca fixes
Signed-off-by: Derek Collison <derek@nats.io>
2018-07-02 11:46:40 -07:00
Derek Collison
e78d587083 Added support for maximum subscriptions per connection
Signed-off-by: Derek Collison <derek@nats.io>
2018-07-01 15:13:59 -07:00
Derek Collison
3b953ce838 Allow localhost to not be defined, only need 127.0.0.1
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-28 16:10:19 -07:00
Derek Collison
719deacc3d Fixes #686
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-28 13:14:18 -07:00
Ivan Kozlovic
aff1dcf089 Fix some tests
Add some helpers to check on some state.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-06-27 17:26:49 -06:00
Ivan Kozlovic
a759ad23aa Add back NoSigs=true to runServerWithSymlinkConfig()
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2018-06-26 18:52:56 -06:00
Derek Collison
17fecd4c9b Support CID in client INFO, allow filtering /connz by CID
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-21 15:23:15 -07:00
Derek Collison
240e21ac5c Fix restart of server
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-19 22:32:50 -07:00
Derek Collison
37352edff0 Fixes #681
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-19 16:42:39 -07:00
Derek Collison
6299e034cb dynamic buffer updates
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-04 17:45:05 -07:00
Derek Collison
644376209b Added large payload pub/sub benchmark
Signed-off-by: Derek Collison <derek@nats.io>
2018-06-04 17:45:05 -07:00
Ivan Kozlovic
40cf0107d6 Ensure sig handler routine returns on shutdown, turn it off in most tests
I noticed that when running the test suite, there would be a file
server/log1.txt left. This file is created by one of the config
reload test. Running this test individually was doing the proper
cleanup. I noticed that the Signal test that was checking
that files could be rotated was causing this side effect.
It turns out that none of the config reload tests were disabling
the signal handler (NoSigs=true), and since the go routine would
be left running, running the TestSignalToReOpenLogFile() test
would interact with an already finished test.

I put a thread dump in handleSignals() to track all tests that
were causing this function to start the go routine because NoSigs
was not set to true. I fixed all those tests. At this time, there
are only 2 tests that need to start the signal handler.

I have also fixed the code so that the signal handler routine select
on a server quitCh that is closed on shutdown so that this go routine
exit and is waiting on using the grWG wait group.
2018-04-06 17:14:02 -06:00
Derek Collison
00901acc78 Update license to Apache 2 2018-03-15 22:31:07 -07:00