New configuration fields:
```
cluster {
...
pool_size: 5
accounts: ["A", "B"]
}
```
The configuration `pool_size` in the example above means that this
server will create 5 routes to a remote server, assuming that that
server has the same `pool_size` setting.
Accounts (which are not part of the `accounts[]` configuration)
are assigned a specific route in this pool, and this will be the
same route on all servers in the cluster.
Accounts that are defined in the `accounts` field will each have
a dedicated route connection. This will allow suppression of the
account name in some of the route protocols, reducing bytes transmitted
which may increase performance.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This will be used mainly by CustomClientAuthentication implementations
to indicate that the user connection should be disconnected at some
point in future - like when a certificate or token expires
Signed-off-by: R.I.Pienaar <rip@devco.net>
Some warnings, especially when dealing with JS limits that were
printed on a per-message basis, are now limited to ~1 per second
if the content of the warning is already found in a map.
This is also for "client" warnings, but the client porting of the
warning is not taken into account so that helps with reducing logging
for similar content, but coming from different clients.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Removed the warnings, instead have a sync.Map where they are
registered/unregistered and can be inspected with an undocumented
monitor page.
Added the notion of "in progress" which is the number of messages
that have beend pop()'ed. When recycle() is invoked this count
goes down.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Also had to change all references from `path.` to `filepath.` when
dealing with files, so that it works properly on Windows.
Fixed also lots of tests to defer the shutdown of the server
after the removal of the storage, and fixed some config files
directories to use the single quote `'` to surround the file path,
again to work on Windows.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Along a leaf node connection, unless the system account is shared AND the JetStream domain name is identical, the default JetStream traffic (without a domain set) will be denied.
As a consequence, all clients that wants to access a domain that is not the one in the server they are connected to, a domain name must be specified.
Affected from this change are setups where: a leaf node had no local JetStream OR the server the leaf node connected to had no local JetStream.
One of the two accounts that are connected via a leaf node remote, must have no JetStream enabled.
The side that does not have JetStream enabled, will loose JetStream access and it's clients must set `nats.Domain` manually.
For workarounds on how to restore the old behavior, look at:
https://github.com/nats-io/nats-server/pull/2693#issuecomment-996212582
New config values added:
`default_js_domain` is a mapping from account to domain, settable when JetStream is not enabled in an account.
`extension_hint` are hints for non clustered server to start in clustered mode (and be usable to extend)
`js_domain` is a way to set the JetStream domain to use for mqtt.
Signed-off-by: Matthias Hanel <mh@synadia.com>
This will solve the issue of naming the durable per server for
the "retained messages" stream in situation where a cluster
of servers would not have JetStream defined but connect to another
cluster that has it. All the servers within the cluster without
JetStream would cause the durable's delivery subject to be updated
to the last server starting the durable.
Updated the check for mqtt requiring JetStream if running in
standalone mode to check that no leafnode configuration is present.
Replaced use of fmt.Errorf() when the string was static with
errors created with errors.New(). Updated tests that were checking
for errors to use those errors instead of repeating the string.
Added test that has a hub cluster with JS enabled and a remote server
that has mqtt{} without JetStream and ensure that this works.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Currently, we use ReadyForConnections in server tests to wait for the
server to be ready. However, when this fails we don't get a clue about
why it failed.
This change adds a new unexported method called readyForConnections that
returns an error describing which check failed. The exported
ReadyForConnections version works exactly as before. The unexported
version gets used in internal tests only.
Currently in tests, we have calls to os.Remove and os.RemoveAll where we
don't check the returned error. This hides useful error messages when
tests fail to run, such as "too many open files".
This change checks for more filesystem related errors and calls t.Fatal
if there is an error.
Currently, temporary test files and directories are written in lots of
different paths within the OS's temp dir. This makes it hard to know
which files are from nats-server and which are unrelated. This in turn
makes it hard to clean up nats-server test files.
This PR introduces native support for MQTT clients. It requires use
of accounts with JetStream enabled. Since as of now clustering is
not available, MQTT will be limited to single instance.
Only QoS 0 and 1 are supported at the moment. MQTT clients can
exchange messages with NATS clients and vice-versa.
Since JetStream is required, accounts with JetStream enabled must
exist in order for an MQTT client to connect to the NATS Server.
The administrator can limit the users that can use MQTT with the
allowed_connection_types option in the user section. For instance:
```
accounts {
mqtt {
users [
{user: all, password: pwd, allowed_connection_types: ["STANDARD", "WEBSOCKET", "MQTT"]}
{user: mqtt_only, password: pwd, allowed_connection_types: "MQTT"}
]
jetstream: enabled
}
}
```
The "mqtt_only" can only be used for MQTT connections, which the user
"all" accepts standard, websocket and MQTT clients.
Here is what a configuration to enable MQTT looks like:
```
mqtt {
# Specify a host and port to listen for websocket connections
#
# listen: "host:port"
# It can also be configured with individual parameters,
# namely host and port.
#
# host: "hostname"
port: 1883
# TLS configuration section
#
# tls {
# cert_file: "/path/to/cert.pem"
# key_file: "/path/to/key.pem"
# ca_file: "/path/to/ca.pem"
#
# # Time allowed for the TLS handshake to complete
# timeout: 2.0
#
# # Takes the user name from the certificate
# #
# # verify_an_map: true
#}
# Authentication override. Here are possible options.
#
# authorization {
# # Simple username/password
# #
# user: "some_user_name"
# password: "some_password"
#
# # Token. The server will check the MQTT's password in the connect
# # protocol against this token.
# #
# # token: "some_token"
#
# # Time allowed for the client to send the MQTT connect protocol
# # after the TCP connection is established.
# #
# timeout: 2.0
#}
# If an MQTT client connects and does not provide a username/password and
# this option is set, the server will use this client (and therefore account).
#
# no_auth_user: "some_user_name"
# This is the time after which the server will redeliver a QoS 1 message
# sent to a subscription that has not acknowledged (PUBACK) the message.
# The default is 30 seconds.
#
# ack_wait: "1m"
# This limits the number of QoS1 messages sent to a session without receiving
# acknowledgement (PUBACK) from that session. MQTT specification defines
# a packet identifier as an unsigned int 16, which means that the maximum
# value is 65535. The default value is 1024.
#
# max_ack_pending: 100
}
```
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
We previously simply called DialTimeout() on a route's url when
soliciting. If it resolved to the IP of the host, it would create
a route to self, which server detects, but then would not try again
with other IPs that would have allowed to form a cluster with
other servers running on the other IPs.
This PR keeps track of local IPs + cluster port and exclude them
from the list of IPs returned by LookupHost API. This even prevent
solicitation of routes to self. Only non-local IPs will be tried.
Resolves#1586
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The new option Websocket.NoTLS would have to be set to true
to disable the server check that enforces TLS configuration.
Resolves#1529
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If some servers in the cluster have the same connect URLs (due
to the use of client advertise), then it would be possible to
have a server sends the connect_urls INFO update to clients with
missing URLs.
Resolves#1515
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This was discovered with the test TestLeafNodeWithGatewaysServerRestart
that was sometimes failing. Investigation showed that when cluster B
was shutdown, one of the server on A that had a connection from B
that just broke tried to reconnect (as part of reconnect retries of
implicit gateways) to a server in B that was in the process of shuting down.
The connection had been accepted but createGateway not called because
the server's running boolean had been set to false as part of the shutdown.
However, the connection was not closed so the server on A had a valid
connection to a dead server from cluster B. When the B cluster (now single
server) was restarted and a LeafNode connection connected to it, then
the gateway from B to A was created, that server on A did not create outbound
connection to that B server because it already had one (the zombie one).
So this PR strengthens the starting of accept loops and also make sure
that if a connection (all type of connections) is not accepted because
the server is shuting down, that connection is properly closed.
Since all accept loops had almost same code, made a generic function
that accept functions to call specific create connection functions.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Added cluster names as required for prep work for clustered JetStream. System can dynamically pick a cluster name and settle on one even in large clusters.
Signed-off-by: Derek Collison <derek@nats.io>
The grace period used to be hardcoded at 10 seconds.
This option allows the user to configure the amount of time the
server will wait before initiating the closing of client connections.
Note that the grace period needs to be strictly lower than the overall
lame_duck_duration. The server deducts the grace period from that
overall duration and spreads the closing of connections during
that time.
For instance, if there are 1000 connections and the lame duck
duration is set to 30 seconds and grace period to 10, then
the server will use 30-10 = 20 seconds to spread the closing
of those 1000 connections, so say roughly 50 clients per second.
Resolves#1459.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is related to #1408.
Make sure that we close the websocket "accept loop" if configured
before proceeding with the lame duck mode.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- A race test may have consumed a lot of fds going in TIME_WAIT
that could cause some issues for other tests
- Missing defer filestore.Stop() that would leave flushLoop()
routines
- A defer for the from server in a LeafNode test
- Rework [Re]ConnectErrorReports that was failing often for me
locally (probably due to exhaustion of fds - too many TIME_WAIT).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Server was incorrectly processing a queue subscription removal
as both a plain sub and queue sub, which may have resulted in
drop of interest even when some queue subs remained.
Resolves#1421
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This will ensure that there is no race where clients are accepted
after the LDM INFO notification.
Also add to the test to make sure that we don't send INFO when
routes are disconnected due to internal closing of connections
during the shutdown process.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Also send an INFO to routes so that the remotes can remove the
LDM's server client URLs and notify their own clients of this
change.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When no tag was set, os.Getenv("TRAVIS_TAG") would return empty string.
Travis now set TRAVIS_TAG to `''`. So check for both.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Updated all tests that use "async" clients.
- start the writeLoop (this is in preparation for changes in the
server that will not do send-in-place for some protocols, such
as PING, etc..)
- Added missing defers in several tests
- fixed an issue in client.go where test was wrong possibly causing
a panic.
- Had to skip a test for now since it would fail without server code
change.
The next step will be ensure that all protocols are sent through
the writeLoop and that the data is properly flushed on close (important
for -ERR for instance).
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>