The test TestMQTTPersistedSession() flapped once on GA. It turns
out that when the server was sending CONNACK the test was immediately
using a NATS publisher to send a message that was not received by
the MQTT subscription for the recovered session.
Sending the CONNACK before restoring subscriptions allowed for a
window where a different connection could publish and messages
would be missed. It is technically ok, I think, and test could
have been easily fixed to ensure that we don't NATS publish before
the session is fully restored.
However, I have changed the order to first restore subscriptions
then send the CONNACK. The way locking happens with MQTT subscriptions
we are sure that the CONNACK will be sent first because even if
there are inflight messages, the MQTT callbacks will have to wait
for the session lock under which the subscriptions are restored
and the CONNACK sent.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Had a deadlock with new preconditions. We need to hold lock across Store() call but that call could call into storeUpdate() such that we may need to acquire the lock. We can enter this callback from the storage layer itself and the lock would not be held so added an atomic.
Signed-off-by: Derek Collison <derek@nats.io>
Based on how the MQTT callback operates, it is safe to finish setup
of the MQTT subscriptions after processSub() returns. So I have
reverted the changes to processSub() which will minimize changes
to non-MQTT related code.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
- Added non-public stream and consumer configuration options to
achieve the "no subject" and "no interest" capabilities. Had
to implement custom FileStreamInfo and FileConsumerInfo marshal/
unmarshal methods so that those non public fields can be
persisted/recovered properly.
- Restored some of JS original code (since now can use config
instead of passing booleans to the functions).
- Use RLock for deliveryFormsCycle() check (unrelated to MQTT).
- Removed restriction on creating streams with MQTT prefix.
- Preventing API deletion of internal streams and their consumers.
- Added comment on Sublist's ReverseMatch method.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
MQTT streams are special in that we do not set subjects in the config
since they capture all subjects. Otherwise, we would have been forced
to create a stream on say "MQTT.>" but then all publishes would have
to be prefixed with "MQTT." in order for them to be captured.
However, if one uses the "nats" tool to inspect those streams, the tool
would fail with:
```
server response is not a valid "io.nats.jetstream.api.v1.stream_info_response" message:
(root): Must validate one and only one schema (oneOf)
config: subjects is required
config: Must validate all the schemas (allOf)
```
To solve that, if we detect that user asks for the MQTT streams, we
artificially set the returned config's subject to ">".
Alternatively, we may want to not return those streams at all, although
there may be value to see the info for mqtt streams/consumers.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Tests dealing with MQTT "will" needed to wait for the server to
process the MQTT client close of the connection. Only then we
have the guarantee that the server produced the "will" message.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This PR introduces native support for MQTT clients. It requires use
of accounts with JetStream enabled. Since as of now clustering is
not available, MQTT will be limited to single instance.
Only QoS 0 and 1 are supported at the moment. MQTT clients can
exchange messages with NATS clients and vice-versa.
Since JetStream is required, accounts with JetStream enabled must
exist in order for an MQTT client to connect to the NATS Server.
The administrator can limit the users that can use MQTT with the
allowed_connection_types option in the user section. For instance:
```
accounts {
mqtt {
users [
{user: all, password: pwd, allowed_connection_types: ["STANDARD", "WEBSOCKET", "MQTT"]}
{user: mqtt_only, password: pwd, allowed_connection_types: "MQTT"}
]
jetstream: enabled
}
}
```
The "mqtt_only" can only be used for MQTT connections, which the user
"all" accepts standard, websocket and MQTT clients.
Here is what a configuration to enable MQTT looks like:
```
mqtt {
# Specify a host and port to listen for websocket connections
#
# listen: "host:port"
# It can also be configured with individual parameters,
# namely host and port.
#
# host: "hostname"
port: 1883
# TLS configuration section
#
# tls {
# cert_file: "/path/to/cert.pem"
# key_file: "/path/to/key.pem"
# ca_file: "/path/to/ca.pem"
#
# # Time allowed for the TLS handshake to complete
# timeout: 2.0
#
# # Takes the user name from the certificate
# #
# # verify_an_map: true
#}
# Authentication override. Here are possible options.
#
# authorization {
# # Simple username/password
# #
# user: "some_user_name"
# password: "some_password"
#
# # Token. The server will check the MQTT's password in the connect
# # protocol against this token.
# #
# # token: "some_token"
#
# # Time allowed for the client to send the MQTT connect protocol
# # after the TCP connection is established.
# #
# timeout: 2.0
#}
# If an MQTT client connects and does not provide a username/password and
# this option is set, the server will use this client (and therefore account).
#
# no_auth_user: "some_user_name"
# This is the time after which the server will redeliver a QoS 1 message
# sent to a subscription that has not acknowledged (PUBACK) the message.
# The default is 30 seconds.
#
# ack_wait: "1m"
# This limits the number of QoS1 messages sent to a session without receiving
# acknowledgement (PUBACK) from that session. MQTT specification defines
# a packet identifier as an unsigned int 16, which means that the maximum
# value is 65535. The default value is 1024.
#
# max_ack_pending: 100
}
```
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
When a system account was configured and not the default when we did a reload we would lose the JetStream service exports.
Signed-off-by: Derek Collison <derek@nats.io>
When cluster origin code was added, a server may send LS+ with
an origin cluster name in the protocol. Parsing code from a ROUTER
connection was adjusted to understand this LS+ protocol.
However, the server was also sending an LS- with origin but the
parsing code was not able to understand that. When the unsub was
for a queue subscription, this would cause the parser to error out
and close the route connection.
This PR sends an LS- without the origin in this case (so that tracing
makes sense in term of LS+/LS- sent to a route). The receiving side
then traces appropriate LS- but processes as a normal RS-.
Resolves#1751
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
In some cases, the reply of a request message is prefixed when
going over a gateway so that if it comes back to a different
server than when the request originates, it can be routed back.
For system accounts, this routed reply subject was not tracked
so the server would reply to the inbox and may reach a server
that had not yet processed (through the route) the interest
on that inbox. If the reply came with the GW routed info, that
server would know to route it to the original server.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
We were not properly restoring our state for consumers and we also had a bug where we would not properly encode and write redelivered state.
Signed-off-by: Derek Collison <derek@nats.io>
Made several changes based on feedback.
1. Made PubAckResponse only optionally include an ApiError and not force an API type.
2. Allow FilterSubject to be set on a consumer config and cleared if it matches the only stream subject.
3. Remove LookupStream by subject, and add in filters for stream names API.
Signed-off-by: Derek Collison <derek@nats.io>
It would be convenient if we did not need to use cgo to compile for
FreeBSD for common architectures, allowing builds to be generated in
Linux CI flows.
The sysctl interface uses offsets which can vary by build architecture,
so doing this in the general case without cgo is not realistic. But we
can front-load the C work to get the offsets for a given architecture,
then use encoding/binary at run-time.
While doing this, I saw that the existing FreeBSD code was
mis-calculating `%cpu` by converting the `fixpt_t` scaled int straight
to a double without dividing by the scaling factor, so we also fix for
all other architectures by introducing a division by `FSCALE`.
The offsets-emitting code is in `freebsd.txt`, with the filename chosen
to keep the Go toolchain from picking it up and trying to compile.
The result is unsafe-free cgo-free builds for FreeBSD/amd64.
A newly introduced test (TestLeafNodeTwoRemotesBindToSameAccount)
had a server creating two remotes to the same server/account.
This test quite often show the data race:
```
go test -race -v -run=TestLeafNodeTwoRemotesBindToSameAccount ./server -count 100 --failfast
=== RUN TestLeafNodeTwoRemotesBindToSameAccount
==================
WARNING: DATA RACE
Write at 0x00c000168790 by goroutine 34:
github.com/nats-io/nats-server/v2/server.(*client).processLeafNodeConnect()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:1177 +0x314
github.com/nats-io/nats-server/v2/server.(*client).processConnect()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/client.go:1719 +0x9e4
github.com/nats-io/nats-server/v2/server.(*client).parse()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/parser.go:870 +0xf88
github.com/nats-io/nats-server/v2/server.(*client).readLoop()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/client.go:1052 +0x7a5
github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode.func4()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:872 +0x52
Previous read at 0x00c000168790 by goroutine 32:
github.com/nats-io/nats-server/v2/server.(*client).remoteCluster()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:1203 +0x42d
github.com/nats-io/nats-server/v2/server.(*Server).updateLeafNodes()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:1375 +0x2cf
github.com/nats-io/nats-server/v2/server.(*client).processLeafSub()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:1619 +0x858
github.com/nats-io/nats-server/v2/server.(*client).parse()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/parser.go:624 +0x5031
github.com/nats-io/nats-server/v2/server.(*client).readLoop()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/client.go:1052 +0x7a5
github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode.func4()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:872 +0x52
Goroutine 34 (running) created at:
github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:2627 +0xc7
github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:872 +0xf7a
github.com/nats-io/nats-server/v2/server.(*Server).startLeafNodeAcceptLoop.func1()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:474 +0x5e
github.com/nats-io/nats-server/v2/server.(*Server).acceptConnections.func1()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:1784 +0x57
Goroutine 32 (running) created at:
github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:2627 +0xc7
github.com/nats-io/nats-server/v2/server.(*Server).createLeafNode()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:872 +0xf7a
github.com/nats-io/nats-server/v2/server.(*Server).startLeafNodeAcceptLoop.func1()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/leafnode.go:474 +0x5e
github.com/nats-io/nats-server/v2/server.(*Server).acceptConnections.func1()
/Users/ivan/dev/go/src/github.com/nats-io/nats-server/server/server.go:1784 +0x57
==================
testing.go:965: race detected during execution of test
--- FAIL: TestLeafNodeTwoRemotesBindToSameAccount (0.05s)
```
This is because as soon as a LEAF is registered with the account, it is available
in the account's lleafs map, even before the CONNECT for this connectio is processed.
If another LEAF connection is processing a LSUB, the code goes over all leaf connections
for the account and may find the new connection that is in the process of connecting.
The check accesses c.leaf.remoteCluster unlocked which is also set unlocked during
the CONNECT. The fix is to have the set and check on that particular location using
the client's lock.
Ideally I believe that the connection should not have been in the account's lleafs,
or at least not used until the CONNECT for this leaf connection is fully processed.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If a LeafNode message is sent across a route, and the message
does not fit in the buffer, the parser would incorrectly process
the "pub args" as if it was a ROUTED message, not a LEAF message.
This caused clonePubArg() to return an error that would cause
the parser to end with a protocol violation.
Keep track that we are processing an LMSG so that we can pass
that information to clonePubArg() and do proper parsing in split
scenario.
Resolves#1743
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
There was a test to prevent an errorneous loop detection when a
remote would reconnect (due to a stale connection) while the accepting
side did not detect the bad connection yet.
However, this test was racy because the test was done prior to add
the connections to the map.
In the case of a misconfiguration where the remote creates 2 different
remote connections that end-up binding to the same account in the
accepting side, then it was possible that this would not be detected.
And when it was, the remote side would be unaware since the disconnect/
reconnect attempts would not show up if not running in debug mode.
This change makes sure that the detection is no longer racy and returns
an error to the remote so at least the log/console of the remote will
show the "duplicate connection" error messages.
Resolves#1730
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>