In normal message get, the returned format is RFC3339Nano, which
is what is being used when using JSON marshaling. However, for
the direct get we had to pass a string to construct the header
and we were using time.Time.String() which was using a different
layout. So use time.Time.MarshalJSON() to be consistent with
the non-direct get message.
Libraries that already parsed the non RFC3339Nano time format
can be updated since none should have been released yet (since
the feature in the server is not released yet)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
* Fix race between stream stop and monitorStream
monitorCluster stops the stream, when doing so, monitorStream
needs to be stopped to avoid miscounting of store size.
In a test stop and reset of store size happened first and then
was followed by storing more messages via monitorStream
Signed-off-by: Matthias Hanel <mh@synadia.com>
This can happen if the move was initiated by the user.
A subsequent cancel resets the initial peer list.
The original peer list was picked on the old set of tags.
A cancel would then keep the new list of tags but reset
to the old peers. Thus tags and peers diverge.
The problem is that at the time of cancel, the old
placement tags can't be found anymore.
This fix causes cancel to remove the placement tags, if
the old peers do not satisfy the new placement tags.
Signed-off-by: Matthias Hanel <mh@synadia.com>
* fixed consumer restart on source filter update
When a stream source filter subject was updated, the internal consumer
was not re created
If the upstream stream contains a tail of previously filtered messages,
these will now be delivered
Signed-off-by: Matthias Hanel <mh@synadia.com>
* Added check for source/mirror filter subjects
When the origin stream exists, the sourec/mirror filter subject
will be checked against the stream subjects.
If there is no overlap, an error will be returned
Signed-off-by: Matthias Hanel <mh@synadia.com>
Use better indexing for lookups, we used to do simple linear scan backwards, now track first and last block.
Will expire the fss cache at will to reduce memory usage.
Signed-off-by: Derek Collison <derek@nats.io>
If the leader sends messages but the follower for any reason aborts
or retry the snapshot process, it will now send the error that
caused this and the leader can then abort the catchup instead of
waiting for its inactivity threshold of 5 seconds.
Also make the send of a batch be delayed for a bit until the number
of "acks" is 1/2 of the batch size or after reaching 100ms. This
helps avoid trickling of messages. Tested with the new test
TestJetStreamSuperClusterStreamCathupLongRTT() and see better
results both in size of batches and overall time is smaller or
similar but not longer.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
I noticed some contention when I was investigating a catchup bug on the server write lock.
Medium term we could have a separate lock, longer term formal client support in the server will alleviate.
Signed-off-by: Derek Collison <derek@nats.io>
We would send skip messages for a sync request that was completely below our current state, but this could be more traffic then we might want.
Now we only send EOF and the other side can detect the skip forward and adjust on a successful catchup.
We still send skips if we can partially fill the sync request.
Signed-off-by: Derek Collison <derek@nats.io>
```
Replica: Server name unknown at this time (peerID: jZ6RvVRH), outdated, OFFLINE, not seen
```
After discussing with @ripienaar, this text convey better a sense
that this is a transient situation.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If a cluster is brought down and then partially restarted, the
replica information about the non restarted node would be completely
missing. The CLI could report replicas 3 but then only the leader
and the running replicas, but nothing about the other node.
Since this node's server name is not know, this PR adds an entry
with something similar to this:
```
<unknown (peerID: jZ6RvVRH)>, outdated, OFFLINE, not seen
```
Also, replicas array is now ordered, which will help when using
a watcher or repeating stream info commands in that the replicas
output will be stable in regards to the list of replicas.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
* better error when peer selection fails
It is pretty hard to diagnose what went wrong when not enough peers for
an operation where found. This change now returns counts of reasons why
peers where discarded.
Changed the error to JSClusterNoPeers as it seems more appropriate
of an error for that operation. Not having enough resources is one of
the conditions for a peer not being considered. But so is having a non
matching tag. Which is why JSClusterNoPeers seems more appropriate
In addition, JSClusterNoPeers was already used as error after one call
to selectPeerGroup already.
example:
no suitable peers for placement: peer selection cluster 'C' with 3 peers
offline: 0
excludeTag: 1
noTagMatch: 2
noSpace: 0
uniqueTag: 0
misc: 0
Examle for mqtt:
mid:12 - "mqtt" - unable to connect: create sessions stream for account "$G":
no suitable peers for placement: peer selection cluster 'MQTT' with 3 peers
offline: 0
excludeTag: 0
noTagMatch: 0
noSpace: 0
uniqueTag: 0
misc: 0
(10005)
Signed-off-by: Matthias Hanel <mh@synadia.com>
* review comment
Signed-off-by: Matthias Hanel <mh@synadia.com>
* Adding account purge operation
The new request is available for the system account.
The subject to send the request to is $JS.API.ACCOUNT.PURGE.*
With the name of the account to purge instead of the wildcard.
Also added directory cleanup code such that server do not
end up with empty streams directories and account dirs that
only contain streams
Also adding ACCOUNT to leaf node domain rewrite table
Addresses #3186 and #3306 by providing a way to
get rid of the streams for existing and non existing accounts
Signed-off-by: Matthias Hanel <mh@synadia.com>
A server maintains a map for the subject+queue to know the number
of members on the same group. However, on unsubscribe when we get
to the last one being unsubscribed, we were removing from the map
but then unfortunately adding back with a value of 0, which caused
a leak. If the same subscription was coming back, then this map
entry would be reused, but if it is a never coming back queue sub,
then memory could increase continously.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This would allow in embedded use-cases where the user does not
have the ability to use a credentials file. Instead, a signature
callback is specified and invoked by the server sends the CONNECT
protocol. The user is responsible to provide the JWT and sign the
nonce.
Resolves#3331
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Maybe that is the place it could be set and not in NewServer(), but
want to minimize risk of breaking something close to 2.9.0
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
A test TestJetStreamClusterLeafNodeSPOFMigrateLeaders was added at
some point that needed the remotes to stop (re)connecting. It made
use of existing leafNodeEnabled that was used for GW/Leaf interest
propagation races to disable the reconnect, but that may not be
the best approach since it could affect users embedding servers
and adding leafnodes "dynamically".
So this PR introduced a specific boolean specific for that test.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The interest moved across the leafnode would be for the mapping, and not the actual qsub.
So when received if we did detect that we are mapped and do not have a queue filter present make sure to ignore.
This will allow queue subscriber processing on the local server that received the message from the leafnode.
Signed-off-by: Derek Collison <derek@nats.io>