If a stream R2 had one of its server network-partitioned and at
that time the stream was edited to be scaled down to an R1 it
would cause the stream to no longer have quorum even when the
network partition is resolved.
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If start by time is before what we remember during recovery use that instead
Resolves#3559
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is specific to setup described [here](https://github.com/nats-io/nats-server/issues/3191#issuecomment-1296974382)
and does not require JetStream to be reproduced. The added test
reproduces the above setup but without JetStream enabled in
the accounts.
Each cluster has a leafnode for a given account to the other
cluster. The accounts import/export a subject. When a consumer
is connected to cluster "B" and the producer is on cluster "A"
there was a duplicate message. Due to shadow subscription caused
by the import/export rules, an additional subscription was
sent across the leafnode.
Resolves#3191
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Very difficult to reproduce. Had to run TestJetStreamSuperClusterMoveCancel
in covermode=atomic on a slow machine to hit the condition where
the monitorConsumer go routine is started by RAFT node is nil,
which caused the warning message to produce the panic (since n is nil)
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The issue that a "first sequence mismatch" during processing of
a snapshot was causing the state to be reset and caused a lot
of catchup from the follower. An attempt to fix that in PR #3567
caused an issue that was addressed in PR #3589. However, this was
then causing the follower to sometime never able to catchup or
took a very long time.
This PR - we believe - addresses the original and subsequent issues.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
The server would reset its INFO's TLSRequired to the presence
of a TLS configuration without checking for the allow_non_tls
option.
Resolves#3581
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This is based of @neilalexander PR #3558.
It ensures that the timer is reset/canceled on configuration
update (by the leader only).
Fixed also the issue with a super-cluster where the delete timer
would always be reset at every gateway interval check.
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Originally, only solicited routes were retried in case of a disconnect,
but that was before gossip protocol was introduced. Since then, two
servers that connect to each other due to gossip should retry to
reconnect if the connection breaks, even if the route is not explicit.
However, server will retry only once or more accurately, ConnectRetries+1.
This PR solves the issue that the reconnect attempt was not initiated
for a "solicited route" that was not explicit.
Maybe related to #3571
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If we had to flush while loading in subject info we would fail there and not properly rebuild state as we did on a failure for flushPending.
Signed-off-by: Derek Collison <derek@nats.io>
When a server was restarted and expired messages, but the leader had a snapshot that
still had the old messages we would reset complete follower stream state, this fix
just skips over the expired as we prepare the request to the leader.
Resolves#3516
Signed-off-by: Derek Collison <derek@nats.io>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If an ephemeral was given a name by the user, if the consumer leader
was then shutdown, the ephemeral would be migrated using a server
generated new name instead of keeping the user given name.
Also, in some cases the migration would not even occur. This was
likely due to the fact that RAFT node(s) were shutdown prior to
the ephemeral migration code was invoked.
Resolves#3550
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
If the server is unable to write to disk, there is a failure mode that
results in the fss map being nil when writing message block data to it,
resulting in a panic. This fix checks whether an error is returned by
the function that loads the fss map, and if so, returns an error to
prevent future assignment.
A simple configuration like this:
```
...
mappings = {
foo: bar
}
mqtt {
port: 1883
}
```
would cause an MQTT subscription on "bar" to not receive messages
published on "foo".
In otherwords, the subject transformation was not done when parsing
a PUBLISH packet.
This PR also handles the case of service imports where transformation
occurs after the initial publish parsing.
Resolves#3547
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>