[FIXED] Race condition during implicit Gateway reconnection

Say server in cluster A accepts a connection from a server in
cluster B.
The gateway is implicit, in that A does not have a configured
remote gateway to B.
Then the server in B is shutdown, which A detects and initiate
a single reconnect attempt (since it is implicit and if the
reconnect retries is not set).
While this happens, a new server in B is restarted and connects
to A. If that happens before the initial reconnect attempt
failed, A will register that new inbound and do not attempt to
solicit because it has already a remote entry for gateway B.
At this point when the reconnect to old server B fails, then
the remote GW entry is removed, and A will not create an outbound
connection to the new B server.

We fix that by checking if there is a registered inbound when
we get to the point of removing the remote on a failed implicit
reconnect. If there is one, we try the reconnection.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
This commit is contained in:
Ivan Kozlovic
2020-05-22 12:55:33 -06:00
parent 0e3c73192d
commit 5dba3cdd75
3 changed files with 73 additions and 3 deletions

View File

@@ -740,6 +740,7 @@ func createClusterEx(t *testing.T, doAccounts bool, gwSolicit time.Duration, wai
// All of these need system accounts.
o.Accounts, o.Users = createAccountsAndUsers()
o.SystemAccount = "$SYS"
o.ServerName = fmt.Sprintf("%s1", clusterName)
// Run the server
s := RunServer(o)
bindGlobal(s)
@@ -761,6 +762,7 @@ func createClusterEx(t *testing.T, doAccounts bool, gwSolicit time.Duration, wai
// All of these need system accounts.
o.Accounts, o.Users = createAccountsAndUsers()
o.SystemAccount = "$SYS"
o.ServerName = fmt.Sprintf("%s%d", clusterName, i+1)
s := RunServer(o)
bindGlobal(s)