[FIXED] Race condition during implicit Gateway reconnection

Say server in cluster A accepts a connection from a server in cluster B. The gateway is implicit, in that A does not have a configured remote gateway to B. Then the server in B is shutdown, which A detects and initiate a single reconnect attempt (since it is implicit and if the reconnect retries is not set). While this happens, a new server in B is restarted and connects to A. If that happens before the initial reconnect attempt failed, A will register that new inbound and do not attempt to solicit because it has already a remote entry for gateway B. At this point when the reconnect to old server B fails, then the remote GW entry is removed, and A will not create an outbound connection to the new B server. We fix that by checking if there is a registered inbound when we get to the point of removing the remote on a failed implicit reconnect. If there is one, we try the reconnection. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
2026-04-17 03:24:40 -07:00 · 2020-05-22 12:55:33 -06:00
parent 0e3c73192d
commit 5dba3cdd75
3 changed files with 73 additions and 3 deletions
--- a/test/leafnode_test.go
+++ b/test/leafnode_test.go
@@ -740,6 +740,7 @@ func createClusterEx(t *testing.T, doAccounts bool, gwSolicit time.Duration, wai
 	// All of these need system accounts.
 	o.Accounts, o.Users = createAccountsAndUsers()
 	o.SystemAccount = "$SYS"
+	o.ServerName = fmt.Sprintf("%s1", clusterName)
 	// Run the server
 	s := RunServer(o)
 	bindGlobal(s)
@@ -761,6 +762,7 @@ func createClusterEx(t *testing.T, doAccounts bool, gwSolicit time.Duration, wai
 		// All of these need system accounts.
 		o.Accounts, o.Users = createAccountsAndUsers()
 		o.SystemAccount = "$SYS"
+		o.ServerName = fmt.Sprintf("%s%d", clusterName, i+1)
 		s := RunServer(o)
 		bindGlobal(s)