[SERVER-42047] Windows 4.2 mongo shell cannot connect to a 4.2 Cluster Created: 02/Jul/19  Updated: 29/Oct/23  Resolved: 17/Jul/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.2.0-rc3

Type: Bug Priority: Major - P3
Reporter: Alex Bevilacqua Assignee: Backlog - Storage Engines Team
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on WT-4913 Fix the Windows CRC32 on blocks that ... Closed
Documented
is documented by DOCS-15779 [SERVER] mongo shell cannot connect t... Closed
Related
is related to DOCS-12883 4.2 downgrade to 4.0 on windows will ... Closed
Assigned Teams:
Storage Engines
Backwards Compatibility: Major Change
Operating System: ALL
Sprint: Storage Engines 2019-07-29
Participants:
Case:

 Description   

Currently cannot connect from a 4.2.0-rc2 Windows shell to a 4.2 Atlas cluster.

When testing from a 4.0.10 Windows shell to a 4.2.0-rc1 Atlas cluster:

MongoDB shell version v4.0.10
Enter password:
connecting to: mongodb://42betatest-shard-00-00-dgtt4.gcp.mongodb.net.:27017,42betatest-shard-00-01-dgtt4.gcp.mongodb.net.:27017,42betatest-shard-00-02-dgtt4.gcp.mongodb.net.:27017/test?authSource=admin&gssapiServiceName=mongodb&replicaSet=42BetaTest-shard-0&ssl=true
2019-07-02T13:24:07.928-0400 I NETWORK  [js] Starting new replica set monitor for 42BetaTest-shard-0/42betatest-shard-00-00-dgtt4.gcp.mongodb.net.:27017,42betatest-shard-00-01-dgtt4.gcp.mongodb.net.:27017,42betatest-shard-00-02-dgtt4.gcp.mongodb.net.:27017
2019-07-02T13:24:08.056-0400 I NETWORK  [ReplicaSetMonitor-TaskExecutor] Successfully connected to 42betatest-shard-00-01-dgtt4.gcp.mongodb.net.:27017 (1 connections now open to 42betatest-shard-00-01-dgtt4.gcp.mongodb.net.:27017 with a 5 second timeout)
2019-07-02T13:24:08.056-0400 I NETWORK  [js] Successfully connected to 42betatest-shard-00-00-dgtt4.gcp.mongodb.net.:27017 (1 connections now open to 42betatest-shard-00-00-dgtt4.gcp.mongodb.net.:27017 with a 5 second timeout)
2019-07-02T13:24:08.083-0400 I NETWORK  [js] changing hosts to 42BetaTest-shard-0/42betatest-shard-00-00-dgtt4.gcp.mongodb.net:27017,42betatest-shard-00-01-dgtt4.gcp.mongodb.net:27017,42betatest-shard-00-02-dgtt4.gcp.mongodb.net:27017 from 42BetaTest-shard-0/42betatest-shard-00-00-dgtt4.gcp.mongodb.net.:27017,42betatest-shard-00-01-dgtt4.gcp.mongodb.net.:27017,42betatest-shard-00-02-dgtt4.gcp.mongodb.net.:27017
2019-07-02T13:24:08.198-0400 I NETWORK  [ReplicaSetMonitor-TaskExecutor] Successfully connected to 42betatest-shard-00-00-dgtt4.gcp.mongodb.net:27017 (1 connections now open to 42betatest-shard-00-00-dgtt4.gcp.mongodb.net:27017 with a 5 second timeout)
2019-07-02T13:24:08.198-0400 I NETWORK  [js] Successfully connected to 42betatest-shard-00-01-dgtt4.gcp.mongodb.net:27017 (1 connections now open to 42betatest-shard-00-01-dgtt4.gcp.mongodb.net:27017 with a 5 second timeout)
2019-07-02T13:24:08.344-0400 I NETWORK  [ReplicaSetMonitor-TaskExecutor] Successfully connected to 42betatest-shard-00-02-dgtt4.gcp.mongodb.net:27017 (1 connections now open to 42betatest-shard-00-02-dgtt4.gcp.mongodb.net:27017 with a 5 second timeout)
Implicit session: session { "id" : UUID("e5c3291d-b04b-4646-ab33-04085a8436ed") }
MongoDB server version: 4.2.0-rc1
WARNING: shell and server versions do not match
MongoDB Enterprise 42BetaTest-shard-0:PRIMARY>

When testing from a 4.2.0-rc2 Windows shell to a 4.2.0-rc1 Atlas cluster:

MongoDB shell version v4.2.0-rc2
Enter password:
connecting to: mongodb://42betatest-shard-00-01-dgtt4.gcp.mongodb.net:27017,42betatest-shard-00-02-dgtt4.gcp.mongodb.net:27017,42betatest-shard-00-00-dgtt4.gcp.mongodb.net:27017/test?authSource=admin&compressors=disabled&gssapiServiceName=mongodb&replicaSet=42BetaTest-shard-0&ssl=true
2019-07-02T13:26:01.810-0400 I  NETWORK  [js] Starting new replica set monitor for 42BetaTest-shard-0/42betatest-shard-00-01-dgtt4.gcp.mongodb.net:27017,42betatest-shard-00-02-dgtt4.gcp.mongodb.net:27017,42betatest-shard-00-00-dgtt4.gcp.mongodb.net:27017
2019-07-02T13:26:01.811-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to 42betatest-shard-00-00-dgtt4.gcp.mongodb.net:27017
2019-07-02T13:26:01.811-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to 42betatest-shard-00-02-dgtt4.gcp.mongodb.net:27017
2019-07-02T13:26:01.811-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to 42betatest-shard-00-01-dgtt4.gcp.mongodb.net:27017
2019-07-02T13:26:02.024-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Ending connection to host 42betatest-shard-00-00-dgtt4.gcp.mongodb.net:27017 due to bad connection status: HostUnreachable: Connection closed by peer; 0 connections to that host remain open
2019-07-02T13:26:02.029-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Ending connection to host 42betatest-shard-00-02-dgtt4.gcp.mongodb.net:27017 due to bad connection status: HostUnreachable: Connection closed by peer; 0 connections to that host remain open
2019-07-02T13:26:02.050-0400 W  NETWORK  [ReplicaSetMonitor-TaskExecutor] Unable to reach primary for set 42BetaTest-shard-0
2019-07-02T13:26:02.051-0400 I  NETWORK  [ReplicaSetMonitor-TaskExecutor] Cannot reach any nodes for set 42BetaTest-shard-0. Please check network connectivity and the status of the set. This has happened for 1 checks in a row.
2019-07-02T13:26:02.052-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Ending connection to host 42betatest-shard-00-01-dgtt4.gcp.mongodb.net:27017 due to bad connection status: HostUnreachable: Connection closed by peer; 0 connections to that host remain open
2019-07-02T13:26:02.338-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Ending connection to host 42betatest-shard-00-00-dgtt4.gcp.mongodb.net:27017 due to bad connection status: HostUnreachable: Connection closed by peer; 0 connections to that host remain open
2019-07-02T13:26:02.339-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Ending connection to host 42betatest-shard-00-01-dgtt4.gcp.mongodb.net:27017 due to bad connection status: HostUnreachable: Connection closed by peer; 0 connections to that host remain open
2019-07-02T13:26:02.340-0400 W  NETWORK  [ReplicaSetMonitor-TaskExecutor] Unable to reach primary for set 42BetaTest-shard-0
2019-07-02T13:26:02.340-0400 I  NETWORK  [ReplicaSetMonitor-TaskExecutor] Cannot reach any nodes for set 42BetaTest-shard-0. Please check network connectivity and the status of the set. This has happened for 2 checks in a row.
2019-07-02T13:26:02.340-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Ending connection to host 42betatest-shard-00-02-dgtt4.gcp.mongodb.net:27017 due to bad connection status: HostUnreachable: Connection closed by peer; 0 connections to that host remain open
2019-07-02T13:26:02.843-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Ending connection to host 42betatest-shard-00-00-dgtt4.gcp.mongodb.net:27017 due to bad connection status: HostUnreachable: Connection closed by peer; 0 connections to that host remain open

The above will loop indefinitely. To verify the 4.2.0-rc2 shell could still connect to non-4.2 servers I also tested connecting to a 3.6.13 Atlas cluster:

MongoDB shell version v4.2.0-rc2
Enter password:
connecting to: mongodb://m10-dev-shard-00-02-dgtt4.mongodb.net:27017,m10-dev-shard-00-00-dgtt4.mongodb.net:27017,m10-dev-shard-00-01-dgtt4.mongodb.net:27017/test?authSource=admin&compressors=disabled&gssapiServiceName=mongodb&replicaSet=m10-dev-shard-0&ssl=true
2019-07-02T13:30:34.669-0400 I  NETWORK  [js] Starting new replica set monitor for m10-dev-shard-0/m10-dev-shard-00-02-dgtt4.mongodb.net:27017,m10-dev-shard-00-00-dgtt4.mongodb.net:27017,m10-dev-shard-00-01-dgtt4.mongodb.net:27017
2019-07-02T13:30:34.670-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to m10-dev-shard-00-02-dgtt4.mongodb.net:27017
2019-07-02T13:30:34.670-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to m10-dev-shard-00-00-dgtt4.mongodb.net:27017
2019-07-02T13:30:34.670-0400 I  CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to m10-dev-shard-00-01-dgtt4.mongodb.net:27017
2019-07-02T13:30:35.433-0400 I  NETWORK  [ReplicaSetMonitor-TaskExecutor] Confirmed replica set for m10-dev-shard-0 is m10-dev-shard-0/m10-dev-shard-00-00-dgtt4.mongodb.net:27017,m10-dev-shard-00-01-dgtt4.mongodb.net:27017,m10-dev-shard-00-02-dgtt4.mongodb.net:27017
Implicit session: session { "id" : UUID("08d396c6-ce2d-4f4c-80a2-60c47f325dc9") }
MongoDB server version: 3.6.13
WARNING: shell and server versions do not match
MongoDB Enterprise m10-dev-shard-0:PRIMARY>

My OS Details are as follows:

systeminfo | findstr /B /C:"OS Name" /C:"OS Version"
OS Name:                   Microsoft Windows 10 Pro
OS Version:                10.0.17763 N/A Build 17763



 Comments   
Comment by Kelsey Schubert [ 21/Jul/19 ]

Yes, this was fully resolved by WT-4913.

Comment by Alexander Gorrod [ 15/Jul/19 ]

mark.benvenuto Can we close this ticket as a duplicate of WT-4913, or is there follow on work required?

Comment by Mark Benvenuto [ 04/Jul/19 ]

I can confirm this repros with 4.2.0-rc2 Windows and the provided Atlas URI. I can also repro this locally without SSL (checksums are disabled when running with SSL post 4.2.0-rc2) between a 4.2.0-rc2 Windows shell and Linux server. 

The server is killing the connection because of the client and server disagree on the message CRC.

2019-07-03T17:26:20.059-0400 I  NETWORK  [conn4] DBException handling request, closing client connection: ChecksumMismatch: OP_MSG checksum does not match contents

 
Linux and Windows use different compilers (GCC vs MSVC) and have different implementations of the CRC-32 function. On Linux, this code is used. On Windows, this code is used. The Linux implementation hard codes the op-codes for the CRC32 instruction while the MSVC solution uses intrinsics.

If we examine the last loop (the following explanation is also true of the first loop which is skipped when computing the CRC in the op_msg CRC case), we can see that different Intel CRC instructions are called. I have provided the raw hex of the assembly.

On GCC/Linux, GDB decoded, AT&T style,

   0x000055555b000f8e <+172>:   89 c6   mov    %eax,%esi
   0x000055555b000f90 <+174>:   89 d1   mov    %edx,%ecx
   0x000055555b000f92 <+176>:   f2 0f 38 f0 f1  crc32b %cl,%esi  <=== 8 bits crc
   0x000055555b000f97 <+181>:   89 f0   mov    %esi,%eax

On MSVC, as decoded by llvm-ojdump, AT&T style,

142397fb6:      0f b6 00        movzbl  (%rax), %eax
142397fb9:      8b 0c 24        movl    (%rsp), %ecx
142397fbc:      f2 0f 38 f1 c8  crc32l  %eax, %ecx     <=== 32 bits crc
142397fc1:      8b c1   movl    %ecx, %eax

As a result of this code generation, we compute different CRCs for anything that is not 8-byte aligned and not a multiple of 8 bytes.

For C code, the MSVC compiler mis-compile the intrinsic as taking a 32-bit value if the header is missing but does not error (even under /Wall). GCC and Clang warn about it undefined. For C++ code, the MSVC compiler will not compile the code since the intrinsic is not defined.

The fix to the code is to add an include for nmmintrin.h.

Generated at Thu Feb 08 04:59:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.