[SERVER-31995] Logged initial sync statistics may exceed 16mb causing fassert Created: 16/Nov/17 Updated: 20/Dec/23 Resolved: 02/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.4.2 |
| Fix Version/s: | 3.4.16, 3.6.6, 4.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Benoit Bui | Assignee: | Benety Goh |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | initialSync | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v3.6, v3.4
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 11 | ||||||||||||||||||||
| Description |
|
Hi Team, We have a 10 shards (Primary / Secondary / Arbiter) sharded cluster which hosts 70k databases. Here's the repartition on the shards:
We're currently experiencing issues to resync this shard from scratch with the following error:
On another cluster with the same architecture but less databases per shards, we do not encounter this issue. We plan to upgrade from version 3.4.4 to 3.4.10 but we haven't found anything related to this issue in changelog. Thanks. Regards, |
| Comments |
| Comment by Githook User [ 22/May/18 ] | |||||||||
|
Author: {'username': 'benety', 'name': 'Benety Goh', 'email': 'benety@mongodb.com'}Message: (cherry picked from commit 1a605a99bdc242fd957afdb6e9fe9b8f9c32c862) | |||||||||
| Comment by Githook User [ 22/May/18 ] | |||||||||
|
Author: {'username': 'benety', 'name': 'Benety Goh', 'email': 'benety@mongodb.com'}Message: (cherry picked from commit 6fdc0e7d43a3a6550f8d93ec2cdfa25e23a0bdba) | |||||||||
| Comment by Githook User [ 22/May/18 ] | |||||||||
|
Author: {'username': 'benety', 'name': 'Benety Goh', 'email': 'benety@mongodb.com'}Message: (cherry picked from commit 880de296d30736f6005e283aa7737bb2f335dc61) | |||||||||
| Comment by Githook User [ 22/May/18 ] | |||||||||
|
Author: {'username': 'benety', 'name': 'Benety Goh', 'email': 'benety@mongodb.com'}Message: (cherry picked from commit 1a605a99bdc242fd957afdb6e9fe9b8f9c32c862) | |||||||||
| Comment by Githook User [ 22/May/18 ] | |||||||||
|
Author: {'username': 'benety', 'name': 'Benety Goh', 'email': 'benety@mongodb.com'}Message: (cherry picked from commit 6fdc0e7d43a3a6550f8d93ec2cdfa25e23a0bdba) | |||||||||
| Comment by Githook User [ 02/May/18 ] | |||||||||
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: | |||||||||
| Comment by Githook User [ 02/May/18 ] | |||||||||
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: | |||||||||
| Comment by Githook User [ 02/May/18 ] | |||||||||
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: | |||||||||
| Comment by Systems [ 27/Apr/18 ] | |||||||||
|
Is there a current workaround to this? When is the fix expected to be out for this issue? | |||||||||
| Comment by Systems [ 27/Apr/18 ] | |||||||||
|
I am experiencing same issue here. After loss of 2 of the 3 total replica members we attempt a re-sync from the remaining primary After initial clean sync from primary completes mongod crashes (complete trace attached) [replication-100] DBException::toString(): 10334 BSONObj size: 38694152 (0x24E6D08) is invalid. Size must be between 0 and 16793600(16MB) First element: databasesCloned: 56584 Upon mongod restart mongod attempts a full re-sync and fails [replication-1] dropAllDatabasesExceptLocal 56592 db version v3.4.10 | |||||||||
| Comment by Spencer Brody (Inactive) [ 25/Apr/18 ] | |||||||||
|
The log message that causes this issue appears to have been added in https://github.com/mongodb/mongo/commit/c911be4e42994ad6106d12ca6a760c255e5d0452#diff-64adfcffbeb50a1887b7aa86d2689bfcR740, which landed in 3.3.12. So I don't expect 3.4.1 to be any better than 3.4.2 at avoiding this issue. This is a problem in all versions of 3.4. | |||||||||
| Comment by Kelsey Schubert [ 16/Feb/18 ] | |||||||||
|
Hi BenoitSIB, Sorry for the delay getting back to you, we've confirmed this issue and I've updated the ticket summary to better describe this issue and marked this ticket to be scheduled against currently planned work. Please continue to watch for updates. Kind regards, | |||||||||
| Comment by Anthony Brodard [ 13/Feb/18 ] | |||||||||
|
Hi Kelsey, Are you able to reproduce the issue on your side ? Regards, | |||||||||
| Comment by Anthony Brodard [ 01/Dec/17 ] | |||||||||
|
Hi Andy, We have upgraded our cluster in the latest version (3.4.10). We still have the same issue. Thank you, | |||||||||
| Comment by Benoit Bui [ 16/Nov/17 ] | |||||||||
|
clust-users-2-shard3-2.log Thanks. | |||||||||
| Comment by Andy Schwerin [ 16/Nov/17 ] | |||||||||
|
Can you upload the entire log? Or at least from the beginning of the extract you provided above to the end of the log file? | |||||||||
| Comment by Anthony Brodard [ 16/Nov/17 ] | |||||||||
|
Hello, The error seems to come from https://github.com/mongodb/mongo/blob/r3.4.4/src/mongo/db/repl/initial_syncer.cpp#L1073 Thank you, | |||||||||
| Comment by Benoit Bui [ 16/Nov/17 ] | |||||||||
|
Hi, A more complete extract from the log:
So it finished the initial sync and failed (and crashed) right after. Regards, |