[SERVER-65930] DDL coordinators and rename participant initial checkpoint may incur in DuplicateKey error Created: 25/Apr/22  Updated: 29/Oct/23  Resolved: 16/May/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.3.2, 6.0.0-rc7, 5.0.10, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Pierlauro Sciarelli Assignee: Allison Easton
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
is duplicated by SERVER-65340 Operations hang when re-using dropped... Closed
Related
is related to SERVER-66336 ConfigsvrCoordinators initial checkpo... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.3, v5.0
Sprint: Sharding EMEA 2022-05-16, Sharding EMEA 2022-05-30
Participants:

 Description   

Each DDL coordinator is calling _inserStateDocument to initially checkpoint the received operation on disk. Since the write is using a concern with timeout , it could happen the following:

  1. DDL coordinator starts and calls _insertStateDocument
  2. The document is locally written but not yet majority committed
  3. The write concern timeout is hit
  4. The coordinator retries
  5. The retry fails because the document had already been inserted so a DuplicateKey error is thrown

In some cases, such as for renameCollection, the result is that the DDL coordinator document remains on disk but the in-memory instance is released because of the exception. When this happens, the only way to resume the coordinator is either having the user invoke again the operation, either having a new node stepping on the source database's primary shard.

[EDIT] Also the rename participant can incur in the same problem since the implemented logic is the same.



 Comments   
Comment by Githook User [ 19/May/22 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-65930 DDL coordinators and rename participant initial checkpoint may incur in DuplicateKey error

(cherry picked from commit 1eb5a9257b3bfc0c768b342d73c3668cc6566841)
Branch: v5.0
https://github.com/mongodb/mongo/commit/ea66e125bf6368da4146c1d8974c595e29787542

Comment by Githook User [ 19/May/22 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-65930 DDL coordinators and rename participant initial checkpoint may incur in DuplicateKey error

(cherry picked from commit 1eb5a9257b3bfc0c768b342d73c3668cc6566841)
Branch: v5.3
https://github.com/mongodb/mongo/commit/2bd6810b6afe06530969d77ffd3931978feef8c3

Comment by Githook User [ 17/May/22 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-65930 DDL coordinators and rename participant initial checkpoint may incur in DuplicateKey error
Branch: v6.0
https://github.com/mongodb/mongo/commit/1a44e197ee6d2e8ebdec97f8fd817619e84aacaa

Comment by Githook User [ 16/May/22 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-65930 DDL coordinators and rename participant initial checkpoint may incur in DuplicateKey error
Branch: master
https://github.com/mongodb/mongo/commit/1eb5a9257b3bfc0c768b342d73c3668cc6566841

Generated at Thu Feb 08 06:04:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.