[SERVER-67962] Applying config.image_collection deletes needs better concurrency control Created: 11/Jul/22  Updated: 01/Nov/23  Resolved: 24/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.0.0-rc1, 5.3.0, 5.2.0, 5.1.0, 4.2.16, 4.0.27, 5.0.3, 4.4.9
Fix Version/s: 7.1.0-rc0, 7.0.4

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: repl-shortlist
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-80791 Potential data consistency issue with... Closed
related to SERVER-81423 Prevent the fuzzer from generating wr... Closed
is related to SERVER-69497 Have internal_sessions_reaping_basic.... Closed
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.0
Sprint: Repl 2022-08-08, Repl 2022-08-22, Repl 2022-09-05, Repl 2023-07-24
Participants:
Linked BF Score: 152

 Description   

On secondaries, inserts/updates into config.image_collection happen inside the transactions that perform the data write. But deletes are replicated explicitly. Thus if a batch contains both a data write that wants to upsert an image_collection document as well as a delete of that document, they can be executed out of order by different threads.

This can manifest in two ways:

  • If this is a new image_collection document, that document will be leaked on secondaries that do the writes out of order.
  • If the this is an update to an existing image_collection document, an out of order write assertion will crash the process. No data is corrupted. Restarting the secondary will eventually succeed.

Also note that tripping this bug requires an unlikely ingredient. Logical sessions on primaries are reaped after a configured amount of inactivity (minutes). This bug requires either:

  • A secondary applying a batch that spans an entire reaping window.
  • A client that stops using a logical session for long enough to reap it. But then uses the LSID again right as its being reaped.


 Comments   
Comment by Githook User [ 31/Oct/23 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-67962 Applying config.image_collection deletes needs better concurrency control
Branch: v7.0
https://github.com/mongodb/mongo/commit/65d4425e4635073876ad7d92649acfbd89de26c7

Comment by Githook User [ 22/Jul/23 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-67962 Applying config.image_collection deletes needs better concurrency control
Branch: master
https://github.com/mongodb/mongo/commit/8c1a7aceb40c7156dbc43b66e2a04cccf28fb013

Generated at Thu Feb 08 06:09:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.