[SERVER-55248] CRUD state on FSM workload can fail if executed concurrently with a rename Created: 17/Mar/21  Updated: 29/Oct/23  Resolved: 24/Mar/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Marcos José Grillo Ramirez Assignee: Pierlauro Sciarelli
Resolution: Fixed Votes: 0
Labels: PM-1965-Milestone-1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
is caused by SERVER-52813 No-stepdown concurrency suites for th... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

On the CRUD/DDL fsm workload there is an error that can happen if an update happens during a rename:

 Error: write failed with error: {
    "nMatched" : 0,
    "nUpserted" : 0,
    "nModified" : 0,
    "writeError" : {
        "code" : 175,
        "errmsg" : "collection renamed from 'test0_fsmdb0.fsmcoll00' to 'test0_fsmdb0.fsmcoll01615840135680'. UUID 020b4e1f-f1d1-4af6-8c1a-371e774082ec"
    }
 }
 
_getErrorWithCode@src/mongo/shell/utils.js:25:13
quietlyDoAssert@jstests/concurrency/fsm_libs/assert.js:53:18
assert.writeOK@src/mongo/shell/assert.js:863:13
_assertCommandWorked@src/mongo/shell/assert.js:704:17
assert.commandWorked@src/mongo/shell/assert.js:811:16
wrapAssertFn@jstests/concurrency/fsm_libs/assert.js:65:13
assertWithLevel/</assertWithLevel[fn]@jstests/concurrency/fsm_libs/assert.js:104:13
CRUD@jstests\concurrency\fsm_workloads\random_DDL_CRUD_operations.js:141:13
runFSM@jstests/concurrency/fsm_libs/fsm.js:132:17
@eval:8:9
main@jstests/concurrency/fsm_libs/worker_thread.js:217:17
@eval:5:12
@eval:3:24
_threadStartWrapper@:26:16

This ticket can be solved by surrounding the write operations with a try/catch and ignore this specific error because it is related to the query planner. Before that we should ensure that:

  • The same error happens on insert and remove
  • The error is the same if we are running a replica set


 Comments   
Comment by Githook User [ 24/Mar/21 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-55248 CRUD state on FSM workload can fail if executed concurrently with a rename (part 2)
Branch: master
https://github.com/mongodb/mongo/commit/cfa7a4ead4b5ab08d3ab61460b9e7fea1c547b20

Comment by Githook User [ 19/Mar/21 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-55248 CRUD state on FSM workload can fail if executed concurrently with a rename
Branch: master
https://github.com/mongodb/mongo/commit/f3919cf092f76e317d539a1cde33cf2123c54055

Comment by Pierlauro Sciarelli [ 17/Mar/21 ]

The root cause is described in SERVER-31695 even though the context is different (not happening during initial sync): it's possible that a rename happens after yielding during an update, both on replicasets and shareded clusters. The same issue could arise during multi-document insert or remove.

It should be safe to catch QueryPlanKilled errors and ignore them.

Generated at Thu Feb 08 05:35:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.