[SERVER-16165] multi-update gets stuck retrying after WriteConflict Created: 14/Nov/14  Updated: 23/Mar/15  Resolved: 26/Nov/14

Status: Closed
Project: Core Server
Component/s: Concurrency
Affects Version/s: 2.8.0-rc0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: David Percy Assignee: Kaloian Manassiev
Resolution: Duplicate Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File backtraces     HTML File log    
Operating System: ALL
Participants:

 Description   

We have a collection with 10 docs, and 2 client threads are operating on it. Each one is repeatedly doing a multi-update on all docs in the collection. The multi-update involves both normal indexes and multikey indexes.

At some point, the server starts logging "[conn5] Got a WriteConflict in the middle of a multi-update, retrying the current doc" over and over, and the clients stop making progress.

Attached are the server log and the backtraces of all threads after attaching gdb to the server.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 26/Nov/14 ]

Confirmed SERVER-16143 fixes this as well.

Comment by Eliot Horowitz (Inactive) [ 18/Nov/14 ]

Almost positive this is the same root as issue as SERVER-16143

Comment by J Rassi [ 18/Nov/14 ]

The test in question spawns 64 threads performing multi-updates. With David's repro script, I can reproduce a scenario in which 2 of them enter an infinite loop, and the other 62 are waiting on a lock:

  • 1 multi-update thread is waiting to upgrade collection IX=>X for IndexCatalogEntry::setMultiKey().
  • 61 multi-update threads are waiting for collection IX, to start the multi-update.
  • 2 multi-update threads are holding collection IX, and are in an infinite loop. The threads get a return value of WT_ROLLBACK from a call to (WT_CURSOR*)->update() (in WiredTigerRecordStore::updateRecord()), then log "Had WriteConflict in the middle of a multi-update, retrying the current update", then go back to the beginning of the while(1) loop in UpdateStage::work().

eliot, kaloian.manassiev: one of you want to take it from here?

Generated at Thu Feb 08 03:40:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.