[SERVER-16717] unexpected WriteConflict exceptions with WiredTiger b-tree Created: 05/Jan/15 Updated: 23/Jan/15 Resolved: 22/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 2.8.0-rc4 |
| Fix Version/s: | 3.0.0-rc6 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Mark Callaghan | Assignee: | Michael Cahill (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | wiredtiger | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Steps To Reproduce: | Start mongod: Run iibench (https://github.com/tmcallaghan/iibench-mongodb) with a few changes to run.simple.bash: I also edited jmongoiibench.java to increase the client timeout to avoid errors on checkpoint stalls. I used this mongo.conf: |
| Participants: |
| Description |
|
This occurs when running iibench with the WiredTiger b-tree. This does not occur with mmapv1, wiredtiger LSM, tokumx or rocksdb engines for MongoDB. While I won't rule out the client code, it looks like a WT bug to me. It seems to occur around the time of a checkpoint stall. The iibench client prints this on an error: And the mongod error log has: keyUpdates:0 reslen:40 12929ms keyUpdates:0 reslen:40 12894ms keyUpdates:0 reslen:40 12907ms ninserted:0 keyUpdates:0 exception: WriteConflict code:112 4204ms This also occurs when I use a small cache for WT which should make checkpoint stalls less of an issue. |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 22/Jan/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Cause by | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mark Callaghan [ 16/Jan/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Is this code the source of the duplicate key error? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mark Callaghan [ 16/Jan/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The pattern is first I get the bogus write conflict exception and then I get a duplicate key exception. I don't see how the duplicate key exception can occur give that the collection has 3 non-unique secondary indexes and doesn't set the _id on insert. See https://github.com/tmcallaghan/iibench-mongodb for the code.
Code that generates attribute values for new documents.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mark Callaghan [ 16/Jan/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can someone label this as a WT bug? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 15/Jan/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi mdcallag, thanks for the report and sorry to waste your time. We will add logging for this case. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mark Callaghan [ 05/Jan/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Given the comment about this being a pathological case, can you add logging for it and for other pathological cases to avoid the need to attach gdb?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mark Callaghan [ 05/Jan/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Looking at code, WriteConflict is thrown for WT_ROLLBACK, see wiretiger_util.cpp
And then in WiredTiger source
And from running with gdb attached, that "return(WT_ROLLBACK)" appears to be the cause
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mark Callaghan [ 05/Jan/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The error is more likely with zlib compression and using a smaller cache (--wiredTigerCacheSizeGB 4) might also make it more likely. The iibench client code does not set the _id field in documents. In a test I just started, there are 4 exceptions 68 seconds into the test. All my tests run with 10 client threads via: export NUM_LOADER_THREADS=10 |