[SERVER-15998] Don't fall-through on WriteConflictExceptions in write command update Created: 06/Nov/14  Updated: 11/Jul/16  Resolved: 07/Nov/14

Status: Closed
Project: Core Server
Component/s: Concurrency, Write Ops
Affects Version/s: None
Fix Version/s: 2.8.0-rc0

Type: Bug Priority: Major - P3
Reporter: Daniel Pasette (Inactive) Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-16006 write conflict on single-update isn't... Closed
Operating System: ALL
Steps To Reproduce:

python benchrun.py -f testcases/simple_update_mms.js -t 1 2 4 8 --nodyno --mongo-repo-path ../mongo/

Participants:

 Description   

WriteConflictException was not being handled properly by the write command update executor. This caused them to erroneously appear to succeed.

Original Description
The update_mms tests suddenly stopped scaling at all, where last week they did. This is something in the mongo tree, not the WT library as I keep that constant during my testing here and see very different results.

I spent a bit of time today narrowing it down and the issue happens between changeset c84d5ea and changeset 7f9e3f3 (FYI, c84d5ea is a parent of 7f9e3f3). The diff between these two changesets is very small and attached.

Here is the page of commits from GitHub:
https://github.com/wiredtiger/mongo/commits/master?page=9

Here are the results. I only show Update.MmsIncShallow1 although all show similar patterns.

GOOD:
db version: 2.7.9-pre-
c84d5ea9f484948ba4f52bf00274bd84e45b837e
 
Update.MmsIncShallow1
1       10495.389587181788
1       10455.777272892861
2       15549.864735174962
2       15670.030276924275
4       25134.398199163585
4       25159.574244737894
8       32214.23890536317
8       32237.931283570488
Finished Testing.

BAD:
db version: 2.7.9-pre-
7f9e3f383b0bd77e5119a5d57883631950c1a6ff
 
Update.MmsIncShallow1
1       10431.11424139814
1       10637.16293351225
2       11245.63626083444
2       11264.545902335614
4       10201.82719233331
4       11287.419799605117
8       12088.974243081819
8       12127.817603989664
Finished Testing.



 Comments   
Comment by Greg Studer [ 10/Nov/14 ]

Opened one for you - SERVER-16046

Comment by Asya Kamsky [ 10/Nov/14 ]

I've got a build with this fix now, will see if it happens again - if so I will open a new ticket (maybe I should any way?)

Comment by Greg Studer [ 10/Nov/14 ]

Not sure this is the same issue - this was a very specific problem where a write wouldn't get retried if there was a conflict (but would report success). Here it seems like logOp() doesn't expect WriteConflicts in some cases.

Definitely an issue though, will ask repl team.

Comment by Asya Kamsky [ 10/Nov/14 ]

I think I just ran into this one... I'm on up-to-date master and I have this which appears to be identical, on a single document update by _id:

2014-11-10T08:34:33.382-0500 F REPLSETS [conn18] Fatal DBException in logOp(): 112 WriteConflict
2014-11-10T08:34:33.389-0500 F -        [conn18] terminate() called.
 
 0x101bf96 0x101bf39 0x7fcc3c910846 0x7fcc3c910873 0xcbb5b9 0xa73176 0xa7395e 0xc3437e 0xc34dff 0xb7a09f 0x9e113c 0x9e18a5 0x9e1aec 0x9e2155 0x9e510d 0xa0202d 0xa02da7 0xa038e1 0xc1fe6a 0xafff7a 0x7fc9ee 0xfd608d 0x7fcc3cf6ee9a 0x7fcc3c07e31d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"C1BF96"},{"b":"400000","o":"C1BF39"},{"b":"7FCC3C85B000","o":"B5846"},{"b":"7FCC3C85B000","o":"B5873"},{"b":"400000","o":"8BB5B9"},{"b":"400000","o":"673176"},{"b":"400000","o":"67395E"},{"b":"400000","o":"83437E"},{"b":"400000","o":"834DFF"},{"b":"400000","o":"77A09F"},{"b":"400000","o":"5E113C"},{"b":"400000","o":"5E18A5"},{"b":"400000","o":"5E1AEC"},{"b":"400000","o":"5E2155"},{"b":"400000","o":"5E510D"},{"b":"400000","o":"60202D"},{"b":"400000","o":"602DA7"},{"b":"400000","o":"6038E1"},{"b":"400000","o":"81FE6A"},{"b":"400000","o":"6FFF7A"},{"b":"400000","o":"3FC9EE"},{"b":"400000","o":"BD608D"},{"b":"7FCC3CF67000","o":"7E9A"},{"b":"7FCC3BF8A000","o":"F431D"}],"processInfo":{ "mongodbVersion" : "2.7.9-pre-", "gitVersion" : "70dff8ebbac4ddc24ad5a742f7a27644a25ebfc8", "uname" : { "sysname" : "Linux", "release" : "3.5.0-54-generic", "version" : "#81~precise1-Ubuntu SMP Tue Jul 15 04:02:22 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "468C50BB421B3B21CB9AF219DEA41B331A318C7F" }, { "b" : "7FFFBF8FF000", "elfType" : 3, "buildId" : "CAB4773A6DC38342CE3FAEC3A776EF5FC8A277D4" }, { "b" : "7FCC3D184000", "path" : "/home/asya/wiredtiger/LOCAL_INSTALL/lib/libwiredtiger-2.4.2.so", "elfType" : 3, "buildId" : "7C4BC69DD4195055E3D917E76604B96A62C49C20" }, { "b" : "7FCC3CF67000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "C340AF9DEE97C17C730F7D03693286C5194A46B8" }, { "b" : "7FCC3CD5F000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "352C5B373A50E6C4AB881A5DB6F5766FDF81EEE0" }, { "b" : "7FCC3CB5B000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "D181AF551DBBC43E9D55913D532635FDE18E7C4E" }, { "b" : "7FCC3C85B000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "B534DA725D06A04267EB2FEB92B9CC14C838B57B" }, { "b" : "7FCC3C55F000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "817AA99B3DD02501F8BC04A3E9A9358A08F20D7D" }, { "b" : "7FCC3C349000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "ECF322A96E26633C5D10F18215170DD4395AF82C" }, { "b" : "7FCC3BF8A000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "E7863FF2EA3F4B5C6FF90689ADE4D6B5F3A78020" }, { "b" : "7FCC3D474000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "E25AD1A11CCF57E734116B8EC9C69F643DCA9F18" }, { "b" : "7FCC3BD73000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "F695ECFCF3918D5D34989398A14B7ECDD9F46CD0" }, { "b" : "7FCC3BB6D000", "path" : "/usr/lib/libsnappy.so.1", "elfType" : 3, "buildId" : "73722FE265941076386865069324C698E3398297" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x26) [0x101bf96]
 mongod(+0xC1BF39) [0x101bf39]
 libstdc++.so.6(+0xB5846) [0x7fcc3c910846]
 libstdc++.so.6(+0xB5873) [0x7fcc3c910873]
 mongod(_ZN5mongo4repl5logOpEPNS_16OperationContextEPKcS4_RKNS_7BSONObjEPS5_Pbb+0x1E9) [0xcbb5b9]
 mongod(_ZN5mongo11UpdateStage18transformAndUpdateERNS_7BSONObjERNS_7DiskLocE+0x746) [0xa73176]
 mongod(_ZN5mongo11UpdateStage4workEPm+0x35E) [0xa7395e]
 mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_7DiskLocE+0xAE) [0xc3437e]
 mongod(_ZN5mongo12PlanExecutor11executePlanEv+0x3F) [0xc34dff]
 mongod(_ZN5mongo14UpdateExecutor7executeEPNS_8DatabaseE+0x9F) [0xb7a09f]
 mongod(+0x5E113C) [0x9e113c]
 mongod(_ZN5mongo18WriteBatchExecutor10execUpdateERKNS_12BatchItemRefEPNS_7BSONObjEPPNS_16WriteErrorDetailE+0x75) [0x9e18a5]
 mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE+0x13C) [0x9e1aec]
 mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x455) [0x9e2155]
 mongod(_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x1FD) [0x9e510d]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3D) [0xa0202d]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xB47) [0xa02da7]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x561) [0xa038e1]
 mongod(_ZN5mongo11newRunQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERNS_5CurOpES3_b+0x76A) [0xc1fe6a]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xC3A) [0xafff7a]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xAE) [0x7fc9ee]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x4BD) [0xfd608d]
 libpthread.so.0(+0x7E9A) [0x7fcc3cf6ee9a]
 libc.so.6(clone+0x6D) [0x7fcc3c07e31d]
-----  END BACKTRACE  -----

Comment by Githook User [ 07/Nov/14 ]

Author:

{u'username': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}

Message: SERVER-15998 don't fall-through on WriteConflictExceptions in write command update
Branch: master
https://github.com/mongodb/mongo/commit/be3bb1821242a7704ca1c6f6621bdd6abc4acfb4

Comment by Daniel Pasette (Inactive) [ 06/Nov/14 ]

Greg and I found where we're not handling the conflict correctly in writeCmd's

Comment by Daniel Pasette (Inactive) [ 06/Nov/14 ]

Thanks for that info. The crucial difference is the writeCmd.
I'm able to repro when I run with writeCmd=false. It disappears when I run with writeCmd=true. This seems like the WriteConflict exception is not being handled properly.

BAD (writeCmd=false)
dan:(git)mongo-perf[master]/$ python benchrun.py -m 1 -s ../mongo/mongo --mongo-repo-path ../mongo --writeCmd=false --trialTime 30 -t 1 1 2 2 4 4 8 8 -f testcases/simple_update_mms.js
db version: 2.7.9-pre-
62dd588ef3d22480feaa4ae8339785e94f199bcf
 
Update.MmsIncShallow1
1	3342.007601493934
2	3355.7675812491248
4	3190.0981064374614
8	3271.416252794708

GOOD (writeCmd=true)
dan:(git)mongo-perf[master]/$ python benchrun.py -m 1 -s ../mongo/mongo --mongo-repo-path ../mongo --writeCmd=true --trialTime 30 -t 1 2 4 8 -f testcases/simple_update_mms.js
db version: 2.7.9-pre-
62dd588ef3d22480feaa4ae8339785e94f199bcf
 
Update.MmsIncShallow1
1	3236.3579273469445
2	7846.795260829793
4	17047.81209983696
8	25630.17061336483

Comment by Susan LoVerso [ 06/Nov/14 ]

I am using an AWS HDD system, m2.4xlarge. 8 cores. Linux (I also turned WT logging off only because that was the baseline I had from the previous week.) Not sure what the difference is. Both my good numbers and bad numbers are very consistent with each other for all changesets I tested.

[sue@ip-10-159-40-39 (1) ~]$ uname -a
Linux ip-10-159-40-39 3.4.76-65.111.amzn1.x86_64 #1 SMP Tue Jan 14 21:06:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

My mongo-perf command is:

python benchrun.py -m 1 -s ../mongo/mongo --mongo-repo-path ../mongo --writeCmd=false --trialTime 30 -t 1 1 2 2 4 4 8 8 -f testcases/simple_update_mms.js

Comment by Daniel Pasette (Inactive) [ 06/Nov/14 ]

sue, I'm not able to reproduce this on an 8 core linux box. What OS are you using?

dan:(git)mongo-perf[master]/$ python benchrun.py -f testcases/simple_update_mms.js -t 1 4 8 16 --nodyno --shell ../mongo/mongo --mongo-repo-path ../mongo
db version: 2.7.9-pre-
62dd588ef3d22480feaa4ae8339785e94f199bcf
 
Update.MmsIncShallow1
1	3346.1623507254394
4	19005.22801025671
8	27757.409364448846
16	29142.766271238834

Comment by Matt Kangas [ 06/Nov/14 ]

Commit range noted above: https://github.com/wiredtiger/mongo/compare/c84d5ea...7f9e3f3

Generated at Thu Feb 08 03:39:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.