[SERVER-16667] Isolated writes with extreme latencies in replica set under WiredTiger Created: 24/Dec/14  Updated: 18/Jan/15  Resolved: 06/Jan/15

Status: Closed
Project: Core Server
Component/s: Concurrency
Affects Version/s: 2.8.0-rc4
Fix Version/s: 2.8.0-rc5

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: wiredtiger
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File capped-truncate-latencies.png     Text File gdbmon.log     PNG File oplog-inserts-overtake-deletes.png     PNG File profile.png     PNG File wt-pre4-jp-100-rs0.png     PNG File wt-pre4-jp-100-rs1.png     PNG File wt-schwerin-jp-100-rs1-fix.png     PNG File wt-schwerin-jp-100-rs1-nofix.png    
Issue Links:
Duplicate
is duplicated by SERVER-16868 Significant slowdown in capped collec... Closed
Related
is related to SERVER-16651 Mass deletions and updates (including... Closed
is related to SERVER-16640 Long running queries on the secondary... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   
  • tests run on Ubuntu, VMWare, 6 cores.
  • heavy mixed workload (read, update, insert) with 100 clients
  • using a recent pre-rc4 build from master (6105f06402fe1e7578d41f3e4e583a1476ef2455); subsequently reproduced under rc4.

Run with replica set of 1:

  • Throughout the run mongod write queues (second line) sporadically build, indicating a write bottleneck in mongod. This may be the same issue as SERVER-16651.
  • At a couple points during the run (A, B, C), mongod log reports isolated (single) write operations (updates or inserts) taking extreme times, up to 111891 ms in this test (you can read this off of the "max logged query" graph: vertical axis is longest operation, in ms, logged in each 0.1 second interval). Unknown if this is the same bottleneck or a different issue, so opening this issue to investigate.
  • The pattern seen in the graph above of 3 or so such isolated extreme latencies of increasing size throughout the run is repeatable with this workload. The length of each one is roughly double the preceding one.

For comparison, here's the same workload standalone:

  • Write queues do not build.
  • We see pauses in WT (e.g. A, B; subject of SERVER-16662). Those pauses are less evident in the replica set case above, probably because the mongod bottleneck in that case throttles the workload going into WT.
  • Associated with pauses are numerous long latency operations around 11 seconds, but no extreme long-latency operations as seen in the preceding test.
  • Overall throughput is much higher in the standalone case, possibly because of the bottleneck in the replica set case.

Here are the specific queries, two updates and one insert, that showed extreme latencies:

2014-12-24T12:17:14.182-0500 I QUERY    [conn95] command testsrv.$cmd command: update { update: "data", writeConcern: {}, ordered: true, updates: [ { q: { _id: { o: ObjectId('549af4f61d41c883ae5622d2'), t: "This is a string, not a small string either, it's in here to make the index larger", seq: -1729382253017956352 } }, u: { $set: { f0: -1097749151, f1: -1097691211, f2: "bkfiv   Wfi vWfiv Wf ivW fivW fivW  fiv Wfiv Wf i ", f3: "dsz   szs zsz szs zs zszszs z szsz  szszs z szs zs", f4: "xa jmH szsz sz sz sz sz szszs zszsz sz  s zszszs z", f5: "szsz  szs zsz szs zs zszszs z szsz  szszs z szs zs", f6: "ivWfi  vWfi vWf ivW fiv  Wf i vW f  ivW f ivWfiv W", f7: "yvWfiv Wfiv Wf iv Wf iv WfivW fivWf iv  W fivWfi v", f8: "mH s   zszs zsz szs zsz  sz s zs z  szs z szszsz s", f9: "rqjmH   szs zszsz sz szs zszs zszs  zsz szsz sz s ", fa: 47242, fb: 91501, fc: "ajmH s zszs zs zszsz sz  szsz szsz  szszs zszsz sz", fd: "mHszsz szsz sz sz sz sz szszs zszsz sz  s zszszs z", fe: "uHsz  szs zsz szs zs zszszs z szsz  szszs z szs zs", ff: "zszsz   szs zszsz sz szs zszs zszs  zsz szsz sz s ", f10: "kfivW f i v Wf ivWfi vW fi  v Wfiv  WfivW f ivW fi", f11: "tkfiv Wfi v W f i vWfiv Wf iv WfivW fivWf ivWfi vW", f12: "zszs   zszs zsz szs zsz  sz s zs z  szs z szszsz s", f13: "pKNkf   ivW fivWf iv Wfi vWfi vWfi  vWf ivWf iv W ", f14: 51733, f15: 90312, f16: "qjmHs zs zszs zsz s zsz szs zsz szs zszsz sz szsz ", f17: "yv W   fivW fiv Wfi vWf  iv W fi v  Wfi v WfivWf i", f18: "bkf i vWfiv Wf ivWf ivW fiv W f iv  Wfi v WfivW fi", f19: "ajmHs zs zszs zsz s zsz szs zsz szs zszsz sz szsz ", f1a: "bkfivW fivW fiv Wfi vWf  iv W fi v  Wfi v WfivWf i", f1b: "cnajmH  szs zszsz sz szs zszs zszs  zsz szsz sz s ", f1c: "cn a   jmHs zsz szs zsz  sz s zs z  szs z szszsz s", f1d: "gt k   fivW fiv Wfi vWf  iv W fi v  Wfi v WfivWf i" } }, multi: false, upsert: false } ] } ntoreturn:1 keyUpdates:0  reslen:95 24494ms
2014-12-24T12:18:06.958-0500 I QUERY    [conn100] command testsrv.$cmd command: insert { insert: "data", writeConcern: {}, ordered: true, documents: [ { _id: { o: ObjectId('549af4f61d41c883a908b7c2'), t: "This is a string, not a small string either, it's in here to make the index larger", seq: 3819052484899373056 }, f0: 1826216753, f1: 1826277117, f2: "bkfivW fivW fiv Wfi vWf  iv W fi v  Wfi v WfivWf i", f3: "lKNkf iv  W f ivW f ivWf ivWf ivWf ivWfiv WfivW fi", f4: "wNkfi  vWfi vWf ivW fiv  Wf i vW f  ivW f ivWfiv W", f5: "qj m   Hszs zsz szs zsz  sz s zs z  szs z szszsz s", f6: "xajmH   szs zszsz sz szs zszs zszs  zsz szsz sz s ", f7: "oHszs zs zszs zsz s zsz szs zsz szs zszsz sz szsz ", f8: "zsz   szs zsz szs zs zszszs z szsz  szszs z szs zs", f9: "sz s   zszs zsz szs zsz  sz s zs z  szs z szszsz s", fa: 54913, fb: 94391, fc: "szsz  szs zsz szs zs zszszs z szsz  szszs z szs zs", fd: "ivWfi vW fivW fiv W fiv Wfi vWf ivW fivWf iv Wfiv ", fe: "jmHszs zszs zsz szs zsz  sz s zs z  szs z szszsz s", ff: "fivWf iv  W f ivW f ivWf ivWf ivWf ivWfiv WfivW fi", f10: "rq jmH szsz sz sz sz sz szszs zszsz sz  s zszszs z", f11: "wNkfi vW fivW fiv W fiv Wfi vWf ivW fivWf iv Wfiv ", f12: "lKN k fivWf iv Wfiv Wfi vWf i v Wf  ivW f ivWfi vW", f13: "vWf i vWfiv Wf ivWf ivW fiv W f iv  Wfi v WfivW fi", f14: 50071, f15: 107864, f16: "mHsz    szs zszsz sz szs zszs zszs  zsz szsz sz s ", f17: "ezszsz szsz sz sz sz sz szszs zszsz sz  s zszszs z", f18: "yvWfiv Wfiv Wf iv Wf iv WfivW fivWf iv  W fivWfi v", f19: "tkfiv Wf  i v Wfi v Wfiv Wfiv Wfiv WfivWf ivWfi vW", f1a: "ivWfi  vWfi vWf ivW fiv  Wf i vW f  ivW f ivWfiv W", f1b: "fivWf   ivW fivWf iv Wfi vWfi vWfi  vWf ivWf iv W ", f1c: "mHsz    szs zszsz sz szs zszs zszs  zsz szsz sz s ", f1d: "oHsz    szs zszsz sz szs zszs zszs  zsz szsz sz s " } ] } ntoreturn:1 keyUpdates:0  reslen:80 52917ms
2014-12-24T12:19:58.656-0500 I QUERY    [conn39] command testsrv.$cmd command: update { update: "data", writeConcern: {}, ordered: true, updates: [ { q: { _id: { o: ObjectId('549af4f61d41c8837743de22'), t: "This is a string, not a small string either, it's in here to make the index larger", seq: 3602879702735257600 } }, u: { $set: { f0: -1097747785, f1: -1097704122, f2: "rqj m Hszsz sz szsz szs zsz s z sz  szs z szszs zs", f3: "mHszs zszsz sz szsz szs zsz s z sz  szs z szszs zs", f4: "bkfivW fivW fiv Wfi vWf  iv W fi v  Wfi v WfivWf i", f5: "vWfiv   Wfi vWfiv Wf ivW fivW fivW  fiv Wfiv Wf i ", f6: "fivW   fivW fiv Wfi vWf  iv W fi v  Wfi v WfivWf i", f7: "xajmH sz  s z szs z szsz szsz szsz szszsz szszs zs", f8: "jmH   szs zsz szs zs zszszs z szsz  szszs z szs zs", f9: "ajmH s zszs zs zszsz sz  szsz szsz  szszs zszsz sz", fa: 44944, fb: 82734, fc: "qjmHs zs zszs zsz s zsz szs zsz szs zszsz sz szsz ", fd: "uH s   zszs zsz szs zsz  sz s zs z  szs z szszsz s", fe: "mHszsz szsz sz sz sz sz szszs zszsz sz  s zszszs z", ff: "sz s   zszs zsz szs zsz  sz s zs z  szs z szszsz s", f10: "dszszs zszs zs zszsz sz  szsz szsz  szszs zszsz sz", f11: "tkfiv   Wfi vWfiv Wf ivW fivW fivW  fiv Wfiv Wf i ", f12: "ezszs zszsz sz szsz szs zsz s z sz  szs z szszs zs", f13: "ivWf  ivW fiv Wfi vW fivWfi v Wfiv  WfivW f ivW fi", f14: 40990, f15: 88315, f16: "oHsz    szs zszsz sz szs zszs zszs  zsz szsz sz s ", f17: "ivWfi vW fivW fiv W fiv Wfi vWf ivW fivWf iv Wfiv ", f18: "xa jmH szsz sz sz sz sz szszs zszsz sz  s zszszs z", f19: "iv W   fivW fiv Wfi vWf  iv W fi v  Wfi v WfivWf i", f1a: "fivW   fivW fiv Wfi vWf  iv W fi v  Wfi v WfivWf i", f1b: "bk fiv Wfiv Wf iv Wf iv WfivW fivWf iv  W fivWfi v", f1c: "kfivWf ivWf iv Wf iv Wf ivWfi vWfiv Wf  i vWfivW f", f1d: "bkfi   vWfi vWf ivW fiv  Wf i vW f  ivW f ivWfiv W" } }, multi: false, upsert: false } ] } ntoreturn:1 keyUpdates:0  reslen:95 111891ms



 Comments   
Comment by Michael Cahill (Inactive) [ 06/Jan/15 ]

The issues about scaling with many more threads than cores are captured in SERVER-16662

Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'}

Message: SERVER-16667 Fix WT truncate of all records in a capped collection.
Branch: master
https://github.com/mongodb/mongo/commit/53d7bce974d18fc4a92262de5ac6e604b79d8a04

Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'}

Message: SERVER-16667 Only truncate if we find some documents.
Branch: master
https://github.com/mongodb/mongo/commit/2f6817253ebfef52f06c55f5a58a799997e014ad

Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'}

Message: SERVER-16667 Faster capped collections in WT.

Use a single WiredTiger truncate operation rather than multiple delete
operations to delete from a capped collection.
Branch: master
https://github.com/mongodb/mongo/commit/1b522cf5745c40754f9d5aa4f4e35def4e8fee74

Comment by Githook User [ 29/Dec/14 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'}

Message: SERVER-16667 Have threads block if they insert enough additional records while a capped collection is being truncated.
Branch: capped-truncate
https://github.com/mongodb/mongo/commit/e15538f67138812d647d047efea4c3fb5e5023de

Comment by Githook User [ 29/Dec/14 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'}

Message: SERVER-16667 Use a single truncate operation to manage capped collections rather than multiple deletes.

Bound the work done managing capped collections for any individual update to avoid long latencies if deletes can't keep up with inserts.
Branch: capped-truncate
https://github.com/mongodb/mongo/commit/79a784b2809ad1ed7ce5be361f66f86d8e7049ea

Comment by Bruce Lucas (Inactive) [ 26/Dec/14 ]

Looks like the isolated ops with extreme latencies are stuck inside WT, not the mongod layer.

Here's a long op that ran for 78 seconds on conn76, from about 09:34:43 to 09:36:01:

2014-12-26T09:36:01.448-0500 I QUERY    [conn76] command testsrv.$cmd command: update { update: "data", writeConcern: {}, ordered: true, updates: [ { q: { _id: { o: ObjectId('549d718a1d41c848573ac682'), t: "This is a string, not a small string either, it's in here to make the index larger", seq: -1297036684394758144 } }, u: { $set: { f0: -295740523, f1: -295695630, f2: "dszsz szs zsz szs zs zszszs z szsz  szszs z szs zs", f3: "jmHsz sz  s z szs z szsz szsz szsz szszsz szszs zs", f4: "hgt   kfi vWf ivW fi vWfivW f ivWf  ivWfi v Wfi vW", f5: "uHsz s zszs zs zszsz sz  szsz szsz  szszs zszsz sz", f6: "rqjmH szs zsz szs zs zszszs z szsz  szszs z szs zs", f7: "qj m   Hszs zsz szs zsz  sz s zs z  szs z szszsz s", f8: "kfivW f i v Wf ivWfi vW fi  v Wfiv  WfivW f ivW fi", f9: "wNkfi v W f iv WfivW fi vW  f ivWf  ivWfi v Wfi vW", fa: 38995, fb: 103702, fc: "ds zsz szsz sz sz sz sz szszs zszsz sz  s zszszs z", fd: "uHszs  zszs zsz szs zsz  sz s zs z  szs z szszsz s", fe: "jm Hsz szsz sz sz sz sz szszs zszsz sz  s zszszs z", ff: "kfivWf ivWf iv Wf iv Wf ivWfi vWfiv Wf  i vWfivW f", f10: "gtkfiv Wfiv Wf iv Wf iv WfivW fivWf iv  W fivWfi v", f11: "zsz   szs zsz szs zs zszszs z szsz  szszs z szs zs", f12: "ezszs  zszs zsz szs zsz  sz s zs z  szs z szszsz s", f13: "uHsz s zszs zs zszsz sz  szsz szsz  szszs zszsz sz", f14: 52194, f15: 104076, f16: "fivWf   ivW fivWf iv Wfi vWfi vWfi  vWf ivWf iv W ", f17: "mHsz  szs zsz szs zs zszszs z szsz  szszs z szs zs", f18: "naj   mHs zsz szs zs zszszs z szsz  szszs z szs zs", f19: "gtkfiv Wfiv Wf iv Wf iv WfivW fivWf iv  W fivWfi v", f1a: "mH s   zszs zsz szs zsz  sz s zs z  szs z szszsz s", f1b: "hgtk   fivW fiv Wfi vWf  iv W fi v  Wfi v WfivWf i", f1c: "oHsz  szs zsz szs zs zszszs z szsz  szszs z szs zs", f1d: "najmH szs z s z s zszsz sz sz szszs zszsz szszs zs" } }, multi: false, upsert: false } ] } ntoreturn:1 keyUpdates:0  reslen:95 78144ms

I had enchanced the "end connection" message to include the tid at the end of the message, which is 18578 in this case:

2014-12-26T09:36:01.449-0500 I NETWORK  [conn76] end connection 127.0.0.1:33528 (1 connection now open) 18578

The above run was done with gdb sampling every 5 seconds throughout the run; focusing on the call tree for those stack samples for just tid (LWP) 18578, we see that it's spending its time during that long running op (marked with blue lines below) doing sched_yield in one of three places inside WT, all within __wt_log_write:

Also attaching the raw stack trace samples from gdb as gdbmon.log from which the above call tree was generated, and we can double-check there that LWP 18578 is seen in __wt_log_write calling sched_yield in every sample from 09:34:46.342897 onwards, corresponding to the time that the op was stuck.

Generated at Thu Feb 08 03:41:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.