Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 3.4.2
Component/s: None
Labels:
None

Operating System:
ALL
Steps To Reproduce:
Hide

Here's the rough pseudocode to reproduce:

List<DBObject> rows = doBigAggregation(); // this aggregation uses disk, is in cursor mode, and returns roughly 35,000 rows DBCollection temp = db.getCollection("schoolusage_temp"); temp.drop(); // just in case this process failed last night BulkWriteOperation bulkwrite = temp.initializeOrderedBulkOperation(); for (DBObject row:rows) bulkwrite.insert(u); // in reality we insert in batches of 250 bulkwrite.execute(); temp.rename("schoolusage", true);
Show
Here's the rough pseudocode to reproduce: List<DBObject> rows = doBigAggregation(); // this aggregation uses disk, is in cursor mode, and returns roughly 35,000 rows DBCollection temp = db.getCollection("schoolusage_temp"); temp.drop(); // just in case this process failed last night BulkWriteOperation bulkwrite = temp.initializeOrderedBulkOperation(); for (DBObject row:rows) bulkwrite.insert(u); // in reality we insert in batches of 250 bulkwrite.execute(); temp.rename("schoolusage", true);
Confidence Status:
None
Work Order:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Every night, we do an aggregation, followed by a bulk insert, followed by a rename with the "drop target option".

Without fail, roughly one second (the time it takes for the rename to propagate to the secondary) after the rename completes, our secondary goes completely haywire and performance goes out the window. Performance does not recover unless we restart the secondary.

This behavior did not occur until we upgraded from 3.2.10 to 3.4.2

Once the performance drops, every couple of seconds we'll see hundreds of queries piling up for very simple queries. They all block and then complete simultaneously. It looks as though it's some sort of concurrency issue.

+Included are some graphs and logs that illustrate the problem. +

Server restarted at 22:00 and aggregation happened at 00:00:

Same period in MMS:

Here are some very simple queries taking over 6 seconds to complete:

2017-03-02T16:02:32.034+0000 I COMMAND  [conn1328] command thinkmap.gamequestion command: find { find: "gamequestion", filter: { _id: 163258 }, projection: { workflow-writing: 0, votes: 0, ag: 0 }, limit: 1, singleBatch: true } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:1 nreturned:1 reslen:740 locks:{ Global: { acquireCount: { r: 4 }, acquireWaitCount: { r: 2 }, timeAcquiringMicros: { r: 6288267 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_query 6356ms
2017-03-02T16:02:32.034+0000 I COMMAND  [conn1289] command thinkmap.gamequestion command: find { find: "gamequestion", filter: { _id: 300683 }, projection: { workflow-writing: 0, votes: 0, ag: 0 }, limit: 1, singleBatch: true } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:1 nreturned:1 reslen:642 locks:{ Global: { acquireCount: { r: 4 }, acquireWaitCount: { r: 2 }, timeAcquiringMicros: { r: 6378585 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_query 6443ms
2017-03-02T16:02:32.034+0000 I COMMAND  [conn1166] command thinkmap.gamequestion command: find { find: "gamequestion", filter: { _id: 131553 }, projection: { workflow-writing: 0, votes: 0, ag: 0 }, limit: 1, singleBatch: true } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:1 nreturned:1 reslen:526 locks:{ Global: { acquireCount: { r: 4 }, acquireWaitCount: { r: 2 }, timeAcquiringMicros: { r: 6555928 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_query 6623ms
2017-03-02T16:02:32.034+0000 I COMMAND  [conn1369] command thinkmap.gamequestion command: find { find: "gamequestion", filter: { _id: 212580 }, projection: { workflow-writing: 0, votes: 0, ag: 0 }, limit: 1, singleBatch: true } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:1 nreturned:1 reslen:683 locks:{ Global: { acquireCount: { r: 4 }, acquireWaitCount: { r: 2 }, timeAcquiringMicros: { r: 6193103 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_query 6257ms

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

bad-mongod.log.gz
5.64 MB
Mar 03 2017 03:02:40 AM UTC
Screen Shot 2017-03-02 at 11.32.19 AM.png
94 kB
Mar 02 2017 04:32:43 PM UTC
Screen Shot 2017-03-02 at 11.34.55 AM.png
137 kB
Mar 02 2017 04:35:09 PM UTC
Screen Shot 2017-03-02 at 9.34.42 PM.png
505 kB
Mar 03 2017 02:36:31 AM UTC

duplicates

WT-3115 Change the dhandle lock to a read/write lock

Closed

is related to

WT-1228 Improve performance of WT_SESSION::open_cursor

Closed

Assignee:: Mark Agarunov (Inactive)
Reporter:: Marc Tinkler
Participants:: Kelsey Schubert, Marc Tinkler, Mark Agarunov
Votes:: 0 Vote for this issue
Watchers:: 13 Start watching this issue

Created:: Mar 02 2017 04:36:03 PM UTC
Updated:: May 31 2017 09:23:41 PM UTC
Resolved:: May 05 2017 04:50:09 AM UTC

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates