Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Incomplete
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.0.4
Component/s: MapReduce, Sharding, WiredTiger
Labels:
None

Operating System:
ALL
Steps To Reproduce:

Hide

1. Install a sharded 3.0.4 mongodb setup w/
2. Run a MR every 5 min
3. After few days the MR (and the daemon) will crush w/ the following error
4. The same does being recreated in MMAPv1

Show
1. Install a sharded 3.0.4 mongodb setup w/ 2. Run a MR every 5 min 3. After few days the MR (and the daemon) will crush w/ the following error 4. The same does being recreated in MMAPv1
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We got the a similar error to ~~SERVER-16429~~ at version 3.0.4.
It happens once a week in a sharded environment with WT engine in the mongod instances.
The evidences is a crashed shard, while in the other shard there a remaining of a tmp table that was not deleted.
Attached is the log error from the failed shard:

2015-12-12T00:35:40.814+0000 I COMMAND  [conn92487] mr failed, removing collection :: caused by :: WriteConflict
2015-12-12T00:35:40.818+0000 I COMMAND  [conn92487] CMD: drop XXXX.tmp.mr.account_231015
2015-12-12T00:35:40.822+0000 I NETWORK  [initandlisten] connection accepted from XXX.XXX.XXX.XXX:XXXXX #110728 (47 connections now open)
2015-12-12T00:35:40.920+0000 I COMMAND  [conn92487] command XXXX.$cmd command: drop { drop: "tmp.mr.account_231015" } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:122 locks:{ Global: { acquireCount: { r: 8, w: 4 } }, Database: { acquireCount: { r: 1, w: 1, R: 1, W: 4 }, acquireWaitCount: { W: 4 }, timeAcquiringMicros: { W: 7627049380 } }, Collection: { acquireCount: { r: 1, w: 1, W: 1 } } } 102ms
2015-12-12T00:35:40.920+0000 I QUERY    [conn110694] query XXXX.endpoints query: { $query: { gw: { $gt: 0 }, $or: [ { status: "unmanaged" }, { status: "managed" } ] }, $readPreference: { mode: "secondaryPreferred" } } planSummary: IXSCAN { gw: -1.0, status: -1.0 } ntoreturn:0 ntoskip:0 nscanned:0 nscannedObjects:0 keyUpdates:0 writeConflicts:0 numYields:1 nreturned:0 reslen:20 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 }, acquireWaitCount: { r: 2 }, timeAcquiringMicros: { r: 3690878996 } }, Collection: { acquireCount: { r: 2 } } } 106ms
2015-12-12T00:35:40.920+0000 I QUERY    [conn110692] query XXXX.endpoints query: { $query: { gw: { $gt: 0 }, $or: [ { status: "unmanaged" }, { status: "managed" } ] }, $readPreference: { mode: "secondaryPreferred" } } planSummary: IXSCAN { gw: -1.0, status: -1.0 } ntoreturn:0 ntoskip:0 nscanned:0 nscannedObjects:0 keyUpdates:0 writeConflicts:0 numYields:3 nreturned:0 reslen:20 locks:{ Global: { acquireCount: { r: 8 } }, Database: { acquireCount: { r: 4 }, acquireWaitCount: { r: 4 }, timeAcquiringMicros: { r: 3690908386 } }, Collection: { acquireCount: { r: 4 } } } 182338ms
2015-12-12T00:35:40.927+0000 I COMMAND  [conn92491] command admin.$cmd command: listDatabases { listDatabases: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:290 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 14825915727 } } } 179101ms
2015-12-12T00:35:40.978+0000 E NETWORK  [conn92487] Uncaught std::exception: std::exception, terminating
2015-12-12T00:35:40.978+0000 I CONTROL  [conn92487] dbexit:  rc: 100

Assignee:: Kelsey Schubert
Reporter:: Moshe Kaplan [X]
Participants:: Kelsey Schubert, Moshe Kaplan [X], Ramon Fernandez Marina
Votes:: 0 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: Dec 18 2015 01:45:31 PM UTC
Updated:: Mar 05 2016 03:02:43 PM UTC
Resolved:: Mar 05 2016 03:02:43 PM UTC

Details

Description

Attachments

Activity

People

Dates