[SERVER-20738] Oplog stones does not enforce ascending order of RecordIds Created: 02/Oct/15  Updated: 15/Oct/15  Resolved: 13/Oct/15

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 3.1.8
Fix Version/s: 3.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-19551 Keep "milestones" against WT oplog to... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: QuInt A (10/12/15), QuInt B (11/02/15)
Participants:

 Description   

Had a failure during a patch build where WT_SESSION::truncate() returned an error that the starting position of the truncate is after the stop position.

[js_test:fsm_all_replication] 2015-10-02T06:36:50.117+0000 d20010| 2015-10-02T06:36:50.100+0000 E STORAGE  [WT RecordStoreThread: local.oplog.rs] WiredTiger (22) [1443767810:97093][18185:0x2b3eed902940], WT_SESSION.truncate: the start cursor position is after the stop cursor position: Invalid argument
[js_test:fsm_all_replication] 2015-10-02T06:36:50.118+0000 d20010| 2015-10-02T06:36:50.100+0000 I -        [WT RecordStoreThread: local.oplog.rs] Invariant failure: session->truncate(session, nullptr, start, end, nullptr) resulted in status BadValue 22: Invalid argument at src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp 1131
[js_test:fsm_all_replication] 2015-10-02T06:36:50.118+0000 d20010| 2015-10-02T06:36:50.110+0000 I CONTROL  [WT RecordStoreThread: local.oplog.rs]
[js_test:fsm_all_replication] 2015-10-02T06:36:50.119+0000 d20010|  0x1262992 0x1210914 0x11fcb6d 0xff47ce 0xffe03b 0xffe4d8 0x1200827 0x19a5d60 0x2b3ee024783d 0x2b3ee0532fdd
[js_test:fsm_all_replication] 2015-10-02T06:36:50.119+0000 d20010| ----- BEGIN BACKTRACE -----
[js_test:fsm_all_replication] 2015-10-02T06:36:50.121+0000 d20010| {"backtrace":[{"b":"400000","o":"E62992"},{"b":"400000","o":"E10914"},{"b":"400000","o":"DFCB6D"},{"b":"400000","o":"BF47CE"},{"b":"400000","o":"BFE03B"},{"b":"400000","o":"BFE4D8"},{"b":"400000","o":"E00827"},{"b":"400000","o":"15A5D60"},{"b":"2B3EE0241000","o":"683D"},{"b":"2B3EE045E000","o":"D4FDD"}],"processInfo":{ "mongodbVersion" : "3.1.9-pre-", "gitVersion" : "3223f84a8eeaf89a30d6789038e5d68c7b019108", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "2.6.18-194.el5xen", "version" : "#1 SMP Tue Mar 16 22:01:26 EDT 2010", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "2B3EDF6A2000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "2B3EDF8AB000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "2B3EDFAAF000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "2B3EDFDB0000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "2B3EE0033000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "2B3EE0241000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "2B3EE045E000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "2B3EDF484000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
[js_test:fsm_all_replication] 2015-10-02T06:36:50.122+0000 d20010|  mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1262992]
[js_test:fsm_all_replication] 2015-10-02T06:36:50.122+0000 d20010|  mongod(_ZN5mongo10logContextEPKc+0x134) [0x1210914]
[js_test:fsm_all_replication] 2015-10-02T06:36:50.122+0000 d20010|  mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xAD) [0x11fcb6d]
[js_test:fsm_all_replication] 2015-10-02T06:36:50.122+0000 d20010|  mongod(_ZN5mongo21WiredTigerRecordStore12reclaimOplogEPNS_16OperationContextE+0x76E) [0xff47ce]
[js_test:fsm_all_replication] 2015-10-02T06:36:50.123+0000 d20010|  mongod(+0xBFE03B) [0xffe03b]
[js_test:fsm_all_replication] 2015-10-02T06:36:50.123+0000 d20010|  mongod(+0xBFE4D8) [0xffe4d8]
[js_test:fsm_all_replication] 2015-10-02T06:36:50.123+0000 d20010|  mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x157) [0x1200827]
[js_test:fsm_all_replication] 2015-10-02T06:36:50.123+0000 d20010|  mongod(+0x15A5D60) [0x19a5d60]
[js_test:fsm_all_replication] 2015-10-02T06:36:50.124+0000 d20010|  libpthread.so.0(+0x683D) [0x2b3ee024783d]
[js_test:fsm_all_replication] 2015-10-02T06:36:50.124+0000 d20010|  libc.so.6(clone+0x6D) [0x2b3ee0532fdd]
[js_test:fsm_all_replication] 2015-10-02T06:36:50.124+0000 d20010| -----  END BACKTRACE  -----
[js_test:fsm_all_replication] 2015-10-02T06:36:50.124+0000 d20010| 2015-10-02T06:36:50.110+0000 I -        [WT RecordStoreThread: local.oplog.rs]
[js_test:fsm_all_replication] 2015-10-02T06:36:50.125+0000 d20010|
[js_test:fsm_all_replication] 2015-10-02T06:36:50.125+0000 d20010| ***aborting after invariant() failure

Proposal is to skip creating a stone in WiredTigerRecordStore::OplogStones::createNewStoneIfNeeded() if the specified RecordId comes before the most recently created stone's last record.



 Comments   
Comment by Githook User [ 13/Oct/15 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-20738 Enforce ascending order of 'lastRecord' in oplog stones.

Skip creating a new stone when the record's position comes before the
most recently created stone.
Branch: master
https://github.com/mongodb/mongo/commit/16d50744c22953720be2cd96fd536c6416d96295

Comment by Jonathan Abrahams [ 09/Oct/15 ]

I saw the same thing as well in 3.2.0-rc0-pre patch build :

[js_test:fsm_all_replication] 2015-10-09T15:43:23.993+0000 d20010| 2015-10-09T15:43:23.896+0000 I COMMAND  [conn795] command local.oplog.rs command: getMore { getMore: 23628610977, collection: "oplog.rs", maxTimeMS: 5000, term: 1, lastKnownCommittedOpTime: { ts: Timestamp 1444405403000|1569, t: 1 } } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:3 reslen:1848575 locks:{ Global: { acquireCount: { r: 8 } }, Database: { acquireCount: { r: 4 } }, oplog: { acquireCount: { r: 4 } } } protocol:op_command 109ms
[js_test:fsm_all_replication] 2015-10-09T15:43:24.019+0000 d20010| 2015-10-09T15:43:23.901+0000 I COMMAND  [conn794] command local.oplog.rs command: getMore { getMore: 23785421040, collection: "oplog.rs", maxTimeMS: 5000, term: 1, lastKnownCommittedOpTime: { ts: Timestamp 1444405403000|1438, t: 1 } } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:4 reslen:2415710 locks:{ Global: { acquireCount: { r: 10 } }, Database: { acquireCount: { r: 5 } }, oplog: { acquireCount: { r: 5 } } } protocol:op_command 127ms
[js_test:fsm_all_replication] 2015-10-09T15:43:24.468+0000 d20010| 2015-10-09T15:43:24.467+0000 I COMMAND  [conn795] command local.oplog.rs command: getMore { getMore: 23628610977, collection: "oplog.rs", maxTimeMS: 5000, term: 1, lastKnownCommittedOpTime: { ts: Timestamp 1444405404000|697, t: 1 } } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:0 reslen:67435 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } protocol:op_command 102ms
[js_test:fsm_all_replication] 2015-10-09T15:43:24.471+0000 d20010| 2015-10-09T15:43:24.467+0000 I COMMAND  [conn794] command local.oplog.rs command: getMore { getMore: 23785421040, collection: "oplog.rs", maxTimeMS: 5000, term: 1, lastKnownCommittedOpTime: { ts: Timestamp 1444405404000|697, t: 1 } } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:0 reslen:67435 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } protocol:op_command 102ms
[js_test:fsm_all_replication] 2015-10-09T15:43:26.683+0000 d20010| 2015-10-09T15:43:26.677+0000 I COMMAND  [conn794] command local.oplog.rs command: getMore { getMore: 23785421040, collection: "oplog.rs", maxTimeMS: 5000, term: 1, lastKnownCommittedOpTime: { ts: Timestamp 1444405406000|959, t: 1 } } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:3 reslen:1688937 locks:{ Global: { acquireCount: { r: 8 } }, Database: { acquireCount: { r: 4 } }, oplog: { acquireCount: { r: 4 } } } protocol:op_command 133ms
[js_test:fsm_all_replication] 2015-10-09T15:43:26.685+0000 d20010| 2015-10-09T15:43:26.678+0000 I COMMAND  [conn795] command local.oplog.rs command: getMore { getMore: 23628610977, collection: "oplog.rs", maxTimeMS: 5000, term: 1, lastKnownCommittedOpTime: { ts: Timestamp 1444405406000|1087, t: 1 } } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:3 reslen:1667932 locks:{ Global: { acquireCount: { r: 8 } }, Database: { acquireCount: { r: 4 } }, oplog: { acquireCount: { r: 4 } } } protocol:op_command 124ms
[js_test:fsm_all_replication] 2015-10-09T15:43:26.882+0000 d20010| 2015-10-09T15:43:26.873+0000 E STORAGE  [WT RecordStoreThread: local.oplog.rs] WiredTiger (22) [1444405406:873657][17228:0x2b687e155940], WT_SESSION.truncate: the start cursor position is after the stop cursor position: Invalid argument
[js_test:fsm_all_replication] 2015-10-09T15:43:26.883+0000 d20010| 2015-10-09T15:43:26.873+0000 I -        [WT RecordStoreThread: local.oplog.rs] Invariant failure: session->truncate(session, nullptr, start, end, nullptr) resulted in status BadValue 22: Invalid argument at src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp 1131
[js_test:fsm_all_replication] 2015-10-09T15:43:26.883+0000 d20010| 2015-10-09T15:43:26.878+0000 I CONTROL  [WT RecordStoreThread: local.oplog.rs]
[js_test:fsm_all_replication] 2015-10-09T15:43:26.883+0000 d20010|  0x1269bb2 0x1217b34 0x1203d8d 0xff924e 0x1002abb 0x1002f58 0x1207a47 0x19ad1e0 0x2b6870a9a83d 0x2b6870d85fdd
[js_test:fsm_all_replication] 2015-10-09T15:43:26.883+0000 d20010| ----- BEGIN BACKTRACE -----
[js_test:fsm_all_replication] 2015-10-09T15:43:26.885+0000 d20010| {"backtrace":[{"b":"400000","o":"E69BB2"},{"b":"400000","o":"E17B34"},{"b":"400000","o":"E03D8D"},{"b":"400000","o":"BF924E"},{"b":"400000","o":"C02ABB"},{"b":"400000","o":"C02F58"},{"b":"400000","o":"E07A47"},{"b":"400000","o":"15AD1E0"},{"b":"2B6870A94000","o":"683D"},{"b":"2B6870CB1000","o":"D4FDD"}],"processInfo":{ "mongodbVersion" : "3.1.10-pre-", "gitVersion" : "1dde60ed210cd82ba8d0cbd8cc6a5089cb4ca5a8", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "2.6.18-194.el5xen", "version" : "#1 SMP Tue Mar 16 22:01:26 EDT 2010", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "2B686FEF5000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "2B68700FE000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "2B6870302000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "2B6870603000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "2B6870886000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "2B6870A94000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "2B6870CB1000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "2B686FCD7000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
[js_test:fsm_all_replication] 2015-10-09T15:43:26.885+0000 d20010|  mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1269bb2]
[js_test:fsm_all_replication] 2015-10-09T15:43:26.886+0000 d20010|  mongod(_ZN5mongo10logContextEPKc+0x134) [0x1217b34]
[js_test:fsm_all_replication] 2015-10-09T15:43:26.886+0000 d20010|  mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xAD) [0x1203d8d]
[js_test:fsm_all_replication] 2015-10-09T15:43:26.886+0000 d20010|  mongod(_ZN5mongo21WiredTigerRecordStore12reclaimOplogEPNS_16OperationContextE+0x76E) [0xff924e]
[js_test:fsm_all_replication] 2015-10-09T15:43:26.886+0000 d20010|  mongod(+0xC02ABB) [0x1002abb]
[js_test:fsm_all_replication] 2015-10-09T15:43:26.886+0000 d20010|  mongod(+0xC02F58) [0x1002f58]
[js_test:fsm_all_replication] 2015-10-09T15:43:26.886+0000 d20010|  mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x157) [0x1207a47]
[js_test:fsm_all_replication] 2015-10-09T15:43:26.887+0000 d20010|  mongod(+0x15AD1E0) [0x19ad1e0]
[js_test:fsm_all_replication] 2015-10-09T15:43:26.887+0000 d20010|  libpthread.so.0(+0x683D) [0x2b6870a9a83d]
[js_test:fsm_all_replication] 2015-10-09T15:43:26.887+0000 d20010|  libc.so.6(clone+0x6D) [0x2b6870d85fdd]
[js_test:fsm_all_replication] 2015-10-09T15:43:26.887+0000 d20010| -----  END BACKTRACE  -----
[js_test:fsm_all_replication] 2015-10-09T15:43:26.887+0000 d20010| 2015-10-09T15:43:26.879+0000 I -        [WT RecordStoreThread: local.oplog.rs]
[js_test:fsm_all_replication] 2015-10-09T15:43:26.887+0000 d20010|
[js_test:fsm_all_replication] 2015-10-09T15:43:26.887+0000 d20010| ***aborting after invariant() failure
[js_test:fsm_all_replication] 2015-10-09T15:43:26.888+0000 d20010|
[js_test:fsm_all_replication] 2015-10-09T15:43:26.888+0000 d20010|

Generated at Thu Feb 08 03:55:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.