-
Type:
Question
-
Resolution: Works as Designed
-
Priority:
Minor - P4
-
None
-
Affects Version/s: 3.2.11
-
Component/s: Querying, WiredTiger
-
None
-
Environment:Debian
Debian packages
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
Hello,
I am using the oplog collection to create incremental backups from a MongoDB replica (Secondary).
Sometimes, during busy periods on the replica set with undersized WiredTiger cache on the secondary, a very small percentage of the backups will fail with the following error from the MongoDB server.
29-Mar 06:01 + oplog_backup 29-Mar 07:06 since filter: {'ts': {'$gt': Timestamp(1522278803, 94)}} 29-Mar 07:06 Traceback (most recent call last): 29-Mar 07:06 File "oplog_backup", line 99, in <module> 29-Mar 07:06 main() 29-Mar 07:06 File "oplog_backup", line 74, in main 29-Mar 07:06 for doc in oplog.find(filter).sort("$natural"): 29-Mar 07:06 File "/usr/lib/python2.7/dist-packages/pymongo/cursor.py", line 1114, in next 29-Mar 07:06 if len(self.__data) or self._refresh(): 29-Mar 07:06 File "/usr/lib/python2.7/dist-packages/pymongo/cursor.py", line 1036, in _refresh 29-Mar 07:06 self.__collation)) 29-Mar 07:06 File "/usr/lib/python2.7/dist-packages/pymongo/cursor.py", line 928, in __send_message 29-Mar 07:06 helpers._check_command_response(doc['data'][0]) 29-Mar 07:06 File "/usr/lib/python2.7/dist-packages/pymongo/helpers.py", line 210, in _check_command_response 29-Mar 07:06 raise OperationFailure(msg % errmsg, code, response) 29-Mar 07:06 pymongo.errors.OperationFailure: Executor error during find command: CappedPositionLost: CollectionScan died due to position in capped collection being deleted. Last seen record id: RecordId(<InsertLongNumber>)
From pymongo it seems that CappedPositionLost error originates from the MongoDB secondary server
Log from mongod
2018-03-29T07:06:19.873+0300 W QUERY [conn492712] Could not restore RecordCursor for CollectionScan: local.oplog.rs 2018-03-29T07:06:19.873+0300 E QUERY [conn492712] Plan executor error during find command: DEAD, stats: { stage: "COLLSCAN", filter: { ts: { $gt: Timestamp 1522278803000|94 } }, nReturned: 0, executionTimeMillisEstimate: 1690, works: 1241191, advanced: 0, needTime: 1241190, needYield: 0, saveState: 9697, restoreState: 9697, isEOF: 0, invalidates: 0, direction: "forward", docsExamined: 1241189 } 2018-03-29T07:06:19.874+0300 I COMMAND [conn492712] command local.oplog.rs command: find { find: "oplog.rs", filter: { ts: { $gt: Timestamp 1522278803000|94 } }, sort: { $natural: 1 } } keyUpdates:0 writeConflicts:0 numYields:9697 reslen:256 locks:{ Global: { acquireCount: { r: 19396 }, acquireWaitCount: { r: 2519 }, timeAcquiringMicros: { r: 3889033515 } }, Database: { acquireCount: { r: 9698 } }, oplog: { acquireCount: { r: 9698 } } } protocol:op_query 3890891ms
Replication info shows a size that covers a period of ten days and each incremental snapshot is 20 minutes up to a couple of hours so I do not see how a record can be missing from the oplog.
configured oplog size: 9502.277587890625MB log length start to end: 989089secs (274.75hrs) oplog first event time: Sun Mar 18 2018 05:08:54 GMT+0200 (EET) oplog last event time: Thu Mar 29 2018 16:53:43 GMT+0300 (EEST) now: Thu Mar 29 2018 16:54:01 GMT+0300 (EEST)
Is this normal behavior of mongod ?