Details
-
Question
-
Resolution: Works as Designed
-
Minor - P4
-
None
-
3.2.11
-
None
-
Debian
Debian packages
Description
Hello,
I am using the oplog collection to create incremental backups from a MongoDB replica (Secondary).
Sometimes, during busy periods on the replica set with undersized WiredTiger cache on the secondary, a very small percentage of the backups will fail with the following error from the MongoDB server.
29-Mar 06:01 + oplog_backup
|
29-Mar 07:06 since filter: {'ts': {'$gt': Timestamp(1522278803, 94)}}
|
29-Mar 07:06 Traceback (most recent call last):
|
29-Mar 07:06 File "oplog_backup", line 99, in <module>
|
29-Mar 07:06 main()
|
29-Mar 07:06 File "oplog_backup", line 74, in main
|
29-Mar 07:06 for doc in oplog.find(filter).sort("$natural"):
|
29-Mar 07:06 File "/usr/lib/python2.7/dist-packages/pymongo/cursor.py", line 1114, in next
|
29-Mar 07:06 if len(self.__data) or self._refresh():
|
29-Mar 07:06 File "/usr/lib/python2.7/dist-packages/pymongo/cursor.py", line 1036, in _refresh
|
29-Mar 07:06 self.__collation))
|
29-Mar 07:06 File "/usr/lib/python2.7/dist-packages/pymongo/cursor.py", line 928, in __send_message
|
29-Mar 07:06 helpers._check_command_response(doc['data'][0])
|
29-Mar 07:06 File "/usr/lib/python2.7/dist-packages/pymongo/helpers.py", line 210, in _check_command_response
|
29-Mar 07:06 raise OperationFailure(msg % errmsg, code, response)
|
29-Mar 07:06 pymongo.errors.OperationFailure: Executor error during find command: CappedPositionLost: CollectionScan died due to position in capped collection being deleted. Last seen record id: RecordId(<InsertLongNumber>)
|
From pymongo it seems that CappedPositionLost error originates from the MongoDB secondary server
Log from mongod
2018-03-29T07:06:19.873+0300 W QUERY [conn492712] Could not restore RecordCursor for CollectionScan: local.oplog.rs
|
2018-03-29T07:06:19.873+0300 E QUERY [conn492712] Plan executor error during find command: DEAD, stats: { stage: "COLLSCAN", filter: { ts: { $gt: Timestamp 1522278803000|94 } }, nReturned: 0, executionTimeMillisEstimate: 1690, works: 1241191, advanced: 0, needTime: 1241190, needYield: 0, saveState: 9697, restoreState: 9697, isEOF: 0, invalidates: 0, direction: "forward", docsExamined: 1241189 }
|
2018-03-29T07:06:19.874+0300 I COMMAND [conn492712] command local.oplog.rs command: find { find: "oplog.rs", filter: { ts: { $gt: Timestamp 1522278803000|94 } }, sort: { $natural: 1 } } keyUpdates:0 writeConflicts:0 numYields:9697 reslen:256 locks:{ Global: { acquireCount: { r: 19396 }, acquireWaitCount: { r: 2519 }, timeAcquiringMicros: { r: 3889033515 } }, Database: { acquireCount: { r: 9698 } }, oplog: { acquireCount: { r: 9698 } } } protocol:op_query 3890891ms
|
Replication info shows a size that covers a period of ten days and each incremental snapshot is 20 minutes up to a couple of hours so I do not see how a record can be missing from the oplog.
configured oplog size: 9502.277587890625MB
|
log length start to end: 989089secs (274.75hrs)
|
oplog first event time: Sun Mar 18 2018 05:08:54 GMT+0200 (EET)
|
oplog last event time: Thu Mar 29 2018 16:53:43 GMT+0300 (EEST)
|
now: Thu Mar 29 2018 16:54:01 GMT+0300 (EEST)
|
Is this normal behavior of mongod ?