Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Works as Designed
Priority: Minor - P4
Fix Version/s: None
Affects Version/s: 3.2.11
Component/s: Querying, WiredTiger
Labels:
None
Environment:
Debian
Debian packages

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hello,

I am using the oplog collection to create incremental backups from a MongoDB replica (Secondary).

Sometimes, during busy periods on the replica set with undersized WiredTiger cache on the secondary, a very small percentage of the backups will fail with the following error from the MongoDB server.

29-Mar 06:01 + oplog_backup
29-Mar 07:06 since filter: {'ts': {'$gt': Timestamp(1522278803, 94)}}
29-Mar 07:06 Traceback (most recent call last):
29-Mar 07:06   File "oplog_backup", line 99, in <module>
29-Mar 07:06     main()
29-Mar 07:06   File "oplog_backup", line 74, in main
29-Mar 07:06     for doc in oplog.find(filter).sort("$natural"):
29-Mar 07:06   File "/usr/lib/python2.7/dist-packages/pymongo/cursor.py", line 1114, in next
29-Mar 07:06     if len(self.__data) or self._refresh():
29-Mar 07:06   File "/usr/lib/python2.7/dist-packages/pymongo/cursor.py", line 1036, in _refresh
29-Mar 07:06     self.__collation))
29-Mar 07:06   File "/usr/lib/python2.7/dist-packages/pymongo/cursor.py", line 928, in __send_message
29-Mar 07:06     helpers._check_command_response(doc['data'][0])
29-Mar 07:06   File "/usr/lib/python2.7/dist-packages/pymongo/helpers.py", line 210, in _check_command_response
29-Mar 07:06     raise OperationFailure(msg % errmsg, code, response)
29-Mar 07:06 pymongo.errors.OperationFailure: Executor error during find command: CappedPositionLost: CollectionScan died due to position in capped collection being deleted. Last seen record id: RecordId(<InsertLongNumber>)

From pymongo it seems that CappedPositionLost error originates from the MongoDB secondary server

Log from mongod

2018-03-29T07:06:19.873+0300 W QUERY    [conn492712] Could not restore RecordCursor for CollectionScan: local.oplog.rs
2018-03-29T07:06:19.873+0300 E QUERY    [conn492712] Plan executor error during find command: DEAD, stats: { stage: "COLLSCAN", filter: { ts: { $gt: Timestamp 1522278803000|94 } }, nReturned: 0, executionTimeMillisEstimate: 1690, works: 1241191, advanced: 0, needTime: 1241190, needYield: 0, saveState: 9697, restoreState: 9697, isEOF: 0, invalidates: 0, direction: "forward", docsExamined: 1241189 }
2018-03-29T07:06:19.874+0300 I COMMAND  [conn492712] command local.oplog.rs command: find { find: "oplog.rs", filter: { ts: { $gt: Timestamp 1522278803000|94 } }, sort: { $natural: 1 } } keyUpdates:0 writeConflicts:0 numYields:9697 reslen:256 locks:{ Global: { acquireCount: { r: 19396 }, acquireWaitCount: { r: 2519 }, timeAcquiringMicros: { r: 3889033515 } }, Database: { acquireCount: { r: 9698 } }, oplog: { acquireCount: { r: 9698 } } } protocol:op_query 3890891ms

Replication info shows a size that covers a period of ten days and each incremental snapshot is 20 minutes up to a couple of hours so I do not see how a record can be missing from the oplog.

configured oplog size:   9502.277587890625MB
log length start to end: 989089secs (274.75hrs)
oplog first event time:  Sun Mar 18 2018 05:08:54 GMT+0200 (EET)
oplog last event time:   Thu Mar 29 2018 16:53:43 GMT+0300 (EEST)
now:                     Thu Mar 29 2018 16:54:01 GMT+0300 (EEST)

Is this normal behavior of mongod ?

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

wt_cache.png
175 kB
Apr 17 2018 01:29:52 PM UTC

Assignee:: Dmitry Agranat
Reporter:: Stefanos Boglou
Participants:: Dmitry Agranat, Kelsey Schubert, Stefanos Boglou
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Mar 29 2018 01:59:17 PM UTC
Updated:: Oct 27 2023 01:53:54 PM UTC
Resolved:: Apr 22 2018 07:12:45 AM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates