[SERVER-40737] Use Date_t() as default wallTime value in OpTimeAndWallTimes Created: 19/Apr/19  Updated: 29/Oct/23  Resolved: 11/Jun/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.2.0-rc1, 4.3.1

Type: Task Priority: Major - P3
Reporter: Shane Harvey Assignee: Maria van Keulen
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File SERVER-40737.py    
Issue Links:
Backports
Depends
is depended on by SERVER-40683 Make wall clock times in replication ... Closed
Problem/Incident
causes PYTHON-1822 Test failure - Database.current_op fa... Closed
Related
related to SERVER-40565 Omit last applied and last durable wa... Closed
related to SERVER-43465 Remove InvalidBSON try/catch blocks f... Closed
is related to SERVER-40078 Add wall clock time corresponding to ... Closed
is related to SERVER-40080 Add lastApplied and lastDurable wall ... Closed
is related to SERVER-40353 Add wall clock time corresponding to ... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2, v4.0, v3.6
Sprint: Execution Team 2019-06-17
Participants:

 Description   

Similar to SERVER-40565, replSetUpdatePosition can include Dates that pymongo cannot parse. These can show up in currentOp output and lead to decoding failures as in PYTHON-1822.

See "durableOpTime: { ts: Timestamp(0, 0), t: -1 }, durableWallTime: new Date(-9223372036854775808)"

2019-04-18T22:25:48.747+0000 I  NETWORK  [conn45] received client metadata from 127.0.0.1:33807 conn45: { driver: { name: "NetworkInterfaceTL", version: "4.1.10-147-g2c8a3592b0" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.2 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-220.el6.x86_64" } }
2019-04-18T22:25:48.747+0000 I  COMMAND  [conn45] command admin.$cmd command: isMaster { isMaster: 1, client: { driver: { name: "NetworkInterfaceTL", version: "4.1.10-147-g2c8a3592b0" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.2 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-220.el6.x86_64" } }, hostInfo: "ip-10-122-17-221:27018", compression: [ "zstd", "zlib", "snappy", "noop" ], internalClient: { minWireVersion: 8, maxWireVersion: 8 }, hangUpOnStepDown: false, saslSupportedMechs: "local.__system", $db: "admin" } numYields:0 reslen:880 locks:{} protocol:op_query 0ms
...
 
2019-04-18T22:28:18.918+0000 I  COMMAND  [conn45] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { durableOpTime: { ts: Timestamp(1555626497, 3017), t: 1 }, durableWallTime: new Date(1555626497804), appliedOpTime: { ts: Timestamp(1555626497, 3018), t: 1 }, appliedWallTime: new Date(1555626497881), memberId: 0, cfgver: 1 }, { durableOpTime: { ts: Timestamp(1555626498, 5072), t: 1 }, durableWallTime: new Date(1555626498915), appliedOpTime: { ts: Timestamp(1555626498, 5072), t: 1 }, appliedWallTime: new Date(1555626498915), memberId: 1, cfgver: 1 }, { durableOpTime: { ts: Timestamp(0, 0), t: -1 }, durableWallTime: new Date(-9223372036854775808), appliedOpTime: { ts: Timestamp(1555626497, 6), t: 1 }, appliedWallTime: new Date(1555626497075), memberId: 2, cfgver: 1 } ], $replData: { term: 1, lastOpCommitted: { ts: Timestamp(1555626498, 5071), t: 1 }, lastCommittedWall: new Date(1555626498899), lastOpVisible: { ts: Timestamp(1555626498, 5071), t: 1 }, configVersion: 1, replicaSetId: ObjectId('5cb8f956c92f6916e5db17a3'), primaryIndex: 0, syncSourceIndex: 0 }, $clusterTime: { clusterTime: Timestamp(1555626498, 5072), signature: { hash: BinData(0, E508C0B96DCD2E0F8319ECF3BF161FB1E1B1FB2C), keyId: 6681364246506242049 } }, $db: "admin" } numYields:0 reslen:396 locks:{} protocol:op_msg 0ms

And "lastOpCommitted: { ts: Timestamp(0, 0), t: -1 }, lastCommittedWall: new Date(-9223372036854775808)"

2019-04-18T22:25:39.637+0000 I  COMMAND  [conn17] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { durableOpTime: { ts: Timestamp(1555626326, 1), t: -1 }, durableWallTime: new Date(1555626326893), appliedOpTime: { ts: Timestamp(1555626338, 1), t: 1 }, appliedWallTime: new Date(1555626338907), memberId: 0, cfgver: 1 }, { durableOpTime: { ts: Timestamp(1555626338, 1), t: 1 }, durableWallTime: new Date(1555626338907), appliedOpTime: { ts: Timestamp(1555626338, 1), t: 1 }, appliedWallTime: new Date(1555626338907), memberId: 1, cfgver: 1 } ], $replData: { term: 1, lastOpCommitted: { ts: Timestamp(0, 0), t: -1 }, lastCommittedWall: new Date(-9223372036854775808), lastOpVisible: { ts: Timestamp(0, 0), t: -1 }, configVersion: 1, replicaSetId: ObjectId('5cb8f956c92f6916e5db17a3'), primaryIndex: 0, syncSourceIndex: 0 }, $clusterTime: { clusterTime: Timestamp(1555626339, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $db: "admin" } numYields:0 reslen:396 locks:{} protocol:op_msg 0ms

I imagine this could also happen in appliedWallTime too.



 Comments   
Comment by Githook User [ 11/Jun/19 ]

Author:

{'name': 'Maria van Keulen', 'email': 'maria@mongodb.com', 'username': 'mvankeulen94'}

Message: SERVER-40737 Use epoch as default OpTimeAndWallTime wallTime value

(cherry picked from commit cc24f2022c761b32eecb18f6f5c841c3a43fe7c1)
Branch: v4.2
https://github.com/mongodb/mongo/commit/fdb56a92bfea1af0344044856df04af4d464a3b4

Comment by Githook User [ 11/Jun/19 ]

Author:

{'name': 'Maria van Keulen', 'email': 'maria@mongodb.com', 'username': 'mvankeulen94'}

Message: SERVER-40737 Use epoch as default OpTimeAndWallTime wallTime value
Branch: master
https://github.com/mongodb/mongo/commit/cc24f2022c761b32eecb18f6f5c841c3a43fe7c1

Comment by Shane Harvey [ 06/Jun/19 ]

I've attached a python script (SERVER-40737.py) that will log the error from currentOp along with the problematic bson document. The output looks like this:

$ python3.7 SERVER-40737.py
Server version: 4.1.11
Error decoding currentOp response doc: year 292278994 is out of range
Raw bson command response: b"\xaf...."
 
Traceback (most recent call last):
  File "SERVER-40737.py", line 14, in <module>
    BSON(raw_doc.raw).decode()
  File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 1164, in decode
    return _bson_to_dict(self, codec_options)
bson.errors.InvalidBSON: year 292278994 is out of range

Comment by Maria van Keulen [ 26/Apr/19 ]

Initializing wall clock times to Date_t() everywhere where they are currently initialized to Date_t::min() for OpTimeAndWallTimes (e.g., here) should address the issues posed in this ticket. Any usages of isFormattable() or equality comparisons against Date_t::min() to check that the wall clock time is not equal to the default should be replaced with equality checks against Date_t().

Generated at Thu Feb 08 04:55:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.