[SERVER-44180] replicationBatchIsComplete() should prompt the WiredTigerJournalFlusher thread to run instead of the _oplogJournalThreadLoop Created: 23/Oct/19  Updated: 29/Oct/23  Resolved: 07/Nov/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Task Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-41389 Pause the oplog visibility thread in ... Closed
is depended on by SERVER-45665 Make JournalFlusher flush on command ... Closed
Duplicate
is duplicated by SERVER-44181 replicationBatchIsComplete() should m... Closed
Backwards Compatibility: Fully Compatible
Sprint: Execution Team 2019-11-04, Execution Team 2019-11-18
Participants:

 Description   

This project makes the WiredTigerOplogManager::_oplogJournalThreadLoop cease calling waitUntilDurable(), which is what replication wants to trigger after each round of batch application – so that readers with 'j' read concern can see the data as quickly as possible. Therefore, replicationBatchIsComplete should instead trigger the WiredTigerJournalFlusher thread, whose sole responsibility is to call waitUntilDurable(), to run immediately.



 Comments   
Comment by Githook User [ 05/Nov/19 ]

Author:

{'name': 'Dianna Hohensee', 'username': 'DiannaHohensee', 'email': 'dianna.hohensee@mongodb.com'}

Message: SERVER-44180 repl batch applier thread should prompt WiredTigerJournalFlusher thread to flush the journal rather than the oplog visibility thread
Branch: master
https://github.com/mongodb/mongo/commit/b96794b6a5fb8805ebc226d2e6f26c78107fdee2

Comment by Dianna Hohensee (Inactive) [ 01/Nov/19 ]

[ShardedClusterFixture:job0:shard0:secondary0] 2019-11-01T15:43:57.958-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_applyOplogBatch, just flushed journal, last op timestamp: Timestamp(1572637437, 16), durable timestamp: Timestamp(1572637437, 28), oplog read timestamp: Timestamp(1572637437, 15), op itself: { op: "c", ns: "config.$cmd", ui: UUID("2806c1ab-f858-40a1-8fbc-59e5b3691928"), o: { commitIndexBuild: "cache.chunks.config.system.sessions", indexBuildUUID: UUID("7f46e669-27ea-4265-b607-e13bd4d4c7ed"), indexes: [ { name: "lastmod_1", key: { lastmod: 1 }, v: 2 } ] }, ts: Timestamp(1572637437, 16), t: 1, wall: new Date(1572637437628), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] 2019-11-01T15:43:57.958-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_applyOplogBatch, IS BAD, last op ts is less than allDurableTs
[ShardedClusterFixture:job0:shard0:secondary0] 2019-11-01T15:43:57.958-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_applyOplogBatch, ops:
[ShardedClusterFixture:job0:shard0:secondary0] { op: "c", ns: "config.$cmd", ui: UUID("2806c1ab-f858-40a1-8fbc-59e5b3691928"), o: { commitIndexBuild: "cache.chunks.config.system.sessions", indexBuildUUID: UUID("7f46e669-27ea-4265-b607-e13bd4d4c7ed"), indexes: [ { name: "lastmod_1", key: { lastmod: 1 }, v: 2 } ] }, ts: Timestamp(1572637437, 16), t: 1, wall: new Date(1572637437628), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] 2019-11-01T15:43:57.958-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_run, applied batch in _applyOplogBatch, lastOpTimeAppliedInBatch timestamp: Timestamp(1572637437, 16), oplog read timestamp: Timestamp(1572637437, 15), all_durable: Timestamp(1572637437, 28), lastOpInBatch: { op: "c", ns: "config.$cmd", ui: UUID("2806c1ab-f858-40a1-8fbc-59e5b3691928"), o: { commitIndexBuild: "cache.chunks.config.system.sessions", indexBuildUUID: UUID("7f46e669-27ea-4265-b607-e13bd4d4c7ed"), indexes: [ { name: "lastmod_1", key: { lastmod: 1 }, v: 2 } ] }, ts: Timestamp(1572637437, 16), t: 1, wall: new Date(1572637437628), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] 2019-11-01T15:43:57.960-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_run, after setOplogReadTimestamp, lastOpTimeAppliedInBatch: Timestamp(1572637437, 16); all_durable, is: Timestamp(1572637437, 28) oplog read timestamp: Timestamp(1572637437, 16); done calling setOplogReadTimestamp.
[ShardedClusterFixture:job0:shard0:secondary0] 2019-11-01T15:43:57.960-0400 I  REPL     [rsSync-0] ~~~GOT A PROBLEM! oplog read ts isn't equal to all_durable after setOplogReadTimestamp! readTimestamp: Timestamp(1572637437, 16)tsDurable: Timestamp(1572637437, 28)

MongoDB Enterprise shard-rs0:SECONDARY> db.oplog.rs.find({ts: Timestamp(1572637437, 16)}).pretty()
{
	"op" : "c",
	"ns" : "config.$cmd",
	"ui" : UUID("2806c1ab-f858-40a1-8fbc-59e5b3691928"),
	"o" : {
		"commitIndexBuild" : "cache.chunks.config.system.sessions",
		"indexBuildUUID" : UUID("7f46e669-27ea-4265-b607-e13bd4d4c7ed"),
		"indexes" : [
			{
				"name" : "lastmod_1",
				"key" : {
					"lastmod" : 1
				},
				"v" : 2
			}
		]
	},
	"ts" : Timestamp(1572637437, 16),
	"t" : NumberLong(1),
	"wall" : ISODate("2019-11-01T19:43:57.628Z"),
	"v" : NumberLong(2)
}

MongoDB Enterprise shard-rs0:SECONDARY> db.oplog.rs.find({ts: Timestamp(1572637437, 28)}).pretty()
{
	"op" : "u",
	"ns" : "config.cache.collections",
	"ui" : UUID("b17acd80-4988-49ec-888b-58a20b753e97"),
	"o" : {
		"$v" : 1,
		"$set" : {
			"lastRefreshedCollectionVersion" : Timestamp(1, 0),
			"refreshing" : false
		}
	},
	"o2" : {
		"_id" : "config.system.sessions"
	},
	"ts" : Timestamp(1572637437, 28),
	"t" : NumberLong(1),
	"wall" : ISODate("2019-11-01T19:43:57.671Z"),
	"v" : NumberLong(2)
}

[ShardedClusterFixture:job0:shard0:secondary0] 2019-11-01T15:43:57.971-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_applyOplogBatch, just flushed journal, last op timestamp: Timestamp(1572637437, 28), durable timestamp: Timestamp(1572637437, 28), oplog read timestamp: Timestamp(1572637437, 16), op itself: { op: "u", ns: "config.cache.collections", ui: UUID("b17acd80-4988-49ec-888b-58a20b753e97"), o: { $v: 1, $set: { lastRefreshedCollectionVersion: Timestamp(1, 0), refreshing: false } }, o2: { _id: "config.system.sessions" }, ts: Timestamp(1572637437, 28), t: 1, wall: new Date(1572637437671), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] 2019-11-01T15:43:57.971-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_applyOplogBatch, ops:
[ShardedClusterFixture:job0:shard0:secondary0] { op: "i", ns: "config.system.sessions", ui: UUID("136f6741-cf72-496c-ad1b-0315dd19cea5"), o: { _id: { id: UUID("10833ed2-1d0b-43d7-8ed3-ddfdceecdb8c"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, lastUse: new Date(1572637437629) }, ts: Timestamp(1572637437, 17), t: 1, wall: new Date(1572637437629), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] { op: "i", ns: "config.system.sessions", ui: UUID("136f6741-cf72-496c-ad1b-0315dd19cea5"), o: { _id: { id: UUID("a4fbfe3a-0ab1-44cf-a413-4da53d1e8142"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, lastUse: new Date(1572637437632) }, ts: Timestamp(1572637437, 18), t: 1, wall: new Date(1572637437632), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] { op: "i", ns: "config.system.sessions", ui: UUID("136f6741-cf72-496c-ad1b-0315dd19cea5"), o: { _id: { id: UUID("ec74a244-9177-42cc-b402-27cbb50b2d62"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, lastUse: new Date(1572637437634) }, ts: Timestamp(1572637437, 19), t: 1, wall: new Date(1572637437634), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] { op: "i", ns: "config.system.sessions", ui: UUID("136f6741-cf72-496c-ad1b-0315dd19cea5"), o: { _id: { id: UUID("35454c1d-2e26-4a99-b452-a93049119778"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, lastUse: new Date(1572637437636) }, ts: Timestamp(1572637437, 20), t: 1, wall: new Date(1572637437636), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] { op: "i", ns: "config.system.sessions", ui: UUID("136f6741-cf72-496c-ad1b-0315dd19cea5"), o: { _id: { id: UUID("f316b904-1a53-40d9-a726-290f2708c0c8"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, lastUse: new Date(1572637437639) }, ts: Timestamp(1572637437, 21), t: 1, wall: new Date(1572637437639), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] { op: "i", ns: "config.system.sessions", ui: UUID("136f6741-cf72-496c-ad1b-0315dd19cea5"), o: { _id: { id: UUID("5785dc7b-ff6d-4541-959c-002b527ffd0d"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, lastUse: new Date(1572637437641) }, ts: Timestamp(1572637437, 22), t: 1, wall: new Date(1572637437641), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] { op: "i", ns: "config.system.sessions", ui: UUID("136f6741-cf72-496c-ad1b-0315dd19cea5"), o: { _id: { id: UUID("0c2f9628-6253-4302-b961-c583d7c9aced"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, lastUse: new Date(1572637437642) }, ts: Timestamp(1572637437, 23), t: 1, wall: new Date(1572637437642), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] { op: "i", ns: "config.system.sessions", ui: UUID("136f6741-cf72-496c-ad1b-0315dd19cea5"), o: { _id: { id: UUID("9882f7c8-8831-4d09-b246-15f56316f5a6"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, lastUse: new Date(1572637437644) }, ts: Timestamp(1572637437, 24), t: 1, wall: new Date(1572637437644), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] { op: "i", ns: "config.system.sessions", ui: UUID("136f6741-cf72-496c-ad1b-0315dd19cea5"), o: { _id: { id: UUID("e6c8f298-eb09-44e5-869a-f05989da7567"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, lastUse: new Date(1572637437645) }, ts: Timestamp(1572637437, 25), t: 1, wall: new Date(1572637437646), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] { op: "i", ns: "config.system.sessions", ui: UUID("136f6741-cf72-496c-ad1b-0315dd19cea5"), o: { _id: { id: UUID("66c1b1dc-92f1-489d-a371-5c0ef026c6da"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, lastUse: new Date(1572637437647) }, ts: Timestamp(1572637437, 26), t: 1, wall: new Date(1572637437647), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] { op: "i", ns: "config.cache.chunks.config.system.sessions", ui: UUID("2806c1ab-f858-40a1-8fbc-59e5b3691928"), o: { _id: { _id: MinKey }, max: { _id: MaxKey }, shard: "shard-rs0", lastmod: Timestamp(1, 0), history: [ { validAfter: Timestamp(1572637436, 16), shard: "shard-rs0" } ] }, ts: Timestamp(1572637437, 27), t: 1, wall: new Date(1572637437669), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] { op: "u", ns: "config.cache.collections", ui: UUID("b17acd80-4988-49ec-888b-58a20b753e97"), o: { $v: 1, $set: { lastRefreshedCollectionVersion: Timestamp(1, 0), refreshing: false } }, o2: { _id: "config.system.sessions" }, ts: Timestamp(1572637437, 28), t: 1, wall: new Date(1572637437671), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] 2019-11-01T15:43:57.971-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_run, applied batch in _applyOplogBatch, lastOpTimeAppliedInBatch timestamp: Timestamp(1572637437, 28), oplog read timestamp: Timestamp(1572637437, 16), all_durable: Timestamp(1572637437, 28), lastOpInBatch: { op: "u", ns: "config.cache.collections", ui: UUID("b17acd80-4988-49ec-888b-58a20b753e97"), o: { $v: 1, $set: { lastRefreshedCollectionVersion: Timestamp(1, 0), refreshing: false } }, o2: { _id: "config.system.sessions" }, ts: Timestamp(1572637437, 28), t: 1, wall: new Date(1572637437671), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary0] 2019-11-01T15:43:57.973-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_run, after setOplogReadTimestamp, lastOpTimeAppliedInBatch: Timestamp(1572637437, 28); all_durable, is: Timestamp(1572637437, 28) oplog read timestamp: Timestamp(1572637437, 28); done calling setOplogReadTimestamp.

Comment by Dianna Hohensee (Inactive) [ 01/Nov/19 ]

[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.448-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_applyOplogBatch, just flushed journal, last op timestamp: Timestamp(1572633002, 4), durable timestamp: Timestamp(1572633002, 16), oplog read timestamp: Timestamp(1572633002, 3), op itself: { op: "c", ns: "config.$cmd", ui: UUID("1838ba5a-685a-4dc7-81db-d34e37486314"), o: { commitIndexBuild: "cache.chunks.config.system.sessions", indexBuildUUID: UUID("39a0a63b-994f-4aff-9b56-55a8380aa5f0"), indexes: [ { name: "lastmod_1", key: { lastmod: 1 }, v: 2 } ] }, ts: Timestamp(1572633002, 4), t: 1, wall: new Date(1572633002137), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.448-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_run, applied batch in _applyOplogBatch, lastOpTimeAppliedInBatch timestamp: Timestamp(1572633002, 4), oplog read timestamp: Timestamp(1572633002, 3), all_durable: Timestamp(1572633002, 16), lastOpInBatch: { op: "c", ns: "config.$cmd", ui: UUID("1838ba5a-685a-4dc7-81db-d34e37486314"), o: { commitIndexBuild: "cache.chunks.config.system.sessions", indexBuildUUID: UUID("39a0a63b-994f-4aff-9b56-55a8380aa5f0"), indexes: [ { name: "lastmod_1", key: { lastmod: 1 }, v: 2 } ] }, ts: Timestamp(1572633002, 4), t: 1, wall: new Date(1572633002137), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.450-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_run, after setOplogReadTimestamp, lastOpTimeAppliedInBatch: Timestamp(1572633002, 4); all_durable, is: Timestamp(1572633002, 16) oplog read timestamp: Timestamp(1572633002, 4); done calling setOplogReadTimestamp.
[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.450-0400 I  REPL     [rsSync-0] ~~~GOT A PROBLEM! oplog read ts isn't equal to all_durable after setOplogReadTimestamp! readTimestamp: Timestamp(1572633002, 4)tsDurable: Timestamp(1572633002, 16)

last op timestamp: Timestamp(1572633002, 4)

last op: { op: "c", ns: "config.$cmd", ui: UUID("1838ba5a-685a-4dc7-81db-d34e37486314"), o: { commitIndexBuild: "cache.chunks.config.system.sessions", indexBuildUUID: UUID("39a0a63b-994f-4aff-9b56-55a8380aa5f0"), indexes: [ { name: "lastmod_1", key: { lastmod: 1 }, v: 2 } ] }, ts: Timestamp(1572633002, 4), t: 1, wall: new Date(1572633002137), v: 2 }

all_durable timestamp: Timestamp(1572633002, 16)

with corresponding oplog entries

MongoDB Enterprise shard-rs0:SECONDARY> db.oplog.rs.find({ts: Timestamp(1572633002, 16)}).pretty()
{
	"op" : "u",
	"ns" : "config.cache.collections",
	"ui" : UUID("8b2d6568-d355-4d30-b9b0-37b6c448dc8b"),
	"o" : {
		"$v" : 1,
		"$set" : {
			"lastRefreshedCollectionVersion" : Timestamp(1, 0),
			"refreshing" : false
		}
	},
	"o2" : {
		"_id" : "config.system.sessions"
	},
	"ts" : Timestamp(1572633002, 16),
	"t" : NumberLong(1),
	"wall" : ISODate("2019-11-01T18:30:02.170Z"),
	"v" : NumberLong(2)
}

MongoDB Enterprise shard-rs0:SECONDARY> db.oplog.rs.find({ts: Timestamp(1572633002, 4)}).pretty()
{
	"op" : "c",
	"ns" : "config.$cmd",
	"ui" : UUID("1838ba5a-685a-4dc7-81db-d34e37486314"),
	"o" : {
		"commitIndexBuild" : "cache.chunks.config.system.sessions",
		"indexBuildUUID" : UUID("39a0a63b-994f-4aff-9b56-55a8380aa5f0"),
		"indexes" : [
			{
				"name" : "lastmod_1",
				"key" : {
					"lastmod" : 1
				},
				"v" : 2
			}
		]
	},
	"ts" : Timestamp(1572633002, 4),
	"t" : NumberLong(1),
	"wall" : ISODate("2019-11-01T18:30:02.137Z"),
	"v" : NumberLong(2)
}

logs after this...

[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.457-0400 I  SHARDING [repl-writer-worker-2] Marking collection config.cache.chunks.config.system.sessions as collection version: <unsharded>
[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.466-0400 I  STORAGE  [IndexBuildsCoordinatorMongod-0] Index build completed successfully: 17868726-bc95-4454-a14c-a47a3664cf28: config.cache.chunks.config.system.sessions ( 1838ba5a-685a-4dc7-81db-d34e37486314 ). Index specs built: 1. Indexes in catalog before build: 1. Indexes in catalog after build: 2
[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.469-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_applyOplogBatch, just flushed journal, last op timestamp: Timestamp(1572633002, 16), durable timestamp: Timestamp(1572633002, 16), oplog read timestamp: Timestamp(1572633002, 4), op itself: { op: "u", ns: "config.cache.collections", ui: UUID("8b2d6568-d355-4d30-b9b0-37b6c448dc8b"), o: { $v: 1, $set: { lastRefreshedCollectionVersion: Timestamp(1, 0), refreshing: false } }, o2: { _id: "config.system.sessions" }, ts: Timestamp(1572633002, 16), t: 1, wall: new Date(1572633002170), v: 2 }
[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.469-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_run, applied batch in _applyOplogBatch, lastOpTimeAppliedInBatch timestamp: Timestamp(1572633002, 16), oplog read timestamp: Timestamp(1572633002, 4), all_durable: Timestamp(1572633002, 16), lastOpInBatch: { op: "u", ns: "config.cache.collections", ui: UUID("8b2d6568-d355-4d30-b9b0-37b6c448dc8b"), o: { $v: 1, $set: { lastRefreshedCollectionVersion: Timestamp(1, 0), refreshing: false } }, o2: { _id: "config.system.sessions" }, ts: Timestamp(1572633002, 16), t: 1, wall: new Date(1572633002170), v: 2 }
[ShardedClusterFixture:job0:shard0:primary] 2019-11-01T14:30:02.469-0400 I  COMMAND  [conn41] command config.$cmd command: update { update: "system.sessions", bypassDocumentValidation: false, ordered: false, updates: 11, shardVersion: [ Timestamp(1, 0), ObjectId('5dbc79a9c5c025b714b6980e') ], writeConcern: { w: "majority", wtimeout: 15000 }, allowImplicitCollectionCreation: false, lsid: { id: UUID("b7a58891-e0eb-4d5e-ae90-ec96e3ac682d"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, $clusterTime: { clusterTime: Timestamp(1572633002, 1), signature: { hash: BinData(0, 0F76C2B0750908262B166FF542C766DF39ECB723), keyId: 6754407144696578064 } }, $client: { driver: { name: "PyMongo", version: "3.9.0" }, os: { type: "Linux", name: "Linux", architecture: "x86_64", version: "5.0.0-20-generic" }, platform: "CPython 3.7.0.final.0" }, $configServerState: { opTime: { ts: Timestamp(1572633001, 17), t: 1 } }, $db: "config" } numYields:0 reslen:1487 locks:{ ParallelBatchWriterMode: { acquireCount: { r: 11 } }, ReplicationStateTransition: { acquireCount: { w: 11 } }, Global: { acquireCount: { w: 11 } }, Database: { acquireCount: { w: 11 } }, Collection: { acquireCount: { w: 11 } }, Mutex: { acquireCount: { r: 44 } } } flowControl:{ acquireCount: 11 } storage:{ timeWaitingMicros: { schemaLock: 28095 } } protocol:op_msg 363ms
[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.470-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_run, after setOplogReadTimestamp, lastOpTimeAppliedInBatch: Timestamp(1572633002, 16); all_durable, is: Timestamp(1572633002, 16) oplog read timestamp: Timestamp(1572633002, 16); done calling setOplogReadTimestamp.

last op timestamp: Timestamp(1572633002, 16)

last op: { op: "u", ns: "config.cache.collections", ui: UUID("8b2d6568-d355-4d30-b9b0-37b6c448dc8b"), o: { $v: 1, $set: { lastRefreshedCollectionVersion: Timestamp(1, 0), refreshing: false } }, o2: { _id: "config.system.sessions" }, ts: Timestamp(1572633002, 16), t: 1, wall: new Date(1572633002170), v: 2 }

all_durable timestamp: Timestamp(1572633002, 16)

Some other stuff maybe relevant...

[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.176-0400 I  REPL     [replication-1] ~~~ new _lastFetched: { ts: Timestamp(1572633002, 4), t: 1 }, old _lastFetched: { ts: Timestamp(1572633002, 1), t: 1 }
[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.200-0400 I  REPL     [replication-2] ~~~ new _lastFetched: { ts: Timestamp(1572633002, 15), t: 1 }, old _lastFetched: { ts: Timestamp(1572633002, 4), t: 1 }
[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.206-0400 I  REPL     [replication-0] ~~~ new _lastFetched: { ts: Timestamp(1572633002, 16), t: 1 }, old _lastFetched: { ts: Timestamp(1572633002, 15), t: 1 }
[ShardedClusterFixture:job0:shard0:secondary1] 2019-11-01T14:30:02.536-0400 I  REPL     [replication-2] ~~~ new _lastFetched: { ts: Timestamp(1572633002, 17), t: 1 }, old _lastFetched: { ts: Timestamp(1572633002, 16), t: 1 }

Comment by Dianna Hohensee (Inactive) [ 01/Nov/19 ]

// Just after calling replicationBatchIsComplete, renamed to triggerJournalFlush, that instead of calling the oplog vis loop will call the JournalFlusher thread
// (so the WT all_durable ts is not used to update setOplogReadTimestamp, just the last op timestamp)
[ShardedClusterFixture:job0:shard1:secondary1] 2019-11-01T14:00:25.791-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_applyOplogBatch, just flushed journal, last op timestamp: Timestamp(1572631219, 18), durable timestamp: Timestamp(1572631221, 1), oplog read timestamp: Timestamp(1572631219, 5), op itself: { op: "d", ns: "test0_fsmdb0.fsmcoll0_0", ui: UUID("bdc16854-2366-4684-95d6-5c2a4bf59e83"), o: { a: 28.0, b: 28.0, _id: ObjectId('5dbc72a97b7dc76aecc0ba8e') }, ts: Timestamp(1572631219, 18), t: 1, wall: new Date(1572631219818), v: 2 }
 
// haven't done anything more yet, just returning to _applyOplogBatch caller function...
[ShardedClusterFixture:job0:shard1:secondary1] 2019-11-01T14:00:25.791-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_run, applied batch in _applyOplogBatch, lastOpTimeAppliedInBatch timestamp: Timestamp(1572631219, 18), oplog read timestamp: Timestamp(1572631219, 5), all_durable: Timestamp(1572631221, 1), lastOpInBatch: { op: "d", ns: "test0_fsmdb0.fsmcoll0_0", ui: UUID("bdc16854-2366-4684-95d6-5c2a4bf59e83"), o: { a: 28.0, b: 28.0, _id: ObjectId('5dbc72a97b7dc76aecc0ba8e') }, ts: Timestamp(1572631219, 18), t: 1, wall: new Date(1572631219818), v: 2 }
 
// Now we've called setOplogReadTimestamp directly, to update the oplog read timestamp, without going through the oplog vis loop, with the last applied op ts.
[ShardedClusterFixture:job0:shard1:secondary1] 2019-11-01T14:00:25.793-0400 I  REPL     [rsSync-0] ~~~OplogApplierImpl::_run, after setOplogReadTimestamp, lastOpTimeAppliedInBatch: Timestamp(1572631219, 18); all_durable, is: Timestamp(1572631221, 1) oplog read timestamp: Timestamp(1572631219, 18); done calling setOplogReadTimestamp.
[ShardedClusterFixture:job0:shard1:secondary1] 2019-11-01T14:00:25.793-0400 I  REPL     [rsSync-0] ~~~GOT A PROBLEM! oplog read ts isn't equal to all_durable after setOplogReadTimestamp! readTimestamp: Timestamp(1572631219, 18)tsDurable: Timestamp(1572631221, 1)

Somehow, we've got something in the oplog such that all_durable is Timestamp(1572631221, 1) while the last op applied is only Timestamp(1572631219, 18).

Generated at Thu Feb 08 05:05:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.