[SERVER-40324] sharded cluster backtraces with errormessage "XXX was committed once with opTime YYY and a second time with opTime ZZZ Created: 25/Mar/19  Updated: 06/Dec/22  Resolved: 25/Mar/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.10
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Jonathan Schmieg Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-39847 Migrating session info can trigger fa... Closed
Assigned Teams:
Sharding
Operating System: ALL
Participants:

 Description   

While evaluating the performance of our sharded mongodb cluster: three replicasets (one for configservers, two for sharded data) the servers keep crashing after some time with the following backtrace:

 

2019-03-15T11:26:05.271+0100 I COMMAND [conn189352] command admin.$cmd command: _recvChunkStatus { _recvChunkStatus: "applicationStorage.processStateHistory", waitForSteadyOrDone: true, sessionId: "appstore_app store2_5c8b7dbc8e10cc1ae695b136", $clusterTime: { clusterTime: Timestamp(1552645564, 2742), signature: { hash: BinData(0, 2A55878002F62882A63A5BA4253A0B5EC1F2D609), keyId: 6667140865305083910 } }, $configServerState: { opTime: { ts: Timestamp(1552645564, 1086), t: 1 } }, $db: "admin" } numYields:0 reslen:680 locks:{} protocol:op_msg 999ms 2019-03-15T11:26:05.316+0100 F STORAGE [sessionCatalogMigration-appstore_appstore2_5c8b7dbc8e10cc1ae695] Statement id 0 from transaction [ { id: UUID("8673c1c4-609e-4380-8251-f63f09526359"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }:1170 ] was committed once with opTime { ts: Timestamp(1552644646, 788), t: 1 } and a second time with opTime { ts: Timestamp(1552645565, 4495), t: 1 }. This indicates possible data corruption or server bug and the process will be terminated. 2019-03-15T11:26:05.329+0100 F - [sessionCatalogMigration-appstore_appstore2_5c8b7dbc8e10cc1ae695] Fatal Assertion 40526 at src/mongo/db/session.cpp 69 2019-03-15T11:26:05.350+0100 F - [sessionCatalogMigration-appstore_appstore2_5c8b7dbc8e10cc1ae695] ***aborting after fassert() failure 2019-03-15T11:26:05.431+0100 I SHARDING [Collection Range Deleter] Finished deleting documents in merchantNotification.notificationJob range [{ _id: ObjectId('5c8b77c7a2c5342604fcfdb9') }, { _id: ObjectId('5c8b77dc84842a4eddbad856') }) 2019-03-15T11:26:05.508+0100 I COMMAND [conn191246] command applicationStorage.entityLock appName: "opgcore" command: insert { insert: "entityLock", bypassDocumentValidation: false, ordered: true, stmtIds: [ 0 ], documents: [ { _id: "5c8b75f7a8d2367bd46db1b3lhsnb126sjv88ao9ha3jo721p7-process", data: { entityType: "process", entityId: "5c8b75f7a8d2367bd46db1b3lhsnb126sjv88ao9ha3jo721p7", serverId: "192.168.223.3", lockId: "MiCP3lymgZpNpdR", lockDate: new Date(1552645565391), autoUnlockDate: new Date(1552645745391), id: "5c8b75f7a8d2367bd46db1b3lhsnb126sjv88ao9ha3jo721p7-process" } } ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("81522b57-d23a-4f48-9d77-9d400c1323d1"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 5682, $clusterTime: { clusterTime: Timestamp(1552645565, 6140), signature: { hash: BinData(0, 1BA8D1BE7A3A1E47D8397B8BF0FA416A5D7C91C6), keyId: 6667140865305083910 } }, $client: { driver: { name: "mongo-java-driver", version: "3.9.1" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "4.19.0-2-cloud-amd64" }, platform: "Java/Oracle Corporation/1.8.0_171-8u171-b11-2-b11", application: { name: "opgcore" }, mongos: { host: "lasttests-opgcore-0:27017", client: "127.0.0.1:27886", version: "3.6.10-3.0" } }, $configServerState: { opTime: { ts: Timestamp(1552645564, 1099), t: 1 } }, $db: "applicationStorage" } ninserted:0 writeConflicts:1 exception: E11000 duplicate key error collection: applicationStorage.entityLock index: _id_ dup key: { : "5c8b75f7a8d2367bd46db1b3lhsnb126sjv88ao9ha3jo721p7-process" } code:DuplicateKey numYields:0 reslen:571 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } protocol:op_msg 116ms 2019-03-15T11:26:05.701+0100 I NETWORK [listener] connection accepted from 127.0.0.1:45436 #191827 (370 connections now open) 2019-03-15T11:26:05.733+0100 I NETWORK [conn191827] end connection 127.0.0.1:45436 (369 connections now open) 2019-03-15T11:26:05.750+0100 I NETWORK [conn191515] end connection 192.168.223.12:18284 (368 connections now open) 2019-03-15T11:26:05.752+0100 I NETWORK [conn191218] end connection 192.168.223.12:65266 (367 connections now open) 2019-03-15T11:26:05.762+0100 I NETWORK [conn191194] end connection 192.168.223.12:65236 (366 connections now open) 2019-03-15T11:26:05.765+0100 I NETWORK [conn191590] end connection 192.168.223.12:18538 (365 connections now open) 2019-03-15T11:26:05.777+0100 F - [sessionCatalogMigration-appstore_appstore2_5c8b7dbc8e10cc1ae695] Got signal: 6 (Aborted). 0x5579d4babd1a 0x5579d4baaf79 0x5579d4bab446 0x7f69ee654730 0x7f69ee4b68bb 0x7f69ee4a1535 0x5579d3186b9e 0x5579d387192a 0x5579d38761ce 0x5579d3352989 0x5579d396ff33 0x5579d397120f 0x5579d397342a 0x7f69ee8bbb2f 0x7f69ee649fa3 0x7f69ee57882f ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"5579D281E000","o":"238DD1A","s":"_ZN5mongo15printStackTraceERSo"},{"b":"5579D281E000","o":"238CF79"},{"b":"5579D281E000","o":"238D446"},{"b":"7F69EE642000","o":"12730"},{"b":"7F69EE47F000","o":"378BB","s":"gsignal"},{"b":"7F69EE47F000","o":"22535","s":"abort"},{"b":"5579D281E000","o":"968B9E","s":"_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"5579D281E000","o":"105392A"},{"b":"5579D281E000","o":"10581CE"},{"b":"5579D281E000","o":"B34989","s":"_ZN5mongo22WiredTigerRecoveryUnit7_commitEv"},{"b":"5579D281E000","o":"1151F33"},{"b":"5579D281E000","o":"115320F","s":"_ZN5mongo34SessionCatalogMigrationDestination31_retrieveSessionStateFromSourceEPNS_14ServiceContextE"},{"b":"5579D281E000","o":"115542A"},{"b":"7F69EE800000","o":"BBB2F"},{"b":"7F69EE642000","o":"7FA3"},{"b":"7F69EE47F000","o":"F982F","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.6.10-3.0", "gitVersion" : "da6faf62946711bd29a9c30806dc9c535eb1becb", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.19.0-2-cloud-amd64", "version" : "#1 SMP Debian 4.19.16-1 (2019-01-17)", "machine" : "x86_64" }, "somap" : [ { "b" : "5579D281E000", "elfType" : 3, "buildId" : "A21E4DCB049E3C0650DEFF25851716DACBDBCBA7" }, { "b" : "7FFDBB7DC000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "C26B1C5AA9BD65265D83852CC18BA05D93607325" }, { "b" : "7F69EF0AC000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "3AF7C4BCEB19B6C83F76E2822B9A23041D85F6D1" }, { "b" : "7F69EF099000", "path" : "/lib/x86_64-linux-gnu/libbz2.so.1.0", "elfType" : 3, "buildId" : "0F4BE7DF3B64A62D63203974A3DB18A167D6873D" }, { "b" : "7F69EF07C000", "path" : "/usr/lib/x86_64-linux-gnu/libsasl2.so.2", "elfType" : 3, "buildId" : "E5A6C1B52E9B18F9B6B1B54BC006507409683CE7" }, { "b" : "7F69EF062000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "23FAE799E09DBEA2935DA2AC99C15B0CE90D654F" }, { "b" : "7F69EEBFE000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.2", "elfType" : 3, "buildId" : "B26F780956D55C181206CBB67FEEFD665209C227" }, { "b" : "7F69EE995000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.2", "elfType" : 3, "buildId" : "8D36AB4C9418B58CC011D77F38B7FA8B918E2CF8" }, { "b" : "7F69EE98E000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "CEB2E0633BA706855DEDC23FD4DB9EE8D66FDCE3" }, { "b" : "7F69EE984000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "C843859D314AFE8A43B7D35B93DF751E90800043" }, { "b" : "7F69EE800000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "3DD2314D6BAAD3B26AB487142EEB7F8C26681C69" }, { "b" : "7F69EE67D000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "516F5A44B9F34F22234C050718F8ECDE3EE03A07" }, { "b" : "7F69EE663000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "5972B8FD957DB2A465A37EEE3E142BDEEC74DE7B" }, { "b" : "7F69EE642000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "8746B2EA9DEA22E2D79E0DB815D6606BA1930008" }, { "b" : "7F69EE47F000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "55E7DE84DF6813C82C5B12822219C17D19C9C4B0" }, { "b" : "7F69EF2D1000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "755312DCB2382EB2FDE78494879BB2104028AE80" }, { "b" : "7F69EE46F000", "path" : "/usr/lib/x86_64-linux-gnu/sasl2/libsasldb.so", "elfType" : 3, "buildId" : "F6CCCB520A568F50285B8EF541C7D059F9175448" }, { "b" : "7F69EE2B3000", "path" : "/usr/lib/x86_64-linux-gnu/libdb-5.3.so", "elfType" : 3, "buildId" : "674E2DC8FF2D056A41D25065EC098D81392952A3" } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x3A) [0x5579d4babd1a] mongod(+0x238CF79) [0x5579d4baaf79] mongod(+0x238D446) [0x5579d4bab446] libpthread.so.0(+0x12730) [0x7f69ee654730] libc.so.6(gsignal+0x10B) [0x7f69ee4b68bb] libc.so.6(abort+0x121) [0x7f69ee4a1535] mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x5579d3186b9e] mongod(+0x105392A) [0x5579d387192a] mongod(+0x10581CE) [0x5579d38761ce] mongod(_ZN5mongo22WiredTigerRecoveryUnit7_commitEv+0x39) [0x5579d3352989] mongod(+0x1151F33) [0x5579d396ff33] mongod(_ZN5mongo34SessionCatalogMigrationDestination31_retrieveSessionStateFromSourceEPNS_14ServiceContextE+0x108F) [0x5579d397120f] mongod(+0x115542A) [0x5579d397342a] libstdc++.so.6(+0xBBB2F) [0x7f69ee8bbb2f] libpthread.so.0(+0x7FA3) [0x7f69ee649fa3] libc.so.6(clone+0x3F) [0x7f69ee57882f] ----- END BACKTRACE -----



 Comments   
Comment by Kelsey Schubert [ 25/Mar/19 ]

MongoDB 3.6.12 is currently scheduled to be released in the next couple weeks. As a workaround, I would suggest disabling the balancer. If it is not feasible to disable the balancer until 3.6.12 is generally available, my recommendation would be to increase your oplog size.

Comment by Kaloian Manassiev [ 25/Mar/19 ]

Hi jschmieg,

We are sorry that you are hitting this problem. This is a known bug, which we have identified and fixed under SERVER-39847 and it is available in version 3.6.12 (and 4.0.7).

Even though the message says so, there is actually no data corruption.

Best regards,
-Kal.

Generated at Thu Feb 08 04:54:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.