|
Hello,
I am trying to upgrade our MongoDB 4.2.1 replica set to 4.4.2. Our Spring Boot 2.4 application with Java driver 4.1.1 uses multi-documents transactions and manipulates lots of documents in the same transaction (batch writting).
We do not have any issue on MongoDB 4.2 but theses transactions fails on WriteConflict on 4.4.
I add here a Java/Spring simple unit test case which pass on mongo 4.2 but not on mongo 4.4 which do the following :
- init the collection outside the transaction with 20 000 docs (with 2 properties, id and string with 1000 chars)
- in a transaction, removes all the 20 000 docs and re-insert 20 000 similar docs
Step to reproduce :
- Run the following test on mongo 4.2.1 : it passes
@RunWith(SpringRunner.class)
|
@ActiveProfiles(value = "test-mongo44")
|
@Import(
|
{MongoAutoConfiguration.class, MongoDataAutoConfiguration.class}
|
)
|
@SpringBootTest()
|
public class Mongo44Test {
|
@Autowired
|
private MongoTemplate mongoTemplate;
|
@Autowired
|
private MongoTransactionManager mongoTransactionManager;
|
|
@Test
|
public void test1() {
|
// init the collection with 20 000 docs with a 1000 chars length string
|
Collection<Document> elements = new ArrayList<Document>();
|
IntStream.range(0,20000).forEach(i -> {
|
Document doc = new Document();
|
doc.put("id", "site");
|
doc.put("string", RandomStringUtils.random(1000, true, true));
|
elements.add(doc);
|
});
|
mongoTemplate.insert(elements, "mongo44");
|
TransactionTemplate transactionTemplate = new TransactionTemplate(mongoTransactionManager);
|
transactionTemplate.execute(new TransactionCallbackWithoutResult() {
|
@Override
|
protected void doInTransactionWithoutResult(TransactionStatus status) {
|
// remove all docs in transaction
|
mongoTemplate.remove(Query.query(Criteria.where("id").is("site")), "mongo44");
|
elements.clear();
|
// and re-insert 20 000 docs with a 1000 chars length string in the transaction
|
IntStream.range(0,20000).forEach(i -> {
|
Document doc = new Document();
|
doc.put("id", "site");
|
doc.put("string", RandomStringUtils.random(1000, true, true));
|
elements.add(doc);
|
});
|
mongoTemplate.insert(elements, "mongo44");
|
};
|
}); }
|
}
|
- Run the same test on mongo 4.4.2 : it fails on following error
Could not commit Mongo transaction for session [ClientSessionImpl@6d4a8c85 id = {"id": {"$binary": {"base64": "uCifnjaYRkqu73ku3PC4Tw==", "subType": "04"}}}, causallyConsistent = true, txActive = false, txNumber = 3, error = d != java.lang.Boolean].;
|
nested exception is com.mongodb.MongoCommandException: Command failed with error 112 (WriteConflict): 'WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.' on server localhost:27017.
|
The full response is {"errorLabels": ["TransientTransactionError"], "operationTime": {"$timestamp": {"t": 1608556898, "i": 20000}}, "ok": 0.0, "errmsg": "WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.", "code": 112, "codeName": "WriteConflict", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1608556900, "i": 70000}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": 0}}}org.springframework.transaction.TransactionSystemException: Could not commit Mongo transaction for session [ClientSessionImpl@6d4a8c85 id = {"id": {"$binary": {"base64": "uCifnjaYRkqu73ku3PC4Tw==", "subType": "04"}}}, causallyConsistent = true, txActive = false, txNumber = 3, error = d != java.lang.Boolean].;
|
nested exception is com.mongodb.MongoCommandException: Command failed with error 112 (WriteConflict): 'WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.' on server localhost:27017.
|
The full response is {"errorLabels": ["TransientTransactionError"], "operationTime": {"$timestamp": {"t": 1608556898, "i": 20000}}, "ok": 0.0, "errmsg": "WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.", "code": 112, "codeName": "WriteConflict", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1608556900, "i": 70000}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": 0}}}
|
at org.springframework.data.mongodb.MongoTransactionManager.doCommit(MongoTransactionManager.java:203)
|
at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:743)
|
at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:711)
|
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:152)
|
at com.myproject.mongo44.Mongo44Test.test1(Mongo44Test.java:54) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
|
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
|
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
|
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
|
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
|
at org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74)
|
at org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84)
|
at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
|
at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
|
at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
|
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
|
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251) at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61) at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) at com.sun.proxy.$Proxy5.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164) at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:413) at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56) at java.base/java.lang.Thread.run(Thread.java:834)Caused by: com.mongodb.MongoCommandException: Command failed with error 112 (WriteConflict): 'WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.' on server localhost:27017. The full response is {"errorLabels": ["TransientTransactionError"], "operationTime": {"$timestamp": {"t": 1608556898, "i": 20000}}, "ok": 0.0, "errmsg": "WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.", "code": 112, "codeName": "WriteConflict", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1608556900, "i": 70000}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": 0}}} at com.mongodb.internal.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:175) at com.mongodb.internal.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:302) at com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:258) at com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:99) at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:500) at com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:71) at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:224) at com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:202) at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:118) at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:110) at com.mongodb.internal.operation.CommandOperationHelper$13.call(CommandOperationHelper.java:710) at com.mongodb.internal.operation.OperationHelper.withReleasableConnection(OperationHelper.java:500) at com.mongodb.internal.operation.CommandOperationHelper.executeRetryableCommand(CommandOperationHelper.java:703) at com.mongodb.internal.operation.TransactionOperation.execute(TransactionOperation.java:69) at com.mongodb.internal.operation.CommitTransactionOperation.execute(CommitTransactionOperation.java:132) at com.mongodb.internal.operation.CommitTransactionOperation.execute(CommitTransactionOperation.java:53) at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:207) at com.mongodb.client.internal.ClientSessionImpl.commitTransaction(ClientSessionImpl.java:129) at org.springframework.data.mongodb.MongoTransactionManager$MongoTransactionObject.commitTransaction(MongoTransactionManager.java:469) at org.springframework.data.mongodb.MongoTransactionManager.doCommit(MongoTransactionManager.java:236) at org.springframework.data.mongodb.MongoTransactionManager.doCommit(MongoTransactionManager.java:200) ... 60 more
|
Is there a change in
|
|
We have completed WT-7185 and backported to 4.4.
These changes are scheduled to be included in the next MongoDB 4.4 release (4.4.7)
|
|
We are also seeing this problem in 4.4.2 with large transactions. We're working around it temporarily by reducing our transaction sizes, but are eager to hear when/how MongoDB plans to address the root problem.
|
|
Note that I was also able to reproduce this problem with a large number of updates to a large single document in a bulk write within a transaction, so the issue is not unique to inserts of many documents.
|
|
Hi,
Thanks for the feedback, I will watch it also.
For the moment we are stuck with mongo 4.2 because reducing the number of documents need a rewriting a part of our application (it is a BI application which manipulates a lot of documents through aggregations and bulk removes/inserts)
|
|
Hi o.boudet@gmail.com,
We have created WT-7185 to investigate this issue further in the storage engines team. Please follow that issue if you would like updates on progress when we get the issue scheduled in a future iteration. In the interim, to move forward with the 4.4 upgrade, is it possible to reduce the number of documents updated in the transaction?
|
|
I've reproduced the issue, and indeed the check added in WT-6444 does appear to be problematic. As alexander.gorrod has mentioned I'll begin investigating improvements to the hueristics used around that check and whether we can distinguish between the two scenarios.
I'm currently running the tpcc reproducer as well to create a baseline for performance.
|
|
The change in WT-6444 was put into place to handle cases where MongoDB users do batch inserts - when those batch inserts get to a certain size, the batch tends to fail and MongoDB retries each operation individually. If WiredTiger continues "trying" very hard to apply the batch of operations in a single transaction it can cause the performance for such batches to be very poor, so we implemented the above code to give up in that case.
It appears as though this case with multi-document transactions is triggering the same mechanism (which isn't surprising), but it can't fallback to breaking the operations up into smaller pieces (since it's a transaction).
The mode of WT_ROLLBACK that is being triggered here is based on pinning an individual page of content into cache - and that can hold up to 10MB worth of data (before we need to split that page), not the entire configured dirty cache content. Which is why there is relatively modest cache usage reported here.
We will need to take this workload and investigate whether there are any heuristics we can apply to distinguish between it and the workloads that drove the changes in WT-6444. If not we might consider whether MongoDB server can pass in additional context to WiredTiger to allow it to distinguish between the different cases.
|
|
I had a quick look at the code luke.pearson mentioned. It looks look we are doing a forced eviction on a page that is too large. So whether we do forced eviction or not in this case is not related to the overall cache usage. luke.pearson Is my understanding correct?
|
|
In my repro it seemed that almost every attempt was failing, so I'm not sure that retrying is useful - it might eventually succeed, but that would just turn a functional problem into a severe performance problem.
|
|
Thanks gregory.wlodarek.
We introduced that change to handle the scenario where the oldest transaction, which was pinning content was also getting stuck in forced eviction and subsequently taking longer to complete, pinning more content.
At the storage layer we expected operations that we returned WT_ROLLBACK to which would then get translated to a WCE to get retried. Is that not the case in the oplog? However we should also determine why that thread did get pulled into forced eviction with only 0.281% cache usage.
|
|
luke.pearson, that did the trick! I can no longer reproduce this issue with your patch.
|
|
I could be wrong but this could be WT-6444, gregory.wlodarek could you try running the reproducer with this patch?
diff --git a/src/btree/bt_read.c b/src/btree/bt_read.c
|
index f44bc11cb..1c0fef980 100644
|
--- a/src/btree/bt_read.c
|
+++ b/src/btree/bt_read.c
|
@@ -348,10 +348,10 @@ read:
|
* Forced eviction failed: check if this transaction is keeping content pinned
|
* in cache.
|
*/
|
- if (force_attempts > 1 &&
|
+ /*if (force_attempts > 1 &&
|
(ret = __wt_txn_is_blocking(session, true)) == WT_ROLLBACK)
|
WT_STAT_CONN_INCR(session, cache_eviction_force_rollback);
|
- WT_RET(ret);
|
+ WT_RET(ret);*/
|
stalled = true;
|
break;
|
}
|
|
|
louis.williams pointed out WT-6627 to me, could these be related?
|
|
I've attached some FTDC data from the reproducer. I found out that WiredTiger can return WT_ROLLBACK on write transactions to free up cache space, but looking at the FTDC data we're only using 0.281% of the cache at the time, so I don't believe that's the problem.

Does WiredTiger impose some sort of per-transaction limit on large transactions that forces them to be rolled back?
|
|
I've verified Bruce's observations. The write conflict exceptions are being thrown when inserting operations into the oplog collection.
In the reproducer provided by Bruce, there are no concurrent operations running at the time of the transaction, so I'm not sure why we're getting the write conflict exception thrown.
From what I can see, when we're committing the transaction there are multiple oplog entries associated with it (identical txnNumber) as it's a large transaction and gets split up.
The first oplog entry inserts successfully into the table with the RecordId key 6917308536905007105 and the following contents:
{
|
"op": {
|
"lsid": {
|
"id": {
|
"$uuid": "94c1804b-cd1f-4091-b201-173215e52c14"
|
},
|
"uid": {
|
"$binary": {
|
"base64": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
|
"subType": "0"
|
}
|
}
|
},
|
"txnNumber": 0,
|
"op": "c",
|
"ns": "admin.$cmd",
|
"o": {
|
"applyOps": [
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff376f524b3165ab754a2b"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff376f524b3165ab754a2c"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff376f524b3165ab754a2d"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff376f524b3165ab754a2e"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff376f524b3165ab754a2f"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff376f524b3165ab754a30"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff376f524b3165ab754a31"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff376f524b3165ab754a32"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff376f524b3165ab754a33"
|
}
|
}
|
}
|
]
|
}
|
},
|
"ts": {
|
"$timestamp": {
|
"t": 1610561399,
|
"i": 1
|
}
|
}
|
}
|
The insertion of the second oplog entry into the table is the one that is consistently returning WT_ROLLBACK (-31800) for me, resulting in the write conflict exception being thrown. For this entry, we use the RecordId key 6917308536905007106 and it contains the following contents:
{
|
"op": {
|
"lsid": {
|
"id": {
|
"$uuid": "94c1804b-cd1f-4091-b201-173215e52c14"
|
},
|
"uid": {
|
"$binary": {
|
"base64": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
|
"subType": "0"
|
}
|
}
|
},
|
"txnNumber": 0,
|
"op": "c",
|
"ns": "admin.$cmd",
|
"o": {
|
"applyOps": [
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff3770524b3165ab75860e"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff3770524b3165ab75860f"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff3770524b3165ab758610"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff3770524b3165ab758611"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff3770524b3165ab758612"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff3770524b3165ab758613"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff3770524b3165ab758614"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff3770524b3165ab758615"
|
},
|
"x": "xxxxx.........xxxxx"
|
}
|
},
|
{
|
"op": "i",
|
"ns": "test.c",
|
"ui": {
|
"$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10"
|
},
|
"o": {
|
"_id": {
|
"$oid": "5fff3770524b3165ab758616"
|
}
|
}
|
}
|
]
|
}
|
},
|
"ts": {
|
"$timestamp": {
|
"t": 1610561399,
|
"i": 2
|
}
|
}
|
}
|
During this time there have been no other operations with the RecordId key 6917308536905007106, so I don't see how we can be getting a WT_ROLLBACK from WiredTiger.
I'm going to send this over to WiredTiger for further investigation. This issue also reproduces consistently for me on v4.9.
|
|
It looks like the write conflict exceptions are occuring while inserting the operations in the oplog.
(gdb) bt
|
#0 0x000055da5f67ef98 in mongo::WriteConflictException::WriteConflictException() ()
|
#1 0x000055da5f7ea8b4 in mongo::wtRCToStatus_slow(int, char const*) ()
|
#2 0x000055da5f7d5b1a in mongo::WiredTigerRecordStore::_insertRecords(mongo::OperationContext*, mongo::Record*, mongo::Timestamp const*, unsigned long) ()
|
#3 0x000055da5f7d5c74 in mongo::WiredTigerRecordStore::insertRecords(mongo::OperationContext*, std::vector<mongo::Record, std::allocator<mongo::Record> >*, std::vector<mongo::Timestamp, std::allocator<mongo::Timestamp> > const&) ()
|
#4 0x000055da6000c7a3 in mongo::CollectionImpl::insertDocumentsForOplog(mongo::OperationContext*, std::vector<mongo::Record, std::allocator<mongo::Record> >*, std::vector<mongo::Timestamp, std::allocator<mongo::Timestamp> > const&) ()
|
#5 0x000055da600739b8 in mongo::repl::_logOpsInner(mongo::OperationContext*, mongo::NamespaceString const&, std::vector<mongo::Record, std::allocator<mongo::Record> >*, std::vector<mongo::Timestamp, std::allocator<mongo::Timestamp> > const&, mongo::Collection*, mongo::repl::OpTime, mongo::Date_t) ()
|
#6 0x000055da600742a8 in mongo::repl::logOp(mongo::OperationContext*, mongo::repl::MutableOplogEntry*) ()
|
#7 0x000055da5fe165a4 in mongo::(anonymous namespace)::logOperation(mongo::OperationContext*, mongo::repl::MutableOplogEntry*) ()
|
#8 0x000055da5fe181b0 in mongo::(anonymous namespace)::logApplyOpsForTransaction(mongo::OperationContext*, mongo::repl::MutableOplogEntry*, boost::optional<mongo::DurableTxnStateEnum>, boost::optional<mongo::repl::OpTime>, bool) [clone .constprop.961] ()
|
#9 0x000055da5fe1a7da in mongo::(anonymous namespace)::logOplogEntriesForTransaction(mongo::OperationContext*, std::vector<mongo::repl::ReplOperation, std::allocator<mongo::repl::ReplOperation> >*, std::vector<mongo::repl::OpTime, std::allocator<mongo::repl::OpTime> > const&, unsigned long, bool) ()
|
#10 0x000055da5fe1df93 in mongo::OpObserverImpl::onUnpreparedTransactionCommit(mongo::OperationContext*, std::vector<mongo::repl::ReplOperation, std::allocator<mongo::repl::ReplOperation> >*, unsigned long) ()
|
#11 0x000055da5f756ce5 in mongo::OpObserverRegistry::onUnpreparedTransactionCommit(mongo::OperationContext*, std::vector<mongo::repl::ReplOperation, std::allocator<mongo::repl::ReplOperation> >*, unsigned long) ()
|
#12 0x000055da600ae2bb in mongo::TransactionParticipant::Participant::commitUnpreparedTransaction(mongo::OperationContext*) ()
|
#13 0x000055da5fc452dc in mongo::(anonymous namespace)::CmdCommitTxn::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) ()
|
#14 0x000055da603ebe5a in mongo::BasicCommandWithReplyBuilderInterface::Invocation::run(mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*) ()
|
#15 0x000055da603e639f in mongo::CommandHelpers::runCommandInvocation(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::CommandInvocation*, mongo::rpc::ReplyBuilderInterface*) ()
|
#16 0x000055da5fb8e8b1 in mongo::(anonymous namespace)::invokeWithSessionCheckedOut(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::CommandInvocation*, mongo::OperationSessionInfoFromClient const&, mongo::rpc::ReplyBuilderInterface*) ()
|
#17 0x000055da5fb8f828 in mongo::(anonymous namespace)::runCommandImpl(mongo::OperationContext*, mongo::CommandInvocation*, mongo::OpMsgRequest const&, mongo::rpc::ReplyBuilderInterface*, mongo::LogicalTime, mongo::ServiceEntryPointCommon::Hooks const&, mongo::BSONObjBuilder*, mongo::OperationSessionInfoFromClient const&) ()
|
#18 0x000055da5fb91ff9 in mongo::(anonymous namespace)::execCommandDatabase(mongo::OperationContext*, mongo::Command*, mongo::OpMsgRequest const&, mongo::rpc::ReplyBuilderInterface*, mongo::ServiceEntryPointCommon::Hooks const&) ()
|
#19 0x000055da5fb932e0 in mongo::(anonymous namespace)::receivedCommands(mongo::OperationContext*, mongo::Message const&, mongo::ServiceEntryPointCommon::Hooks const&) ()
|
#20 0x000055da5fb93f6d in mongo::ServiceEntryPointCommon::handleRequest(mongo::OperationContext*, mongo::Message const&, mongo::ServiceEntryPointCommon::Hooks const&) ()
|
#21 0x000055da5fb8195c in mongo::ServiceEntryPointMongod::handleRequest(mongo::OperationContext*, mongo::Message const&) ()
|
#22 0x000055da5fb8c48c in mongo::ServiceStateMachine::_processMessage(mongo::ServiceStateMachine::ThreadGuard) ()
|
#23 0x000055da5fb8a1c5 in mongo::ServiceStateMachine::_runNextInGuard(mongo::ServiceStateMachine::ThreadGuard) ()
|
#24 0x000055da5fb8b176 in std::_Function_handler<void (), mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
|
#25 0x000055da60e47b22 in mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()>, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName) ()
|
#26 0x000055da5fb895db in mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership) ()
|
#27 0x000055da5fb8a93d in mongo::ServiceStateMachine::_sourceCallback(mongo::Status) ()
|
#28 0x000055da5fb8aa20 in auto mongo::future_details::call<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::{lambda(mongo::StatusWith<mongo::Message>)#2}&, mongo::StatusWith<mongo::Message> >(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::{lambda(mongo::StatusWith<mongo::Message>)#2}&, mongo::StatusWith<mongo::Message>&&) ()
|
#29 0x000055da5fb8aca5 in mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard) ()
|
#30 0x000055da5fb8a23a in mongo::ServiceStateMachine::_runNextInGuard(mongo::ServiceStateMachine::ThreadGuard) ()
|
#31 0x000055da5fb8b176 in std::_Function_handler<void (), mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
|
#32 0x000055da60e48188 in std::_Function_handler<void (), mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()>, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName)::{lambda()#2}>::_M_invoke(std::_Any_data const&) ()
|
#33 0x000055da61197b06 in std::_Function_handler<void (), mongo::launchServiceWorkerThread(std::function<void ()>)::{lambda()#3}>::_M_invoke(std::_Any_data const&) ()
|
#34 0x000055da61197b74 in mongo::(anonymous namespace)::runFunc(void*) ()
|
#35 0x00007f18e63da6db in start_thread (arg=0x7f18bb27d700) at pthread_create.c:463
|
#36 0x00007f18e610388f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
|
Thanks o.boudet@gmail.com, we're passing this report on to an appropriate team to investigate further.
|
|
I was able to reproduce this with the following:
function repro() {
|
const doc = {x: "x".repeat(1000)}
|
const many = new Array(20000).fill(doc)
|
const session = db.getMongo().startSession()
|
db = session.getDatabase("test")
|
db.createCollection("c")
|
for (var i = 0; i < 10; i++) {
|
try {
|
session.startTransaction({readConcern: {level: "snapshot" }})
|
db.c.insertMany(many)
|
session.commitTransaction()
|
print("success")
|
} catch (e) {
|
print(e)
|
}
|
}
|
}
|
- Removing documents is not necessary to reproduce the problem
- Interestingly the bulk insert fails on the first try, succeeds on the second, then fails on most successive tries, but succeeds occasionally
- Does not reproduce as readily if you cut either the size of the string or the number of documents in half, but still occasionally fails; however, those occasional failures are on the insert, not on the commit
- Neither the serverStatus counter nor the slow query log message show any write conflicts, even though the error returned from the commit is a write conflict error
- Fails in the same way if the bulk insert is replaced by a loop inserting the same number of documents individually
- Fails on 4.4.0 and 4.4.1 as well
|
|
o.boudet@gmail.com Seems like this is a change in server behavior, so we've moved this to the SERVER project.
|
Generated at Thu Feb 08 05:31:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.