[SERVER-34994] The test secondary_reads_with_catalog_changes fails on timestamp safe unique index. Created: 15/May/18  Updated: 29/Oct/23  Resolved: 27/Jul/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.1.2

Type: Improvement Priority: Major - P3
Reporter: Neha Khatri Assignee: Neha Khatri
Resolution: Fixed Votes: 0
Labels: nonnyc, storage-engines
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File s34994.log    
Backwards Compatibility: Fully Compatible
Sprint: Storage Non-NYC 2018-07-16, Storage Engines 2018-07-30
Participants:

 Description   

The test fails with a duplicate key error in timestamp safe unique index on a primary.

[fsm_workload_test:secondary_reads_with_catalog_changes] 2018-05-15T02:56:36.008+0000         	"errmsg" : "E11000 duplicate key error collection: test139_fsmdb0.secondary_reads index: x_1 dup key: { : 3768.0 }",
[fsm_workload_test:secondary_reads_with_catalog_changes] 2018-05-15T02:56:36.008+0000         	"code" : 11000,
[fsm_workload_test:secondary_reads_with_catalog_changes] 2018-05-15T02:56:36.008+0000         	"codeName" : "DuplicateKey",
[fsm_workload_test:secondary_reads_with_catalog_changes] 2018-05-15T02:56:36.009+0000         	"$clusterTime" : {
[fsm_workload_test:secondary_reads_with_catalog_changes] 2018-05-15T02:56:36.009+0000         		"clusterTime" : Timestamp(1526352975, 1257),
[fsm_workload_test:secondary_reads_with_catalog_changes] 2018-05-15T02:56:36.009+0000         		"signature" : {
[fsm_workload_test:secondary_reads_with_catalog_changes] 2018-05-15T02:56:36.009+0000         			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
[fsm_workload_test:secondary_reads_with_catalog_changes] 2018-05-15T02:56:36.009+0000         			"keyId" : NumberLong(0)
      }

Investigate cause of the failure.



 Comments   
Comment by Githook User [ 27/Jul/18 ]

Author:

{'name': 'nehakhatri5', 'email': 'neha.khatri@mongodb.com', 'username': 'nehakhatri5'}

Message: SERVER-34994 Allow identical index key insert in new format unique index

Ignore the duplicate key error when inserting pre-existing index key in
new format unique index. This can happen during concurrent index insert
and background index build.
Branch: master
https://github.com/mongodb/mongo/commit/4734396bdfcd68c6300f9fd95a0575e7c5287ad2

Comment by Neha Khatri [ 25/Jul/18 ]

Yes milkie, this situation occurs with old format unique indexes too. The old unique-index-record-insert has the logic that prevents duplicate key error here

We can have similar check logic in new index-record-insert logic too i.e. if the new index entry being inserted has identical key + RecordID as an exiting index entry then ignore the duplicate key error returned by WiredTiger insert. 

Comment by Eric Milkie [ 24/Jul/18 ]

I would imagine this situation would surface with old format unique indexes as well.  To make background indexes work, the indexing code must ignore duplicate key errors on index-record-insert and ignore missing key errors on index-record-remove (unindex).  Today's code already always ignores all missing key errors on unindex, although the logic is complicated (see wiredtiger_index.cpp:1599).  The code also needs to ignore duplicate key errors for a background index build; the logic for it is also pretty complicated and there could be problems with the code in that area.

Comment by Neha Khatri [ 24/Jul/18 ]

Thanks max.hirschhorn for inputs. With additional logging, I see that the background index build [conn125]  and the insert thread tid:0=[conn110] are trying to insert the same index document in parallel:

481 [ReplicaSetFixture:job0:primary] 2018-07-24T15:03:37.836+1000 I INDEX    [conn125] build index on: test0_fsmdb0.secondary_reads properties: { v: 2, unique: true, key: { x: 1.0 }, n       ame: "x_1", ns: "test0_fsmdb0.secondary_reads", background: true }
585 [ReplicaSetFixture:job0:primary] 2018-07-24T15:03:37.915+1000 I STORAGE  [conn110] WT index (0x7f5399b619a0) Timestamp safe unique idx key: { : 5320.0 } id: RecordId(5321)
16329 [ReplicaSetFixture:job0:primary] 2018-07-24T15:03:43.706+1000 I STORAGE  [conn125] WT index (0x7f5399b619a0) Timestamp safe unique idx key: { : 5320.0 } id: RecordId(5321)
16330 [ReplicaSetFixture:job0:primary] 2018-07-24T15:03:43.706+1000 I INDEX    [conn125] insertAllDocumentsInCollection inserted doc { _id: 5320.0, x: 5320.0 } 

I am trying to find that why only new format unique indexes are seeing this.

Comment by Max Hirschhorn [ 24/Jul/18 ]

In the attached test log, I see that tid:0 = conn126

[fsm_workload_test:secondary_reads_with_catalog_changes] 2018-07-23T15:17:12.052+1000 [tid:0, conn:conn126] setting random seed: 1289723610

But there are other threads [conn94],[conn136],[conn77],[conn76],[conn103],[conn116],[conn61] that are attempting to insert index entries. The inserts from these non-tid:0 threads are resulting in duplicate key errors.

neha.khatri, milkie, I think there's a misunderstanding in what it is the secondary_reads_with_catalog_changes.js FSM workload is doing. While there is only one thread (with tid=0) inserting documents into the collection, any one of the other threads is able to drop the {x: 1} index if it exists or create it if it was dropped by some thread. Given that $config.threadCount=50, there are possibly 49 threads doing this at different times while the workload is running.

I don't see any duplicate key errors if I change the workload to always build the {x: 1} in the foreground so I believe the uniqueness constraint is somehow being violated when the index is being built in the background. I hope that helps to point you in the right direction as you continue your investigation.

Comment by Neha Khatri [ 24/Jul/18 ]

In the attached test log, I see that tid:0 = conn126

[fsm_workload_test:secondary_reads_with_catalog_changes] 2018-07-23T15:17:12.052+1000 [tid:0, conn:conn126] setting random seed: 1289723610

But there are other threads [conn94],[conn136],[conn77],[conn76],[conn103],[conn116],[conn61] that are attempting to insert index entries. The inserts from these non-tid:0 threads are resulting in duplicate key errors.

e.g.

[ReplicaSetFixture:job0:primary] 2018-07-23T15:17:16.541+1000 I STORAGE  [conn77] WT index (0x7f95e17943e0) Timestamp safe unique idx key: { : 8256.0 } id: RecordId(1)
[fsm_workload_test:secondary_reads_with_catalog_changes] 2018-07-23T15:17:27.987+1000         	"errmsg" : "E11000 duplicate key error collection: test0_fsmdb0.secondary_reads index: x_1 dup key: { : 8256.0 }"

Comment by Neha Khatri [ 23/Jul/18 ]

milkie The above log snippet was from a local run. Same test failure was also seen in Evergreen here.

Also attaching the log from the local run s34994.log.

Comment by Max Hirschhorn [ 23/Jul/18 ]

The fsm test appears to check if the "tid" field is 0, in order to determine if a particular thread should do inserts. Could it be that multiple threads have a tid of 0? I can't find where the tid field gets set.

milkie, the thread ids are assigned by ThreadManager#spawnAll(). We increment the tid variable once for every call to makeThread().

Comment by Eric Milkie [ 23/Jul/18 ]

The fsm test appears to check if the "tid" field is 0, in order to determine if a particular thread should do inserts.  Could it be that multiple threads have a tid of 0?  I can't find where the tid field gets set.

Comment by Neha Khatri [ 23/Jul/18 ]

The Duplicate Key error is seen on primary when inserting entries into unique index.

The tests says the workload has a dedicated thread to write documents. However for index writes, it can be seen that more than one threads are attempting to write the same document. e.g see the index key {5384.0} . below, thread conn126 is inserting all the index entries, in between the thread conn136 inserting the already inserted index entry as shown:

65 [ReplicaSetFixture:job0:primary] 2018-07-23T15:17:14.844+1000 I STORAGE [conn126] WT index (0x7f95e788d920) Timestamp safe unique idx key: { : 5384.0 } id: RecordId(1) 
66 [ReplicaSetFixture:job0:primary] 2018-07-23T15:17:14.845+1000 I STORAGE [conn126] WT index (0x7f95e788d920) Timestamp safe unique idx key: { : 5385.0 } id: RecordId(2) 
67 [ReplicaSetFixture:job0:primary] 2018-07-23T15:17:14.845+1000 I STORAGE [conn126] WT index (0x7f95e788d920) Timestamp safe unique idx key: { : 5386.0 } id: RecordId(3) 
...
128 [ReplicaSetFixture:job0:primary] 2018-07-23T15:17:14.855+1000 I STORAGE [conn126] WT index (0x7f95e788d920) Timestamp safe unique idx key: { : 5447.0 } id: RecordId(64) 
129 [ReplicaSetFixture:job0:primary] 2018-07-23T15:17:14.869+1000 I STORAGE [conn136] WT index (0x7f95e788d920) Timestamp safe unique idx key: { : 5384.0 } id: RecordId(1) 
130 [ReplicaSetFixture:job0:primary] 2018-07-23T15:17:14.877+1000 I STORAGE [conn126] WT index (0x7f95e788d920) Timestamp safe unique idx key: { : 5448.0 } id: RecordId(65)

Two threads attempting to insert identical index entries in a unique index is causing the failure.

 

Generated at Thu Feb 08 04:38:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.