[SERVER-40535] Possibility to get a non-existent key if using ReadConcern level:local when reading signing keys in ReplicaSet Created: 08/Apr/19 Updated: 29/Oct/23 Resolved: 20/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.12, 4.0.8 |
| Fix Version/s: | 4.0.11, 4.2.0-rc3, 4.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Misha Tyulenev | Assignee: | Misha Tyulenev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.2, v4.0, v3.6
|
||||||||||||||||
| Sprint: | Sharding 2019-05-06, Repl 2019-06-03, Sharding 2019-06-17, Sharding 2019-07-01 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
There is a possible scenario that admin.system.keys collection gets diverged and hence customer gets a signing key that does not exists which causes errors in query processing. |
| Comments |
| Comment by Danny Hatcher (Inactive) [ 22/Jul/19 ] | ||||
|
We have decided not to backport this ticket to 3.6. Due to the way key generation is written in 3.6, it would be a significantly larger code change to backport to that version than it was to backport to 4.0. Additionally, the fix described in this ticket only resolves scenarios in which Read Concern "Majority" is enabled. As the driver automatically corrects the problem after receiving the error, we encourage users to retry the operation. | ||||
| Comment by Githook User [ 15/Jul/19 ] | ||||
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: (cherry picked from commit 7d88bdb226e8a3dc9b5eb4b57edcca111619c5f9) | ||||
| Comment by Githook User [ 15/Jul/19 ] | ||||
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: (cherry picked from commit 1d158cabb504fa9dba3ed0f0688cdf14cb7b0cba) | ||||
| Comment by Githook User [ 03/Jul/19 ] | ||||
|
Author: {'name': 'Misha Tyulenev', 'username': 'mikety', 'email': 'misha@mongodb.com'}Message: (cherry picked from commit 7d88bdb226e8a3dc9b5eb4b57edcca111619c5f9) | ||||
| Comment by Githook User [ 03/Jul/19 ] | ||||
|
Author: {'name': 'Misha Tyulenev', 'username': 'mikety', 'email': 'misha@mongodb.com'}Message: | ||||
| Comment by Misha Tyulenev [ 28/Jun/19 ] | ||||
|
mark.brinsmead when the majority reads are disabled the system can have this bug. | ||||
| Comment by Githook User [ 21/Jun/19 ] | ||||
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: (cherry picked from commit 1d158cabb504fa9dba3ed0f0688cdf14cb7b0cba) | ||||
| Comment by Githook User [ 20/Jun/19 ] | ||||
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: | ||||
| Comment by Judah Schvimer [ 04/Jun/19 ] | ||||
|
testingSnapshotBehaviorInIsolation prevents the stableTimestamp from advancing which prevents majority reads from being available. This is correct expected behavior. I'd recommend adding a failpoint to work around this behavior or talking to the storage team about how to maintain the coverage of that test with your change. | ||||
| Comment by Misha Tyulenev [ 04/Jun/19 ] | ||||
|
I built the testcase based on the failures in the patch run, and this is the part of this test | ||||
| Comment by Judah Schvimer [ 04/Jun/19 ] | ||||
|
Why are you turning on testingSnapshotBehaviorInIsolation? | ||||
| Comment by Misha Tyulenev [ 04/Jun/19 ] | ||||
|
judah.schvimer attached the git patch and the test.js. I run it in the no_passthrough suite. server-40535.diff | ||||
| Comment by Judah Schvimer [ 03/Jun/19 ] | ||||
|
misha.tyulenev, I tried reproducing this issue with the following but it did not reproduce. Can you please provide a repro script:
| ||||
| Comment by Misha Tyulenev [ 29/May/19 ] | ||||
|
judah.schvimer to fix the issue on the ticket I need to change the read concern to be RC majority here. Once I make this change the
fails to start and initiate as it hangs waiting for RC majority | ||||
| Comment by Misha Tyulenev [ 24/May/19 ] | ||||
|
Over to repl team to investigate why readConcern majority reads are not possible once the transition to primary completed on a one node RS. | ||||
| Comment by Misha Tyulenev [ 29/Apr/19 ] | ||||
|
ankur.raina I will be working on this fix and plan to push the changes to 3.6 within two weeks. | ||||
| Comment by Misha Tyulenev [ 18/Apr/19 ] | ||||
|
renctan you are correct - the keyGenerator needs readConcern local or it will get stuck when trying to check for keys. | ||||
| Comment by Randolph Tan [ 08/Apr/19 ] | ||||
|
I took a look at the code again, and it made me realize that we want read concern majority for KeysCollectionCache but local for the KeyGenerator. Currently, they both share the same opCtx so we should be careful not being contaminating the opCtx while setting the read concern. |