[SERVER-48137] Use DBLock instead of AutoGetDB in index builds Created: 12/May/20  Updated: 29/Oct/23  Resolved: 14/May/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.4.0-rc7, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Execution Team 2020-05-18
Participants:
Linked BF Score: 30

 Description   

AutoGetDB can throw with a StaleDBVersion error.

Index builds rely on waiting until critical section to let exceptions throw. Anything thrown before that will cause the server to crash.

Use a DBLock then manually check the dbVersion under this exception handler.



 Comments   
Comment by Githook User [ 15/May/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-48137 Use DBLock instead of AutoGetDB in index builds

Index builds do not handle exceptions until a final critical section. AutoGetDB can throw
due to stale database or shard versions, which is checked explicitly when an index build completes.

(cherry picked from commit 5dc21b311ba95877eae491f2f3422402bddd8ee0)
Branch: v4.4
https://github.com/mongodb/mongo/commit/cec3439a2d03fc94a4eec2903c76fe44e4c5b69d

Comment by Githook User [ 14/May/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-48137 Use DBLock instead of AutoGetDB in index builds

Index builds do not handle exceptions until a final critical section. AutoGetDB can throw
due to stale database or shard versions, which is checked explicitly when an index build completes.
Branch: master
https://github.com/mongodb/mongo/commit/5dc21b311ba95877eae491f2f3422402bddd8ee0

Comment by Louis Williams [ 13/May/20 ]

Great, that makes sense. I'll switch to using DBLock to bypass the dbversion checks until the explicit call when we can handle exceptions.

Comment by Kaloian Manassiev [ 13/May/20 ]

Ah, this actually makes more sense now. The reason why we check for DB/Shard version changes as part of the index build is to discover changes to the index composition and abort the index build and force it to start over (at least that's how I understand it - jack.mulrow?). So both the setting of the DB/Shard version must stay in some form. I am not familiar with the concurrency rules around the index build critical section, but lock cannot be dropped and re-taken after this check, then it should be fine to not use AutoGetDb if it causes exception to be thrown at an inopportune time.

A cleaner solution would be to just handle the exception.

Comment by Louis Williams [ 13/May/20 ]

kaloian.manassiev, it looks like we copy the dbVersion and shardVersion from the client OperationContext to the index builder thread's operation context. This was added as part of SERVER-44719. But that was done before SERVER-46122 which allowed the 'drop' command to abort in-progress index builds. Do you think the dbversion initialization is not necessary anymore?

Comment by Kaloian Manassiev [ 13/May/20 ]

I don't know why this check is there and more importantly why does it have a DBVersion on the OperationContext at all if it is an internal thread. There must be something higher-up in that code path, which is setting it, or it is not always an internal thread?

The database version can change as a result of movePrimary or dropDatabase, which should technically interrupt the index build, but this should happen through the act of dropping the collection, not by doing version checking internally. jack.mulrow, is this something new perhaps due to the "Consistent Indexes" project?

Comment by Louis Williams [ 12/May/20 ]

This is happening on an internal thread that is started on behalf of a user "createIndex" operation. There is a manual check for the dbVersion in the critical section that has been there for a very long time, so I assume it's still required?

Comment by Kaloian Manassiev [ 12/May/20 ]

Is this error thrown from an internal index build thread or it happens on the user thread? Because internal threads are not supposed to have a DbVersion on the OpContext, so they should never throw this exception. This might be similar to the issue in SERVER-48128.

Generated at Thu Feb 08 05:16:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.