[SERVER-39323] Don't consider in-progress indexes when deciding whether or not to build an index under an intent lock Created: 31/Jan/19  Updated: 29/Oct/23  Resolved: 13/May/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.1.8
Fix Version/s: 4.1.12

Type: Bug Priority: Major - P3
Reporter: Shane Harvey Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-40926 createIndexes should return IndexBuil... Closed
depends on SERVER-40927 createIndexes should wait for indexes... Closed
Gantt Dependency
has to be done after SERVER-37644 Make the createIndexes command join a... Closed
Problem/Incident
causes PYTHON-1734 Test failure - test_legacy_api.TestLe... Backlog
Related
is related to DRIVERS-644 Hybrid index build Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Storage NYC 2019-04-08, Storage NYC 2019-04-22, Storage NYC 2019-05-06
Participants:

 Description   

I started seeing two Pymongo test failures on 4.2-latest that I think are related to the hybrid index build project. Both tests pass on 4.1.7 but fail on v4.1.7-176-gd807556. They are listed in PYTHON-1734 but I will put them here too along with how to run each case individually:

$ python3.7 setup.py test -s test.test_legacy_api.TestLegacy.test_ensure_index_threaded
running test
...
test_ensure_index_threaded (test.test_legacy_api.TestLegacy) ... FAIL
 
======================================================================
FAIL: test_ensure_index_threaded (test.test_legacy_api.TestLegacy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/test/test_legacy_api.py", line 1296, in test_ensure_index_threaded
    self.assertEqual(index_doc, first)
AssertionError: {'_id[250 chars]ion'}, 'foo2_1': {'v': 2, 'key': [('foo2', 1)][43 chars]on'}} != {'_id[250 chars]ion'}}
  {'_id_': {'key': [('_id', 1)],
            'ns': 'pymongo_test.threaded_index_creation',
            'v': 2},
   'foo0_1': {'key': [('foo0', 1)],
              'ns': 'pymongo_test.threaded_index_creation',
              'v': 2},
   'foo1_1': {'key': [('foo1', 1)],
              'ns': 'pymongo_test.threaded_index_creation',
-             'v': 2},
-  'foo2_1': {'key': [('foo2', 1)],
-             'ns': 'pymongo_test.threaded_index_creation',
              'v': 2}}
 
----------------------------------------------------------------------
Ran 1 test in 0.334s
 
FAILED (failures=1)

The source for this test is here: https://github.com/mongodb/mongo-python-driver/blob/3.7.2/test/test_legacy_api.py#L1266-L1293
From the output, it looks like the test expects that an index created with createIndex should show up immediately in the listIndexes output.

And the second failing test:

python3.7 setup.py test -s test.test_legacy_api.TestLegacy.test_ensure_unique_index_threaded
running test
...
test_ensure_unique_index_threaded (test.test_legacy_api.TestLegacy) ... FAIL
 
======================================================================
FAIL: test_ensure_unique_index_threaded (test.test_legacy_api.TestLegacy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/shane/git/mongo-python-driver/test/test_legacy_api.py", line 1355, in test_ensure_unique_index_threaded
    self.assertEqual(10001, coll.count())
AssertionError: 10001 != 10018
 
----------------------------------------------------------------------
Ran 1 test in 0.493s
 
FAILED (failures=1)

The source for this test is here: https://github.com/mongodb/mongo-python-driver/blob/3.7.2/test/test_legacy_api.py#L1328-L1354
This test creates the same unique index in multiple threads and then every thread attempts to insert a document with the same value for the unique field (and only one insert should succeed). It looks even though the createIndex succeeds (in some threads) the index is never actually created.

These failures are reproducible on standalones, replica sets, and sharded clusters.



 Comments   
Comment by Dianna Hohensee (Inactive) [ 13/May/19 ]

SERVER-40926 and SERVER-40927 have been completed. A createIndexes request for an index that is already in the process of being built will now wait for that index build to finish.

Comment by Dianna Hohensee (Inactive) [ 01/May/19 ]

I've created SERVER-40926 to return the error, which I think is v4.2 required; and SERVER-40927 to add the waiting functionality as v4.2 desired. We'll try to get the latter in, but it might be after our code complete date and then squeaked in as a bug fix

Comment by Shane Harvey [ 30/Apr/19 ]

Mongomirror would need to write a retry loop as you say. If we decided to do the same in gridfs (to maintain backward compatibility) all drivers would need to implement the retry loop too. I still think returning IndexBuildAlreadyInProgress would be better than returning OK because at least the error is explicit. Even better would be to implement this retry/join/wait logic in the server and have no downstream changes at all.

Comment by Dianna Hohensee (Inactive) [ 30/Apr/19 ]

Thanks for the additional info. It does sound like we'll need to do something about it.

shane.harvey, if we were to return an error IndexBuildAlreadyInProgress instead of OK, would that require a lot of downstream work to handle the error correctly? I'd be concerned about callers hammering the server retrying the createIndexes cmd in a simple loop. Callers would have to periodically re-run the cmd if the index build takes a significant amount of time. Index builds say on empty collections should be really quick and retrying the cmd in such a small window should be okay.

Comment by Dianna Hohensee (Inactive) [ 22/Apr/19 ]

shane.harvey, what does MongoMirror use the createIndexes cmd for? I thought MongoMirror used applyOps to apply createIndexes oplog entries from one replica set to a new one: I believe that path still uses foreground index builds.

Comment by Shane Harvey [ 22/Apr/19 ]

I still think it is a bug to reply with ok:1 before the index is actually built, should I file a new ticket?

Ideally, the server would join the second createIndexes thread with the in progress build. There are real world scenario where the same index may be created by multiple threads.

For example, I believe mongomirror could be impacted by this bug. Currently it retries createIndexes commands when they fail with network (or other transient) errors. With the buggy behavior in 4.1, a retried createIndexes may "succeed" immediately even though the indexes are not built. This is a problem for two reasons:

  1. the indexes may eventually fail and thus will never be created.
  2. the index builds could still be running when the application cuts over to Atlas which will have adverse performance impacts.

Another practical situation where multiple threads may attempt to build the same index is in GridFS. Before the first write operation gridfs will attempt to build two indexes if they do not already exist. GridFS depends on a unique index on files_id to enforce uniqueness. The buggy behavior in 4.1 can cause some threads to violate the unique index on chunks and cause the actual index build to fail with a DuplicateKey error.

Comment by Dianna Hohensee (Inactive) [ 22/Apr/19 ]

I'm closing this without changing behavior – I should have filed a separate ticket for the refactor, on reflection, sorry about that.

We concluded that there was no good story for what the user would do with an IndexBuildAlreadyInProgress error instead of an OK response. They would not be able to retry the command until successful because that would simply overload the server with requests. Also, we did not think it was a reasonable scenario for a user to be running an index build on one thread, then start up another thread to run the same index build in parallel.

Comment by Githook User [ 22/Apr/19 ]

Author:

{'name': 'Dianna', 'username': 'DiannaHohensee', 'email': 'dianna.hohensee@10gen.com'}

Message: SERVER-39323 refactor index spec validation code to make it more modularized and flexible.

  • Remove a "$freelist" collection reference, which was removed with MMAPv1
  • Modularize a {buildIndexes:false}

    check that causes a IndexAlreadyExists error to be returned

  • Split IndexCatalog::removeExistingIndexes into removeExistingIndexes and removeExistingIndexesNoChecks,
    to be more explicit about the internal behavior.
    Branch: master
    https://github.com/mongodb/mongo/commit/75f5f054406ae2a9805960872d5bae49561d4d82
Comment by Dianna Hohensee (Inactive) [ 05/Apr/19 ]

Eventually for the simul project's join functionality in v4.4:
1) need a way to only filter out ready indexes – so having a check to only filter the ready indexes will be needed anyway
2) then we would need checks to find out whether the remaining indexes match any existing builds or conflict with any existing builds. The IndexBuildsCoordinator will have a registry of in-progress builds. I think it would still be easiest to use the IndexCatalog to check whether the remaining indexes conflict or are already being built, then go to the IndexBuildsCoordinator's registry to join any in-progress builds if the requirements are met – like a request can only match to a single in-progress build, not multiple. So I think the simul project would also benefit from code we write for v4.2, but we should definitely keep the IndexBuildsCoordinator in mind when writing the v4.2 code, so it's easy later: for example, making it easy to distinguish indexes that conflict with existing/being built indexes and indexes that match and are already being built.

Comment by Dianna Hohensee (Inactive) [ 05/Apr/19 ]

The proposal for v4.2:
1) The createIndexes cmd will only silently remove already existing (ready) indexes from user requests. If none remain to be built, then the cmd returns OK
2) We will add a second check for remaining indexes after step 1) to discover whether any are already being built. If any are being built already, we will return a IndexBuildAlreadyInProgress error with a msg about which indexes are already being built.
3) The MultiIndexBlock::init code will also have to differentiate between already built and being built indexes in order to return IndexAlreadyExists and IndexBuildAlreadyInProgress, respectively and appropriately.

Comment by Dianna Hohensee (Inactive) [ 05/Apr/19 ]

v4.0 behavior:
1) we still have foreground index builds
2) the createIndexes cmd will check for index already existing/being built under an intent lock – this was backported or we originally committed it in v4.0. We return OK if the request's indexes are found to already exist or are in the process of being built.
3) if there were a race between createIndexes cmd requests during relocking after the intent lock check mentioned in 2), then the code will reach the MultiIndexBlock::init code and return an IndexAlreadyExists error.

Comment by Eric Milkie [ 04/Apr/19 ]

Ah, I got it now. When the description says that createIndex succeeds, it is referring to the second invocation of createIndex for the same index. Subsequent to that, the first invocation eventually fails, leaving the index unbuilt forever.

Comment by Shane Harvey [ 04/Apr/19 ]

For the first test (test_ensure_index_threaded) the indexes are created eventually. For the second test (test_ensure_unique_index_threaded) the index is never created because the createIndex that's actually waiting for the index build fails with a DuplicateKeyError.

Comment by Eric Milkie [ 04/Apr/19 ]

Just to be clear, shane.harvey in your description you say that the index is "never actually created", but we now believe it is simply still building and would actually be created eventually, correct?

Comment by Eric Milkie [ 20/Feb/19 ]

Marking this as 4.1 Required and we'll reconsider next steps after the dependent ticket is closed.

Comment by Louis Williams [ 13/Feb/19 ]

On second through, I don't know if this is as simple as updating IndexCatalogImpl::_doesSpecConflictWithExisting() to pass "false" for "includeUnfinishedIndexes". Indexes are only ever created with an exclusive collection lock. With this change, when we first check with an IX lock and skip in-progress indexes, we will queue up an X exclusive lock and wait. If the in-progress index yields and the waiting thread gets an X lock, it needs to see that there is an index build in-progress, otherwise it will start building the same index.

If this second check sees the in-progress index, it will return early because an identical index exists, in-progress. This would also not solve this problem. I think for now, we should wait to see the outcome of SERVER-37644. Otherwise we may need to wait for all other background index builds to finish, which would mean you can't build multiple background indexes simultaneously.

Comment by Louis Williams [ 13/Feb/19 ]

That makes sense to me. Thanks for reporting this. I'll update the description to match the desired behavior: Don't consider in-progress indexes when deciding whether or not to build an index under an intent lock. 

Comment by Shane Harvey [ 12/Feb/19 ]

SERVER-36961 was backported to 4.0.3 and 3.6.9 but these pymongo tests pass on those server versions with both background and foreground indexes.

I think it's a bug for createIndexes to return ok:1 when the index it refers to is still in progress. So far I'm only seeing this bug on 4.1.7-latest.

Comment by Louis Williams [ 11/Feb/19 ]

What you noticed is because of SERVER-36961: we take a weak lock and return successfully if an identical index exists, even if it is in-progress. And, like you said, the "createIndexes" command will block until the index build completes. So while the test does wait for all threads to exit, the issue is that not all threads complete with the same index_information. Some threads finish early while other threads are still building the indexes.

SERVER-37644 will likely be completed at some point before the 4.2 release, but I'm not sure what the timeline is.

The change in SERVER-36961 was backported to 4.0 and 3.6, so this same behavior exists, but only for background index builds. SERVER-37644 will definitely never be backported. If the "joinable" behavior is important, it should be possible to just exclude in-progress index builds when first checking whether or not an index exists with an intent lock. I believe that will restore the behavior that this test relied on. Is SERVER-37644 sufficient or should this be backported to 4.0 and 3.6?

Comment by Shane Harvey [ 11/Feb/19 ]

Even if the index is built in the background I expect that the createIndex command should block until the index is created, right? From these test failures it looks like a concurrent index build on the same index returns ok:1 immediately even though the index build is still in progress. Will this go away once SERVER-37644 is implemented?

Comment by Louis Williams [ 11/Feb/19 ]

shane.harvey I just noticed that there is a joinall used before checking the indexes, so I'll see what might be going on there. The index build threads shouldn't return until they are completed.

Comment by Louis Williams [ 11/Feb/19 ]

shane.harvey as of SERVER-37270, all index builds run in the background by default. By creating a foreground index, this test depended on the exclusive locking nature of foreground indexes. The reader thread would block until the index builds finished. Background index builds yield locks so non-conflicting requests can still be processed. This test is failing because multiple index builds are started in background threads, and the reader sees a state where these indexes are not ready.

My suggestions would be either:

  • Rewrite the test with retries to assert that the index builds "eventually" complete.
  • Use the enableHybridIndexBuilds=false setParameter to disable this non-blocking behavior. I would advise against doing this because of the reduced test coverage.

If there is any work we can do on our side, let me know, otherwise I will close this ticket.

Generated at Thu Feb 08 04:51:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.