[SERVER-25168] Foreground index build blocks all R/W on ALL database on a sharded cluster with secondaryPreferred read preference Created: 20/Jul/16  Updated: 22/Jul/16  Resolved: 21/Jul/16

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Replication
Affects Version/s: 3.0.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alessio Checcucci Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-6883 index creation on secondaries need no... Closed
Related
related to SERVER-20328 Allow secondary reads while applying ... Closed
Participants:

 Description   

One of the users of our MongoDB 3.0.12 6-shards sharded cluster has issued a foreground index build on its own database (pride_archive_ms), after the operation completed on the primary member of each replica set (shard), it got replicated to the secondaries as expected.

I was expecting the database where the index build still is in progress to be blocked for read and writes, but actually read/writes to ALL databases are blocked when using secondary or secondaryPreferred read preference.

Basically:

  • if I try to connect directly to the secondary member (admin database) with the mongo shell the session hangs before displaying the prompt getting blocked on this call:

    mongo --username root --password YYYYYYYY admin --port 27018
    MongoDB shell version: 3.0.12
    connecting to: 127.0.0.1:27018/admin
     
    [ .... ]
    getsockname(3, {sa_family=AF_INET, sin_port=htons(50864), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
    sendto(3, "<\0\0\0\0\0\0\0\0\0\0\0\324\7\0\0\0\0\0\0admin.$cmd\0\0"..., 60, MSG_NOSIGNAL, NULL, 0) = 60
    recvfrom(3, "N\0\0\0\257Z(\0\0\0\0\0\1\0\0\0", 16, MSG_NOSIGNAL, NULL, NULL) = 16
    recvfrom(3, "\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0*\0\0\0\2you\0\20\0\0"..., 62, MSG_NOSIGNAL, NULL, NULL) = 62
    futex(0x22b9fc8, FUTEX_WAKE_PRIVATE, 1) = 1
    futex(0x22b9fc8, FUTEX_WAKE_PRIVATE, 1) = 1
    futex(0x22b9fc8, FUTEX_WAKE_PRIVATE, 1) = 1
    sendto(3, ">\0\0\0\1\0\0\0\0\0\0\0\324\7\0\0\0\0\0\0admin.$cmd\0\0"..., 62, MSG_NOSIGNAL, NULL, 0) = 62
    recvfrom(3, "\277\1\0\0\260Z(\0\1\0\0\0\1\0\0\0", 16, MSG_NOSIGNAL, NULL, NULL) = 16
    recvfrom(3, "\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\233\1\0\0\2setName"..., 431, MSG_NOSIGNAL, NULL, NULL) = 431
    futex(0x22b9fc8, FUTEX_WAKE_PRIVATE, 1) = 1
    open("/dev/urandom", O_RDONLY)          = 4
    read(4, "\177\241\0316?s\361C\310HMd\205\300BJo\331'\347\324\356\33\225\24\23\265\314A\27D\353"..., 8191) = 8191
    close(4)                                = 0
    sendto(3, "\220\0\0\0\2\0\0\0\0\0\0\0\324\7\0\0\0\0\0\0admin.$cmd\0\0"..., 144, MSG_NOSIGNAL, NULL, 0) = 144
    recvfrom(3, ^C <unfinished ...>
    

  • Connecting via the Mongo router I can properly authenticate, but any query issued with secondary or secondaryPreferred read preference gets blocked on ANY database (please note, the database is different from the one the index is built on):

    mongo --host mongos-hxvm-001 --username ddi_user --password XXXX --authenticationDatabase admin ddi_db
    MongoDB shell version: 3.0.12
    connecting to: mongos-hxvm-001:27017/ddi_db
    ddi_db@  - undefined> db.getMongo().setReadPref("primary")
    ddi_db@  - undefined> db.datasets.dataset.count()
    78234
    ddi_db@  - undefined> db.getMongo().setReadPref("secondaryPreferred")
    ddi_db@  - undefined> db.datasets.dataset.count()
    

MongoDB docs claim that:

Any operation that requires a read or write lock on all databases (e.g. listDatabases) will wait for the foreground index build to complete.

But I don't see how it matches the case. In fact, simple find() on distinct databases from the one the index is being built on can't carry on.

It seems like:

  • the index build is blocking R/W access to ALL the databases on the secondaries (during the index build)
  • the mongo router is unable to detect that the secondary can't answer the query on ANY database and should steer it to the primary
  • writes using _ { w: majority }

    _ are also appended on ANY database

Could you please comment if this is the expected behaviour? Our system is suffering of availability problems because of this.

Thanks a lot



 Comments   
Comment by Kelsey Schubert [ 22/Jul/16 ]

Hi alessio.checcucci@gmail.com,

You can find additional details about concurrency and locking in MongoDB at https://docs.mongodb.com/manual/faq/concurrency/. Specifically, the behavior you have observed is documented here.

In MongoDB 3.4, we will support a new eligibility requirement for server selection based on the staleness of the secondaries, SERVER-24421. For more information about this feature, please read our recently published driver spec. This new feature will significantly limit the disruption you observe when foreground index builds are being replicated to secondaries.

There is currently an open ticket to configure whether a default index build is foreground or background - please feel free to vote for SERVER-20960 and watch it for updates.

Kind regards,
Thomas

Comment by Alessio Checcucci [ 22/Jul/16 ]

Hi Thomas,
thank you very much for the reply, I see that during the apply of the foreground index build from the opLog the secondaries are read-locked. Sorry, but I couldn't find any reference to that behaviour in the documentation. Our cluster has multi-TB collections (it's multi-tenant and multi-DC, too) and you may imagine that this is generating a considerable disruption.

This triggers the second part of my question, given that the secondaries are not able to serve any query during the build which spans many hours, shouldn't the Mongo router detect this condition and serve from the primary when secondaryPreferred is specified as read preference (this is what we recommend the customers to use)? If the main aim of the secondary read-lock is prevent serving stale data, the primary would be the perfect candidate to answer any query and would prevent availability issues.

Last, but not least, is there any way to alter the default index build method to background? I know the answer is "currently not", but I wanted to double check.

Thanks a lot
Alessio

Comment by Kelsey Schubert [ 21/Jul/16 ]

Hi alessio.checcucci@gmail.com,

Thank you for reporting this behavior. It is expected that operations such as a foreground index build will block reads on a secondary. Replication holds a lock when applying batches of oplog entries to ensure no readers can consume inconsistent data.

We have open tickets, which would improve this behavior, SERVER-6883 and SERVER-20328. Please feel free to vote for these tickets and watch them for updates.

Kind regards,
Thomas

Comment by Alessio Checcucci [ 21/Jul/16 ]

I have reproduced (using MongoDB 3.2.8) the behaviour I reported in a very simple manner:

  • Created a very simple shared cluster with:
    • 1x Mongos router
    • 1x Mongo Config server
    • 1 shard (3-way Replica Set)

Test setup:

  • created a "test" database
  • populated a "test.test" collection with 10mln documents containing random data
  • triggered a foreground index build on the collection
  • waited for the operation to complete on the primary and being propagated to the secondaries

Tests after the index build on the secondaries was triggered:

  • I couldn't directly connect to any database on the secondaries during the build, the Mongo shell simply hangs until the operation is complete. As reported it seems that the authentication phase is locked (authenticationDatabase is admin). On the contrary: I can connect to the primary and use any database apart from "test" while the index build is ongoing on the primary itself.
  • I could connect to the "test" database via the Mongo router and run any query with primary and primaryPreferred read preference, but using secondaryPreferred blocks the operation until the index build on secondaries is completed. Similarly, during the foreground index build on the primary, using primaryPreferred read preference hangs the operation.

Could you please shed some light whether this is the behaviour is should expect? I supposed to:

  • Be able to connect to all the databases on the secondaries, excluding "test". Does authentication required a "read or write lock on all databases"? According to the documentation "User authentication requires a read lock on the admin database" only.
  • Be able to use the secondaryPreferred read preference while connected to the "test" database, let the Mongo router detect the blocked secondaries and steer the operation to the primary.

Thanks in advance for any help you can provide

Generated at Thu Feb 08 04:08:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.