While an index build is in progress, we take any concurrent writes and insert them into the side writes table. Once the collection scan and bulk load, we drain the writes from the side writes table into the index. However, it is possible for these concurrent writes to also be seen by the collection scan. In this case, we end up inserting the same index key twice: once during collection scan/bulk load, and once while draining side writes. This is not a correctness issue; for non-unique indexes we simply ignore that the key already exists, and for unique indexes we write down the key as a potential duplicate. For the unique case, we then check all of these false "duplicates" under a MODE_X collection lock before committing the index. This can lead to a loss of availability if the index being built is on a large collection with many writes during the collection scan.
A couple potential solutions:
- Initially check duplicates under a MODE_IX lock and then later use a MODE_X lock, similarly to what we do for draining the side writes table
- When draining the side writes table for a unique index, if the entire key (including the record id, as opposed to only the key prefix) already exists in the table, do not count as a potential duplicate