[SERVER-10456] get cursor logic used to find docs to clone (in migration) is not same as in removeRange Created: 07/Aug/13  Updated: 13/Sep/19  Resolved: 02/Sep/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.6.15, 4.0.13, 4.2.1, 4.3.1

Type: Bug Priority: Major - P3
Reporter: Greg Studer Assignee: Kevin Pulo
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
is duplicated by SERVER-18226 shardCollection command can orphan do... Closed
Related
related to SERVER-18226 shardCollection command can orphan do... Closed
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Requested:
v4.2, v4.0, v3.6
Sprint: Sharding 2018-09-24, Sharding 2018-10-08, Sharding 2018-10-22, Sharding 2018-12-03, Sharding 2018-12-17, Sharding 2018-12-31, Sharding 2019-01-14, Sharding 2019-01-28, Sharding 2019-02-11, Sharding 2019-02-25, Sharding 2019-03-11, Sharding 2019-03-25, Sharding 2019-09-09
Participants:

 Description   

In storeClonedLocs(), we use nsd->findIndexByPrefix(), in removeRange() we use nsd->findIndexByKeyPattern(). This may be a problem if the user has multiple indexes prefixed by the shard key and some of the indexes are sparse.



 Comments   
Comment by Githook User [ 13/Sep/19 ]

Author:

{'name': 'Kevin Pulo', 'username': 'devkev', 'email': 'kevin.pulo@mongodb.com'}

Message: SERVER-10456 do not use sparse indexes as shard key indexes

(cherry picked from commit 0e6b9119f4d3a0fae681ed28220bc55ed1469f81)
Branch: v4.0
https://github.com/mongodb/mongo/commit/3c0aa2b8e37c813035a8e07f28b0decf4ce2a74f

Comment by Githook User [ 13/Sep/19 ]

Author:

{'username': 'devkev', 'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo'}

Message: SERVER-10456 do not use sparse indexes as shard key indexes

(cherry picked from commit 0e6b9119f4d3a0fae681ed28220bc55ed1469f81)
Branch: v4.2
https://github.com/mongodb/mongo/commit/11eaee8aace14a3b8ba1d7c3ab462e8badc8ffee

Comment by Githook User [ 13/Sep/19 ]

Author:

{'username': 'devkev', 'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo'}

Message: SERVER-10456 do not use sparse indexes as shard key indexes

(cherry picked from commit 0e6b9119f4d3a0fae681ed28220bc55ed1469f81)
Branch: v3.6
https://github.com/mongodb/mongo/commit/d4cadf356110abd83334bc4a1ba7650266ae4673

Comment by Githook User [ 02/Sep/19 ]

Author:

{'name': 'Kevin Pulo', 'username': 'devkev', 'email': 'kevin.pulo@mongodb.com'}

Message: SERVER-10456 do not use sparse indexes as shard key indexes
Branch: master
https://github.com/mongodb/mongo/commit/0e6b9119f4d3a0fae681ed28220bc55ed1469f81

Comment by Kevin Pulo [ 02/Sep/19 ]

Confirmed that this patch:

diff --git a/src/mongo/db/catalog/index_catalog_impl.cpp b/src/mongo/db/catalog/index_catalog_impl.cpp
index f216a83dea5..341800c35d8 100644
--- a/src/mongo/db/catalog/index_catalog_impl.cpp
+++ b/src/mongo/db/catalog/index_catalog_impl.cpp
@@ -1103,7 +1103,7 @@ const IndexDescriptor* IndexCatalogImpl::findShardKeyPrefixedIndex(OperationCont
         const IndexDescriptor* desc = ii->next()->descriptor();
         bool hasSimpleCollation = desc->infoObj().getObjectField("collation").isEmpty();
 
-        if (desc->isPartial())
+        if (desc->isPartial() || desc->isSparse())
             continue;
 
         if (!shardKey.isPrefixOf(desc->keyPattern(), SimpleBSONElementComparator::kInstance))

fixes the problem, producing this result:

$ mlaunch init --sharded 1 --replicaset --nodes 1 --port 12345 --binarypath .
launching: "./mongod" on port 12346
launching: config server on port 12347
replica set 'configRepl' initialized.
replica set 'shard01' initialized.
launching: ./mongos on port 12345
adding shards. can take up to 30 seconds...
$ ./mongo --port 12345
MongoDB shell version v0.0.0
connecting to: mongodb://127.0.0.1:12345/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("b00e2241-7629-4296-9994-b3bf704858a7") }
MongoDB server version: 0.0.0
connection to 127.0.0.1:12345, version 0.0.0
db: test
Server has startup warnings:
2019-09-02T09:28:07.878+0000 I  CONTROL  [main]
2019-09-02T09:28:07.878+0000 I  CONTROL  [main] ** WARNING: Access control is not enabled for the database.
2019-09-02T09:28:07.878+0000 I  CONTROL  [main] **          Read and write access to data and configuration is unrestricted.
2019-09-02T09:28:07.878+0000 I  CONTROL  [main]
2019-09-02T09:28:07.878+0000 I  CONTROL  [main] ** WARNING: This server is bound to localhost.
2019-09-02T09:28:07.878+0000 I  CONTROL  [main] **          Remote systems will be unable to connect to this server.
2019-09-02T09:28:07.878+0000 I  CONTROL  [main] **          Start the server with --bind_ip <address> to specify which IP
2019-09-02T09:28:07.878+0000 I  CONTROL  [main] **          addresses it should serve responses from, or with --bind_ip_all to
2019-09-02T09:28:07.878+0000 I  CONTROL  [main] **          bind to all interfaces. If this behavior is desired, start the
2019-09-02T09:28:07.878+0000 I  CONTROL  [main] **          server with --bind_ip 127.0.0.1 to disable this warning.
2019-09-02T09:28:07.878+0000 I  CONTROL  [main]
mongos> db.test.insert({a:1})
WriteResult({ "nInserted" : 1 })
mongos> db.test.ensureIndex({a:1,b:1},{sparse:true})
{
        "raw" : {
                "shard01/localhost:12346" : {
                        "createdCollectionAutomatically" : false,
                        "numIndexesBefore" : 1,
                        "numIndexesAfter" : 2,
                        "commitQuorum" : 1,
                        "ok" : 1
                }
        },
        "ok" : 1,
        "operationTime" : Timestamp(1567416511, 3),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1567416511, 3),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> db.test.ensureIndex({a:1})
{
        "raw" : {
                "shard01/localhost:12346" : {
                        "createdCollectionAutomatically" : false,
                        "numIndexesBefore" : 2,
                        "numIndexesAfter" : 3,
                        "commitQuorum" : 1,
                        "ok" : 1
                }
        },
        "ok" : 1,
        "operationTime" : Timestamp(1567416526, 3),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1567416526, 3),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> db.enableSharding()
{
        "ok" : 1,
        "operationTime" : Timestamp(1567416532, 3),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1567416532, 3),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> db.test.shardCollection({a:1})
{
        "collectionsharded" : "test.test",
        "collectionUUID" : UUID("7abf2db8-5301-47d6-b4fa-9a140232a1ba"),
        "ok" : 1,
        "operationTime" : Timestamp(1567416540, 11),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1567416540, 11),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> db.test.find()
{ "_id" : ObjectId("5d6ce0ba6cf65212382ee13c"), "a" : 1 }
mongos> db.test.find().itcount()
1
mongos> db.test.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_"
        },
        {
                "v" : 2,
                "key" : {
                        "a" : 1,
                        "b" : 1
                },
                "name" : "a_1_b_1",
                "sparse" : true
        },
        {
                "v" : 2,
                "key" : {
                        "a" : 1
                },
                "name" : "a_1"
        }
]
mongos> db.test.dropIndex({a:1})
{
        "raw" : {
                "shard01/localhost:12346" : {
                        "nIndexesWas" : 3,
                        "ok" : 1
                }
        },
        "ok" : 1,
        "operationTime" : Timestamp(1567416565, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1567416565, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> db.test.find()
{ "_id" : ObjectId("5d6ce0ba6cf65212382ee13c"), "a" : 1 }
mongos> db.test.find().itcount()
1
mongos>

Comment by Kevin Pulo [ 01/Aug/19 ]

I think this is an issue. In IndexCatalogImpl::findShardKeyPrefixedIndex() it checks isPartial() (and isMultiKey()), but not isSparse(). And indeed the repro from SERVER-18226 still happens:

kev@devkev-2:~$ mlaunch init --sharded 1 --replicaset --nodes 1 --port 12345 --binarypath /omni/4.2.0-rc2/bin
launching: "/omni/4.2.0-rc2/bin/mongod" on port 12346
launching: config server on port 12347
replica set 'configRepl' initialized.
replica set 'shard01' initialized.
launching: /omni/4.2.0-rc2/bin/mongos on port 12345
adding shards. can take up to 30 seconds...
kev@devkev-2:~$ mongo --port 12345
MongoDB shell version v4.0.0
connecting to: mongodb://127.0.0.1:12345/
MongoDB server version: 4.2.0-rc2
WARNING: shell and server versions do not match
connection to 127.0.0.1:12345, version 4.2.0-rc2
db: test
session { "id" : UUID("837de8bb-74d5-4327-9d5a-ff67d7fbbbb8") }
Server has startup warnings:
2019-07-10T12:31:50.978+0000 I  CONTROL  [main]
2019-07-10T12:31:50.978+0000 I  CONTROL  [main] ** WARNING: Access control is not enabled for the database.
2019-07-10T12:31:50.978+0000 I  CONTROL  [main] **          Read and write access to data and configuration is unrestricted.
2019-07-10T12:31:50.978+0000 I  CONTROL  [main]
2019-07-10T12:31:50.978+0000 I  CONTROL  [main] ** WARNING: This server is bound to localhost.
2019-07-10T12:31:50.978+0000 I  CONTROL  [main] **          Remote systems will be unable to connect to this server.
2019-07-10T12:31:50.978+0000 I  CONTROL  [main] **          Start the server with --bind_ip <address> to specify which IP
2019-07-10T12:31:50.978+0000 I  CONTROL  [main] **          addresses it should serve responses from, or with --bind_ip_all to
2019-07-10T12:31:50.978+0000 I  CONTROL  [main] **          bind to all interfaces. If this behavior is desired, start the
2019-07-10T12:31:50.978+0000 I  CONTROL  [main] **          server with --bind_ip 127.0.0.1 to disable this warning.
2019-07-10T12:31:50.978+0000 I  CONTROL  [main]
mongos> db.test.insert({})
WriteResult({ "nInserted" : 1 })
mongos> db.test.ensureIndex({a:1,b:1},{sparse:true})
{
        "raw" : {
                "shard01/localhost:12346" : {
                        "createdCollectionAutomatically" : false,
                        "numIndexesBefore" : 1,
                        "numIndexesAfter" : 2,
                        "ok" : 1
                }
        },
        "ok" : 1,
        "operationTime" : Timestamp(1562761966, 2),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1562761966, 2),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> db.test.ensureIndex({a:1})
{
        "raw" : {
                "shard01/localhost:12346" : {
                        "createdCollectionAutomatically" : false,
                        "numIndexesBefore" : 2,
                        "numIndexesAfter" : 3,
                        "ok" : 1
                }
        },
        "ok" : 1,
        "operationTime" : Timestamp(1562764833, 2),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1562764833, 2),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> db.enableSharding()
{
        "ok" : 1,
        "operationTime" : Timestamp(1562764867, 3),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1562764867, 3),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> db.test.shardCollection({a:1})
{
        "collectionsharded" : "test.test",
        "collectionUUID" : UUID("9f7d9441-c5e5-41fa-8d0e-ab171d7a106f"),
        "ok" : 1,
        "operationTime" : Timestamp(1562764878, 9),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1562764878, 9),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> db.test.find()
mongos> db.test.find().itcount()
0
mongos> db.test.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "test.test"
        },
        {
                "v" : 2,
                "key" : {
                        "a" : 1,
                        "b" : 1
                },
                "name" : "a_1_b_1",
                "ns" : "test.test",
                "sparse" : true
        },
        {
                "v" : 2,
                "key" : {
                        "a" : 1
                },
                "name" : "a_1",
                "ns" : "test.test"
        }
]
mongos> db.test.dropIndex({a:1})
{
        "raw" : {
                "shard01/localhost:12346" : {
                        "nIndexesWas" : 3,
                        "ok" : 1
                }
        },
        "ok" : 1,
        "operationTime" : Timestamp(1562764983, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1562764983, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> db.test.find().itcount()
0
mongos> db.test.find()
mongos> 

I think we just need to check isSparse() along with isPartial(), and skip any such indexes.

Comment by Kelsey Schubert [ 17/Nov/18 ]

kevin.pulo, any update?

Comment by Kevin Pulo [ 26/Sep/18 ]

The issue where findIndexByKeyPattern() was incorrectly used (instead of findIndexByPrefix() followed by extendRangeBound()) by the range deleter was fixed in SERVER-8598 (specifically, the Step 3 commit), which actually predates this ticket.

However, the change made the code use findIndexByPrefix() to first identify the correct index, before passing that index's key pattern to findIndexByKeyPattern(). So this ticket may simply have resulted from a misreading of that code (combined with the historical problem).

In any case, current branches all definitively use findShardKeyPrefixedIndex() (which is what findIndexByPrefix() was renamed to) in all relevant sharding places, and not the findIndexBy...() variants (the one exception being where the range deleter calls findIndexByName() after calling findShardKeyPrefixedIndex(), possibly superfluously). So the behaviour should not be inconsistent between, say, the cloner and the range deleter.

The final step is to check the behaviour of that function in the presence of multiple matching prefix indexes, and index options such as sparseness or partialness. I suspect that SERVER-17915, which is where findIndexByPrefix() was renamed to findShardKeyPrefixedIndex(), took care of this (but that's what I need to check).

Comment by Kaloian Manassiev [ 07/Sep/18 ]

kevin.pulo, can you please take a look at whether this is still an issue and whether it is something we need to fix.

Generated at Thu Feb 08 03:23:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.