[CSHARP-2043] Add GridFSBucketOptions.SuppressEnsureIndexes Created: 25/Sep/17  Updated: 31/Mar/22

Status: Backlog
Project: C# Driver
Component/s: GridFS
Affects Version/s: 2.4.4
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: John Gibbons Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: driver, gridfs, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

All


Issue Links:
Related
Backwards Compatibility: Fully Compatible

 Description   

The GridFS Driver currently calls EnsureIndexes during UploadFromStream. This is normally highly convenient because it ensures that files/chunks tables are correctly set up, with zero admin setup required. However, there is one highly significant downside (plus one less important downside).

Downside 1: if you have SHARDED files/chunks collections, then the EnsureIndexes routine will fail when ANY shard is unavailable (even if the write operation pertains to a shard that is perfectly fine). This is because the EnsureIndexes routine issues a "find" operation and that find operation gets broadcast to all shards - so if one shard is down, the find will fail and so the whole write will fail. In my case, users in London are unable to save to London data centre, because New York data centre is currently offline - bad!

Downside 2: Running EnsureIndexes requires user to have find permission, list (and potentially create) indexes permission. In my use case, I do not want user to have find permission (they can write but cannot read).



 Comments   
Comment by John Gibbons [ 08/Mar/18 ]

HI Robert, glad to see this has been scheduled. Looking forward to seeing it out in the wild and let me know if anything I can do to help. NOte: there is already a pull request with the fix included so hopefully that helps. Rgds, JG

Comment by John Gibbons [ 11/Oct/17 ]

https://github.com/mongodb/mongo-csharp-driver/pull/294

Comment by John Gibbons [ 11/Oct/17 ]

https://jira.mongodb.org/browse/SERVER-31511 raised to report issue. Working on AssumeIndexesExist fix now ....

Comment by Robert Stam [ 10/Oct/17 ]

Also, if you think the server is not treating AllowPartialResults properly would you be willing to create a SERVER ticket reporting that?

Comment by Robert Stam [ 10/Oct/17 ]

Thanks so much for all the work you have done investigating this.

Based on your findings I think we should conclude that the AllowPartialResults = true approach is not going to work. So I think we should return to your original proposal.

I do have one suggestion though. We try to avoid words like "Suppress" in options. Can we find another name for this new option? Perhaps options.AssumeIndexesExist?

Comment by John Gibbons [ 10/Oct/17 ]

THIS ONE PRETTY MUCH SAME EXCEPT SETS UP 2 SHARDS BACKED BY [single mongod] REPLICA SETS. partial DOES NOT HAVE DESIRED EFFECT IF ONE MONGOD KILLED.

kill -2 $(pgrep -a mongo | grep " 400" | awk '{print $1}')
sleep 1
cd
rm PartialTest2 -r -f
mkdir PartialTest2
cd PartialTest2/
mkdir config
mkdir shard1
mkdir shard2
mkdir logs
mongod --port 40001 --configsvr --replSet config --dbpath config --logpath logs/config.log --fork
mongo localhost:40001 --quiet --eval 'rs.initiate( { _id: "config", configsvr:true, members:[{_id:0, host: "localhost:40001"}]} )'
mongos --port 40002 --configdb config/localhost:40001 --logpath logs/mongos.log --fork
mongod --port 40011 --shardsvr --replSet shard1 --dbpath shard1 --logpath logs/shard1.log --fork
mongo localhost:40011 --quiet --eval 'rs.initiate( { _id: "shard1", members:[{_id:0, host: "localhost:40011"}]} )'
mongod --port 40012 --shardsvr --replSet shard2 --dbpath shard2 --logpath logs/shard2.log --fork
mongo localhost:40012 --quiet --eval 'rs.initiate( { _id: "shard2", members:[{_id:0, host: "localhost:40012"}]} )'
mongo localhost:40002 --quiet --eval 'sh.addShard("shard1/localhost:40011")'
mongo localhost:40002 --quiet --eval 'sh.addShard("shard2/localhost:40012")'
mongo localhost:40002 --quiet --eval 'sh.enableSharding("partial")'
mongo localhost:40002/partial --quiet --eval 'db.test.createIndex({"key" : 1})'
mongo localhost:40002 --quiet --eval 'sh.shardCollection("partial.test", { "key" : 1 })'
mongo localhost:40002 --quiet --eval 'sh.addShardTag("shard1", "firsthalf")'
mongo localhost:40002 --quiet --eval 'sh.addShardTag("shard2", "secondhalf")'
mongo localhost:40002 --quiet --eval 'sh.addTagRange("partial.test", { key : "A"}, {key: "L"}, "firsthalf")'
mongo localhost:40002 --quiet --eval 'sh.addTagRange("partial.test", { key : "L"}, {key: "Z"}, "secondhalf")'
mongo localhost:40002/partial --quiet --eval 'db.test.insert({ "key" : "B" })'
mongo localhost:40002/partial --quiet --eval 'db.test.insert({ "key" : "Y" })'
mongo localhost:40002/partial --quiet --eval 'db.test.find().addOption(DBQuery.Option.partial)' #should return 2
kill -2 $(pgrep -a mongo | grep "40011" | awk '{print $1}') db.test.find().addOption(DBQuery.Option.partial)
mongo localhost:40002/partial --quiet --eval 'db.test.find().addOption(DBQuery.Option.partial)' #should return 1 but fails

[mongod@BLAH PartialTest2]$ mongo localhost:40002/partial --quiet --eval 'db.test.find()'

{ "_id" : ObjectId("59dcc375f955bba2a5b15850"), "key" : "B" } { "_id" : ObjectId("59dcc375f90654100e5edb0e"), "key" : "Y" }

[mongod@BLAH PartialTest2]$ mongo localhost:40002/partial --quiet --eval 'db.test.find().addOption(DBQuery.Option.partial)'

{ "_id" : ObjectId("59dcc375f955bba2a5b15850"), "key" : "B" } { "_id" : ObjectId("59dcc375f90654100e5edb0e"), "key" : "Y" }

[mongod@BLAH PartialTest2]$ kill -2 $(pgrep -a mongo | grep "40011" | awk '

{print $1}

')

[mongod@BLAHPartialTest2]$ mongo localhost:40002/partial --quiet --eval 'db.test.find().addOption(DBQuery.Option.partial)'

Error: error: {

"ok" : 0,

"errmsg" : "could not find host matching read preference { mode: \"primary\", tags: [ {} ] } for set shard1",

"code" : 133,

"codeName" : "FailedToSatisfyReadPreference"

}

Comment by John Gibbons [ 10/Oct/17 ]

THIS ONE SETS UP 2 SHARDS BACKED DIRECTLY BY SINGLE MONGODS
AND partial WORKS FINE IF ONE MONGOD KILLED.

 
 
kill -2 $(pgrep -a mongo | grep " 300" | awk '{print $1}')
sleep 1
cd
rm PartialTest -r -f
mkdir PartialTest
cd PartialTest/
mkdir config
mkdir shard1
mkdir shard2
mkdir logs
mongod --port 30001 --configsvr --replSet config --dbpath config --logpath logs/config.log --fork
sleep 1
mongo localhost:30001 --quiet --eval 'rs.initiate( { _id: "config", configsvr:true, members:[{_id:0, host: "localhost:30001"}]} )'
sleep 5
mongos --port 30002 --configdb config/localhost:30001 --logpath logs/mongos.log --fork
mongod --port 30011 --shardsvr --dbpath shard1 --logpath logs/shard1.log --fork
mongod --port 30012 --shardsvr --dbpath shard2 --logpath logs/shard2.log --fork
mongo localhost:30002 --quiet --eval 'sh.addShard("localhost:30011")'
mongo localhost:30002 --quiet --eval 'sh.addShard("localhost:30012")'
mongo localhost:30002 --quiet --eval 'sh.enableSharding("partial")'
mongo localhost:30002/partial --quiet --eval 'db.test.createIndex({"key" : 1})'
mongo localhost:30002 --quiet --eval 'sh.shardCollection("partial.test", { "key" : 1 })'
mongo localhost:30002 --quiet --eval 'sh.addShardTag("shard0000", "firsthalf")'
mongo localhost:30002 --quiet --eval 'sh.addShardTag("shard0001", "secondhalf")'
mongo localhost:30002 --quiet --eval 'sh.addTagRange("partial.test", { key : "A"}, {key: "L"}, "firsthalf")'
mongo localhost:30002 --quiet --eval 'sh.addTagRange("partial.test", { key : "L"}, {key: "Z"}, "secondhalf")'
mongo localhost:30002/partial --quiet --eval 'db.test.insert({ "key" : "B" })'
mongo localhost:30002/partial --quiet --eval 'db.test.insert({ "key" : "Y" })'
mongo localhost:30002/partial --quiet --eval 'db.test.find()' #should return 2
mongo localhost:30002/partial --quiet --eval 'db.test.find().addOption(DBQuery.Option.partial)' #should return 2
kill -2 $(pgrep -a mongo | grep "30011" | awk '{print $1}')
mongo localhost:30002/partial --quiet --eval 'db.test.find().addOption(DBQuery.Option.partial)' #should return 1
mongo localhost:30002/partial --quiet --eval 'db.test.find()' #should error

Results:
mongos> db.test.find().addOption(DBQuery.Option.partial)

{ "_id" : ObjectId("59dcb0f38b49a86cdec9e780"), "key" : "B" } { "_id" : ObjectId("59dcb0f37950b2dc5ff66fd5"), "key" : "Y" }

mongos> exit

bye

[mongod@BLAH PartialTest]$ kill -2 $(pgrep -a mongo | grep "30011" | awk '

{print $1}

')

[mongod@BLAH PartialTest]$ mongo localhost:30002/partial --quiet --eval 'db.test.find().addOption(DBQuery.Option.partial)' #should return 1

{ "_id" : ObjectId("59dcb0f37950b2dc5ff66fd5"), "key" : "Y" }

[mongod@BLAH PartialTest]$ mongo localhost:30002/partial --quiet --eval 'db.test.find()' #should error because no partial

Error: error:

{ "ok" : 0, "errmsg" : "Connection refused", "code" : 6, "codeName" : "HostUnreachable" }
Comment by John Gibbons [ 10/Oct/17 ]

OK I see what the issue is. the DBQuery.Option.partial works if the sharded cluster is a cluster of SINGLE MONGODs. Whereas it does not work if the sharded cluster is a cluster of REPLICA SETS. Attaching evidence shortly.

Comment by John Gibbons [ 10/Oct/17 ]

Hi Robert,

It's not really working for me as yet. Firstly, I added the AllowPartialResults = true in C# code, but it made no difference - that is to say it continued to report an error saying that one of the shards was down. Then I tried same from the mongo client, and got what I believe to be same result [1].

In both cases I'm executing against a mongos pointing to a config replica set which governs a 3 shard cluster. One shard I've killed the 2 mongods.

I'm continuing to research this (e.g. by building up a second, completely new cluster from scratch to retest all this) - but in the meantime, any thoughts?

A quick question, how does one officially set the partial results cursor option from the command line in Mongo 3.4 - In the documentation [2] it says that the cursor.addOption feature has been deprecated in shell since 3.2, and use cursor methods instead, but there is no cursor method for partial results afaik. Makes me wonder whether there are any issues around the partial cursor concept ...

[1]
mongos> db.fs.files.find().limit(1).addOption(DBQuery.Option.partial)
Error: error: {
"ok" : 0,
"errmsg" : "could not find host matching read preference { mode: \"primary\", tags: [ {} ] } for set HK1",
"code" : 133,
"codeName" : "FailedToSatisfyReadPreference"
}

[2]
https://docs.mongodb.com/v3.4/reference/method/cursor.addOption/

Note
Deprecated in the mongo Shell since v3.2
Starting in v3.2, the cursor.addOption() operator is deprecated in the mongo shell. In the mongo shell, use available cursor methods instead.

Comment by Robert Stam [ 06/Oct/17 ]

Adding AllowPartialResults = true will change the resulting error in certain scenarios.

I assume that if all shards are down a FindOperation with AllowPartialResults = true will return 0 documents instead of an error. This will cause the driver to conclude the fs.files collection is empty and proceed to call EnsureIndexes, which will fail because all shards are down. Either way an error would have occurred, but the error is now different.

A similar situation could arise when all the shards that are up are empty (and the shards that are down are not empty). This will also result in the driver erroneously concluding that the fs.files collection is empty.

Let me know what your test reveals. And what you think about the possible change in errors reported.

Comment by John Gibbons [ 06/Oct/17 ]

Thanks for suggestion Robert! I'd feel most comfortable to do a quick test on it just to triple check it does as you suggest and if so I'll raise a new PR (I'm assuming that is what's most convenient for you ...) Rgds, JG

Comment by Robert Stam [ 06/Oct/17 ]

Rather than adding a new knob which must be configured I think it would be better to change the Find operation that is used to check if the files collection is empty to set AllowPartialResults to true. This will allow the Find to return without error even if one shard is down.

Would you like to submit a new pull request that adds:

    AllowPartialResults = true

when creating the FindOperation here:

https://github.com/mongodb/mongo-csharp-driver/blob/master/src/MongoDB.Driver.GridFS/GridFSBucket.cs#L695

Or just let me know if you would prefer for me to make the change for you.

Thanks for submitting this.

Comment by John Gibbons [ 06/Oct/17 ]

Pull Request Submitted
CSHARP-2043: Added SuppressEnsureIndexes option to GridFSBucketOption… #292

Comment by John Gibbons [ 25/Sep/17 ]

forgot to mention, proposed solution would be as follows:

1. Create a new field in GridFSBucketOptions (and its immutable counterpart) called SuppressEnsureIndexes. This defaults to false, and thus the default behaviour remains unchanged for 100% backward compatibility.
2. In EnsureIndexes (and EnsureIndexesAsync) insert test at top of method and exit out immediately if appropriate.

/// <summary>
/// Gets the suppress ensure indexes setting
/// </summary>
/// <value>
/// The suppress ensure indexes setting
/// </value>
public bool SuppressEnsureIndexes
{
get

{ return _suppressEnsureIndexes; }

}

private void EnsureIndexes(IReadWriteBindingHandle binding, CancellationToken cancellationToken)
{
_ensureIndexesSemaphore.Wait(cancellationToken);

try
{
if (!_ensureIndexesDone)
{
if (!_options.SuppressEnsureIndexes)
{
var isFilesCollectionEmpty = IsFilesCollectionEmpty(binding, cancellationToken);
if (isFilesCollectionEmpty)

Generated at Wed Feb 07 21:41:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.