[SERVER-4676] $unset-ing field in sparse key does not reduce index size Created: 13/Jan/12  Updated: 29/Feb/12  Resolved: 27/Jan/12

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Harry Mexxian Assignee: Aaron Staple
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

WIN C#


Participants:

 Description   

Not sure if this is by design, a bug, or whether I am doing something wrong, but see findings in this thread:

https://groups.google.com/forum/#!topic/mongodb-user/_3e0fmVorD0

The use case where this comes into play is explained in the thread: "Using unset on a sparse index as an archiving alternative to two collections".



 Comments   
Comment by Scott Hernandez (Inactive) [ 27/Jan/12 ]

Your syntax for creating the index is indeed incorrect.

You want to use the ensureIndex(keys, options) method of collection – http://api.mongodb.org/csharp/current/html/9e19ab20-3843-7ad8-93f9-c4ace3320753.htm

Comment by Harry Mexxian [ 27/Jan/12 ]

My bad. Figured out the correct syntax:

 
            keys.Add("s_dt", 1);
            indexKeys = new IndexKeysDocument(keys);
            var options = new IndexOptionsDocument("sparse", true);
 

"User error" on this one. Sorry for the wasted time Aaron and thanks for the help.

Comment by Harry Mexxian [ 27/Jan/12 ]

OK. I now think that the C# driver is not actually creating a sparse index. It's just calling it sparse by adding sparse to the index name.

Comment by Harry Mexxian [ 27/Jan/12 ]

> db.docs.stats();
{
"ns" : "MongoPerfTests.docs",
"count" : 100000,
"size" : 6400048,
"avgObjSize" : 64.00048,
"storageSize" : 11182080,
"numExtents" : 6,
"nindexes" : 3,
"lastExtentSize" : 8388608,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 8985424,
"indexSizes" :

{ "_id_" : 3262224, "s_dt_1_sparse_" : 2910656, "f_dt_1" : 2812544 }

,
"ok" : 1
}

Comment by Harry Mexxian [ 27/Jan/12 ]

database records around the 98,000 'border'

/* 97998 */
{
"_id" : ObjectId("4f22846b99198de77d4cf42a"),
"_t" : "DataObject",
"f_dt" : 97998
}

/* 97999 */
{
"_id" : ObjectId("4f22846b99198de77d4cf42b"),
"_t" : "DataObject",
"f_dt" : 97999
}

/* 98000 */
{
"_id" : ObjectId("4f22846b99198de77d4cf42c"),
"_t" : "DataObject",
"f_dt" : 98000,
"s_dt" : 98000
}

/* 98001 */
{
"_id" : ObjectId("4f22846b99198de77d4cf42d"),
"_t" : "DataObject",
"f_dt" : 98001,
"s_dt" : 98001
}

/* 98002 */
{
"_id" : ObjectId("4f22846b99198de77d4cf42e"),
"_t" : "DataObject",
"f_dt" : 98002,
"s_dt" : 98002
}

Comment by Harry Mexxian [ 27/Jan/12 ]

If I get a chance later, I will experiment with a javascript version so it's either for you to replicate.

Comment by Harry Mexxian [ 27/Jan/12 ]

Hi Aaron. Here is the C# code I use. I check the database itself to make sure it is working correctly. In this test, only the last 1000 out of the 100,000 document still have the sparse field (s_dt) set, yet the two indices (s_dt, f_dt) are the same size.

    class Test1
    {
        private const int iterations = 100000;
   
        public Test1()
        {
 
            string connectionString = "mongodb://localhost";
            MongoServer server = MongoServer.Create(connectionString);
            MongoDatabase database = server.GetDatabase("MongoPerfTests",SafeMode.True);
 
            MongoCollection<DataObject> collection = database.GetCollection<DataObject>("docs");
 
 
            database.Drop();
            stopwatch.Restart();
            IMongoIndexKeys indexKeys;
            
            BsonDocument keys = new BsonDocument();
            keys.Add("s_dt", 1);
            keys.Add("sparse", true);
            indexKeys = new IndexKeysDocument(keys);
            collection.CreateIndex(indexKeys);
 
            keys = new BsonDocument();
            keys.Add("f_dt", 1);
            indexKeys = new IndexKeysDocument(keys);
            collection.CreateIndex(indexKeys);
 
            for (int i = 0; i < iterations; i++)
            {
                DataObject A = new DataObject(i);
                myInsert(collection, A);
 
                /* 
                 * rolling archiving, after 1000 inserts, unset 1 doc for each new row inserted
                 * 
                 * /
                if (i > 1000)
                {
                    int recordnumtoarchive = i - 1001;
                    var query = new QueryDocument {
                        { "s_dt", recordnumtoarchive },
                    };
                    var update = new UpdateDocument {
                        { "$unset", new BsonDocument("s_dt", "1") }
                    };
                    collection.Update(query, update);
 
                }
                */
 
                /*
                 *  Batch archiving, After 2000 inserts, archive 1000 items every 1000 inserts
                 * 
                 */
      
                if (i >= 2000 && i % 1000 == 0 )
                {
                    int recordnumtoarchive = i - 1000;
                    
                    var query = new QueryDocument {
                        { "s_dt", new BsonDocument {
                            {"$lt", recordnumtoarchive}
                            }
                        }
                    };
                     
 
                    var update = new UpdateDocument {
                        { "$unset", new BsonDocument("s_dt", "1") }
                    };
                    collection.Update(query, update,UpdateFlags.Multi);
 
                }
                 
                /*
                 * 
                 * Remove instead of unsettings
                 * 
                if (i >= 2000 && i % 1000 == 0)
                {
                    int recordnumtoarchive = i - 1000;
 
                    var query = new QueryDocument {
                        { "s_dt", new BsonDocument {
                            {"$lt", recordnumtoarchive}
                            }
                        }
                    };
 
 
                    var update = new UpdateDocument {
                        { "$unset", new BsonDocument("s_dt", "1") }
                    };
                    collection.Remove(query);
 
                }
                 */
            }
 
 
            Console.ReadLine();
 
        }
 
        static bool myInsert(MongoCollection collection, object obj) {
 
            try
            {
                SafeModeResult result = collection.Insert(obj);
                if (result != null && !result.Ok)
                {
                    Console.WriteLine(result.ErrorMessage);
                    return false;
                }
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
                return false;
 
            }
            finally
            {
                
 
            }
            return true;
 
        }
 
    }
 
   
 
    class DataObject
    {
        public int f_dt { get; set; }
        public int s_dt { get; set; }
 
        public DataObject(int num)
        {
            f_dt = num;
            s_dt = f_dt;
 
        }
 
 
    }
 

Comment by Aaron Staple [ 26/Jan/12 ]

Could you send the test that you performed?

Comment by Aaron Staple [ 26/Jan/12 ]

Hi Harry,

I wrote a simple test and was not able to duplicate the behavior you described

Test

c = db.c;
c.drop();
 
for( i = 0; i < 5*1000; ++i ) {
    c.save( {a:i} );
}
 
c.ensureIndex( {a:1}, {sparse:true} );
printjson( c.stats() );
 
c.update( {}, { $unset:{a:1} }, false, true );
printjson( c.stats() );
 
db.runCommand( {compact:'c'} );
printjson( c.stats() );

Here is the output:

connecting to: test
{
	"ns" : "test.c",
	"count" : 5000,
	"size" : 180020,
	"avgObjSize" : 36.004,
	"storageSize" : 348160,
	"numExtents" : 4,
	"nindexes" : 2,
	"lastExtentSize" : 262144,
	"paddingFactor" : 1,
	"flags" : 1,
	"totalIndexSize" : 310688,
	"indexSizes" : {
		"_id_" : 171696,
		"a_1" : 138992
	},
	"ok" : 1
}
{
	"ns" : "test.c",
	"count" : 5000,
	"size" : 180020,
	"avgObjSize" : 36.004,
	"storageSize" : 348160,
	"numExtents" : 4,
	"nindexes" : 2,
	"lastExtentSize" : 262144,
	"paddingFactor" : 1,
	"flags" : 1,
	"totalIndexSize" : 179872,
	"indexSizes" : {
		"_id_" : 171696,
		"a_1" : 8176
	},
	"ok" : 1
}
{
	"ns" : "test.c",
	"count" : 5000,
	"size" : 180020,
	"avgObjSize" : 36.004,
	"storageSize" : 1048576,
	"numExtents" : 1,
	"nindexes" : 2,
	"lastExtentSize" : 1048576,
	"paddingFactor" : 1,
	"flags" : 1,
	"totalIndexSize" : 163520,
	"indexSizes" : {
		"_id_" : 155344,
		"a_1" : 8176
	},
	"ok" : 1
}
 

Generated at Thu Feb 08 03:06:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.