[SERVER-21403] $snapshot can return duplicates on 3.2 in the case of an MMAPv1 document move Created: 11/Nov/15  Updated: 14/Dec/15  Resolved: 17/Nov/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.2.0-rc2
Fix Version/s: 3.2.0-rc4

Type: Bug Priority: Major - P3
Reporter: Gustavo Niemeyer Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File SnapshotTest.go     Text File snapShotJs.log     File snapshotTest.js    
Issue Links:
Related
related to SERVER-21563 storage_rocks_index_test is broken Closed
related to SERVER-14703 Snapshot queries can miss records if ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: QuInt C (11/23/15)
Participants:

 Description   

The $snapshot option doesn't seem to be working in 3.2.

The following Go test case works in every other release, but in 3.2 it breaks with "Error: seen duplicated key: 3". The test consists in exercising a worst case scenario where documents are resized (grown) beyond the padding and thus moved forwards, in reverse order, while doing a forward iteration.

 func (s *S) TestFindIterSnapshot(c *C) {
        session, err := mgo.Dial("localhost:40001")
        c.Assert(err, IsNil)
        defer session.Close()
 
        // Insane amounts of logging otherwise due to the
        // amount of data being shuffled.
        mgo.SetDebug(false)
        defer mgo.SetDebug(true)
 
        coll := session.DB("mydb").C("mycoll")
 
        var a [1024000]byte
 
        for n := 0; n < 10; n++ {
                err := coll.Insert(M{"_id": n, "n": n, "a1": &a})
                c.Assert(err, IsNil)
        }
 
        query := coll.Find(M{"n": M{"$gt": -1}}).Batch(2).Prefetch(0)
        query.Snapshot()
        iter := query.Iter()
 
        seen := map[int]bool{}
        result := struct {
                Id int "_id"
        }{}
        for iter.Next(&result) {
                if len(seen) == 2 {
                        // Grow all entries so that they have to move.
                        // Backwards so that the order is inverted.
                        for n := 10; n >= 0; n-- {
                                _, err := coll.Upsert(M{"_id": n}, M{"$set": M{"a2": &a}})
                                c.Assert(err, IsNil)
                        }
                }
                if seen[result.Id] {
                        c.Fatalf("seen duplicated key: %d", result.Id)
                }
                seen[result.Id] = true
        }
        c.Assert(iter.Close(), IsNil)
}

Test was performed using the mmapv1 storage engine.



 Comments   
Comment by Githook User [ 17/Nov/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-21403 make SortedDataInterface::Cursor implementations for unique indices never return dup keys
Branch: master
https://github.com/mongodb/mongo/commit/957f4ddf7cfb060eafb58fce581fb10d7a0d49b1

Comment by Githook User [ 17/Nov/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-21403 plumb flag indicating whether or not index is unique down to MMAPv1 BtreeLogic

Includes similar plumbing for the ephemeralForTest storage
engine. WiredTiger integration layer changes are unnecessary
since WT index cursors are already unique-aware.
Branch: master
https://github.com/mongodb/mongo/commit/009268e918a1e08d611daad34ec94fa5a351db60

Comment by David Storch [ 12/Nov/15 ]

The problem has to do with the way in which an MMAPv1 IndexCursor restores its position after a yield. In particular, it may restore incorrectly when an MMAP document's RecordId changes due to a move. This was introduced in the 3.1 dev cycle during the rewrite of the Storage Engine API's index cursor interface.

Comment by Scott Hernandez (Inactive) [ 12/Nov/15 ]

I've updated the test to use a large string, 1MB and that seems to do it on master to cause the failure.

Comment by Gustavo Niemeyer [ 12/Nov/15 ]

Unlike the Go test case, the attached javascript test is using a very small array, which means an update could fit into the padding.

Comment by Scott Hernandez (Inactive) [ 12/Nov/15 ]

Thomas, I updated your jstest to assert instead of printing so we can use it as a test in resmoke and so it fits into the regression tests in the server code base.

Unfortunately it doesn't seem to have a problem and passes in both wired tiger and mmapv1 on master for me locally. Please confirm my changes still error on your system, and then we can figure out what the differences are.

./buildscripts/resmoke.py --storageEngine mmapv1 ~/Downloads/snapshotTest.js 
./buildscripts/resmoke.py --storageEngine wiredTiger ~/Downloads/snapshotTest.js 
 
// Note: Using master resmoke with an old mongod you should use these options for legacy reads/writes.
./buildscripts/resmoke.py --storageEngine mmapv1 --shellWriteMode legacy --shellReadMode legacy ~/Downloads/snapshotTest.js

Comment by Scott Hernandez (Inactive) [ 11/Nov/15 ]

Can this be reproduced with another driver/language, like in the shell? In mgo, what protocol is being used with the server, the find command or op_query on the wire protocol?

Please include the server logs with level 2 verbosity, and explain output as well to help understand what is being sent.

Comment by Kelsey Schubert [ 11/Nov/15 ]

This issue was introduced in 3.1.2.

Comment by Kelsey Schubert [ 11/Nov/15 ]

I have verified the issue. I have attached a version of this test case in an a program that can be executed outside of the test framework and its fixtures. The issue is only present for mmapv1, and does not occur with WiredTiger.

Generated at Thu Feb 08 03:57:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.