[SERVER-22778] Invariant failure _uncommittedSnapshots.empty() with enableMajorityReadConcern Created: 20/Feb/16  Updated: 09/Jun/16  Resolved: 22/Feb/16

Status: Closed
Project: Core Server
Component/s: Replication, Stability
Affects Version/s: 3.2.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Vlad Galu Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-22617 SnapshotThread hits invariant due to ... Closed
Related
related to SERVER-22740 mongod crashes on FreeBSD while setin... Closed
Operating System: ALL
Steps To Reproduce:

Running a replica set with one primary and two priority 0 secondaries, all on FreeBSD 10.2, ZFS and WiredTiger. Pushing small documents into a lightly indexed collection at

{w:majority, J:true}

.

Participants:

 Description   

2016-02-20T05:26:41.252+0100 I -        [SnapshotThread] Invariant failure _uncommittedSnapshots.empty() src/mongo/db/repl/replication_coordinator_impl.cpp 3232
2016-02-20T05:26:41.252+0100 I -        [SnapshotThread] 
 
***aborting after invariant() failure
 
 
2016-02-20T05:26:41.264+0100 F -        [SnapshotThread] Got signal: 6 (Abort trap).
 
 0x11f1d4b 0x11f16ba 0x803c479aa 0x803c471a8
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"DF1D4B","s":"_ZN5mongo15printStackTraceERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEE"},{"b":"400000","o":"DF16BA","s":"_ZN5mongo29reportOutOfMemoryErrorAndExitEv"},{"b":"803C3A000","o":"D9AA","s":"pthread_sigmask"},{"b":"803C3A000","o":"D1A8","s":"pthread_getspecific"}],"processInfo":{ "mongodbVersion" : "3.2.3", "gitVersion" : "b326ba837cf6f49d65c2f85e1b70f6f31ece7937", "compiledModules" : [], "uname" : { "sysname" : "FreeBSD", "release" : "10.2-RELEASE-p9", "version" : "FreeBSD 10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC", "machine" : "amd64" } }}
 mongod(_ZN5mongo15printStackTraceERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEE+0x2B) [0x11f1d4b]
 mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x14A) [0x11f16ba]
 libthr.so.3(pthread_sigmask+0x4AA) [0x803c479aa]
 libthr.so.3(pthread_getspecific+0xDD8) [0x803c471a8]
-----  END BACKTRACE  -----



 Comments   
Comment by Ramon Fernandez Marina [ 22/Feb/16 ]

Understood, thanks vgalu. We're closing these tickets as duplicates of SERVER-22617 then.

Regards,
Ramón.

Comment by Vlad Galu [ 20/Feb/16 ]

Hi Ramon,

Many thanks for chiming back so swiftly on a Saturday. I can confirm that the majority read concern is indeed in use, we wanted to give it a try. We will disable it and report back in a few days, if that's OK.

Vlad

Comment by Ramon Fernandez Marina [ 20/Feb/16 ]

vgalu, a colleague points out that this is most likely SERVER-22617, so there should be no need to keep this system up. However, can you please share the startup log for these servers? This issue should only appear when using --enableMajorityReadConcern: true, and the logs should show whether that's the case.

Comment by Vlad Galu [ 20/Feb/16 ]

P.S. This is a purely synthetic test where all documents are identical except for the _id, which is a unique 16 byte array and the two timestamps.

Comment by Vlad Galu [ 20/Feb/16 ]

Hi Ramon, I sadly no longer have the diagnostics data, as the setup we observed the issue was transient, but we have brought it back online and we are waiting for another assert. Meanwhile, here is what our schema looks like:

 
rs0:PRIMARY> db.collection.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "db.collection"
        }
]
rs0:PRIMARY> db.collection.findOne()
{
        "_id" : BinData(0,"AAAAAFbIb60UNKrKPUyCUg=="),
        "nb" : ISODate("2016-02-20T13:52:45.136Z"),
        "na" : ISODate("2016-02-20T13:52:45.136Z"),
        "bd" : BinData(0,"w+iLA4Hy5DRy6ZH5MwxHW+j44b4TRvv5Lwp2eXZhpOhONByy1cj89jSf/7c246eXeq/tbB60E7G68ejCsaOlksNY0jXuRpWXd+cBNXEBNK/k81uR6jY9etIeyjlp0uFyFYqG3Sj3DUcNUX4vj2SHPtFBvwb6MKWZRZOvIyvF+nKtkvnbobhct3xuwdjqjmyQYpsa7Fw4uBKWA4YJku8v/eXohhaipqWxZoX2WyqNTbvnTY5qA88ReuhWugRmAWcdKE3sr2rlXN4pwhGYGqPeMxjT+dMRPgCteQ47q5e9j6xPQFE3ll7MzX8qS1fq5nfu3IE/QIPb8zoXQuuJdKQ0pA==")
}
rs0:PRIMARY> 

Comment by Ramon Fernandez Marina [ 20/Feb/16 ]

vgalu, can you please specify on which node do you get the invariant failure? Having full logs for the affected node would be helpful if you could upload them. I'm trying on my end with Linux and documents of the following shape and indexes:

> db.c.findOne()
{
        "_id" : 11000000,
        "x" : 0,
        "y" : 0.94040102610125,
        "padding" : [
                0.0979344929655781,
                0.7888911063018895,
                0.18485241756570758,
                0.5824662338181952,
                0.8659871878908234,
                0.845018010741569,
                0.7572533686161478,
                0.9750469207183414,
                0.20008059867222983,
                0.9742682568175951
        ]
}
> db.c.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "db1.c"
        },
        {
                "v" : 1,
                "key" : {
                        "x" : 1
                },
                "name" : "x_1",
                "ns" : "db1.c"
        },
        {
                "v" : 1,
                "key" : {
                        "padding" : 1
                },
                "name" : "padding_1",
                "ns" : "db1.c"
        }
]

But often data distribution is important, so if you have a reproduction script you can share that will be of great help investigating this ticket and SERVER-22740 as well.

Thanks,
Ramón.

Generated at Thu Feb 08 04:01:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.