|
Author:
{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}
Message: SERVER-71937 Change dropDatabase to drop the views collection first to ensure time-series deletion oplog order of view THEN buckets
Branch: master
https://github.com/mongodb/mongo/commit/3770a31b999e708bca5fb7500267bf829ddc2a2e
|
|
Author:
{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}
Message: SERVER-71937 Change dropDatabase to drop the views collection first to ensure time-series deletion oplog order of view THEN buckets
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/8cb48ab0796a6246f629af546213d8fdb692723a
|
|
Spoke offline with Yuhong. We think the best approach is to change dropDatabase to drop the views collection first – instead of randomly – before the other collections. That way we can ensure that a buckets collection can exist without a view entry, but not the other way around. Then, validate will not longer be able to find a view and subsequently fail to find a buckets collection and return an error.
|
|
We appear to hold a MODE_X database lock across dropDatabase for the duration of all collection drops.
I think this leads to the conclusion that dropDatabase and dropCollection both block access to both the view and buckets namespaces while the time-series collection state is removed. So the validate code should only see either both or neither. If validate finds neither a collection or view, the code throws a NamespaceNotFound error. It seems like it would make sense to throw a NamespaceNotFound error with a different err msg if neither the view nor buckets is found – checked under a continuous lock. I don't know what's appropriate for actually finding either the view or buckets and not the other – what should the user do in that case? The oplog could be chopped in between by rollback/crash recovery, I imagine.
|
|
The dropCollection logic removes both namespaces under locks on both namespaces. So in that case the validate cmd should never find one without the other because it checks under at least one lock. I'm not sure about the dropDatabase cmd yet.
|
|
Ah. This failure scenario is specific to dropDatabase – what the BF failure logs show occurred --, not dropCollection. dropDatabase does buckets drop first, then view deletion; whereas dropCollection deletes view first, then buckets drop. Makes sense now that it's an OK scenario for the buckets collection to be missing when the view is still present.
// create
|
|
// Timestamp(1673386580, 1)
|
{ "op" : "c", "ns" : "godb.$cmd", "ui" : UUID("9d2bb203-b2e3-4fbe-a56a-98adad321128"), "o" : { "create" : "system.buckets.ts", "validator" : { "$jsonSchema" : { "bsonType" : "object", "required" : [ "_id", "control", "data" ], "properties" : { "_id" : { "bsonType" : "objectId" }, "control" : { "bsonType" : "object", "required" : [ "version", "min", "max" ], "properties" : { "version" : { "bsonType" : "number" }, "min" : { "bsonType" : "object", "required" : [ "t" ], "properties" : { "t" : { "bsonType" : "date" } } }, "max" : { "bsonType" : "object", "required" : [ "t" ], "properties" : { "t" : { "bsonType" : "date" } } }, "closed" : { "bsonType" : "bool" }, "count" : { "bsonType" : "number", "minimum" : 1 } }, "additionalProperties" : false }, "data" : { "bsonType" : "object" }, "meta" : { } }, "additionalProperties" : false } }, "clusteredIndex" : true, "timeseries" : { "timeField" : "t", "granularity" : "minutes", "bucketMaxSpanSeconds" : 86400 } }, "ts" : Timestamp(1673386580, 1), "t" : NumberLong(8), "v" : NumberLong(2), "wall" : ISODate("2023-01-10T21:36:20.226Z") }
|
|
// Timestamp(1673386580, 2)
|
{ "op" : "i", "ns" : "godb.system.views", "ui" : UUID("bff8ee7d-88d2-4088-9f4b-2e063808e2d3"), "o" : { "_id" : "godb.ts", "viewOn" : "system.buckets.ts", "pipeline" : [ { "$_internalUnpackBucket" : { "timeField" : "t", "bucketMaxSpanSeconds" : 86400 } } ] }, "o2" : { "_id" : "godb.ts" }, "ts" : Timestamp(1673386580, 2), "t" : NumberLong(8), "v" : NumberLong(2), "wall" : ISODate("2023-01-10T21:36:20.227Z") }
|
|
|
// dropDatabase
|
|
// Timestamp(1673386591, 1) -- buckets
|
{ "op" : "c", "ns" : "godb.$cmd", "ui" : UUID("9d2bb203-b2e3-4fbe-a56a-98adad321128"), "o" : { "drop" : "system.buckets.ts" }, "o2" : { "numRecords" : 0 }, "ts" : Timestamp(1673386591, 1), "t" : NumberLong(8), "v" : NumberLong(2), "wall" : ISODate("2023-01-10T21:36:31.142Z") }
|
|
// Timestamp(1673386591, 2) -- view
|
{ "op" : "c", "ns" : "godb.$cmd", "ui" : UUID("bff8ee7d-88d2-4088-9f4b-2e063808e2d3"), "o" : { "drop" : "system.views" }, "o2" : { "numRecords" : 1 }, "ts" : Timestamp(1673386591, 2), "t" : NumberLong(8), "v" : NumberLong(2), "wall" : ISODate("2023-01-10T21:36:31.144Z") }
|
|
// Timestamp(1673386591, 3) -- database
|
{ "op" : "c", "ns" : "godb.$cmd", "o" : { "dropDatabase" : 1 }, "ts" : Timestamp(1673386591, 3), "t" : NumberLong(8), "v" : NumberLong(2), "wall" : ISODate("2023-01-10T21:36:31.236Z") }
|
|
|
Ah. The validate logic releases and locks a new namespace here before the check, so the time-series view namespace might also be gone at this point: we don't recheck for it.
|
db.oplog.rs.find({}).limit(20).sort({ts:-1})
|
|
// create
|
|
// Timestamp(1673385204, 1)
|
{ "op" : "c", "ns" : "godb.$cmd", "ui" : UUID("5b2a849b-af1c-47be-bd6e-d060040c66d9"), "o" : { "create" : "system.buckets.ts", "validator" : { "$jsonSchema" : { "bsonType" : "object", "required" : [ "_id", "control", "data" ], "properties" : { "_id" : { "bsonType" : "objectId" }, "control" : { "bsonType" : "object", "required" : [ "version", "min", "max" ], "properties" : { "version" : { "bsonType" : "number" }, "min" : { "bsonType" : "object", "required" : [ "t" ], "properties" : { "t" : { "bsonType" : "date" } } }, "max" : { "bsonType" : "object", "required" : [ "t" ], "properties" : { "t" : { "bsonType" : "date" } } }, "closed" : { "bsonType" : "bool" }, "count" : { "bsonType" : "number", "minimum" : 1 } }, "additionalProperties" : false }, "data" : { "bsonType" : "object" }, "meta" : { } }, "additionalProperties" : false } }, "clusteredIndex" : true, "timeseries" : { "timeField" : "t", "granularity" : "minutes", "bucketMaxSpanSeconds" : 86400 } }, "ts" : Timestamp(1673385204, 1), "t" : NumberLong(8), "v" : NumberLong(2), "wall" : ISODate("2023-01-10T21:13:24.434Z") }
|
|
// Timestamp(1673385204, 2)
|
{ "op" : "c", "ns" : "godb.$cmd", "ui" : UUID("bff8ee7d-88d2-4088-9f4b-2e063808e2d3"), "o" : { "create" : "system.views", "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } }, "ts" : Timestamp(1673385204, 2), "t" : NumberLong(8), "v" : NumberLong(2), "wall" : ISODate("2023-01-10T21:13:24.458Z") }
|
|
// Timestamp(1673385204, 3)
|
{ "op" : "i", "ns" : "godb.system.views", "ui" : UUID("bff8ee7d-88d2-4088-9f4b-2e063808e2d3"), "o" : { "_id" : "godb.ts", "viewOn" : "system.buckets.ts", "pipeline" : [ { "$_internalUnpackBucket" : { "timeField" : "t", "bucketMaxSpanSeconds" : 86400 } } ] }, "o2" : { "_id" : "godb.ts" }, "ts" : Timestamp(1673385204, 3), "t" : NumberLong(8), "v" : NumberLong(2), "wall" : ISODate("2023-01-10T21:13:24.466Z") }
|
|
..........
|
|
// drop
|
|
// Timestamp(1673385313, 1)
|
{ "op" : "d", "ns" : "godb.system.views", "ui" : UUID("bff8ee7d-88d2-4088-9f4b-2e063808e2d3"), "o" : { "_id" : "godb.ts" }, "ts" : Timestamp(1673385313, 1), "t" : NumberLong(8), "v" : NumberLong(2), "wall" : ISODate("2023-01-10T21:15:13.066Z") }
|
|
// Timestamp(1673385313, 2)
|
{ "op" : "c", "ns" : "godb.$cmd", "ui" : UUID("5b2a849b-af1c-47be-bd6e-d060040c66d9"), "o" : { "drop" : "system.buckets.ts" }, "o2" : { "numRecords" : 0 }, "ts" : Timestamp(1673385313, 2), "t" : NumberLong(8), "v" : NumberLong(2), "wall" : ISODate("2023-01-10T21:15:13.070Z") }
|
|
|
|
It looks like the view is dropped before the underlying buckets collection of a time-series namespace. So a buckets collection could exist without a view registered on the time-series namespace.
It isn't yet obvious how we can run into the "Cannot validate a time-series collection without its bucket collection" error, to make sure being in such a state is not error-worthy. I'll keep digging.
|
Generated at Thu Feb 08 06:20:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.