[SERVER-51561] Defective query plans gets indefinitely cached Created: 14/Oct/20  Updated: 27/Oct/23  Resolved: 28/Oct/20

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.6.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dror Levy Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Due to unknown causes, the mongodb server in my company generated a defective query plan for a certain query.

The issue with the query is it took extreme amount of time to execute for something that should only take milliseconds.

Exacerbating the issue, I suspect that because the query was taking so long to finish, it never got expired from the cache.

Manually cleaning the query cache using

db.<collection>.getPlanCache().clear()

resolved the issue.

 

 

Technical info:

The query was in the form of

 

db.reservations
.find({ "deleted_at"null"_id": { "$in": [ObjectId("abcdef12f8c7030001792750"), ObjectId("abcdef122546d60001e38bf7"), ObjectId("abcdef12f4858d00010badb6"), ObjectId("abcdef12021c1700015e4e9b"), ObjectId("abcdef12e9d35c0001608522"), ObjectId("abcdef1262d4a900015bb867"), ObjectId("abcdef12901e3e00015bc2ac"), ObjectId("abcdef125beb580001f3f42c"), ObjectId("abcdef12f4858d00010c2f44"), ObjectId("abcdef12a806fa00017a981b"), ObjectId("abcdef120255970001f52fe3"), ObjectId("abcdef12908a2100013276f9"), ObjectId("abcdef12097e6600018d01ea"), ObjectId("abcdef12be535b000187afaa"), ObjectId("abcdef129ad77100011eb105"), ObjectId("abcdef12aee5080001c42906"), ObjectId("abcdef12d2f27c0001cf4b89"), ObjectId("abcdef12f8c70300017ab382"), ObjectId("abcdef1290f4740001bc6191"), ObjectId("abcdef1200b9cf0001a542c7"), ObjectId("abcdef12c4ea200001f5d7c5"), ObjectId("abcdef124d31bc000114ce2f"), ObjectId("abcdef12d2f27c0001ce9343"), ObjectId("abcdef12629ec300017bec63"), ObjectId("abcdef12903e160001b8d09e"), ObjectId("abcdef1262d4a90001e191d3"), ObjectId("abcdef12803ace000180beeb"), ObjectId("abcdef1262661e00014b0064"), ObjectId("abcdef12fdbe210001b85490"), ObjectId("abcdef12751db50001662763"), ObjectId("abcdef12702aa40001fa8d13"), ObjectId("abcdef1242775f00011381cd"), ObjectId("abcdef12ce728400011668ad"), ObjectId("abcdef12613aa9000192eed9")] } })
.sort({ st: 1, shop_id: 1 })

This query, and queries like it with different objectIDs, were taking as much as 6 hours to complete.

 

The execution plan that was generated for this type of query was

 

  {
        "stage" : "SORT",
            "sortPattern" : {
            "shop_id" : 1,
                "st" : 1
        },
        "inputStage" : {
            "stage" : "SORT_KEY_GENERATOR",
                "inputStage" : {
                "stage" : "FETCH",
                    "filter" : {
                    "deleted_at" : {
                        "$eq" : null
                    }
                },
                "inputStage" : {
                    "stage" : "IXSCAN",
                        "keyPattern" : {
                        "_id" : 1
                    },
                    "indexName" : "_id_",
                        "isMultiKey" : false,
                            "isUnique" : true,
                                "isSparse" : false,
                                    "isPartial" : false,
                                        "indexVersion" : 1,
                                            "direction" : "forward",
                                                "indexBounds" : {
                        "_id" : [
                            "[ObjectId('abcdef12021c1700015e4e9b'), ObjectId('abcdef12021c1700015e4e9b')]",
                            "[ObjectId('abcdef12751db50001662763'), ObjectId('abcdef12751db50001662763')]",
                            "[ObjectId('abcdef12fdbe210001b85490'), ObjectId('abcdef12fdbe210001b85490')]",
                            "[ObjectId('abcdef12702aa40001fa8d13'), ObjectId('abcdef12702aa40001fa8d13')]",
                            "[ObjectId('abcdef12ce728400011668ad'), ObjectId('abcdef12ce728400011668ad')]",
                            "[ObjectId('abcdef1242775f00011381cd'), ObjectId('abcdef1242775f00011381cd')]",
                            "[ObjectId('abcdef12613aa9000192eed9'), ObjectId('abcdef12613aa9000192eed9')]",
                            "[ObjectId('abcdef1262661e00014b0064'), ObjectId('abcdef1262661e00014b0064')]",
                            "[ObjectId('abcdef12803ace000180beeb'), ObjectId('abcdef12803ace000180beeb')]",
                            "[ObjectId('abcdef12f8c7030001792750'), ObjectId('abcdef12f8c7030001792750')]",
                            "[ObjectId('abcdef1262d4a90001e191d3'), ObjectId('abcdef1262d4a90001e191d3')]",
                            "[ObjectId('abcdef12903e160001b8d09e'), ObjectId('abcdef12903e160001b8d09e')]",
                            "[ObjectId('abcdef124d31bc000114ce2f'), ObjectId('abcdef124d31bc000114ce2f')]",
                            "[ObjectId('abcdef12629ec300017bec63'), ObjectId('abcdef12629ec300017bec63')]",
                            "[ObjectId('abcdef12d2f27c0001ce9343'), ObjectId('abcdef12d2f27c0001ce9343')]",
                            "[ObjectId('abcdef122546d60001e38bf7'), ObjectId('abcdef122546d60001e38bf7')]",
                            "[ObjectId('abcdef12c4ea200001f5d7c5'), ObjectId('abcdef12c4ea200001f5d7c5')]",
                            "[ObjectId('abcdef1200b9cf0001a542c7'), ObjectId('abcdef1200b9cf0001a542c7')]",
                            "[ObjectId('abcdef1290f4740001bc6191'), ObjectId('abcdef1290f4740001bc6191')]",
                            "[ObjectId('abcdef12f8c70300017ab382'), ObjectId('abcdef12f8c70300017ab382')]",
                            "[ObjectId('abcdef12d2f27c0001cf4b89'), ObjectId('abcdef12d2f27c0001cf4b89')]",
                            "[ObjectId('abcdef12aee5080001c42906'), ObjectId('abcdef12aee5080001c42906')]",
                            "[ObjectId('abcdef129ad77100011eb105'), ObjectId('abcdef129ad77100011eb105')]",
                            "[ObjectId('abcdef12be535b000187afaa'), ObjectId('abcdef12be535b000187afaa')]",
                            "[ObjectId('abcdef12097e6600018d01ea'), ObjectId('abcdef12097e6600018d01ea')]",
                            "[ObjectId('abcdef12908a2100013276f9'), ObjectId('abcdef12908a2100013276f9')]",
                            "[ObjectId('abcdef120255970001f52fe3'), ObjectId('abcdef120255970001f52fe3')]",
                            "[ObjectId('abcdef12f4858d00010badb6'), ObjectId('abcdef12f4858d00010badb6')]",
                            "[ObjectId('abcdef12e9d35c0001608522'), ObjectId('abcdef12e9d35c0001608522')]",
                            "[ObjectId('abcdef1262d4a900015bb867'), ObjectId('abcdef1262d4a900015bb867')]",
                            "[ObjectId('abcdef12901e3e00015bc2ac'), ObjectId('abcdef12901e3e00015bc2ac')]",
                            "[ObjectId('abcdef125beb580001f3f42c'), ObjectId('abcdef125beb580001f3f42c')]",
                            "[ObjectId('abcdef12f4858d00010c2f44'), ObjectId('abcdef12f4858d00010c2f44')]",
                            "[ObjectId('abcdef12a806fa00017a981b'), ObjectId('abcdef12a806fa00017a981b')]"
                        ]
                    }
                }
            }
        }
    }

 
Performing the same query without the sort was taking <1ms.

To further verify the issue is with the query plan cache, the same query was modified to include a non-op in the form of adding a rightmost sort key on the _id

db.reservations
.find({ "deleted_at"null"_id": { "$in": [ObjectId("abcdef12f8c7030001792750"), ObjectId("abcdef122546d60001e38bf7"), ObjectId("abcdef12f4858d00010badb6"), ObjectId("abcdef12021c1700015e4e9b"), ObjectId("abcdef12e9d35c0001608522"), ObjectId("abcdef1262d4a900015bb867"), ObjectId("abcdef12901e3e00015bc2ac"), ObjectId("abcdef125beb580001f3f42c"), ObjectId("abcdef12f4858d00010c2f44"), ObjectId("abcdef12a806fa00017a981b"), ObjectId("abcdef120255970001f52fe3"), ObjectId("abcdef12908a2100013276f9"), ObjectId("abcdef12097e6600018d01ea"), ObjectId("abcdef12be535b000187afaa"), ObjectId("abcdef129ad77100011eb105"), ObjectId("abcdef12aee5080001c42906"), ObjectId("abcdef12d2f27c0001cf4b89"), ObjectId("abcdef12f8c70300017ab382"), ObjectId("abcdef1290f4740001bc6191"), ObjectId("abcdef1200b9cf0001a542c7"), ObjectId("abcdef12c4ea200001f5d7c5"), ObjectId("abcdef124d31bc000114ce2f"), ObjectId("abcdef12d2f27c0001ce9343"), ObjectId("abcdef12629ec300017bec63"), ObjectId("abcdef12903e160001b8d09e"), ObjectId("abcdef1262d4a90001e191d3"), ObjectId("abcdef12803ace000180beeb"), ObjectId("abcdef1262661e00014b0064"), ObjectId("abcdef12fdbe210001b85490"), ObjectId("abcdef12751db50001662763"), ObjectId("abcdef12702aa40001fa8d13"), ObjectId("abcdef1242775f00011381cd"), ObjectId("abcdef12ce728400011668ad"), ObjectId("abcdef12613aa9000192eed9")] } })
.sort({ st: 1, shop_id: 1, _id: 1 })

This again took <1ms to execute.

After cleaning the query cache the query was executed again, but this time took <1ms, and it's query plan stayed identical to the above.

This query plan stayed in the cache for over 18 hours, and never expired, nor got fixed.

 

 



 Comments   
Comment by Johnny Shields [ 16/Oct/20 ]

Dima, thank you. We will proceed to upgrade to MongoDB 4.2

Comment by Dmitry Agranat [ 15/Oct/20 ]

Hi dror@tablecheck.com,

Looking at the reported MongoDB version, 3.6.2 is quite old and was released almost 3 years ago.

It should be harder to trigger this in MongoDB 4.2 due to SERVER-32452, which prevents a single query from establishing an active cache entry. The 4.2 release, which includes SERVER-32452, is designed to prevent bad plans from remaining in the plan cache. Generally speaking, we expect 4.2 to have fewer issues around bad plans remaining in the plan cache for too long.

Can you test with 4.2.10 and report back if you still experience the same issue?

Thanks,
Dima

Comment by Johnny Shields [ 14/Oct/20 ]

Additional info: the bad query plan was cached only on the primary. The secondaries did not have the bad plan and continued to execute within milliseconds.

Comment by Dror Levy [ 14/Oct/20 ]

Couldn't find a way to edit the issue, but this is for mongodb version 3.6.2

Generated at Thu Feb 08 05:25:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.