[SERVER-80776] create fails on already-sharded time-series collection Created: 05/Sep/23  Updated: 29/Oct/23  Resolved: 11/Oct/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 7.0.1
Fix Version/s: 7.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Felipe Gasper Assignee: Gregory Noma
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
is caused by SERVER-76547 Create command on a time-series colle... Closed
Related
is related to SERVER-82072 Time-series collection creation does ... Backlog
Assigned Teams:
Storage Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Run the above script, e.g., with `mongosh`.

Sprint: Execution NAMR Team 2023-10-02, Execution NAMR Team 2023-10-16
Participants:

 Description   

The following fails for me every time on 7.1.0-alpha0-201-g1eb2c72:

db = db.getSiblingDB("mytest")
 
db.dropDatabase()
 
db.createCollection(
    "weather",
    {
        timeseries: {
            timeField: "mytimefield",
        },
    },
)
 
db.getSiblingDB("admin").runCommand(
    {
        shardCollection: db.getName() + `.weather`,
        key: {
            mytimefield: 1,
        },
    },
)
 
db.createCollection(
    "weather",
    {
        timeseries: {
            timeField: "mytimefield",
        },
    },
)

The failure is:

MongoServerError: got stale shardVersion response from shard src-sh02 at host localhost:28022 :: caused by :: timestamp mismatch detected for TestE2EIntegrationTestSuite_TestTimeSeries_Sha-8120c373cd712241.system.buckets.weather

In v7.0.1 the error is less serious-looking:

MongoServerError: namespace mytest.weather already exists, but is a view on mytest.system.buckets.weather rather than mytest.

In v6.0.9 the error also looks wrong:

MongoServerError: ns: mytest.weather already exists with different options: { timeseries: { timeField: "mytimefield", granularity: "seconds", bucketMaxSpanSeconds: 3600 } }


I would expect there to be no error since I’m just creating the existing collection. The 7.1.0 error looks worst, followed by 6.0.9, then 7.0.1.



 Comments   
Comment by Githook User [ 11/Oct/23 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-80776 Prevent `StaleConfig` on time-series creation
Branch: master
https://github.com/mongodb/mongo/commit/5f753b64500373a5a36acefd4c9867a3aafc187c

Comment by Gregory Noma [ 22/Sep/23 ]

On v7.1, the error that gets returned back to the user looks like this:

"ok" : 0,
"errmsg" : "got stale shardVersion response from shard shard-rs1 at host localhost:20001 :: caused by :: timestamp mismatch detected for timeseries_create.system.buckets.timeseries_0",
"code" : 13388,
"codeName" : "StaleConfig",
"ns" : "timeseries_create.system.buckets.timeseries_0",
"vReceived" : {
	"e" : ObjectId("000000000000000000000000"),
	"t" : Timestamp(0, 0),
	"v" : Timestamp(0, 0)
},
"vWanted" : {
	"e" : ObjectId("6508b89b93d09586582b184e"),
	"t" : Timestamp(1695070363, 24),
	"v" : Timestamp(1, 0)
},
"shardId" : "shard-rs1",
"$clusterTime" : {
	"clusterTime" : Timestamp(1695070363, 46),
	"signature" : {
		"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
		"keyId" : NumberLong(0)
	}
},
"operationTime" : Timestamp(1695070363, 46)

It looks like this strange error is a result of SERVER-76547, in particular this usage of acquireCollectionMaybeLockFree that was added. I believe this is because when the create command is forwarded from the router to the shard, it contains the shard versioning information for the user-facing time-series view namespace, which is unsharded. But then this collection acquisition is on the underlying buckets collection, which is sharded. Thus we end up with this StaleConfig error.

Generated at Thu Feb 08 06:44:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.