[SERVER-65910] the server tests attempt to set illegal durable and oldest timestamps Created: 23/Apr/22  Updated: 26/Apr/22  Resolved: 26/Apr/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Keith Bostic (Inactive) Assignee: Keith Bostic (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt Dependency
has to be done before WT-9042 commit/durability timestamps can race... Backlog
Operating System: ALL
Participants:

 Description   

In WT-9042, the WT code to set the global timestamps was simplified, but also changed to eliminate possible races and to validate the final state of the timestamps. Running an MDB server build reported some errors.

The MDB server build is here:
https://spruce.mongodb.com/version/626427e83e8e866de9dca260/tasks?limit=20&page=0&sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

There are complaints about setting the oldest timestamp after stable, here's an example from replica_sets_large_txns_format_2_enterprise-rhel-80-64-bit-dynamic-all-feature-flags-required, at: https://logkeeper.mongodb.org/build/3f81046

{"ts_sec":1650735882,"ts_usec":511535,"thread":"22910:0x7f53ff400500","session_name":"WT_CONNECTION.set_timestamp","category":"WT_VERB_DEFAULT","category_id":9,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"__wt_txn_global_set_timestamp:421:set_timestamp: oldest timestamp (1650735870, 106) must not be after the stable timestamp (1650735870, 6)","error_str":"Invalid argument","error_code":22}}}
[js_test:startup_recovery_for_restore_needs_rollback] d20521| {"t":{"$date":"2022-04-23T17:44:42.511+00:00"},"s":"F",  "c":"ASSERT",   "id":23083,   "ctx":"initandlisten","msg":"Invariant failure","attr":{"expr":"_conn->set_timestamp(_conn, oldestTSConfigString.c_str())","error":"BadValue: 22: Invalid argument","file":"src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp","line":2282}}
[js_test:startup_recovery_for_restore_needs_rollback] d20521| {"t":{"$date":"2022-04-23T17:44:42.511+00:00"},"s":"F",  "c":"ASSERT",   "id":23084,   "ctx":"initandlisten","msg":"\n\n***aborting after invariant() failure\n\n"}

There are complaints about setting the durable timestamp before stable, here's an example from disk_wiredtiger, at: https://logkeeper.mongodb.org/build/ec4076fcbe010429d5ba553821902f95/test/62642f26be07c42af4119b16?raw=1

{"ts_sec":1650732901,"ts_usec":804554,"thread":"11348:0x7fbc01cb3700","session_name":"WT_CONNECTION.set_timestamp","category":"WT_VERB_DEFAULT","category_id":9,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"__wt_txn_global_set_timestamp:409:set_timestamp: durable timestamp (0, 1) must not be before the stable timestamp (1650732887, 2)","error_str":"Invalid argument","error_code":22}}}
[js_test:repair_invalidates_replica_set_config] d20021| {"t":{"$date":"2022-04-23T16:55:01.804+00:00"},"s":"F",  "c":"ASSERT",   "id":23083,   "ctx":"ReplCoordExtern-0","msg":"Invariant failure","attr":{"expr":"_conn->set_timestamp(_conn, oldestTSConfigString.c_str())","error":"BadValue: 22: Invalid argument","file":"src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp","line":2282}}



 Comments   
Comment by Keith Bostic (Inactive) [ 25/Apr/22 ]

Thanks, daniel.gottlieb@mongodb.com  – yes, this one is mine for now.

Comment by Daniel Gottlieb (Inactive) [ 25/Apr/22 ]

There are two states which cause the majority of failures:

  • startupRecoveryForRestore
  • File based initial sync

Both states are capable of setting advanceTimestampsEachBatch which will advance the oldest timestamp as batches complete without advancing the stable timestamp.

Those cases (I claim) are ultimately benign. We're only taking checkpoints with use_timestamp=false. The history preserved by the oldest timestamp is not needed for readers. And historical values are not needed for fuzzy checkpoints restoring a stable checkpoint.

keith.bostic@mongodb.com and I talked offline about the relationships between the oldest vs stable timestamps and came up with a more precise set of assertions. Keith is going to run a follow-up patch to see if there's anything leftover which looks concerning (correct me if I'm wrong Keith!)

Comment by Keith Bostic (Inactive) [ 23/Apr/22 ]

daniel.gottlieb@mongodb.com, geert.bosch@mongodb.com, I'm seeing unexpected results when I tighten down setting global timestamps, can I please ask for some server assistance here?

 

Dan, we could easily drop core if there's an attempt to set an illegal timestamp combination in HAVE_DIAGNOSTIC mode, just like we do when checking timestamps at commit time, if that would help with debugging.

Generated at Thu Feb 08 06:03:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.