-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Schema Management
-
Storage Engines, Storage Engines - Foundations
-
(copied to CRM)
-
8
-
StorEng - 2025-03-14, StorEng - 2025-03-28, StorEng - 2025-04-25
-
v8.1, v8.0, v7.0, v6.0
Fix the issue where an unnecessary schema lock is taken for actively used file: -prefixed dhandles because their corresponding table:-prefixed dhandles are expired by the sweep server. This leads to schema lock contention, especially during checkpoint prepare, affecting performance.
Description
When opening a file: -prefixed dhandle, the table:-prefixed dhandle is used to determine if a corresponding file: -prefixed dhandle is a simple table. However, the sweep server sweeps table:-prefixed dhandles, leading to their premature expiration.
Since file: -prefixed dhandles have places to reset their "Time of Death", they remain active. But there is no "Time of Death" reset mechanism for table:-prefixed dhandles during schema operations, causing them to expire. Also, even when there are no schema ops, corresponding table: -prefixed dhandles are expired by the sweep server.
This results in:
- Unnecessary reopening of table:-prefixed dhandles, requiring a schema lock.
- Schema lock contention when a checkpoint is preparing, since it also needs the schema lock.
- Performance degradation due to increased blocking between application threads and the checkpoint thread.
Reproducer
In the Python test below, I create 1,000 dhandles and then spawn 1,000 threads to perform inserts for 100 iterations, ensuring that all dhandles remain active.
def insert(self, i, start, rows): session = self.conn.open_session() uri = self.uri + str(i) cursor = session.open_cursor(uri) session.begin_transaction() for i in range(start, rows): cursor.set_key(i) cursor.set_value(str(i)) cursor.insert() session.commit_transaction() cursor.close() session.close() def test_dhandles(self): dhandles = 1000 for i in range(1,dhandles): uri = self.uri + str(i) self.session.create(uri, format) for i in range(1,100): threads = [] for i in range(1,dhandles): thread = threading.Thread(target=self.insert, args=(i, 0, 100)) thread.start() threads.append(thread) for thread in threads: thread.join()
To test the sweep server, I modify the config accordingly
file_manager=(close_handle_minimum=0,close_idle_time=60,close_scan_interval=30)
Scope:
- Decide the potential solutions described in WT-13663.
- Fix the issue by applying the chosen solution.