-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Schema Management
-
Storage Engines, Storage Engines - Foundations
-
None
-
None
In our current shared metadata design, we maintain a separate shared metadata table called WiredTigerShared. When a follower picks up a new checkpoint, it copies the entire contents of the metadata shared table to its local metadata table WiredTiger.wt and creates any missing ingest tables in the process. We chose this design, as it could be implemented quickly and was good enough for our current requirements. However, this design would not scale to deployments with thousands and even million tables.
1) Avoiding metadata copy
The first requirement for developing scalable metadata representation is to avoid having to copy the entire contents of the shared metadata table into the local metadata table during checkpoint pick up, as that would not scale for a large number of tables. For example, if we have a million tables, with even just 2KB metadata per table, that would require copying 2GB of metadata during checkpoint pickup.
We could possibly solve this by making the metadata: cursor to be a union cursor over the shared metadata table and a local metadata table. The two tables should be mostly disjoint (and hopefully entirely disjoint), with the shared metadata table containing the file: entries for the stable tables, and the layered: and table: entries for layered tables. The local metadata table would be storing the file: entries for the ingest tables. However, we would need to figure out some corner cases, such as what to do with the layered: and table: entries when a follower creates a layered table during oplog application.
2) Creating ingest tables on demand
A likely consequence of the first requirement would be the need to create ingest tables when they are first used. In the current design, we can easily detect and create a missing ingest table during checkpoint pickup, but we would no longer have this if we stop copying entries from the shared metadata table to the local metadata table during checkpoint pickup. As a result, we would either need to find a more efficient way to find which ingest tables are missing (without having to scan the shared metadata table), or create ingest tables on demand. The latter would be likely simpler and more performant.
3) Bonus: Cleaning up unnecessary ingest tables
If a follower does not keep using a layered table, its ingest table would become empty after all of its contents is made obsolete by picking up a checkpoint that has all of the table's contents already in it.
Especially if we create ingest tables on demand, it would allow us to be more memory efficient by periodically closing empty ingest tables (e.g., during a periodic sweep, which we could do at the same time as the dhandle sweep), which would delete the associated btree structures and free the corresponding memory. We could then take it a step further by actually dropping such ingest tables to free even more memory.
This could be a differentiator for large deployments with millions of tables, in which only a fraction of tables are used at any given time, as that would bound the size of the local metadata table by the number of tables in the application's current working set. This would then allow us to keep the entire metadata table in memory without having to spill it to disk (WT-15163).