[SERVER-19334] TTL index deletions cannot always keep up with insertions with WiredTiger Created: 08/Jul/15  Updated: 06/Dec/22  Resolved: 02/Apr/20

Status: Closed
Project: Core Server
Component/s: Performance, Storage, WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Eitan Klein Assignee: Backlog - Storage Execution Team
Resolution: Incomplete Votes: 1
Labels: 32qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File FindOne and Index.txt     Text File serverstatus.txt    
Issue Links:
Duplicate
is duplicated by SERVER-19688 TTL Index is having more Latency Closed
Related
Assigned Teams:
Storage Execution
Operating System: ALL
Sprint: Integration F (02/01/16), Integration 10 (02/22/16)
Participants:

 Description   

Configuration:

db version v3.1.6-pre-
git version: 21c4ce42c34008a966e66214b1dd83cdc066bf1c

Environment

  • Windows
  • SA or Replica set
  • Storage engine WT
  • TTL configured to 20 sec

Workload:

  • Hammer.mongo insert only profile w/ TTL = 20 sec (configurable variable in simple_insert.sh) i'm using it to simulate cache workload

Issues:

MongoDb reported that document has been deleted w/
db.serverStatus().metrics.ttl
But all document remains and DB size is growing like insert only no TTL , and you can see them using db.c.find()

TTL working well on the same workload , same build w/ MMAP storage engine

Additional information

Please see SERVER-19688 as well.



 Comments   
Comment by Githook User [ 16/Oct/15 ]

Author:

{u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}

Message: SERVER-19334: Refactor handling of status in delete stage
Branch: master
https://github.com/mongodb/mongo/commit/ec9317104da7816cabb41cf603c084099d768df0

Comment by Daniel Pasette (Inactive) [ 03/Aug/15 ]

re-opening this issue to pursue approach of batching deletes

Comment by Geert Bosch [ 13/Jul/15 ]

The issue is that restoreState resets the timer, and we call this function in the delete loop of doTTLForIndex:

(gdb) bt
#0  mongo::ElapsedTracker::resetLastTime (this=0x12b07a58) at src/mongo/util/elapsed_tracker.cpp:62
#1  0x0000000000c7ac5e in mongo::PlanYieldPolicy::resetTimer (this=<optimized out>) at src/mongo/db/query/plan_yield_policy.cpp:57
#2  0x0000000000c75024 in mongo::PlanExecutor::restoreStateWithoutRetrying (this=this@entry=0x119d88f0, opCtx=opCtx@entry=0x7f371147d870)
    at src/mongo/db/query/plan_executor.cpp:290
#3  0x0000000000c75091 in mongo::PlanExecutor::restoreState (this=0x119d88f0, opCtx=opCtx@entry=0x7f371147d870) at src/mongo/db/query/plan_executor.cpp:272
#4  0x0000000000eb8b61 in mongo::TTLMonitor::doTTLForIndex (this=this@entry=0x3341ba0, txn=txn@entry=0x7f371147d870, dbName=..., idx=...) at src/mongo/db/ttl.cpp:307
#5  0x0000000000eb9b92 in mongo::TTLMonitor::doTTLPass (this=this@entry=0x3341ba0) at src/mongo/db/ttl.cpp:145
#6  0x0000000000eba1f8 in mongo::TTLMonitor::run (this=0x3341ba0) at src/mongo/db/ttl.cpp:113
#7  0x000000000105da07 in mongo::BackgroundJob::jobBody (this=0x3341ba0) at src/mongo/util/background.cpp:152
#8  0x00007f3716b15b60 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
#9  0x00007f3716328182 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f3716052fbd in clone () from /lib/x86_64-linux-gnu/libc.so.6

As a result the TTL monitor never yields.

Comment by Daniel Pasette (Inactive) [ 13/Jul/15 ]

Broke out the separate issue found while investigating this one here: SERVER-19374

Comment by Daniel Pasette (Inactive) [ 10/Jul/15 ]

This did expose a possible issue with the TTL monitor though. When the TTL monitor is running, I found that listCollections would block. Easy to repro:

  • Create a ttl index
  • Start an insert job into that collection (benchrun will work)
  • When the TTL monitor job is running, try running db.getCollectionNames(). It will block til the monitor is done deleting docs.
Comment by Daniel Pasette (Inactive) [ 10/Jul/15 ]

Depending exactly on the parameters used in the test, it is very easy to outstrip the ability of mongod to delete as much as what is being inserted. The TTL monitor runs 1/minute and is single threaded, deleting each document 1 at a time. This can be exacerbated with a small cache, which will force the deleted documents to be pulled off disk. With MMAP, I believe the insertion rate is naturally throttled by the deletions because of the collection level lock.

This explains why the data would take a long time to be deleted. The disk space then would only be reclaimed gradually by new data, but without an explicit compact, I don't think it would be returned to the system, and thus would stay at the high water mark it reached during the insertion phase.

I don't think there is a ton we can do right now to make this much better, other than possibly parallelizing the deletes in some manner.

Comment by Sam Kleinman (Inactive) [ 08/Jul/15 ]

The TTL process only deletes documents every 60 seconds, and the configurability of the interval that the TTL deletion operation is not documented (noted as "for testing only".) https://github.com/mongodb/mongo/blob/master/src/mongo/db/ttl.cpp#L78

The TTL documentation states that expired documents may continue to exist in the database for some time after they've reached the timeout. Thus this is expected behavior.

Cheers,
sam

Comment by Eitan Klein [ 08/Jul/15 ]

Easy repro script

db.foo.createIndex(

{date: 1}

,

{expireAfterSeconds: 20}

)
for (var i=0; i<100;i++) { db.foo.insert(

{x: i, date: new Date()}

) }
Wait 20 sec
db.foo.find() should return an empty set

Generated at Thu Feb 08 03:50:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.