[SERVER-53382] Add additional initial sync metrics to logs Created: 16/Dec/20  Updated: 05/Apr/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Evin Roesle Assignee: Alan Zheng
Resolution: Unresolved Votes: 0
Labels: former-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-53476 Don't include initial sync metrics in... Closed
depends on SERVER-50624 Submit Log Ingestion Rule Request For... Closed
Related
related to SERVER-53381 Submit Log Ingestion Rule Request For... Closed
related to SERVER-50624 Submit Log Ingestion Rule Request For... Closed
is related to SERVER-47528 Presence of initialSyncStatus in repl... Closed
is related to SERVER-47863 Initial Sync Progress Metrics Closed
Participants:

 Description   

In order to have a better picture of how our initial sync is performing in current and previous versions, we need to add more metrics to our logs.

Current important metrics that we already have in our logs -

  • failedInitialSyncAttempts
  • approxTotalDataSize
  • approxTotalBytesCopied
  • totalInitialSyncElapsedMillis
  • initialSyncAttempts
  • appliedOps
  • databases.databasesCloned

Future metrics to add (names and methods are TBD)

  • successfulInitialSyncAttempts
  • resumedInitialSyncAttempts
  • collectionsCloned = length of listCollections response
  • <collection>.approxBytesCopied = collStats.avgObjSize * documentsCopied
  • indexesCloned = dbStats.indexes
  • totalIndexSize = dbStats.indexSize
  • totalInitialSyncOplogElapsedMillis = time for oplog application phase 
  • initialSyncMethod 
  • resycOrAddingNode 


 Comments   
Comment by Alan Zheng [ 16/Jul/21 ]

Agreed that we should add additional initial sync metrics. I can help with defining this ticket. Reassigning to myself.

Comment by Bruce Lucas (Inactive) [ 16/Dec/20 ]

Is this for logs as stated, or for serverStataus or replSetGetStatus? If either of the latter then <collection>.approxBytesCopied is problematic in the case where there are a large number of collections because of the size of the resulting document, and because of the load on ftdc.

Generated at Thu Feb 08 05:30:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.