Audit write paths to ensure 16MB document size is being enforced explicitly and consistently

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Before SERVER-104405, the 16MB size limit on user documents was being enforced by an implicit size check in the BSONObj constructor. This was not consistently enforcing the size, and there were different size limits that needed to be enforced at different points, so we decided to effectively remove the check in the constructor by increasing the upper limit to BufferMaxSize (125MB).

      As a result, the callers need to enforce the correct size limit on the resulting BSON Object/Array, so we added explicit validation of the size in the places that called the BSONObj constructor to confirm that the appropriate size limit was being used and it was being enforced as early as possible in the primary write path. This ticket added this size validation to many call sites of BSONObjBuilder::done(), BSONObjBuilder.obj(), BSONArrayBuilder::done(), BSONObj(), etc. which were identified from test failures caused by removing the size check from the BSONObj constructor.

      To ensure correctness, each team needs to audit all their other call sites to the BSONObj constructor and generally all places where the user document should be within the max user limit (16MB) to make sure the correct size limit is being enforced. SERVER-104405 provided helper function validateBSONObjSize() which can be used to do this.

      It's extremely important that the primary is always doing size validation in all paths. If the primary takes a path which doesn't correctly enforce size, and the secondary takes a different path that does do size validation, an operation could fail on the secondary that succeeded on the primary. This will cause a crash loop in oplog application and effectively bring the cluster down (see SERVER-108119 and HELP-53890)

              Assignee:
              Unassigned
              Reporter:
              Ruchitha Rajaghatta
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: