[SERVER-71564] _writeTestPipeBsonFile() test shell extension for Named Pipes benchmarks Created: 22/Nov/22  Updated: 29/Oct/23  Resolved: 30/Nov/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.3.0-rc0

Type: Task Priority: Major - P3
Reporter: Kevin Cherkauer Assignee: Kevin Cherkauer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Participants:

 Description   

These are subtasks from PERF-3313:

  1. ClickBench hits.json – Convert e.g. first ~1M objects to BSON and upload to SE dsi-donot-remove bucket.
  2. _writeTestPipeBsonFile() - Create this as another new test shell function that reads from a BSON file and writes the objects it contains to a named pipe. (SERVER-70392 is a reference for this.)
  3. jstests/noPassthrough/external_data_source.js - Add a test case for #2 to this test.


 Comments   
Comment by Githook User [ 30/Nov/22 ]

Author:

{'name': 'Kevin Cherkauer', 'email': 'kevin.cherkauer@mongodb.com', 'username': 'kevin-cherkauer'}

Message: SERVER-71564 Named Pipes _writeTestPipeBsonFile() test shell function
Branch: master
https://github.com/mongodb/mongo/commit/003d7559230acf925ca0069750cc914f7db0ab5b

Comment by Kevin Cherkauer [ 28/Nov/22 ]

Implementation:

  1. Read the BSON file into a std::vector<BSONObj>.
  2. Call existing NamedPipeHelper::writeToPipeObjectsAsync() on this vector.

This will round-robin the objects into the pipe until the number of objects written reaches the requested count. This approach allows us to feed BSON files into named pipes very fast, since the pipe writing is done from memory images, and for the parent PERF-3313 we do want the writer (producer) to be as fast as possible so it is not the bottleneck in the benchmarks.

Example:

_writeTestPIpeBsonFile("pipeName", 100000, "objectsFile.bson");

Comment by Kevin Cherkauer [ 23/Nov/22 ]

I have created a set of compressed benchmark data files of different orders of magnitude in both BSON and JSON formats plus uploaded the original Queries.zip and added a README.txt. These files (16 total) are in the Query Team Google Drive directory
 
Query/Benchmarks/ClickBench/
[https://drive.google.com/drive/folders/1xC_aAtY_W8Hn5zQq5n7opd5N4NBz1lmq]
 
These have been uploaded to S3 path

https://s3-us-west-2.amazonaws.com/dsi-donot-remove/Query/Benchmarks/ClickBench/

by Ryan Timmons of the Performance Tools team.

The subsets contain the first N objects from the full ClickBench dataset, for N in {1, 10, 100, 1,000, 10,000, 100,000, 1,000,000}. The original full dataset has almost 100 million objects and is unwieldy at 22.1 GB compressed JSON or 216.7 GB uncompressed. We do not need such a huge dataset for our benchmarks. These seven subsets will let us easily pick the scale of data we want to run any given benchmark on. The largest subset is about 2.2 GB of BSON when uncompressed or 162 MB when gzipped.

Generated at Thu Feb 08 06:19:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.