[CDRIVER-474] "Failure to buffer 4 bytes" error on OS X with 300 client threads Created: 02/Dec/14  Updated: 06/Jan/15  Resolved: 06/Jan/15

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: 1.0.0
Fix Version/s: 1.1.0

Type: Bug Priority: Major - P3
Reporter: Ben Becker Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Mac OS X 10.10.1



 Description   

When running mongo-sim with 300 client threads on OS X, I'm getting the following error from each thread:

Failure to buffer 4 bytes: Failed to buffer 4 bytes within 300000 milliseconds.

Note that each thread reports this almost immediately; it's not timing out after 300 seconds or anything. You can reproduce by checking out 10gen/mongo-sim.git, building, and running the following from the build directory:

./src/simulation/mongo-sim -w workloads/simple_read -t 300 -e 10000

This spawns 300 client threads, one mongoc client per thread and executes a query 10,000 times total. Each thread executes a query operation 33 or 34 times, then iterates over every document in a tight loop (using mongoc_cursor_error(), mongoc_cursor_more() and mongoc_cursor_next() as outlined in cursor.hpp's Cursor::itcount() method.

Note that the same error occurs when running an insert-only workload (change -w to insert_only). I have not been able to reproduce this issue on Linux.

Sorry I haven't been able to narrow this down to a minimal reproducible case yet. I don't have many free cycles, but I'm happy to help anywhere I can.



 Comments   
Comment by Ben Becker [ 02/Dec/14 ]

Yay, fixed!

Thanks Jason!

Comment by Ben Becker [ 02/Dec/14 ]

I was on tag 1.0.2, but it seems I have the old logic. I'll upgrade to the 1.1.0-rc0 tag and let you know!

   static mach_timebase_info_data_t info = { 0 };
   static double ratio = 0.0;
 
   if (!info.denom) {
      mach_timebase_info (&info);
      ratio = info.numer / info.denom;
   }
 
   return mach_absolute_time () * ratio;

Comment by Mira Carey [ 02/Dec/14 ]

Actually, what version of libbson are you using? I took a patch for adjusting mach_absolute_time to microseconds a couple of months back and I think I'm getting the units right at this point.

   static mach_timebase_info_data_t info = { 0 };
   static double ratio = 0.0;
 
   if (!info.denom) {
      // the value from mach_absolute_time () * info.numer / info.denom
      // is in nano seconds. So we have to divid by 1000.0 to get micro seconds
      mach_timebase_info (&info);
      ratio = (double)info.numer / (double)info.denom / 1000.0;
   }
 
   return mach_absolute_time () * ratio;

I see the race you're talking about, but the units seem sane.

Comment by Mira Carey [ 02/Dec/14 ]

aah, good catch. thanks for the report

Comment by Ben Becker [ 02/Dec/14 ]

It looks like bson_get_monotonic_time() is not thread safe due to the use of static variables in the function. Also, mach_absolute_time() needs to be adjusted; see https://developer.apple.com/library/mac/qa/qa1398/_index.html.

Generated at Wed Feb 07 21:09:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.