[SERVER-26120] Incompatible string conversion of floating point numbers in between different Windows builds Created: 14/Sep/16  Updated: 30/Sep/16  Resolved: 30/Sep/16

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.3.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Randolph Tan
Resolution: Won't Fix Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File fp_error.js    
Issue Links:
Depends
Related
related to SERVER-26419 Modify parallel.js to workaround floa... Closed
Operating System: ALL
Sprint: Sharding 2016-10-10
Participants:
Linked BF Score: 0

 Description   

It looks like 40000/12 when converted to string in v3.2 results to "3333.333333333334" while in master results to "3333.333333333333". This causes some issues in sharding because the _id of config.chunks contains the string representation of the min value of that chunk.



 Comments   
Comment by Andy Schwerin [ 30/Sep/16 ]

Outside of tests, shard version boundaries are not chosen by floating point division, so this should not affect production deployments. The `config.chunks` table is due for a redesign, and when we get to that, we can eliminate the stringified shard key index.

Comment by Randolph Tan [ 30/Sep/16 ]

Yes. Unless the v3.2 nodes change their queries, they won't be able to modify the chunks with incompatible stringified _id that were given to them by a v3.4 shard.

Comment by Andy Schwerin [ 30/Sep/16 ]

In what sense must this solution be backported to 3.2 for completeness? Are you considering mixed-mode operation during upgrade and downgrade?

Comment by Randolph Tan [ 29/Sep/16 ]

schwerin My proposal to workaround this bug is to change the query portion in applyOps (merge/split/moveChunk) to use { ns: xxx, min: yyy } instead of { _id: "strigify xxx + yyy" }. The update portion will also need to be tweaked to exclude the _id portion for basic updates but should be kept when doing upserts. This will require the fix to be backported to v3.2 as well to be complete. In this proposal, we will still continue on generating the _id as before but we will no longer use it in our queries. Assuming that v3.2 and v3.0 produce compatible strings, this change will not break compatibility between the 2 versions even when the workaround is backported to v3.2. However, this issue can manifest again if a v3.4 cluster is dowgraded down to v3.2.

Comment by Randolph Tan [ 14/Sep/16 ]

Attached a test file that demonstrates the bug. Works perfectly fine on my Linux Mint machine, but fails on Windows build.

Generated at Thu Feb 08 04:11:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.