[CDRIVER-1452] Attempt to clean/normalize values collected in mongoc-metadata Created: 05/Aug/16  Updated: 19/Oct/16  Resolved: 02/Sep/16

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: None
Fix Version/s: 1.5.0

Type: Improvement Priority: Minor - P4
Reporter: Ian Boros Assignee: Hannes Magnusson
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to CDRIVER-1552 handshake meta data on s390/rhel 7.2 ... Closed
Epic Link: mongodb-handshake

 Description   

In some parts of the metadata collection process, we do nothing to "sanitize" the strings we collect. For example, the values may have quotes around them. We also don't do much to make sure the values are whitespace trimmed. Hannes brought this up in a code review for the linux distro scanner, so I'm opening a ticket for it.

I think we should either:

a) Decide not to do this, and just put the responsibility on whoever is analyzing this data.

b) Do this as a separate step in the mongoc-metadata code. If we go with this option we should define exactly what kind of sanitizing we want to do here.



 Comments   
Comment by Githook User [ 02/Sep/16 ]

Author:

{u'username': u'bjori', u'name': u'Hannes Magnusson', u'email': u'bjori@php.net'}

Message: CDRIVER-1452 Remove quotes around values

And use the distro name, not its ID.
This makes for more human readable value
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/f80882af4dc13018015995ff017bcda284541065

Comment by Hannes Magnusson [ 29/Aug/16 ]

These quotes for the distro name is getting a littlebit annoying. I think we should consider doing this sooner then later

Comment by Ian Boros [ 05/Aug/16 ]

I'm putting my take on both options in this comment to separate my opinions from what the options (as I see them) are

1)
This is a good option, I think, since whoever does the analysis of this will also have to deal with things like the os name being "RHEL" in some cases and "Red Hat" in others. As part of reconciling these inconsistencies, they'll probably also have to sanitize the strings, so putting in the extra work to do it in C isn't worth it.

2)
The only good argument I see for this is that extra whitespace and quotes take up space and we should try to conserve space. With that said, this would be a "micro optimization" (we'd maybe save a few bytes per document...)

Generated at Wed Feb 07 21:12:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.