[SERVER-51519] Describe collection schema or explain validation to get current schema Created: 13/Oct/20  Updated: 08/Mar/23

Status: Backlog
Project: Core Server
Component/s: Querying, Tools
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Joshua Hemphill Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2020-10-19 at 2.55.19 PM.png    
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

There's not a good way currently to get Mongo's bson types for every field that exists in a collection. Trying to use external tools to sample and guess at the bson types has proved difficult. 

This would be especially helpful when creating validators to give a starting point, or for creating documentation.



 Comments   
Comment by Edwin Zhou [ 20/Oct/20 ]

jhemphill@tecnicocorp.com,
Thank you for further clarifying your use-case. We're assigning this ticket to the appropriate team to be evaluated against our currently planned work. Updates will be posted on this ticket as they happen.

In the meantime, you can attempt to create an aggregation pipeline using $objectToArray and $map. Here is a snippet that will flatly display the types of each field as a starting point.

db.collection.aggregate(
  [{
    $project: {
      types: {
        $arrayToObject: {
          $map: {
            input: {
              $objectToArray: "$$ROOT"
            },
            in: {
              v: {
                $type: "$$this.v"
              },
              k: "$$this.k"
            }
          }
        }
      }
    }
  }]
)

Which outputs:

_id:ObjectId("5f74b4200885de90a48fd57e"),
types: {
  _id:"objectId",
  title:"string",
  year:"double",
  rated:"string",
  runtime:"double",
  countries:"array",
  genres:"array",
  director:"string",
  writers:"array",
  actors:"array",
  plot:"string",
  poster:"string",
  imdb:"object",
  tomato:"object",
  metacritic:"double",
  awards:"object",
  type:"string"
}

Best,

Edwin

Comment by Joshua Hemphill [ 19/Oct/20 ]

@edwin.zhou , if you mean directly query for only the schema of a nested document, no; but get the schema of the nested documents when retrieving the whole document schema, yes.

Although I see the issue. That would have to be configurable, since I'm sure use-case by use-case some people would either want to ignore them, sample some number, or analyze all of them.

It might make sense as available in the aggregation pipeline, and have it ignore or provide some property/hook by default to let that behavior be configured or consumed in user space.  

Comment by Edwin Zhou [ 19/Oct/20 ]

jhemphill@tecnicocorp.com, does your use case need to also query nested documents?

Edwin

Comment by Joshua Hemphill [ 19/Oct/20 ]

Hello, 

No I'm talking about programmatic access. Compass is fine for looking at the database for manual review, but there's no way to export that schema or consume it with other applications to make it actionable. Having some way to collect the schema from a record would let an external program determine both the widest and narrowest schema that exist on a collection and would allow for point-in-time documentation to be maintained externally.

For example, if the schema is also being tracked in the consuming codebase, it can only document the JSON types since that's all it has the ability to obtain, and if that's the case, it wouldn't be possible to generate new collections derived from that documentation and still be able to guarantee compatibility with the one it was documenting.

Comment by Edwin Zhou [ 19/Oct/20 ]

Hi jhemphill@tecnicocorp.com,

Thank you for your request. We'd like some additional information about your use case. Are you looking to produce a list of fields given a smaller sample of your collection? In this case, you can use MongoDB Compass, a GUI for MongoDB to provide a schema with the types of each field in your collection. I've attached an image as an example of the output. 

Please let me know if this is adequate for your use case.

Generated at Thu Feb 08 05:25:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.