[SERVER-17357] Add flag to enable logging in JSON format Created: 23/Feb/15  Updated: 06/Dec/22  Resolved: 08/Apr/20

Status: Closed
Project: Core Server
Component/s: Logging
Affects Version/s: None
Fix Version/s: 4.4.0

Type: New Feature Priority: Major - P3
Reporter: Travis Redman Assignee: DO NOT USE - Backlog - Dev Tools
Resolution: Done Votes: 33
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-21857 Mongodb output log as json? Closed
Related
Assigned Teams:
Developer Tools
Backwards Compatibility: Fully Compatible
Participants:
Case:

 Description   

The mongo logs have a lot of varied information and, as a result, lack a consistent structure. This makes writing accurate and efficient parsers difficult because your parser has to understand how the line will be printed in each context. For instance, queries are logged very differently from updates:

Tue Feb 17 21:19:53.300 [conn947952] update foo.bar query: { _id: "abc" } update: { $set: { x: 0 } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0 numYields:0 locks(micros) w:168 0ms

vs.

Tue Feb 17 21:19:53.301 [conn927275] query foo.bar query: { $query: { _id: "abc" } } planSummary: COLLSCAN ntoreturn:100 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:93 nreturned:1 reslen:521 0ms

Outside of the realm of op queries, the lines get even more diverse:

Fri Feb 20 21:56:12.332 [initandlisten] connection accepted from 127.0.0.1:48345 #6552246 (5768 connections now open)

Fri Feb 20 21:57:02.003 [conn6479488] getmore local.oplog.rs cursorid:901395238872875 ntoreturn:0 keyUpdates:0 numYields:0 locks(micros) r:40 nreturned:1 reslen:201 4ms

Fri Feb 20 21:57:42.989 [conn6549710] Plan 2 involved in multi-way tie. ns: data.foo query: { foo: "bar" } sort: {} projection: {} skip: 0 limit: 0 score: 1.5003 summary: IXSCAN { foo: 1 }

This type of data seems like a good candidate for JSON formatting. It would be helpful to have a config option for having all log lines printed in JSON. This would make parsing straightforward and consistent regardless of the presence or absence of various fields.

{ "time": "Tue Feb 17 21:19:53.301", "thread": "conn927275", "op": "query", "ns": "foo.bar", "query": "{ $query: { _id: \"abc\" } }", "planSummary": "COLLSCAN", "ntoreturn":100, "ntoskip":0, "nscanned":1, "nscannedObjects":1, "keyUpdates":0, "numYields":0, "locks(micros)" : { "r":93 }, "nreturned":1, "reslen":521, "duration_ms":0 }



 Comments   
Comment by Bryce Jones [ 07/Feb/20 ]

If changing things from the inside out is so difficult can you add a grok parser that can at least make sense of the logs?  I understand changing everything to JSON is tough, however, if you tried to parse your own logs you would say wow these are really non standard and vary wildly from log to log.  Then you could make some changes to your logs to make them more uniform and maybe this need goes away as they are simple to parse.

Comment by David Lambert [ 06/Feb/20 ]

So, this has moved back to "Backlog"?  Was really looking forward to it being release in 4.3... Do you have an updated release date?

Comment by Luc Charbonneau [ 25/Nov/19 ]

Side note: Would be nice if JSON logs can be streamed remotely through Kafka or other means.

Comment by Andrew Morrow (Inactive) [ 11/Apr/19 ]

david.lambert@sabre.com -

Thanks for letting us know that this issue is important to you. My team is actually in the process of planning our upcoming work for the next release, and a project to address this issue is among the items we are considering. As I'm sure you can imagine though, there are many potential projects competing for finite development resources.

If we do end up pursuing the project, it will not be a simple change. The issue is that almost all logging in the server codebase is written in the following style:

log() << "Foo has the value" << foo;

Here, the object returned by the call to log() is effectively a C++ std::ostream, and the << operators are the standard C++ streaming operators, which incrementally builds a std::string, and then logs that generated std::string as a single item. And there are thousands of such calls.

In order to log to other formats like JSON, this needs to change, since once we have stringified and concatenated the arguments it is too late to change the format. Non-exhaustively, and for the purpose of illustration, here are three approaches we might take to change that. We could:

1) Manually re-write all of the logging statements in the codebase to target a wholly new logging framework that is not based on streams.
2) Use a mechanism like C++ expression templates to defer evaluation of all arguments to << in each log statement, instead accumulating them into a "log statement arguments packet" of some sort (presumably a std::variant) and then defining different evaluators that can unroll the argument packet into different formats: a single std::string, a JSON object, etc.
3) Implement the wholly new logging framework from 1, above, and then implement a source-to-source transformation engine using something like LLVMs AST level transformations to automatically re-write all existing log statements into the new syntax.

Each of these approaches has its challenges (option one is probably a non-starter), so a proper design will need to be written, probably with some proof of concept work as well to help steer the design choices. It is also important that we ensure that changes to our logging system do not have adverse performance consequences, particularly when dealing with debug logging statements. Such statements are almost always disabled in production so they need to be close to zero overhead, especially in hot paths through the database.

The various support and engineering teams at MongoDB also spend a lot of time reading and analyzing MongoDB log files. Making this change would not only benefit our customers, but would be invaluable for us as well, so there is genuine interest in making this change purely on the technical merits.

I hope that helps clarify the state of this request. Any specific input or requests you might have regarding the implementation are most welcome. Thanks again for the helpful feedback!

Comment by David Lambert [ 11/Apr/19 ]

I know my tiny input to this isn't going to make Mongo wake up and do the right thing, but here it is.

I too have spent WAY to much time writing a logstash/GROK parser for logs and it's ugly and doesn't catch everything.  But it does get the major job done.

The idea that a document database company can't write logs out in a document form is just crazy really.  Would almost be willing to bet that they are stored internally in json.

Sadly, unless they find a way to monetize this, they aren't going to do the right thing... 

Comment by Petr Kukrál [ 11/Apr/19 ]

+1 for this. We are in age of OpenShift and EFK where we send logs of our applications in JSON format. This way is far more easy and managable then creating parsing rules for every specific type of logs.

Comment by Nicholas Whitehead [ 16/Nov/18 ]

If this issue moves, please add json logging to mongos as well.

Comment by Paul Frischknecht [ 19/Feb/18 ]

https://github.com/mongodb-js/log attempts to provide a parser.

Some motivation to implement this:

From

https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

(Actually, if you think about it, the idea of humans reading through logs on individual machines is something of an anachronism. This approach quickly becomes an unmanageable strategy when many services and servers are involved and the purpose of logs quickly becomes as an input to queries and graphs to understand behavior across many machines—something for which english text in files is not nearly as appropriate as the kind structured log described here.)

Comment by Anton [X] [ 06/Feb/18 ]

Any news on this task?

Comment by Mike Zraly [ 19/Nov/17 ]

Agreed. I have spent those hours parsing MongoDB logs and it is a frustrating experience, both because of the time I had to spend dealing with the crazy format (query too long to print? we'll overwrite the middle with a 10-period ellipsis and break parseability!) and the time my parsing code has to spend working around it.

Why parse logs at all you ask?

First, to get detailed collection- and query-level data without incurring the expense of enabling the profiler. I can set the slow query threshold to 0 if I want to look at everything, and that's less of an issue to me than enabling the profiler because log files go to a separate disk so there is no contention for IOPS.

Second, to get data that I can't get from serverStatus or collStats commands. For example, which collections or queries are seeing lock contention (acquireWaitCount, timeAcquiringMicros)? That data is only provided at the server level in the serverStatus locks document.

It's also kind of ironic that everything else in MongoDB is so JSON-centric, but not the logs.

Besides, think about how much easier it would be for you guys to implement per-query analytics in Ops Manager or Atlas if you had an easy-to-parse log format that you could extend without breaking parsing code? Isn't that the promise of using a document format like JSON in the database to begin with?

You could then take things one step further and imagine a downstream application that takes the JSON-formatted logs and writes them, or a subset of them, to a capped collection in another MongoDB installation where they can be easily queried.

Maybe you could use this to implement something like AWS RDS Performance Insights (https://aws.amazon.com/rds/performance-insights/) without having to enable the profiler.

It's all goodness, really.

Comment by Yauhen Artsiukhou [ 19/Nov/17 ]

Up!

Either implement strict log format for everything or make it possible to change format type, like make each log entry in json.
It is impossible to parse your logs: sudden warning in the middle of message, truncation, different formats of message.
Even your documentation is lying:
https://docs.mongodb.com/manual/reference/log-messages/

<timestamp> <severity> <component> [<context>] <message>

Here <component> field has padding, so I must read it like:

<timestamp> <severity> <component> +[<context>] (warning:.* \.\.\. )?<message>

Because of crappy logs we have:
https://github.com/rueckstiess/mongodb-log-spec - specification???
https://github.com/vjeantet/grok/blob/master/patterns/mongodb - wrong
https://gist.github.com/gswallow/9011322 - wrong
https://gist.github.com/xizhibei/081444ea66dbf527624b4d53b635f9bd - wrong
https://gist.github.com/mohit/6207215 - wrong

I have my own grok. And it is huge, it is ugly, it doesn't work well.
We have mongodb clusters, we need to aggregate logs for different type of analysis:

  • Time spent of query
  • Scanned object
  • Lock statistics
  • Database name
  • Collection name
  • etc

You have to spend couple of hours to implement json or some kind of format where I can exactly extract all fields.
On other hand thousands of mongodb users MUST waste tens hours to implement something partially working.

Comment by Yauhen Artsiukhou [ 15/Jun/17 ]

The same issue: It is impossible to work with mongodb logs. One more case when you have multiple lines and garbage with long query in logs like:

2017-06-15T13:49:07.103+0300 I WRITE    [conn4604436] warning: log line attempted (15kB) over max size (10kB), printing beginning and end ... update stag_peopl LONG LONG QUERY GOES HERE, AND FIELD="multi
lines
 
of text
 
"

Comment by pravin dwiwedi [ 28/Mar/17 ]

Now a days JSON is being supported by almost all the applications/parser. I want to use ELK stack for our log monitoring. Could you please let us know if you have anything on pipeline to provide Log in JSON format?

Comment by Feng Yu [ 11/Dec/15 ]

+1 For this feature. Mongodb log is hard to parse. Logstash grok pattern could not do well with mongodb log.

Generated at Thu Feb 08 03:44:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.