|
I'm not sure how many users exist that are using Pig with the MongoDB Hadoop connector but no documentation exists about it. The README in the pig folder of mongo-hadoop basically tells you everything you need to know except for one caveat: If a subsequent statement depends on a previous statement that uses MongoStorage*, then the EXEC statement must be used after the former statement. Not doing this causes out of order job execution and in the case where you have an insert on test.a and an update on test.a, the update document may be lost.
This is called Implicit Dependencies.
With that it may also be useful to write some Hive documentation as well.
|