[CSHARP-1169] ParallelScan Return Type Created: 24/Jan/15  Updated: 05/Apr/19  Resolved: 04/Apr/15

Status: Closed
Project: C# Driver
Component/s: API
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Wolfie Wolf [X] Assignee: Unassigned
Resolution: Done Votes: 0
Labels: question
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows x64



 Description   

Hi, I'm new here and to MongoDB so please excuse me if this has been asked before, I tried to find similar questions / issues. I'm attempting to use the ParallelScan method in the C# official driver and noticed a couple of things. First, the example on the website is not c#, I don't know what language it is, but C# requires values to be initialized in a for loop.

for (int c = 0; c < cursors.Count(); c++ ){}

would be more accurate. Next is the return type, it appears to be a ReadOnlyCollection of type BsonDocument. Wouldn't it make more sense if this returned a MongoCollection that could be further iterated through with the recommended foreach structure? As it is I get back a massive Bson document with the contents of every document that the cursor returned. If I want to do anything with the elements I would have to parse it like a CSV or something strange. If it returned a MongoCollection, or pseudoMongoCollection then a standard ForEach (BsonDocument ... would let you grab individual elements from each returned document...

Again I apologize if I am completely off base here, I don't claim to be a developer, just a guy trying to use some really cool code that you've developed.

Kind Regards

/W



 Comments   
Comment by Wolfie Wolf [X] [ 24/Jan/15 ]

Hi Craig,

Many thanks for your prompt reply. Unfortunately the google groups community forum is very disorganized, if I'm honest I couldn't even properly follow a thread in there let alone search it for information. However, since you have kindly offered to help I will detail my scenario and maybe you can offer some advice.. I really appreciate it.

First though I have to say (and remember I will never claim to be a developer) The example in the docs that you mentioned is the one I'm talking about and, unfortunately, it doesn't work. Go ahead and type:

for (var cursor in cursors){}

into any C# Ide and you're going to get a complaint about variables needing to be initialized. I just found this and it's looks like it is at least c#.

http://mongodb.github.io/node-mongodb-native/driver-articles/anintroductionto1_4_and_2_6.html

"I don't know much about art but I know what I like"..... I know I'm new but I know what I want to achieve and ParallelScan is exactly what I need... Here's my scenario (thanks for helping)... My intention is to use MongoDB for raw tick data (Market Data) as well as other things, but for now tick data is the focus. I have one database and a collection for each instrument. Each document in a collection has an identical schmea, This is crude oil.

{
"_id" : ObjectId("54c2ca2ed2f47ff39ab85abb"),
"TimeStamp" : "2015-01-13 00:00:08.678",
"Last" : 45.3,
"LastSize" : NumberInt(1),
"TotalVolume" : NumberInt(15850),
"Bid" : 45.29,
"Ask" : 45.3,
"TickId" : NumberInt(3413392),
"BasisForLast" : "C",
"TradeMarketCenter" : NumberInt(36),
"TradeConditions" : NumberInt(1)
}

I have 10 days of data at the moment, that's every trade down to the milisecond, so we are talking about hundreds of thousands of documents; I haven't counted, the GUI tool I'm using to browse the store only lets me display 10K. As a simple test application I decided to use the tick data to generate "Tick Charts" of 1000 ticks. Simple, right? I want to create bars, so every 1000 documents in the database I want to do the following:

Get the last date/time stamp as the "Close" time for the bar
Get the first document "Last" price as the Open
Get the Max value of "Last" price as the High
Get the Min value of "Last" price as the Low
Get the Last document "Last" price as the Close

I stuff all that into a new array, for every 1000 records, and then I render it to a WPF chart. The code goes like this:

So this is the event that fires when the Chart is loading.. I'm kicking off a background worker thread so as to not tie up the GUI whilst the data is loading...

private void ChartDock_OnLoaded(object sender, RoutedEventArgs e)

{ Thread MongoThread = new Thread(new ThreadStart(MongoDB)); MongoThread.Start(); }

Everything is called "FirstBar" as I thought I was going to have to do this in stages, but as it turns out I was able to run the whole thing in one step, so please excuse the inaccurate naming...

public void MongoDB()
{
//MongoDB
var connectionString = "mongodb://192.168.2.249";
var client = new MongoClient(connectionString);
var server = client.GetServer();
var database = server.GetDatabase("SkyNet");
MongoCollection<BsonDocument> trades = database.GetCollection<BsonDocument>("CL_Tick");
var firstBarCursor = trades.FindAll();

//Create sciChart DataStructures
var tickDataSeries = new OhlcDataSeries<DateTime, double>();
var tickPriceSeries = new PriceSeries();

////First Bar
firstBarCursor.SetBatchSize(1000);
int i = 0;
double[] firstBarLast = new double[1000];
int[] firstBarTickID = new int[1000];
string[] firstBarTimeStamp = new string[1000];

foreach (BsonDocument trade in firstBarCursor)
{
firstBarLast[i] = trade.GetElement("Last").Value.ToDouble();
firstBarTickID[i] = trade.GetElement("TickId").Value.ToInt32();
firstBarTimeStamp[i] = trade.GetElement("TimeStamp").Value.ToString();
i += 1;
if (i == 1000)

{ var firstTickPriceBar = new PriceBar(); firstTickPriceBar.DateTime = DateTime.Parse(firstBarTimeStamp[999], DateTimeFormatInfo.InvariantInfo); //We want the time the bar closed firstTickPriceBar.Open = firstBarLast[0]; //Get the first value firstTickPriceBar.High = firstBarLast.Max(); //Get the highest value in the array firstTickPriceBar.Low = firstBarLast.Min(); //Get the lowest value in the array firstTickPriceBar.Close = firstBarLast[999]; //Get the last value tickPriceSeries.Add(firstTickPriceBar); i = 0; }

}

tickDataSeries.Append(
tickPriceSeries.TimeData,
tickPriceSeries.OpenData,
tickPriceSeries.HighData,
tickPriceSeries.LowData,
tickPriceSeries.CloseData);

sciChart.RenderableSeries[1].DataSeries = tickDataSeries;
Dispatcher.BeginInvoke((Action)(() =>

{ sciChart.RenderableSeries[1].DataSeries = tickDataSeries; }

));

}

I use Dispatcher.BeginInvoke to send the results back to the caller and this works. The problem is that I have to wait 25 seconds, staring at a blank chart, util this completes. So your next question is about infrastructure. The app is running on a Windows VM under OSX, the network adapter is emulating an Intel E1000 GigabitEthernet full duplex, MongoDB sits on OSX using an SSD drive for the --dbpath (and I timed it with SSD and a RAM drive), there was no difference in speed...

So the assumption that I am working on here, is that what I am running up against here is TCP Window Size, it's safe to say that there is no round trip latency. TCP can only send so much data down one socket at a time, but if you have multiple sockets, that all changes. Hence the reason Torrents are so fast and SMB sucks.... Say you have 50 kids to pick up from the pool, the bus holds 10 kids at a time, if you use one bus you have to make 5 trips, if you use 5 busses it gets done all at once....

So the idea of ParallelScan seems to be exactly what I'm after. Multiple Cursors grab different sections of my database at the same time, establish new TCP Sockets between the client / server, and shove it all down the wire (or in this case silicon because it;s a bridged virutal interface) at once....

So to the ParallelScan code this is what I've tried:

var args = new ParallelScanArgs<BsonDocument>

{ NumberOfCursors = 3, BatchSize = 1000 }

;
for (int c = 0; c < cursors.Count(); c++ )
{

while (cursors[c].MoveNext())
{
I was hoping to use the following in here:
foreach (BsonDocument trade in cursors[c])
{
}

}
}
but that doesn't work because cursors[c] isn't a regular cursor, it's a CursorCommand or something along those lines.... and I'm just starting to learn what that is.

Any advice you can offer would be greatly appreciated.

Kind Regards

Wolfie

Comment by Craig Wilson [ 24/Jan/15 ]

Hi,

Sorry you are having trouble. ParallelScan is a very specialized function and I wouldn't expect someone new to MongoDB to need it. Regardless, I'm happy to help. For future reference, questions like this would be better directed to our user group at https://groups.google.com/forum/?pli=1#!forum/mongodb-user or on stackoverflow.com

Providing a code example of what you tried would allow us to better help you solve the problem you are having. I'm not sure what example you looked at, but the C# documentation for parallel collection scan is here: http://docs.mongodb.org/ecosystem/tutorial/use-csharp-driver/#parallelscan.

Craig

Generated at Wed Feb 07 21:38:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.