[COMPASS-4163] Imported large CSV file into Compass creates document count mismatch Created: 24/Feb/20 Updated: 29/Oct/23 Resolved: 09/Mar/20 |
|
| Status: | Closed |
| Project: | Compass |
| Component/s: | Compass, Import/Export |
| Affects Version/s: | 1.20.5 |
| Fix Version/s: | 1.21.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Felicia Hsieh | Assignee: | Lucas Hrabovsky (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
MacOS 10.14.6 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Sprint: | Iteration Yak | ||||||||
| Description |
|
Used Compass > Import Collection > CSV > local file of a very large CSV file (3.7GB) from http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv from creates error after 20 sec of attempting to complete import. After clicking on "Import" button to continue, it seems to try again. Screenshot of database collections shows "0" documents, but inspecting the collection shows that 2.9m documents were saved. Perhaps also give guidance to Compass users what the max size of an import file that could be attempted.
|
| Comments |
| Comment by Lucas Hrabovsky (Inactive) [ 09/Mar/20 ] | ||||||||||||||||||||
|
On current master import took ~30min against localhost: All 25M documents created, but.... the source .csv is headerless so field names are munged to be values of the first row: I created Work arounds for not having wacky field names in the mean time: 1. Manually add the headers as the first line of that .csv before importing. My best guess as to the field names based on docs
2. Use the aggregation pipeline to clean it up (make a view or $out/$merge). Here are the field mappings you'll need for project if you use that original .csv without adding headers:
We currently have no maximum size constraint on the client side. |