Remove implicit thread creation in XEB sampling#6643
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6643 +/- ##
==========================================
- Coverage 97.81% 97.81% -0.01%
==========================================
Files 1067 1067
Lines 91551 91547 -4
==========================================
- Hits 89550 89545 -5
- Misses 2001 2002 +1 ☔ View full report in Codecov by Sentry. |
|
Thanks, Nour! |
| new_records = run_batch(task) | ||
| if dataset_directory is not None: | ||
| os.makedirs(f'{dataset_directory}', exist_ok=True) | ||
| protocols.to_json(new_records, f'{dataset_directory}/xeb.{uuid.uuid4()}.json') |
There was a problem hiding this comment.
Would the order of records be useful here?
If so, it would be better to write json at the end when they are all collected in the records list or at least add an enumeration suffix to the filename.
There was a problem hiding this comment.
without threading the records are written in the same order as their inputs ... using threads with as_completed (the currently) behaviour wrote them in the order they were completed (so randomly) .... adding the suffix can work now but it will break when I reintroduce the optional threading pools ... eitherway this seems to be more of debug feature than anything else
There was a problem hiding this comment.
Sounds good, thanks.
@eliottrosenberg - can you chime in if this debugging feature is still needed?
If not, we should consider removing it in a separate PR.
There was a problem hiding this comment.
I'm not following at this level of detail; I am just a user here :)
There was a problem hiding this comment.
fyi, this part of the library was written 3+ years ago
There was a problem hiding this comment.
I see, let's then leave it as is.
pavoljuhas
left a comment
There was a problem hiding this comment.
LGTM, but please see comment on optional writing of json files.
XEB sampling creates threads inorder to speed sampling up. however this leads to a race condition when the sampler itself uses threads in its internal implementation. This PR removes the implicit behaviour
In a follow up PR I'll make passing a threading pool possible but optional
cc: @eliottrosenberg