-
Notifications
You must be signed in to change notification settings - Fork 170
BIDMach's Architecture
jcanny edited this page May 1, 2014
·
12 revisions
BIDMach has a modular design intended to make it very easy to create new models, to run diverse datasources, and tailor the performance measures that are optimized in training. A graphic of BIDMach's architecture appears below:

The elements of the architecture are:
- Datasources support a "next" method which produces a minibatch of data i.e. a block of samples of specified size. The datasource itself may be backed by an array in memory, a collection of files on disk, or an HDFS source. Datasources in general output multiple matrices in response to the next method: for instance a datasource for training regression models outputs a block of k samples as a sparse matrix and a block of k class membership vectors as a dense matrix. Some datasources also support a "putBack" method, which allows data to be pushed back into the Datasource. Such sources are therefore both sources and sinks. For instance, a datasources for regression prediction has two "output" matrices: one contains includes the data instances to predict from, and the second matrix contains the predictions.
- Models