Search This Blog

Friday 24 March 2017

[WIP] Deep Learning hardware for the Commons

Deep learning required huge amount of processing power. Building such in all households is infeasible in near future (even if there is a breakthrough in hardware technologies for ML, it will mostly be costlier than already available options).

All ML systems consists of two phases, training and prediction(or classification, in general application). Fortunately, actual prediction process requires lesser processing power than training.

It might be useful to cooperatively design/build and run a powerful machine for training purposes and then use the model developed from training in less powerful machines like raspberry-pi.

Note that the training process need not end once and for all. We can collect new data sample after deployment, even though not every application area cater to this process, but certainly there are areas we can harness this.
 
Large System with all the rings and bells.

A compute cluster with heterogenous nodes, nodes with different combinations of cpu/gpu, nodes with different kinds of FPGA based accelerators (along with tensorflow model graph to FPGA compiler)

FPGA can also be used to run trained models for deployment. 


No comments:

Post a Comment