close By using this website, you agree to the use of cookies. Detailed information on the use of cookies on this website can be obtained on OneSpin's Privacy Policy. At this point you may also object to the use of cookies and adjust the browser settings accordingly.

Osmosis 2020: Closing Keynote - Application Specific ML: Building and Executing ML Models at Ultra-High Speeds with Applications in Debugging of SoCs

Claudionor José Nunes Coelho, VP/Fellow of AI - Head of AI Labs, Palo Alto Networks

While the quest for more accurate solutions is pushing deep learning research towards larger and more complex algorithms, edge devices with hard real-time constraints demand very efficient inference engines, e.g. with the reduction in model size, speed, and energy consumption. In this talk, we introduce a novel method for designing heterogeneously quantized versions of deep neural network models for minimum-energy, high-accuracy, nanosecond inference, and fully automated deployment on-chip. Our technique combines AutoML and QKeras (which is called AutoQKeras), combining layer hyperparameter selection and quantization optimization. Users can select among several optimization strategies, such as global optimization of network hyperparameters and quantizers, or splitting the optimization problems into smaller search problems to cope with search complexity. We have applied this design technique in several designs, including the event selection procedure in proton-proton collisions at the CERN Large Hadron Collider, where resources are strictly limited to hard-real time latency below 1 μs. Applications of ASML span over several applications, including the creation of high-level ML-enabled light posts that can be used in bug hunting or post-silicon debugging using formal technologies.

Please sign-up to watch the video