Authors:
Abstract: We propose binarized-soft-tensor-core as a software-hardware co-design approach to construct the bit-manipulation capability for modern GPUs to effectively harvest the emerging bit-level-parallelism from BNNs and a variety of domains. We propose intra- and inter-layer fusion techniques so that the entire BNN inference process can be realized in one GPU kernel, labeled as Singular-Binarized-Neural-Network. Experiments show that our design can achieve over 1000x speedup for raw inference latency and 10x for inference throughput over state-of-the-art full-precision simulated BNN inference for AlexNet on ImageNet.
Presentation: file
Back to Technical Papers Archive Listing