BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200129T163600Z
LOCATION:607
DTSTART;TZID=America/Denver:20191118T121000
DTEND;TZID=America/Denver:20191118T123000
UID:submissions.supercomputing.org_SC19_sess124_ws_lasalss109@linklings.co
m
SUMMARY:Toward Half-Precision Computation for Complex Matrices: A Case Stu
dy for Mixed Precision Solvers on GPUs
DESCRIPTION:Workshop\n\nToward Half-Precision Computation for Complex Matr
ices: A Case Study for Mixed Precision Solvers on GPUs\n\nAbdelfattah, Tom
ov, Dongarra\n\nLow-precision computations are popular in machine learning
and artificial intelligence (AI) applications. Hardware architectures, su
ch as high-end GPUs, now support native 16-bit floating point arithmetic (
i.e. half-precision). While half-precision provides a natural 2x/4x speed
ups against the performance of single/double precisions, modern GPUs are e
quipped with hardware accelerators for even more FP16 performance. These a
ccelerators, which are called tensor cores, have a theoretical peak perfor
mance that is 8x/16x faster than FP32/FP64 performance, respectively. Such
a high level of performance has encouraged researchers to harness the com
pute power of the tensor cores outside AI applications. \n\nThis paper pre
sents a mixed-precision dense linear solver (Ax = b) for complex matrices
using the tensor core units of the GPU. Unlike similar efforts that have d
iscussed accelerating Ax=b using real FP16 arithmetic, this paper focuses
on complex precisions. The developed solution uses a ``half-complex'' prec
ision to accelerate the solution of Ax=b while maintaining single-complex
precision accuracy. The proposed solver requires a matrix multiplication k
ernel that can accept half-complex inputs. We discuss two possible designs
for such a kernel, and integrate both of them into a mixed-precision LU f
actorization. The other component of our solution is an iterative refineme
nt solver, which recovers the single-complex accuracy using a precondition
ed GMRES solver. Our experiments, which are conducted on a V100 GPU, show
that the mixed-precision solver can be up to 2.5x faster than a full singl
e-complex precision solver.\n\nTag: Workshop Reg Pass, Algorithms, Scalabl
e Computing\n\nRegistration Category: Workshop Reg Pass, Algorithms, Scala
ble Computing
URL:https://sc19.supercomputing.org/presentation/?id=ws_lasalss109&sess=se
ss124
END:VEVENT
END:VCALENDAR