Matrix multiplies have the property that as the size of inputs grow, the ratio of compute, O(n^3), to data access, O(n^2), improves. If you can dedicate hardware to speeding up arithmetic and coordinating data movement you can exploit this,
ai algorithms hardware