Right! `KP` is what LinearAlgebra:-KroneckerProduct does for the exact case. So it's hardly surprising that it too is much slower than the full LinearAlgebra:-_Main:-KroneckerProduct routine when dealing with float Matrices. You haven't done anything 'wrong". This just illustrates what differences there are in approach.
Consider: LinearAlgebra:-KroneckerProduct calls LinearAlgebra:-LA_Main:-KroneckerProduct. And that latter routine splits according to the case of exact versus float. If the data is exact, it uses code just like `KP`. If the data is float then it is free to convert the data to float, construct a much lighter float result container, and then do the entire computation in external compiled C.
All the posted versions at top are much slower than LinearAlgebra:-KroneckerProduct for float Matrices. And they consume much more memory since -- unlike LinearAlgebra:-KroneckerProduct -- they havn't been designed to return a float matrix as the result. They use fundamentally different approaches from what LinearAlgebra:-KroneckerProduct does in the float case. It doesn't make much sense to compare them in that way -- except of course for the purpose of seeing how different they behave.
It might make sense to compare LinearAlgebra:-KroneckerProduct to new versions of the posted codes, redesigned for floats.
Or it might make sense to to compare the posted routine `KP` to the others originally posted (ideally for exact examples), since that looks like the exact case code inside LinearAlgebra:-KroneckerProduct. I guess it might be kinda sorta interesting to compare `KP` against the others originally posted on large float examples, just as an experiment (while ackowledging that was not the designed purpose, but at least it's a "fair" comparison).
It makes sense to compare `KP` against the posted versions. Or to compare LinearAlgebra:-KroneckerProduct against the posted versions for exact examples (large of small). It doesn't make a lot of sense to compare LinearAlgebra:-KroneckerProduct against the posted codes (or `KP`), for float examples.
The `KPKP` routine posted ran ok for me, without accessing out of range. Is it actually fastest on the original small exact symbolic test example?
Mathematical Software, Maplesoft