Featured Post

Last week, DeepMind announced their new AlphaEvolve software along with a number of new results improving optimization problems in a number of areas. One of those results is a new, smaller, tensor decomposition for multiplying two 4 × 4 matrices in 48 scalar multiplications, improving on the previous best decomposition which needed 49 multiplications (Strassen 1969). While the DeepMind paper was mostly careful about stating this, when writing the introduction, the authors mistakenly/confusingly wrote:

For 56 years, designing an algorithm with rank less than 49 over any field with characteristic 0 was an open problem. AlphaEvolve is the first method to find a rank-48 algorithm to multiply two 4 × 4 complex-valued matrices

This gives the impression that this new result is the way to multiply 4 × 4 matrices over a field with the fewest number of field multiplications. However, it's not even the case that Strassen's 1969 paper gave a faster way to multiply 4 × 4 matrices. In fact, Winograd's 1968 paper on computing inner products faster could be applied to 4 × 4 matrix multiplication to get a formula using only 48 multiplications. Winograd uses a trick 

which changes two multiplications of ab's into one multiplication of ab's and two multiplications involving only a's or only b's. The latter multiplications can be pre-computed and saved when calculating all the innerproducts in a matrix multiplication

    # i=1..4, 4*2 = 8 multiplications
    p[i] := -A[i, 1]*A[i, 2] - A[i, 3]*A[i, 4];
    # 4*2 = 8 multiplications
    q[i] := -B[1, i]*B[2, i] - B[3, i]*B[4, i];
    # i=1..4, j=1..4, 4*4*2 = 32 multiplications
    C[i,j] = p[i] + q[j] 
            + (A[i,1]+B[2,j])*(A[i,2]+B[1,j])
            + (A[i,3]+B[4,j])*(A[i,4]+B[3,j])

It is simple to verify that C[i,j] = A[i,..] . B[..,j] and that the above formula calculates all of the entries of C=A.B with 8+8+32=48 multiplications.

So, if Winograd achieved a formula of 48 multiplications in 1968, why is the DeepMind result still interesting? Well, the relative shortcoming of Winograd's method is that it only works if the entries of A and B commute, while tensor decomposition formulas for matrix multiplication work even over noncommutative rings. That seems very esoteric, but everyone's favorite noncommutative ring is the ring of n × n matrices. So if a formula applies to a matrix of matrices, that means it gives a recursive formula for matrix multiplication - an tensor decomposition formulas do just that.

The original 1969 Strassen result gave a tensor decomposition of 2 × 2 matrix multiplication using only 7 scalar multiplications (rather than the 8 required by the inner product method) and leads to a recursive matrix multiplication algorithm using O(N^log[2](7)) scalar multiplications. Since then, it has been proved that 7 is the smallest rank tensor decompostion for the 2 × 2 case and so researchers have been interested in the 3x3 and larger cases. Alexandre Sedoglavic at the University of Lille catalogs the best tensor decompositions at https://fmm.univ-lille.fr/ with links to a Maple file with an evaluation formula for each.

The previous best 4 × 4 tensor decomposition was based on using the 2 × 2 Strassen decomposition recursively (2 × 2 matrix of 2 × 2 matrices) which led to 7 2 × 2 matrix multiplications each requiring 7 scalar multiplications, for a total of 49. The new DeepMind result reduces that to 48 scalar multiplications, which leads to a recursive algorithm using O(Nlog[4](48)) scalar multiplications: O(N2.7925) vs. O(N2.8074) for Strassen. This is a theoretical improvment over Strassen but in 2024 the best known multiplication algorithm has much lower complexity: O(N2.3713) (see https://arxiv.org/abs/2404.16349). Now, there might be some chance that the DeepMind result could be used in practical implementations, but its reduction in the number of multiplications comes at the cost of many more additions. Before doing any optimizations I counted 1264 additions in the DeepMind tensor, compared to 318 in the unoptimized 4×4 Strassen tensor (which can be optimized to 12*7+4*12=132). Finally, the DeepMind tensor decomposition involves 1/2 and sqrt(-1), so it cannot be used for fields of characteristic 2, or fields without sqrt(-1). Of course, the restriction on characteristic 2 is not a big deal, since the DeepMind team found a 4 × 4 tensor decomposition of rank 47 over GF2 in 2023 (see https://github.com/google-deepmind/alphatensor).

Now if one is only interested in 4 × 4 matrices over a commutative ring, the Winograd result isn't even the best possible. In 1970, Waksman https://ieeexplore.ieee.org/document/1671519 demonstrated an improvement of Winograd's method that used only 46 multiplications as long as the ring allowed division by 2. Waksman's method has since been improved by Rosowski to remove the divisions see https://arxiv.org/abs/1904.07683. Here's that nice compact straight-line program that computes C = A · B in 46 multiplications.

And, here attached are five Maple worksheets that create the explicit formulas for the 4 × 4 matrix methods mentioned in this post, and verify their operation count, and that they are correct.

Strassen_444.mw  Winograd.mw  DM_444.mw Waksman.mw Rosowski.mw

If you are interested in some of the 2022 results, I posted some code to my GitHub account for creating formulas from tensor decompositions, and verifying them on symbolic matrix multiplications: https://github.com/johnpmay/MapleSnippets/tree/main/FastMatrixMultiplication

The clear followup question to all of this is:  does Maple use any of these fast multiplication schemes?  And the answer to that is mostly hidden in low level BLAS code, but generally the answer is: No, the straightforward inner-product scheme for multiplying matrices optimized memory access thus usually ends up being the fastest choice in most cases.

Featured Post

3347

 

For a long time, triggered by a disagreement with one of my teachers, I wanted to demonstrate that Euler equations are not absolutely necessary to reproduce gyroscopic effects. Back then, there were no computer tools like Maple or programming languages with powerful libraries like Python. Propper calculations by hand (combining Newton’s equations and vector calculus) would have required days without guarantee of immediate success. Overall, costs and risks were too high to go into an academic argument with someone in charge of grading students.

Some years ago, I remembered the unfinished discussion with my teacher and simulated with MapleSim the simple gyroscope with two point masses that I had in mind at the time. It took only 10 minutes to demonstrate that I was right. At least I thought so. As I discovered recently when investigating the intermediate axis theorem, MapleSim derives behind the scenes Euler equations. This devalued the demonstration.

This post is about a second and successful attempt of a demonstration with Maple employing Lagrangian mechanics.  A rotating system of three point masses connected by rigid struts is used. The animation below from the attached Maple worksheet exactly reproduces a simulation of an equivalent T-shaped structure of three identical masses presented here.

 

Lagrangian Mechanics

The worksheet uses Lagrangian mechanics to derive equations of motion. Only translational energy terms are used in the Lagrangian to prevent Euler equations from being derived. To account for the bound motion of the three point masses, geometric constraints with Lagrange multipliers were added to the Lagrangian L of the system. This lead to a modified Lagrangian Lthat can be used with dedicated Maple commands to derive with little effort a set of Lagrange’s equations of the first kind and the corresponding constraints

Ein Bild, das Text, Schrift, weiß, Reihe enthält.

KI-generierte Inhalte können fehlerhaft sein.

     Ein Bild, das Schrift, Handschrift, Text, Typografie enthält.

KI-generierte Inhalte können fehlerhaft sein.

(source https://en.wikipedia.org/wiki/Lagrangian_mechanics)

For the system of three point masses the above equations lead to 9 coupled second-order ordinary differential equations (ODEs) and 3 algebraic constraints 

(Maple output from the command Physics,LagrangeEquations)

where xi, yi and zi are the components of the position vectors i of the masses 1 to 3 and the li,j are the constraints between the masses i and j, and b and h are the base and the height of the triangle.

The 12 equations together are also referred to as differential algebraic equations (DAEs). Maple has dedicated solvers for such systems which make implementation easy. The most difficult part is setting the initial conditions for all point masses. In this respect MapleSim is even easier to use since not all initial conditions have to be exactly defined by the user. MapleSim also detects constraints that allow for a simplification of the problem by reducing the number of variables to solve. This leads automatically to 3 instead of 12 equations to be solved. Computational effort is reduced in this way significantly.

 

Newtonian Mechanics

One could argue now that the above is a demonstration with Lagrangian mechanics and not with Newtonian mechanics. To treat the system in a Newtonian way, the masses must be isolated and internal forces acting on each mass via the struts are applied to each mass and effectively become external forces. This leads to 9 ODEs with 27 unknows

Ein Bild, das Text, Schrift, Screenshot enthält.

KI-generierte Inhalte können fehlerhaft sein.

Following actio=reactio for each of the (massless) struts reduces the number of unknows to 18

Ein Bild, das Text, Handschrift, Schrift enthält.

KI-generierte Inhalte können fehlerhaft sein.

To solve for the 18 unknows, 9 more relations are required. 3 algebraic constraints that keep the distance between the masses constant have already been listed in the previous section. 6 further algebraic constraints can be established from the fact that the force vectors point towards the opposing mass (see also below).
The effort to solve this system of equations will be even greater but with the benefit of having information about the internal forces in the system.
Before making this effort, it is advisable to take a closer look at the equations of motion derived so far.

 

Forces and the "mysterious" Lagrange Multipliers

Rearranging equations of motion from Lagrangian mechanics to

Ein Bild, das Text, Schrift, Screenshot, Electric Blue (Farbe) enthält.

KI-generierte Inhalte können fehlerhaft sein.

and comparing this to the equations of motion from Newtonian mechanics yields in vector notation 

or more general for the forces

Ein Bild, das Schrift, Handschrift, Reihe, Typografie enthält.

KI-generierte Inhalte können fehlerhaft sein.

where the i are the position vectors of the individual masses and the  are the constraint forces between them.  

In the case of a triangle formed by struts all internal forces must act in the direction of the edges of the triangle. If they would not act this way, opposing pairs of  and   would create a torque around the struts which would lead to an infinite angular acceleration of the massless struts. The above equation confirms this reasoning: The internal forces act in the direction of the difference vectors between the position vectors of the masses (which describe the edges) and scale with lij.

The beauty of the Lagrange multipliers in this case is that they hit 3 birds (three components of the vectors ) with one stone. This reduces the number of unknowns.

However, the Lagrange multipliers are somehow mysterious because they do not represent a physical quantity, but they can be used to calculate meaningful and correct physical results.

What makes them even more mysterious is the fact that positions constraints can be expressed in different ways. In the above example the square of the distance between the masses is kept constant instead of the distance. There are many more possibilities to formulate the constraint of constant distance and each of them results in different multipliers lij with different units. In principle they should all work equivalently but might not all be usable with dedicated solvers.

According to the above equation, the internal forces in a strut scale with the Lagrange multipliers and the length of the strut. During the back and forth flip of the triangle in the above animation the forces vary which can be appreciated from the lij in the following plot. 

Ein Bild, das Diagramm, Reihe enthält.

KI-generierte Inhalte können fehlerhaft sein.

Some observations:

  • At the start, only the strut between the red and the green masses is tensed by centrifugal forces. This would have been intuitively expected.
  • At the start, the broken symmetry in the initial conditions is already visible by the imbalance of the forces in the two other struts.
  • At no time two forces are zero
  • There is never compression in two struts at the same time. The existence of compression forces renders any attempt to replace the struts by cables useless
  • Plotted together in 3D the Lagrange multipliers describe a seemingly perfect circle.

Ein Bild, das Diagramm, Reihe, Kreis, Origami enthält.

KI-generierte Inhalte können fehlerhaft sein.

The last observation in particular shows that both the Lagrange multipliers and the IAT still hold some secrets that need to be clarified:

  • Is it an exact circle? How to prove that?
  • Does a circle appear for all initial conditions and geometries (obtuse isosceles triangles) when the intermediate axis theorem manifests?
  • What does the circle represent? Is it a kind of an invariant between the Lagrange multipliers that allows the calculation of one force when the two others are known?
  • Is there an analytical representation of the circle as a function of the geometry and the initial conditions?
  • What determines the center, the radius and the orientation of the circle?

 

Conclusion

Overall, the approach of adding non-physical terms that are zero to a physical quantity (the difference between kinetic and potential energy) to derive something meaningful is not obvious at all and underlines the genius of Lagrange. For a system of three bound masses it could be used to calculate internal forces as opposed to the more common use of calculating external constraint forces. Beyond the lack of fully satisfying intuitive explanations, the IAT still offers unanswered scientific questions.

 

PS.: 

  • The exercise was a nice cross-check of MapleSim and Maple.
  • dsolve and odeplot are awsome

 

IAT_without_Euler_equations.mw



Flattening a Matrix

Maple 2024 asked by Harry Gars... 269 Yesterday

BarChart-Statistics

Maple 2024 asked by jalal 435 May 27