Postdocs, matrix multiplication, and WSJ: yet more shorties
I’m proud to say that Nick Hunter-Jones and Matteo Ippoliti—both of whom work at the interface between quantum information science and condensed-matter physics (Nick closer to the former and Matteo to the latter)—have joined the physics faculty at UT Austin this year. And Nick, Matteo, and I are jointly seeking postdocs to start in Fall 2023! Please check out our call for applications here. The deadline is December 1; you apply through AcademicJobsOnline rather than by emailing me as in past years.
The big news in AI and complexity theory this week was DeepMind’s AlphaTensor, and its automated discovery of new algorithms for matrix multiplication. (See here for the Nature paper.) More concretely, they’ve used AI to discover (among other things) an algorithm for multiplying 4×4 matrices, over finite fields of characteristic 2, using only 47 scalar multiplications. This beats the 49=7×7 that you’d get from Strassen’s algorithm. There are other improvements for other matrix dimensions, many of which work over fields of other characteristics.
Since I’ve seen confusion about the point on social media: this does not improve over the best known asymptotic exponent for matrix multiplication, which over any field, still stands at the human-discovered 2.373 (meaning, we know how to multiply two N×N matrices in O(N2.373) time, but not faster). But it does asymptotically improve over Strassen’s O(N2.81) algorithm from 1968, conceivably even in a way that could have practical relevance for multiplying hundreds-by-hundreds or thousands-by-thousands matrices over F2.
Way back in 2007, I gave a talk at MIT CSAIL’s “Wild and Crazy Ideas Session,” where I explicitly proposed to use computer search to look for faster algorithms for 4×4 and 5×5 matrix multiplication. The response I got at the time was that it was hopeless, since the search space was already too huge. Of course, that was before the deep learning revolution.
This morning, the Wall Street Journal published an article by Karen Hao about competition between China and the US in quantum computing. Unfortunately paywalled, but includes the following passage:
Meanwhile, American academics say it’s gotten harder for Chinese students to obtain visas to conduct quantum research in the U.S. “It’s become common knowledge that when Chinese students or postdocs come to the U.S., they can’t say they’re doing quantum computing,” says Scott Aaronson, director of the Quantum Information Center at the University of Texas, Austin.
Comment #1 October 7th, 2022 at 11:50 am
While AlphaTensor’s result implies a faster non-galactic algorithm for matrix multiplication than Strassen’s algorithm, with an exponent of \( \log_4 47 = 2.777\) as compared to Strassen’s \( \log_2 7 = 2.807\), the best known asymptotics for any non-galactic algorithm for matrix multiplication is based on Smirnov’s 3x3x6 matrix multiplication algorithm, with an exponent of 2.774 [0]. For more, see this blog post by Adam Goucher [1].
[0]: https://link.springer.com/article/10.1134/S0965542513120129
[1]: https://cp4space.hatsya.com/2022/10/06/matrix-multiplication-update/
Comment #2 October 7th, 2022 at 12:34 pm
Thank you for explaining the practical impact of the AlphaTensor discovery! It was not clear from the news why this was useful. I was also wondering if a regular/exhaustive computer search could find a similar optimization. In a virtuous feedback cycle, I wonder if this advance could lead to a faster deep learning algorithm itself. It is also good to know how far we are from the currently known ceiling (exponent) on the speed of matrix multiplication.
Comment #3 October 7th, 2022 at 12:37 pm
Hi Scott, their 4×4 matrix multiplication improvement only holds over F_2, so it may not yield practically relevant speedups. There is some discussion of the implications of their results in the comment here:
https://www.lesswrong.com/posts/5Zfyktwgz3rvAvZyL/paper-discovering-novel-algorithms-with-alphatensor-deepmind?commentId=sR3HpF4GGoLnEhEay
Comment #4 October 7th, 2022 at 1:29 pm
The 4×4 algorithm is only for mod2 matrix multiplication, mind you.
Comment #5 October 7th, 2022 at 2:00 pm
SR #3 and Ryan O’Donnell #4: (gasp) Thanks so much; I’d completely missed that crucial caveat! Edited the post (while in an Uber en route to Lenny Susskind’s 82nd birthday conference 🙂 )
Comment #6 October 7th, 2022 at 2:23 pm
Is there a paper summarizing the major algorithms that were discovered using AI/search? (Is this charge too vague?) What is the most important/relevant one?
Comment #7 October 7th, 2022 at 2:52 pm
Eric B #6: Cases where some aspect of an algorithm was optimized using computer search are probably too numerous to list. This is the first case I know about where an optimization relevant to complexity theory was done with crucial assistance from deep learning. Because of the restriction to fixed-size matrices, it’s still not optimization over an “unbounded” set of potential algorithms.
Comment #8 October 7th, 2022 at 2:53 pm
BTW Scott. I know this is off-topic from your thread, but I was wondering how long it took you to fully recover from COVID-19 (I saw your previous post from September about this). Are you still experiencing any lingering symptoms? (Hopefully not)
Comment #9 October 7th, 2022 at 4:18 pm
Scott —
There are only 2^32 matrix multiplications of 4×4 matrices in F_2. You could just store them all in a table. Like you, I had missed that this only applies over F_2;. Now I’m really not sure why this is exciting.
— Ernie
Comment #10 October 7th, 2022 at 4:44 pm
Aaron G #8: It took 2 weeks, probably because Paxlovid prematurely suppressed the virus before my body had learned to fight it, but without suppressing it entirely. If I had to do it over, I’d either skip the Paxlovid or else harangue a doctor into giving me more of it! 🙂
Thankfully, I don’t notice any lingering symptoms, and I choose to believe that my brain was unharmed (although it might also be convenient to blame long COVID for my next scientific error…).
Comment #11 October 7th, 2022 at 4:49 pm
Looking at the paper again, the particular case of reducing two 4×4 matrices from 49 to 47 multiplications is indeed only over F_2, but they have some other results that they say apply more generally. For instance, they found a way of multiplying a 4×5 by a 5×5 matrix that applies in general, assuming that I’m reading the paper p. 4 correctly (not a safe assumption).
Comment #12 October 7th, 2022 at 4:52 pm
Ernest Davis #9: What exactly do you mean by “232 multiplications”? Aren’t the relevant objects to be counted sequences of multiplication gates? Also, if it was easy to do by brute-force enumeration, why wasn’t it done before?
Comment #13 October 7th, 2022 at 5:11 pm
Probably I’m misunderstanding something. But it seems to me that there are only 2^16 different 4×4 matrices over F_{2} and therefore only 2^32 different multiplication problems. You can precompute them all and save them in a table.
As for why it wasn’t done before, my guess is that the explanation is that multiplying 2 4×4 Boolean matrices is not an important enough problem to be worth reserving 16 Gigabytes.
Comment #14 October 7th, 2022 at 5:39 pm
Ernest Davis #13: OK, but that’s answering a different question than the one both complexity theorists and the AlphaTensor folks care about. The latter question is: assuming you’re only allowed field operations (so, no random accesses to a giant table or whatever), what’s the minimum number needed to compute the matrix product?
Note that answers to the latter question generalize to all matrix sizes, via Strassen-like recursion. The table lookup approach wouldn’t do so.
Comment #15 October 7th, 2022 at 5:47 pm
@Ernest Davis — my understanding is that this approach should also work over any finite field of characteristic 2; and large fields of characteristic 2 are actually used for e.g. cryptography, so this is potentially useful.
Comment #16 October 7th, 2022 at 11:05 pm
Dear Scott,
There’s this very convenient website called archive.is; the key is to put archive.is/(URL). For example, you can go to http://archive.is/www.wsj.com/articles/china-competing-us-quantum-computing-11664997892 – you can Archive the page if it has not been already. In this case, it has been. So you get sent to https://archive.ph/pDKLV which is indeed non-paywalled.
RIP Aaron Swartz, coming up on 10 years, never forget.
-Ryan
Comment #17 October 8th, 2022 at 2:23 am
I gotta wonder if that reinforcement learning stuff can be used for cryptanalysis. There has been some success using SAT solvers to bash through some encryption steps. Similarly, I wonder if SAT solvers could attack binary matrix multiplication.
Comment #18 October 8th, 2022 at 10:55 am
Just to make it perfectly clear: there are many finite fields of characteristic 2, not just \(F_2\). For any positive integer n, the polynomials of degree < n with coefficients in \(F_2\) form such a field, wth \(2^n\) elements. These are the fields that this new algorithm works over.
Comment #19 October 8th, 2022 at 11:21 am
TonyK #18: Yes, I’d assumed as much, thanks!
Comment #20 October 8th, 2022 at 5:44 pm
asdf #17: SAT solvers have indeed been used to find binary matrix multiplication algorithms: see Heule, Kauers, Seidl https://arxiv.org/abs/1905.10192 and Courtois, Bard, Hulme https://arxiv.org/abs/1108.2830.
Comment #21 October 9th, 2022 at 10:31 am
I have a very basic question. Let A and B be 4×4 matrices with entries in some field. What is the simplest algorithm to compute C = AB in less that 112 field operations?
112 = 16*7. C has 16 entries, each computed as a dot product with 4 multiplications and 3 additions.
Comment #22 October 9th, 2022 at 1:13 pm
Everyone: I’ve been amused by multiple anonymous comments, which I’ve left in moderation, attacking me for supposedly not understanding the distinction between “field of characteristic 2” and “field with 2 elements.” If I indeed didn’t understand that, my abstract algebra professor from Cornell would surely be ashamed of me, and so would Avi Wigderson, with whom I coauthored the Algebrization paper which relies on finite fields extensively (albeit, usually of prime order for simplicity 🙂 ).
That wasn’t the issue at all. The Nature paper itself talks about matrices over “Z2” (with even that sufficiently buried that most other social media posts missed it). Once I saw the “Z2” part, I assumed it would work for any field of characteristic 2, but decided to leave that out, both because I wasn’t 100% certain (if it generalized that way, why didn’t they say so?) and because it was probably irrelevant to 98% of my readers anyway. I put it in after being explicitly nitpicked about it.
Comment #23 October 9th, 2022 at 2:16 pm
Reading round a little further about this: there was an interesting article last year (cited in the DeepMind Nature piece) on using SAT-solvers as a first step in finding solution sto the 3×3 x 3×3 case. (They found 17,000 new solutions with 23 multiplications, tying the existing record, but none that broke that record.)
https://www.sciencedirect.com/science/article/pii/S0747717120301139
There’s a lot of interesting material here, but one point in particular struck me “There is also a way to do it [3×3 x 3×3] with only 22 multiplications (Marakov, 1986), but this scheme requires the assumption that the coefficient ring is commutative, which prevents it from being applied recursively.” I’m puzzled how this could work. In solutions like Strassen’s or like the ones in the Nature DM article the multiplications are all (sum of As) * (sum of Bs) so it’s hard to see how commutativity could be an issue. Google Scholar doesn’t list an online version of (Marakov, 1986). Anyone know what is going on here?
Comment #24 October 9th, 2022 at 3:09 pm
First you were the “king of the incels,” and now you’re an idiot who doesn’t understand undergraduate abstract algebra, apparently. Why do these people seem so intent on attacking you for every single thing you say, no matter what? You have this harmless academic who just wants to talk about quantum computing and swarms of contemptuous idiots who want to make him miserable for no reason at all.
Comment #25 October 9th, 2022 at 4:13 pm
Ernest Davis #23: Rather than just (sum of entries of A)*(sum of entries of B), Makarov’s solution uses multiplications of the form (sum of entries of A and B)*(sum of entries of A and B). The assumption of ring commutativity allows the multiplication to expand as a sum of terms a*b (a an entry of A and b an entry of B) whereas it would also include terms b*a in a noncommutative setting. By the way, an algorithm for 3×3 matrix multiplication over a commutative ring with 21 multiplications was discovered recently: https://doi.org/10.1016/j.jsc.2022.05.002
Comment #26 October 10th, 2022 at 8:37 am
Is there any explanation of why deep learning methods were useful in this sort of search? I generally think of deep learning as showing up when there is some sort of categorization task that a human could do instinctively but couldn’t explain well, like sorting movies into genres or recognizing drawings of people. As I understand it, the role of deep learning in Alpha Go is to recognize “reasonable Go positions” in order to narrow the game tree, after which more standard forms of search are used.
So I am surprised to see it show up in a problem which is mathematically precise and which, to me, doesn’t seem to have much room for human perception.
Comment #27 October 10th, 2022 at 9:20 am
“The response I got at the time was that it was hopeless, since the search space was already too huge. Of course, that was before the deep learning revolution.”
So, how exactly is deep learning managing to be so efficient searching a space that’s too huge? Is it because of a certain characteristic of the problem itself? Do we know ahead of time what deep learning is going to be able to crack?
Comment #28 October 10th, 2022 at 5:09 pm
Curtis Bright #25: Thanks very much!
Comment #29 October 10th, 2022 at 11:58 pm
This is an interesting response to DeepMind’s paper: https://arxiv.org/abs/2210.04045 . They further improve the multiplication count for 5 x 5 multiplication over F_2, as well as find an alternative scheme for 4 x 4 multiplication over F_2 which attains the same count as DM did (they claim it is not isomorphic to the DM scheme).
Comment #30 October 11th, 2022 at 11:14 am
Interestingly, matrix multiplication mod 2 is relevant to quantum stabilizer simulation. Composing two stabilizer tableaus is effectively a matrix multiplication mod 2 problem. And when you need to measure a constant fraction of the qubits in a stabilizer simulation, the expensive part can be reduced to composing tableaus.
Normally it’s thought that in order to do the measurement you need to perform gaussian elimination, using row operations that correspond to right-multiplying the tableau by S, CZ, and CX operations. These operations are available because they do nothing to the state; because effectively they’re being placed immediately after resets where the controls are not satisfied so the operations do nothing. But because these operations can all be applied to bits, with the S and CZ operations having no effect while CX has the usual classical effect, and you are measuring the qubits into bits , you can also left multiply the state by these operations (at least within the subset that you are measuring) and therefore you have access to column operations. So instead of performing gaussian elimination you can perform diagonalization, and diagonalization can be done in a divide and conquer way that’s limited by matrix multiplication.
Comment #31 October 11th, 2022 at 12:56 pm
If I may, I have two more questions for the knowledgeable readers and writer of this blog. I had been taught, and, I’m sorry to say, have taught many of my own students, that Strassen’s algorithm was of purely theoretical interest; first, because you only gain the time advantage when N is too big for the multiplication to be carried out, and second because it’s numerically unstable. According both to the Wikipedia article on Strassen’s algorithm and to the article about using a SAT solver for 3×3 matrices that I linked earlier, the first is absolutely wrong, and the second is debatable; and in fact Strassen’s algorithm is used extensively in practice, certainly if one is doing exact arithmetic or integer arithmetic.
Question 1: Suppose that one or both matrices are sparse, as is the case in many practical applications. Can Strassen’s algorithm be adapted so that it is a practical improvement over “conventional” sparse matrix multiplication algorithms?
Question 2: The theoretical asymptotic running times of matrix inverse and matrix multiplication are the same. Does Strassen’s algorithm yield a practically useful algorithm for matrix inverse or for solving systems of linear equations?
Comment #32 October 15th, 2022 at 10:22 am
My understanding is that we usually count multiplications because they are more “expensive” than additions on computers (ignoring vectorisation etc), but in the case of binary coefficients isn’t the cost the same (AND VS XOR)? If so, the only metric that would matter is the total number of operations (multiplication+addition).
Comment #33 October 15th, 2022 at 12:36 pm
Kevin O #32: Again, the fundamental reason to count multiplications as more important than additions is that only the multiplications matter when you apply the algorithm recursively to get an asymptotic speedup.
Comment #34 October 20th, 2022 at 7:57 am
Fast linear algebra is numerically stable: https://arxiv.org/pdf/math/0612264.pdf
Comment #35 October 21st, 2022 at 12:12 pm
Isaac #1, What is an 3x3x6 matrix?