# A Lemma of Bateman-Katz

Here we present a formulation of a lemma from Bateman-Katz‘s work on the cap-set problem, which may be of independent interest in additive combinatorics at large. The proof is also novel — Ben Green knows of no other instances of this argument — and may yet find other applications. Indeed, Thomas Bloom recently used this lemma together with other bounds to give his improvement to Roth’s theorem.

[Our main result will be ‘Corollary 3’ below, which will follow from Lemma 1 and Theorem 2. ]

For any abelian group $G$ and finite $A\subseteq G$, we define the ‘2$m$-fold additive energy’  of $A$ by

$E_{2m}(A)=\left\lvert\{(a_{1},\cdots,a_{m},a_{1}^{\prime},\cdots,a_{m}^{\prime})\in A^{2m}:a_{1}+\cdots+a_{m}=a_{1}^{\prime}+\cdots a_{m}^{\prime}\}\right\rvert$

Any $(a_{1},\cdots,a_{m},a_{1}^{\prime},\cdots,a_{m}^{\prime})$ counted on the right-hand-side we call an ‘additive tuple’. The energy is one way of measuring the additive structure of $A$. The main theorems of this note link data concerning the energy to data concerning another notion of additive structure, namely (in the case where $G$ is a finite vector space) dimension.

Lemma 1

If $\lvert S\rvert = d$ and $\text{dim}(\text{span}(S))=d-k$, with $1\leqslant k\leqslant d$, then $E_{2m}(S)\leqslant 2^{2m+1}m^{4m+3}k^{2m}d^{m}$

Remarks on Lemma 1

1. There are about $m^{2m} d^{m}$ diagonal solutions counted in the additive energy, and — thinking of $m$ as much smaller than $d$ — the claim is exactly saying that (up to some tame factors) these dominate when $k$ is very small, i.e. when $S$ is almost completely linearly independent. This matches our intuition.
2. The claim is (worse than) trivial if $k$ is the same order as $d$.
3. We shall see that information about dimension meshes most naturally with equations in the elements of $S$, and not additive tuples. The count for the former is roughly equivalent to Bloom’s notion of ‘restricted energy’, and we will, like him, have to undergo a messy calculation to convert information about this restricted energy  into information about  $E_{2m} (S)$. The issue that $E_{2m} (S)$ cannot be calculated directly does genuinely seem to have been missed by B-K. Broadly speaking this means that their definition giving what it means for the large spectrum to be ‘additive smoothing’ is overly optimistic. I am currently trying to establish whether this has any major effect on their work; this will probably be the subject of a future blog post.
4. My $m$ dependence is extremely wasteful, and no doubt one can do much better. However, don’t be too scared of extra $m^m$ factors; in Bloom’s argument they roughly correspond to $\log\log N$ factors in the final bound. Also remember that there are $m^m$ factors in the trivial lower bound, so we cannot eschew them completely.
5. The reason we go to the trouble of proving these theorems for $E_{2m} (S)$ rather than the restricted energy is that  $E_{2m} (S)$ has a particularly pleasant expression in terms of the Fourier transform, which allows for many useful manipulations (in particular Holder’s inequality).

Viewing Lemma 1 in the contrapositive, it is saying that sets with larger-than-trivial energy have lower-than-trivial dimension. Theorem 2 is a more refined version, namely that sets with large energy have subsets with very low dimension.

Theorem 2

Let integers $d\geqslant 2m\geqslant\text{max}(2,2Ce)$ be arbitrary parameters and $C>0$ be independent of $d$.  Let V be finite vector space, and $A\subseteq V$ satisfy $E_{2m}(A)>\lvert A \rvert ^{2m}d^{-m}\left(2^{4m+3}m^{6m+4}C^{2m}+2m^{2m+1}\right)$. Then $\exists S\subseteq A$ with $\text{dim}(\text{span}(S))\leqslant d$ and $\lvert S\rvert \geqslant\dfrac{C}{d} \lvert A\rvert$

Remarks on Theorem 2

1. This theorem is good in the case where $d$ is a smallish power of $\lvert A\rvert$; B-K have $d\approx \lvert A\rvert ^{\frac{1}{3}}$. It gives some structural information even when the 2$m$-fold additive energy of $A$ is quite a long way below the maximum possible.
2. This is the first main ‘new’ result in Bateman-Katz’s paper, and is the main topic of this post.
3. $2m\geqslant 2Ce$ is just a minor technical condition that streamlines a particular estimation step. It could certainly be removed with a little more effort, and no effort at all if one didn’t care about how the constants in Theorem 2 depended on $m$.
4. One needs Lemma 1 as an ingredient in the proof of Theorem 2, applied to a suitably chosen random subset $S\subseteq A$.
5. For certain choices of the parameters we can use Theorem 2  to bootstrap an improved theorem regarding $E_{8}(A)$, say, which would be better than applying Theorem 2 directly with $m=4$.

Expanding on this last remark, we note that by Holder’s inequality we have

$E_{2m}(A)\geqslant \dfrac{E_{8}(A)^{\frac{m-1}{3}}}{\lvert A \rvert ^{\frac{m-4}{3}}}$

[This follows from the fact that $E_{2m}(A)=\sum\lvert \hat{A}(r)\rvert^{2m}$ and taking Holder in the form $\sum\lvert f\rvert ^{2k}\leqslant\left(\sum\lvert f \rvert^{2m}\right)^{\frac{k-1}{m-1}}\left(\sum\lvert f \rvert ^{2}\right)^{\frac{m-k}{m-1}}$ with $k=4$, followed by Parseval applied to the second bracket].

Therefore we can make the conclusion of Theorem 2 provided that, for some $m$,

$E_{8}(A)>\left(\lvert A\rvert ^{2m+\frac{m-4}{3}}\right)^{\frac{3}{m-1}}d^{\frac{-3m}{m-1}}\left(2^{4m+3}m^{6m+4}C^{2m}+2m^{2m+3}\right)^{\frac{3}{m-1}}$   (1)

Suppose we wanted to be able to take $d=\lvert A \rvert ^{\epsilon}$ for some fixed $\epsilon$. Applying Theorem 2 directly would only allow us to do so when $E_{8}(A)\gg\lvert A\rvert ^{8-4\epsilon}$ (so for example we would be forced to have $\epsilon>\frac{1}{4}$ to get a non-trivial result). However, equation (1) allows us to conclude whenever $E_{8}(A)\gg_{m}\lvert A\rvert ^{7+\frac{3}{m-1}-\frac{3\epsilon m}{m-1}}$. Taking $m$ large enough, depending on $\epsilon$, we win if $E_{8}(A)\gg_{\epsilon}\lvert A\rvert ^{7-2\epsilon}$, which is better for $\epsilon<\frac{1}{2}$. We state this as a corollary:

Corollary 3

Let $\epsilon\in (0,\frac{1}{2})$, and let V be finite vector space, and $A\subseteq V$. Then there exist a constant $K=K(\epsilon)$ such that if $E_{8}(A)\geqslant K(\epsilon)\lvert A\rvert ^{7-2\epsilon}$ then $\exists S\subseteq A$ with $\text{dim}(\text{span}(S))\leqslant \lvert A\rvert^{\epsilon}$ and $\lvert S\rvert\geqslant\lvert A\rvert^{1-\epsilon}$

Proof: Above discussion.

Proofs of Lemma 1 and Theorem 2 are below.

Proof of Lemma 1

Let $U$ be the formal vector space of dimension $d$ spanned by the elements of $S$. Let $E\subseteq U$ be the set of all linear combinations of the elements of $S$, with at most $2m$ variables, which equal $0$ when evaluated in $V$.

Now, consider the linear map from $U$ to $\text{span}(S)$ given by evaluation in $V$. This map is clearly surjective, so has a kernel of dimension $k$. But every element of $E$ is in this kernel, so $\text{span}(E)$ has a basis of at most $k$ elements of $U$, which we can assume — recalling the linear algebra fact that every spanning set contains a basis — consists of elements from $E$. These basis elements mention at most $2mk$ elements of $S$, so all equations in $E$ combined mention at most $2mk$ elements of $S$.

Regrettably $E$ does not contain all additive $2m$-tuples $(a_{1},\cdots,a_{m},a_{1}^{\prime},\cdots,a_{m}^{\prime})$. There are two phenomena which are not counted in $E$:

1. We could have $a_{i}=a_{j}^{\prime}$ for one or more pairs $(i,j)$ (diagonal behaviour).
2. We could have repetition of $a_{i}$ on one side of the equation that is some multiple of the characteristic of the underlying field.

To prove the claim all we do is take both of these into account. However, the reader is allowed to find this simple process rather painful. There is a similar — and equally painful — lemma in Bloom.

Indeed, a particular  additive $2m$-tuple could have $r$ pairs $a_{i}=a_{j}^{\prime}$, say. There are $(m(m-1)\cdots (m-r+1))^2$ choices for the positions of these pairs, and $d^{r}$ choices for what the elements are. We remove these pairs from both sides to get another equation, in which no $a_{i}$ equals any $a_{j}^{\prime}$.

The elements mentioned in this equation only come from that special set of size at most $2mk$ except those elements which appear a multiple of $p$ times on one side of the equation, where $p$ is the characteristic of the underlying field.

Let’s say that there are $np$ such elements $a_{i}$. Then there are

$\left(\begin{matrix} m-r\\p,p,\cdots,p,m-r-np\end{matrix}\right)$

ways of choosing the subsets of repeated indices (where there are $n$ instances of $p$ in the multinomial coefficient) and $d^n$ way of assigning elements of $S$ to these repeated blocks. [Of course this overcounts tuples where a particular element of $S$ is repeated $2p$ times, but we’re only after an upper bound]. There are at most $(2mk)^{m-r-np}$ ways of filling in the remaining $a_{i}$.

By performing a similar analysis for the $a_{j}^{\prime}$ we see that

$E_{2m}(S)\leqslant\sum\limits_{r=0}^{m} (m(m-1)\cdots (m-r+1))^{2} d^{r}\left(\sum\limits_{n=0}^{\frac{m-r}{p}}\left(\begin{smallmatrix} m-r\\p,p,\cdots,p,m-r-np\end{smallmatrix}\right) d^{n}(2mk)^{m-r-np}\right)^{2}$

which — approximating factorials by powers, forgetting about the denominator of the multinomial coefficient completely, and taking the maximum value of everything in the sum — is at most

$\sum\limits_{r=0}^{m}m^{2r}d^{r}\left(\left(\dfrac{m-r}{p}+1\right)m^{m-r}d^{\frac{m-r}{p}}(2mk)^{m-r}\right)^2$

which — as $p\geqslant 2$ — is at most

$(m+1)m^{2m+2}(2m)^{2m}k^{2m}d^{m}$

which is at most

$2^{2m+1}m^{4m+3}k^{2m}d^{m}$

as claimed.

Remark: When $k=0$ we only get contribution from terms which come entirely from phenomena 1 and 2. Running the above calculation we get a bound of $2m^{2m+1}d^{m}$ in this case.

Proof of Theorem 2

We prove the contrapositive statement. Suppose every $S\subseteq A$ with $\text{dim}(\text{span}(S))\leqslant d$  satisfies $\lvert S\rvert \leqslant \frac{C}{d}\lvert A\rvert$. We will show that this assumption makes a random subset of $A$ of size $d$  have small energy, and therefore that the entire set $A$ has small energy.

Indeed, let $S\subseteq A$ be a subset of size $d$ picked uniformly at random.

Claim: For any integer $k$ in the range $0\leqslant k\leqslant d$, we claim that $\mathbb{P}(\text{dim}(\text{span}(S))=d-k)\leqslant \dfrac{C^k}{k!}$.

Proof of Claim: We can consider picking the elements of $S$ one at a time, uniformly, without replacement. For $0\leqslant l\leqslant d-1$, let $S_l$ be the set constructed after $l trials, and $a_{l+1}\in A$ be the next chosen element. Then

$\mathbb{P}\left(\text{dim}(\text{span}(S_{l}\cup\{a_{l+1}\})=\text{dim}(\text{span}(S_{l}))\right)=\mathbb{P}\left(a_{l+1}\in\text{span}(S_{l})\right)\leqslant\dfrac{C}{d}$

by assumption, since $\text{span}(S_{l})\cap A$ is certainly a subset of $A$ whose span has dimension at most $d$.

Now, the events $X_{l}:= a_{l+1}\in \text{span}(S_{l})$ are not — in general — independent. Indeed, suppose that $V=\mathbb{F}_{3}^{N}$ and let $A=\{u,-u,v,-v\}$, for some linearly independent vectors $u$ and $v$. Then $X_2$ and $X_3$ both have positive probabilities but $\mathbb{P}(X_{2}\cap X_{3})=0$. [Bloom’s paper uses a slightly different probabilistic model in which we pick with replacement, but this example still scuppers independence even in that instance].

However, it is true that $\mathbb{P}\left(\bigcap\limits_{l\in L}X_{l}\right)\leqslant \prod\limits_{l\in L}\mathbb{P}(X_{l})$. Indeed, if any of the $X_l$ have probability zero then the above inequality is trivial. Otherwise, if $l_{1} < l_{2}$ then conditioning on $X_{l_{1}}$ cannot increase $\text{span}(S_{l_{2}})$ hence cannot increase $\mathbb{P}(X_{l_{2}})$, i.e. $\mathbb{P}(X_{l_{2}}\vert X_{l_{1}})\leqslant\mathbb{P}(X_{l_{2}})$.  This shows the inequality for pairwise intersections. We may show similarly that $\mathbb{P}\left(X_{l}\vert\bigcap\limits_{l\in L^{\prime}}X_{l}\right)\leqslant\mathbb{P}(X_{l})$ whenever $\text{max}(L^\prime). Writing out $\mathbb{P}\left(\bigcap\limits_{l\in L}X_{l}\right)$ as a suitable product of conditional probabilities, this proves the inequality.

[In B-K and in Bloom, it is not entirely obvious from the wording whether independence is implicitly (and incorrectly) assumed or whether this weaker statement is (correctly) assumed.]

The probability that $\text{dim}(\text{span}(S))=d-k$ is exactly the probability that $k$ of the events $X_l$ occur. By the above discussion, we have

$\mathbb{P}(\text{dim}(\text{span}(S))=d-k)\leqslant \left(\begin{smallmatrix} d\\k\end{smallmatrix}\right)(\dfrac{C}{d})^k\leqslant\dfrac{C^k}{k!}$

thus proving the claim.

Claim: With $S$ the random set of size $d$ constructed earlier, $\mathbb{E}(E_{2m}(S))\leqslant d^{m}\left(2^{4m+3}m^{6m+4}C^{2m}+2m^{2m+3}\right)$

Proof of claim: We have $\mathbb{E}(E_{2m}(S))=\sum\limits_{k=0}^{d}\mathbb{E}(E_{2m}(S)\vert \text{dim}(\text{span}(S))=d-k)\mathbb{P}(\text{dim}(\text{span}(S))=d-k)$

which, by Theorem 1 and the remark following its proof, and the previous claim, is at most

$2^{2m+1}m^{4m+3}d^{m}\sum\limits_{k=1}^{d}k^{2m}\dfrac{C^k}{k!}+2m^{2m+1}d^{m}$

It remains just to bound this sum. The infinite sum $\sum\limits_{k=1}^{\infty}k^{2m}\dfrac{C^k}{k!}$ is clearly convergent;  we observe that if $k\geqslant2C\left(1+\dfrac{1}{k}\right)^{2m}$ then  $\dfrac{\left(\frac{k+1}{k}\right)^{2m}C}{k}\leqslant \dfrac{1}{2}$.  and we can bound the sum by the ratio test. If $k\geqslant \text{max}(2m,2Ce)$ then the above certainly holds. By assumption, $2m\geqslant 2Ce$, hence we may split the sum as

$\sum\limits_{k=1}^{2m-1}k^{2m}\dfrac{C^k}{k!}+\sum\limits_{k=2m}^{\infty}k^{2m}\dfrac{C^k}{k!}$

and bound both bits as

$2m(2m)^{2m}C^{2m}+(2m)^{2m}\dfrac{C^{2m}}{(2m)!}\cdot 2$

which is certainly at most

$2^{2m+2}m^{2m+1}C^{2m}$

Inputting this into the upper bound for the expected energy, and adding in the $k=0$ contribution, we get the bound

$\mathbb{E}(E_{2m}(S))\leqslant d^{m}\left(2^{4m+3}m^{6m+4}C^{2m}+2m^{2m+1}\right)$

as claimed.

We are now nearly ready to finish to proof of the Theorem 2. Indeed, we can easily bound $\mathbb{E}(E_{2m}(S))$ below by noting that the probability that a certain additive $2m$-tuple is entirely contained in $S$ is at least $\left(\dfrac{d}{\lvert A \rvert}\right)^{2m}$ and so by linearity of expectation $\mathbb{E}(E_{2m}(S))\geqslant\left(\dfrac{d}{\lvert A \rvert}\right)^{2m}E_{2m}(A)$. Therefore, combining our upper and lower bounds, we get

$E_{2m}(A)\leqslant \lvert A \rvert ^{2m}d^{-m}\left(2^{4m+3}m^{6m+4}C^{2m}+2m^{2m+1}\right)$

as required.