Math 256x: The Theory of Error-Correcting Codes (Fall 2013)

This year’s Math 256x is a “topics class” (taught here before, but last in 2001) that develops the mathematical theory of error-correcting codes and its connections with other areas of mathematics ranging from combinatorics to group and representation theory to projective and algebraic geometry. We meet Tuesdays and Thursdays from 11:30 AM to 1:00 PM in Room 309Room 109 of the Science Center.

If you find a mistake, omission, etc., please let me know by e-mail.

September 3: Overview
September 5: Introduction: Hamming distance and Hamming space; (n, M, d) codes; transmission and error-detection rates; the Singleton bound and MDS codes; the Hamming (sphere-packing) bound and perfect codes; the q-entropy function and the asymptotic form of the sphere-packing bound; the Gilbert-(Shannon-)Varshamov bound and our asymptotic embarrassment
September 10: Linear codes (and isometries of Hamming space)
First problem set: continuity of the asymptotic rate functions; a bit on spherical codes; binary codes with 2d > n (a foretaste of the “linear programming” bounds); optimality and uniqueness of the [7, 4, 3]2 (dual Hamming) code
September 12: No Class
September 17: Linear codes and point configurations in projective space (and introduction of several structures and constructions that will recur later in the semester)
Second problem set [Corrected Sep.18, see 3ii]: More about point configurations and the associated linear codes
September 19: Codes and point configurations, cont’d; Segre’s theorem on ovals in algebraic projective planes of odd order
September 24: More on conics, especially over finite fields, and projective spaces in general; rational normal curves and classical Goppa codes.
We’ll use these Lecture notes
September 26: Introduction to weight enumerators and the MacWilliams identity
See also Part II of my Notices article “Lattices, Linear Codes, and Invariants” (Part I is in the analogous picture for lattices in Euclidean space, which is developed to an extent later in the course).
October 1: Some uses of the MacWilliams identity; Gleason on the Hamming weight enumerators of self-dual binary codes
October 3: Gleason for self-dual codes of Type II, III, and IV
October 8: Existence and uniqueness of the binary Golay codes
Third problem set: Synthemes, totals, and the projective plane of order 5
October 10: The binary Golay codes and related structures, cont’d
October 15: Extremal enumerators and codes; the Mallows-Odlyzko-Sloane theorem
October 17: The sphere-packing problem; lattices and their theta series
Table listing, in several code and lattice contexts, the parameters q, F, Δ, q0 and 1/Δ(q0), and the first value m0 for which the extremal weight enumerator or theta series has a negative coefficient. (The m0 information can also be found in Theorem 29 on page  of Rains and Sloane’s chapter “Self-Dual Codes” from the Handbook of Coding Theory, citing a preprint by Zhang Shengyang that meanwhile was published in Discrete Applied Math. 91 (1999), 277–286.)
October 22: Overview of sphere-packing bounds; theta series of unimodular lattices; extremal theta functions and lattices
October 24: Type I and Type II lattices (a.k.a. odd and even unimodular lattices) and their theta series; Construction A
October 29: Asymptotics of kissing numbers of extremal codes and lattices via the stationary-phase method
October 31: Asymptotic impossibility of (nearly-)extremal codes. Start on Reed-Muller codes
November 5: Reed-Muller codes, cont’d
November 7: a bit more on Reed-Muller codes; introduction to cyclic codes
November 12: Cyclic codes cont’d: the BCH bound and BCH codes; QR codes, and Golay codes as QR codes
November 14: Krawtchouk polynomials and Lloyd’s theorem on perfect codes
November 19: Introduction to the linear programming (LP) bounds on error-correcting codes
November 21: orthogonal polynomial basics
November 26: LP bounds II: an asymptotic upper bound
November 28: No Class (Thanksgiving break)
December 3: The ternary Golay codes and related objects

Tuesday, Sep. 3: Overview

The general motivating setup from information theory for error-correcting block codes (as opposed to other kinds such as convolutional, let alone cryptographic: we’re aiming to protect against error, not eavesdropping). Natural languages such as English are very suboptimal error-correcting codes (fingerpuinting?); warning about real-world™ errors beyond the usual scope of the mathematical theory (eror-collecting colds). Coming attractions, drawing on combinatorics, symmetry and groups, linear algebra, finite geometry, invariant theory, orthogonal polynomials, etc. Example: the binary Golay code G23, suggested by the coincidence

1 + 23 + (23·22)/2! + (23·22·21)/3! = 1 + 23 + 253 + 1771 = 2048 = 211,

has point stabilizer the sporadic Mathieu group M23. Another example: encoding polynomials of degree <k on Fq by their values on all q field elements yields a code that can detect qk errors; thus it can correct (qk)/2 errors, and in this special case we’ll see that that this is computationally feasible (whereas in general finding the nearest codeword in a linear code is NP-complete).

[formalities about texts, office hours, grading, etc.; see the initial handout]

Tuesday, Sep. 5: Introduction

General setup for block codes:

Fix a finite set A (“alphabet”) of q>1 elements (“letters”).
[q=1 is vacuous,
as already observed by Lewis Carroll — in general, a letter from an alphabet of size q carries log2(q) bits’ worth of information.]

For any integer n>0 consider the set An of n-letter “words” w = (w1, …, wn). Define the Hamming distance between two words w, w as the number of coordinates i where wiwi; that is, d(w, w’) is the number of one-letter changes (errors) one must make in w to reach w’. Exercise: This is in fact a distance function. We can thus regard An as a metric space, known as Hamming space.

A code is a nonempty subset C of Hamming space. C is said to be binary, ternary, quaternary, “etc.” if the alphabet is of size 2, 3, 4, etc. The integer n is the length of the code. The key parameters of C are q, n, the number of elements (“codewords”) of C, and the minimum (nonzero) distance dmin between codewords. [If there is only one codeword then it is sometimes convenient to set dmin = n + 1, rather than dmin = ∞ which is the usual convention for the minimum of an empty set.] We use the notation “(n, M, d) code” for a code of length n, size M, and minimum distance d. A code of minimum distance at least d is (d−1)-error detecting, and thus also e-error correcting if d > 2e. Sometimes the alphabet size q is incorporated into the notation as a subscript, making C an “(n, M, d)q code”.

A basic question of coding theory is how large M can get given q, n, and d (or more-or-less equivalently how large d can get given q, n, and M). As is often the case in this kind of combinatorics, except in trivial cases it is rare that we can get an exact answer for a given choice of q, n, and M or d, and even the asymptotic behavior is not known: we can only give lower and upper bounds, which hardly ever coincide. Here one standard asymptotic direction is to fix q, let n→∞, and try to make the transmission rate R = (logqM) / n and the error-detection rate δ = d / n both large. One of our recurring themes in this course will be such bounds and asymptotic bounds, and the rare and often special cases where the bounds agree and we get a provably optimal code.

An elementary upper bound is due to Singleton (1964): an (n, M, d) code must have Mqnd+1. The proof is an easy pigeonhole argument. (Note that when M = 1 we must use the convention d = n + 1 for the Singleton bound to hold in this case.) Equivalently, R + δ ≤ (n + 1) / n. Codes that attain this bound are known as maximum distance separable, or MDS for short. We shall see that q is a prime power such a code exists for all d and n as long as 0 ≤ d−1 ≤ nq+1, but the behavior for n much larger than q is very different. Indeed, it follows from the Singleton bound that asymptotically R + δ ≤ 1, but we’ll see that for fixed q and large n this bound cannot be attained except for (R,δ)=(1,0) and (0,1), and moreover that if R > 0 for an infinite family of codes with fixed q then 1−δ is bounded away from zero.

Two trivial examples of MDS codes are An itself (with d=1) and a one-word code (with d=n+1 — so yes, a singleton code attains equality in the Singleton bound). Only slightly less trivial is the example with d=n of the repetition code consisting of the q words of the form (a, a, …, a). A final easy example is a single checksum code with d=2 (also called a parity check code in the case q = 2): give A a group structure (usually abelian, though this isn’t necessary), and let C consist of the qn−1 words whose entries sum to zero. A curious variant is the ISBN, which is in effect the (10, 11·109, 2)11 code obtained from an MDS code by discarding all codewords that use 10 for any but the last letter; this code (even in its unmutilated MDS form) also has the feature of detecting single-transposition errors, because it is defined by an F11-linear relation with pairwise distinct coefficients.

An example of an optimal code that is not MDS is the Hamming code (1950), with parameters q = 2 and (n, M, d) = (7, 16, 3). [NB an MDS code with the same n and M would have had d=4.] The 7 coordinates are identified with the vertices of the Fano plane P2(F2); the alphabet A is F2 = {0,1}, and the codewords are the all-zero word, the characteristic functions of the lines, and the ones’ complements of these 1+7=8 words. Check that each codeword c is at distance 3 from seven codewords and at distance 4 from seven other codewords, with the remaining codeword other than c itself being the antipodal word, at distance 7. Since no two codewords are at distance less than 3, the “Hamming spheres” (actually Hamming-distance balls) of radius 1 about the codewords are disjoint; but each contains 1+7=8 words, and there are 16 of them, so they tile the 27 words of A7 perfectly, proving that M=16 is optimal given q=2, n=7, and d=4.

In general, the sphere-packing bound (a.k.a. the Hamming bound) is the observation that if d > 2e then M can be no larger than qn divided by the number of words in a Hamming ball of radius e. A code that attains this bound is said to be perfect. This happens very rarely: besides generalizations of the Hamming code that we’ll see before long, all of which are perfect single-error-correcting codes, the only known perfect codes are the trivial ones of a one-word code and a binary repetition code of odd length, and the very nontrivial examples of the binary and ternary Golay codes. It is known (and we may show towards the end of the class) that these are the only perfect codes for which q is a prime power. [Warning: the condition that qn be an exact multiple of the sphere volume is necessary but not sufficient; a notable example: it is known (and we’ll prove) that there is no perfect binary (90, 278, 5) code.]

What does the Hamming bound look like in the asymptotic regime where we fix q and let n and d grow while their ratio δ remains roughly constant? We need the following fundamental computation. The number of words at distance exactly e from a given word is (q−1)e Binomial(n, e). If we fix q and let n, e → ∞ with e/n → ε for some fixed ε in (0,1), then we can estimate Binomial(n, e) using Stirling’s approximation. We find that log(Binomial(n, e)) is asymptotic to nH(ε) where H is the (binary) entropy function

H(ε) = −ε log ε − (1−ε) log (1−ε).

This gives us the logarithmic asymptotics of the count (q−1)e Binomial(n, e) for q=2; in general we must add a term e log(q−1) to find that our count is asmyptotically exp(nHq(ε)+o(n)), where Hq is the q-entropy function

Hq(ε) = −ε log ε − (1−ε) log (1−ε) + ε log (q−1).

For any q, this function Hq is convex downwards, with vertical tangents at (ε, Hq(ε)) = (0, 0) and (1, log(q−1)). Thus Hq has a unique maximum in (0,1), and it’s an elementary calculus exercise to locate this maximum at ε = (q−1)/q, where Hq(ε) = log q. [The location and value of the maximum were predictable in advance — do you see why? Likewise the symmetry H(ε) = H(1−ε) of the binary entropy function.]

It soon follows that the number of words at distance at most e from a given word is given by the same asymptotic formula exp(nHq(ε)+o(n)) for ε ≤ (q−1)/q, and is (1−o(1)) qn once ε > (q−1)/q. Therefore the asymptotic form of the sphere-packing bound is

R ≤ 1 − (Hq(δ/2) / log(q)).

This bound is better (i.e. smaller) than the asymptotic Singleton bound for small δ (because of the vertical tangent at δ=0), and for all δ in (0,1) when q=2 (because the two bounds agree at the endpoints and the function 1 − (H(δ/2) / log(2)) of δ is convex upwards).

Replacing e by d−1 in the Hamming bound yields the Gilbert-(Shannon-)Varshamov bound, which is a lower bound: there exists an (n, M, ≥d)q code provided that qn is no larger than the product of M with the number of words in a Hamming ball of radius d−1, because until that happens we can keep adding words to C that do not bring its minimum distance below d. The asymptotic form of this lower bound is

R = 1 − (H(δ) / log(q))    (δ < (q−1)/q).

It is a perennial source of embarrassment that, while the Gilbert-Varshamov bound is quite weak for small n, the asymptotic form is very hard to beat; the only improvements known use surprisingly sophisticated techniques from number theory (curves with many points over finite fields), and for small q (all q up to about 40) the Gilbert-Varshamov bound is still the best we have for all δ. [There are similar embarrassments elsewhere in combinatorics, as in Ramsey theory and Euclidean sphere packing in high dimensions, where the probabilistic method is asymptotically better than any known explicit construction.]

Tuesday, Sep. 10: Linear codes

We usually take for q a prime power, and identify A with a finite field, which as usual we denote by F [the alternative k, for German Körper, is pre-empted as we shall see a couple of paragraphs below]. Hamming space is then the F-vector space Fn, and the Hamming distance is translation-invariant, so

d(w, w’) = d(0, w’−w) = wt(w’−w)

where “wt” is the (Hamming) weight: the weight of any word w is its number #{i: wi ≠ 0} of nonzero coordinates. This is a discrete norm on Fn, satisfying the triangle inequality wt(w+w’) ≤ wt(w) + wt(w’) for all w, w, and also the homogeneity wt(cw) = wt(w) for all words w and nonzero scalars c.

A code C ⊆ Fn is said to be linear if it is closed under addition and scalar multiplication, that is, if it is a vector subspace of Fn. This is a very special case, but an important one; we shall soon say more about the advantages and disadvantages of linear codes vs. unrestricted codes, but for now we note that almost all our examples of codes from the previous lecture become linear when A is identified with a finite field, including the Hamming (7, 16, 3) code (check this!). This is clearly true for An and the repetition code; a one-word code is linear iff it is the zero word; the one-checksum code is linear provided we chose (F,+) for the group structure; and the ISBN code is just a one-checksum code over the 11-element field with some of its codewords removed without increasing the minimum distance.

Suppose C is a linear (n, M, d) code. Then M = qk where k is the dimension of C as an F-vector space; and d is the minimum (nonzero) weight mincC, c≠0 wt(c) because C is closed under subtraction. A linear code’s basic parameters of distance, dimension, and minimum distance are given in square brackets: a linear (n, qk, d) code is an [n, k, d] code (again with optional subscript q). The one-word code, repetition code, Hamming code, single-checksum code, and the code Fn have parameters [n, 0, n+1] (or [n, 0, ∞]),  [n, 1, n],  [7, 4, 3],  [n, n−1, 2],  [n, n, 1] respectively.

The rate R = (logqM) / n of an [n, k, d] code is simply k/n, and the Singleton bound is k + dn + 1. This makes sense, and remains true, even when F is not assumed finite, though then our proof (which used the pigeonhole principle) does not work. We shall rarely study linear codes over infinite fields, but can easily prove the Singleton bound for a linear code over an arbitrary field: choose any d−1 coordinates, and note that the subspace of C consisting of codewords supported on those coordinates must be zero; but this subspace has codimension at most nd+1 in C, so nd+1 is an upper bound on dim(C). A linear MDS code is a k-dimensional code whose words supported on any nm coordinates constitute a subspace of dimension max(km, 0). This is the usual behavior when F is infinite, which is why we rarely do coding theory over infinite fields (but see “compressive sensing” for a recent nontrivial analogue of linear error-correcting codes that works in the context of linear algebra over R or C).

The sphere-packing bound of course holds for linear codes as a special case. The Gilbert-Varshamov bound holds for linear codes as well, though here we must prove it anew. One way is to mimic our proof for arbitrary codes, adding one basis vector at a time while the Hamming spheres of radius d−1 have not yet filled Fn, and using the invariance of the Hamming metric under translation. We obtain the same bound, possibly improved by a small factor (no larger than q, and thus asymptotically negligible). Alternatively, we can choose a k-dimensional subspace of Fn randomly (with each subspace equally likely), and estimate the probability that it has minimum weight at least d by computing the average number of nonzero codewords of weight <d. As long as this average is less than 1, the probability of success is positive; indeed it suffices for the avearge to be under q−1 (why?). The largest k for which this argument works, call it kGV, again yields a code whose number qkGV of words is within a constant factor of the GV bound for arbitrary codes. Moreover, if we apply the same argument for linear codes of somewhat smaller dimension, say kGVc (which is asymptotically indistinguishable from kGV), then the probability of success is 1−O(qc), so we are almost certain to succeed. But (possibly even more embarrassing than before) we still don’t know how to do it in practice for large n; even if we choose C at random from codes of length kGVc, then C almost surely has minimum weight at least d, but we cannot prove it because checking whether an arbitrary linear code has a word of length <d is computationally intractable!

Often in mathematics when we introduce a new mathematical structure we follow it up not just with examples but also with operations that give new examples from old (direct sums and quotients of groups or vector spaces, field extensions and polynomial rings, etc.). Such operations exist for codes too, and we’ve seen a few examples without calling attention to the general constructions (e.g. a subset of an (n, M, d) code is an (n, M’, d’) code with M’ ≤ M and d’ ≥ d, and the subspace of an [n, k, d] code supported on n’ of the coordinates is an [n’, k’, d’] code with k’ ≥ k − (nn’) and d’ ≥ d); but for now we need not pursue this systematically. We do give one fundamental construction that’s not so immediate: the dual, which is a bijection between linear codes of length n and dimension k and linear codes of length n and dimension nk. Denote by ⟨·,·⟩ : Fn × FnF the usual perfect symmetric pairing

v,w⟩   =   ∑i viwi   =   v1w1 + v2w2 + · · · + vnwn.

[Warning: since we hardly ever use FR, there are usually plenty of nonzero v with v,v⟩ = 0; but we shall have other uses for v,v before long.] If C is a linear code then the dual code of C is

C := {c* ∈ Fn : ⟨c*, c⟩ = 0   for all c C}.

As usual C⊥⊥ = C. The zero code and Fn are each other’s dual, as are the repetition code and single-checksum code. In general the minimum distance of C cannot be predicted from the minimum distance of C; for instance, if C is [n, n−1, 1] then C can have minimum distance anywhere from 1 to n−1. However, a linear code is MDS iff its dual is, so the dual of an [n, k, nk+1] code is always an [n, nk, k+1] code. This is illustrated by our above examples with k = 0, 1, n−1, n. To prove it in general, suppose C is a length n code of dimension k such that C is not MDS. Then C has a nonzero word of weight at most k. This gives a nontrivial linear relation on k coordinates of C (not just “at most k because we can always include some idle coordinates to bring the total to k). But then the subspace of C supported on the remaining nk coordinates has codimension strictly less than k, and thus contains some nonzero word of weight at most nk. Therefore C is not MDS either, QED.

The dual of the Hamming [7, 4, 3] code is the [7, 3, 4] code whose nonzero codewords are the line complements; this is a constant-weight code: all nonzero words have the same weight, here 4. That [7, 3, 4] code is thus contained in its dual; a code C C is said to be self-orthogonal, because an equivalent condition is c, c’⟩ = 0 for all c, c C. (Except in characteristic 2 it is enough to check this for c=c, as usual for quadratic forms.) Further examples are the zero code (of course) and the repetition code when the length is a multiple of the characteristic. Such a code must have knk, that is, kn/2. An important special case is a self-dual code, which is a linear code satisfying C = C; equivalently, a self-orthogonal code of length n and dimension n/2. An easy example is the repetition code of length 2 over a field of characteristic 2. A more interesting example is the extended Hamming code, the [8, 4, 4] binary code obtained by adding a checksum coordinate to the Hamming [7, 4, 3] code. The Hamming code contains the antipode (one’s complement) of each of its codewords, and each non-antipodal pair of codewords is at Hamming distance 4. (In general, adding a checksum coordinate to a binary linear code yields a code of the same dimension all of whose words have even weight; applying this transformation to an [n, k, d] code of odd minimum distance such as the Hamming code yields an [n+1, k, d+1] code.)

Interlude on isometries: Hamming space An has two natural sources of isometries: we can apply a permutation of A to each coordinate, or an arbitrary permutation to the set of n coordinates. These generate a group, call it G, of n! · q!n isometries, which is the “wreath product” that’s the semidirect product of the symmetric group Sn acting on (Sq)n. We claim that G is in fact the full isometry group of Hamming space. One way to show this is to check that G acts simply transitively on (n+1)-tuples (w0, W1, W2, …, Wn) where w0 is a word and each Wi is a permutation of the q−1 words w that differ from w0 only in coordinate σ(i), where σ is some permutation of {1, 2, …, n}. If the full isometry group were any larger than G, then its stabilizer acting on these (n+1)-tuples would be nontrivial; but that’s impossible because we can use the Hamming metric, together with the choice of w and of the ordering of the n(q−1) words w at distance 1 from w0, to reconstruct the rest of Hamming space (each word w’ is uniquely determined by d(w0, w’) and the d(w0, w’) words w at distance 1 from w0 that are closer to w than w0).

Two codes C, C’ of the same length over the same alphabet are said to be isomorphic if C’ = g(C) for some isometry g of Hamming space, and the automorphism group Aut(C) of C is the stabilizer of C in the isometry group G of Hamming space (or sometimes the image of that group in the group of permutations of C; this is usually the same but there are codes whose stabilizer does not act faithfully).

If C is linear then translation by any codeword is an automorphism. But we usually exclude nonzero translations of linear codes, because the choice of 0 is part of the vector space structure, and thus should be fixed by any automorphism. Of the isometries of Fn that preserve 0, the only linear ones are the F*-signed permutation matrices, that is, the matrices each of whose rows and columns has a single nonzero entry; the group of F*-signed permutation matrices (a.k.a. F*-signed coordinate permutations) is a semidirect product — again a wreath product — of Sn acting on (F*)n. The subgroup that takes C to C is the group of automorphisms of C as a linear code, and always contains the group F* of scalar matrices as a normal subgroup. (Again, we might be interested not in this group but in its image in GL(C) when the action might not be faithful.) Two linear codes are said to be isomorphic if they are equivalent under an F*-signed coordinate permutation matrix.

Putting the signed permutation matrices together with translations by (F*)n yields a group, call it Glin, of Hamming isometries consistent with the linear structure. This group is a semidirect product in two ways: the F*-signed permutation matrices acting on (F*)n, and Sn acting on the nth power of the ax+b group of affine-linear permutations of F (a.k.a. GL1(F)). When q is 2 or 3, the ax+b group is all of Sq, so concentrating on linear codes does not artificially break the symmetry of the problem. But for large F the ax+b group is tiny compared with Sq (index (q−2)!), which makes the restriction to linear codes feel less natural, and might suggest imposing alternative structures on the alphabet — e.g., later this term we might briefly consider codes over P1(Fq), which has q+1 letters and an action of PGL2(Fq). The borderline case is q=4, where the ax+b group is the alternating group A4, and we recover the symmetric group S4 by adjoining the nontrivial field automorphism of F. In general, when F has nontrivial automorphisms (i.e., when q = pe with e > 1), we can augment Glin slightly by taking a further semidirect product with the action of Aut(F). Note, though, that we must apply the same field automorphism to all coordinates at once (the “diagonal action” of Aut(F)), so even when q = 4 we do not quite get all of G once n > 1.

Tuesday, Sep. 17: Linear codes and point configurations in projective space

The interlude on Aut(C) also serves as a segue to a dictionary between linear codes and configurations of points in finite projective spaces. In applications, and in treatises aimed at applications, an [n, k, d] code will often be described by a generator matrix in block form (I | C), where I is the identity matrix and C is regarded as a matrix of checksums, so that the first k letters are the message (“information bits”, for a binary code) and the remaining nk are the redundant information that makes error correction possible. But this breaks the Sn symmetry on the coordinates; and of course choosing any generator matrix for C breaks its GL(C) symmetry. As usual in linear algebra, when it comes to doing explicit computation we usually need to make a symmetry-breaking choice, but for structural study we want to work instrinsically to the extent that we can.

Here’s one way to think about it, leading to a nice geometric description of linear codes. Hamming space Fn is quite a concrete vector space, coming to us with a choice of coordinates, ambiguous only up to permutation and scaling individual coordinates. But C has no intrinsic structure except for what it gains as a subspace of Fn. So we regard C as an abstract vector space together with an embedding into Fn, that is, a choice of n linear functionals on C. Permutation equivalence says that we have not an ordered n-tuple but an unordered set (or perhaps a multiset, as for the repetition code); and scaling equivalence on each coordinate says each functional is defined up to nonzero scaling. We assume that none of the coordinate functions is the zero functional. (*) Then the coordinates are nonzero vectors in the linear dual C* = Hom(C, F) of C [NB this is not the same as the dual code C!], and scalar equivalence means they’re really points in the k−1 dimensional projective space P(C*) = (C*−{0}) / F*. These points give a linear map from C to Fn (up to F*-signed coordinate permutation); this map is injective iff the functionals span C*, that is, iff the n points are contained in no hyperplane of P(C*) — we say that the points must “span” P(C*). [It takes at least k points to do that, but we already know that n > k.] Therefore:

An [n, k, *] code without identically-zero coordinates is tantamount to a spanning (multi)set SC of n points in Pk−1(F).
This respects the relevant symmetries: two codes are isomorphic iff their associated configurations are equivalent under Aut(Pk−1(F)) = PGLk(F).

(*) In general, an [n, k, d] code C with z identically-zero coordinates is just an [nz, k, d] code C0 with no such coordinates that has been artificially inflated to length n with z idle coordinates that contribute neither information nor error correction; if we can understand C0 then we’ve also understood C.

So how to recover the minimum weights d and d* of C and C from the geometry of this configuration SC in the dual projective space? Nonzero codewords c up to scaling ↔ hyperplanes H in the dual Pk−1, and the weight of c is the number of points of S not in H. So the minimum weight is given by the formula

d = n − maxS #(S ∩ H).

Since any k−1 points span a hyperplane, the Singleton bound follows immediately (you should check that this is really the same proof that we gave already), and MDS codes correspond to configurations of points in “general linear position”, with no k of the points in a hyperplane. As for d*, a nonzero dual codeword is a nontrivial linear relation on the coordinates of C. We already excluded the most trivial of relations by requiring that C have no nonzero coordinates; so d* > 1. Beyond that, d* = 2 means two coordinates are proportional, so two of the points of S coincide; d* = 3 means S has distinct points but three of them are collinear; and so forth. In general, we see that

d* is the smallest size of a linearly dependent subset of S.
(In general, δ points in a projective space are said to be “linearly dependent” when they lie on a linear projective subspace of dimension less than δ−1; here there must be d* points on a subspace of dimension exactly d−2, else the minimum dual weight would be even smaller.) So again C is MDS iff C is MDS.

Note that if C has neither an identically zero coordinate nor a weight-1 codeword, then the same is true of C. This happens iff S has no subset of n−1 points all on a hyperplane. Given such S, we can reconstruct C, then form the dual code C, and find its associated point configuration S*, which again is n points (permuted in the same way as the points in S) with no n−1 points on a hyperplane, but in a projective space of dimension nk−1, not k−1. Since C⊥⊥ = C, applying this construction to S* recovers S. So we have a duality between n-point configurations in Pk−1 and in Pnk−1 over any field (not necessarily finite), with both configurations satisfying the condition that no n−1 of the points are on a hyperplane. This is equivalent to the (Coble-)Gale duality in algebraic geometry; see for instance the 1998 article by Eisenbud and Popescu.

Returning now to our usual setting of a finite field, we can use this picture to construct and study some basic examples of linear codes. Suppose for starters that C* is single-error-correcting, i.e. that d ≥3. This is equivalent to the condition that the points of the configuration S associated to C are distinct. Thus n = #S is at most #Pk−1(F) = (qk−1) / (q−1), and conversely if n satisfies that bound then we can find S and recover C and then an [n, nk, ≥3] code C*. If n equals its maximum value (qk−1) / (q−1), then C* is determined uniquely up to isomorphism, and is perfect because in this case qk = 1 + (q−1)n. This shows that perfect single-error-correcting codes of all possible lengths exist whenever q is a prime power. [The linear ones are unique as noted, but there can be nonlinear codes with the same parameters.] There are called Hamming codes, because the (q,n) = (2,3) case recovers the Hamming [7, 4, 3] code. (NB Sometimes C* is called a “Hamming code” only when q=2, in which case the parameters of C* are [2k−1, 2kk−1, 3]. In this case C itself is a constant-weight [2k−1, k, 2k−1] code, of which we shall have more to say soon.) Another remarkable property of the Hamming code C* is that Aut(C*) = GL(C*) ≅ GLk(F); to see this, note that Aut(C*) is a subgroup of GL(C*) that contains F* and that maps surjectively to PGL(C*) (because any projective linear transformation permutes SC when SC consists of all F-points of Pk−1(F)).

When k=2, the Hamming code is MDS, and is the longest codimension-k MDS code over F. The same is true trivially for k=1. This no longer holds for k ≥ 3. We next consider the case k=3, when an MDS code corresponds to a set of n (distinct) points in the projective plane P2(F) no three of which are collinear. Such a set is called an n-arc” in P2(F), and n-arcs were already studied in finite projective geometry before the advent of coding theory. We next give some of the known results about small and large n-arcs in finite projective planes.

When n is small, it is easy to describe and enumerate n-arcs, even without assuming that our projective plane is algebraic, that is, using only the combinatorial definition of a projective plane of order q as an incidence relation on q2+q+1 points and q2+q+1 lines with each point (resp. line) incident to q+1 lines (points) and any two points (lines) incident on a unique line (point). We find that for n ≤ 6 the number of ways to extend an (n−1)-arc to an n-arc is a number An depending only on (q and) n, so the total number of n-arcs is (1/n!) ∏1≤in Ai, where Ai is q2+q+1, q2+q, q2, (q−1)2, (q−2)(q−3) for i=1, 2, 3, 4, 5, and q2−9q+21 = (q−4)(q−5) + 1 for i=6. In particular, the number of ordered 4-arcs is 1≤i4 Ai = q2 (q3−1) (q3q) = #PGL3(F), and indeed in an algebraic projective plane PGL3 acts simply transitively on the ordered 4-arcs; more generally, for each k the group PGLk acts simply transitively on the ordered (k+1)-tuples of points in general linear position in Pk−1. In the coding picture, this tells us that the MDS codes with parameters [k+1, k, 2] (and dually [k+1, 1, k]) are unique up to isomorphism — which however is not hard to see directly for the 1-dimensional code. Returning to the case of a projective plane, n=4 is also the first case where C satisfies our hypothesis of no n−1 points on a hyperplane(=line). For n=5, the new factors of (q−2)(q−3) can be explained by noting that five points with no three on a line lie on a unique conic, and a conic has q+1 points, of which we’re choosing 5 which can be done in (q+1)q(q−1)(q−2)(q−3) / 5! ways. [More on conics in finite algebraic projective planes next time; the count is q5q2.] In particular, 5- and 6-arcs exist on any projective plane of order at least 4, and on a plane of order 4 or 5 every 5-arc extends uniquely to a 6-arc.

Once n>6, the number of extensions of an (n−1)-arc to an n-arc is no longer constant, and even the total number of n-arcs in a finite projective plane Π can no longer be given by a simple formula because it depends on details of the geometry of Π. After some inclusion-exclusion analysis one finds that for n=7 the count of n-arcs depends also on the number of copies of the Fano (order-2) projective planes in Π For n=8 we need that count as well as the number of occurrences of a certain configuration of 8 points and 8 lines; for n=9 several new counts arise, including the number of copies in Π of the affine plane of order 3 (a configuration of 9 points and 12 lines of 3, one joining each pair of points); and it quickly gets more complicated after that. [See David Glynn: Rings of geometries II, J. Combinatorial Theory A 49 (1988), 26–66, especially Theorems 4.2 and 4.4; thanks to Nathan Kaplan for this reference.] When Π = P2(Fq), these first few contributions depend only on the residue of q mod 2 and 3 respectively (e.g. there are no Fano planes except when q is a power of 2, and no affine planes of order 3 if q ≡ −1 mod 3); but even in this case the formulas get arbitrarily complicated as n grows, in a sense made precise by a theorem of Mnëv (1988) and memorably described by R. Vakil as “Murphy’s Law”.

Thursday, Sep. 19: Codes and point configurations, cont’d; Segre’s theorem

Still we can say something about the largest n-arcs, or equivalently the longest MDS codes of dimension 3 (and dually of dimension n−3). We already saw in effect that a conic is an (q+1)-arc; and it is easy to see that there cannot be an n-arc for n>q+2: each of the q+1 lines through a point of the arc can contain at most one further point. Thus the maximal n is either q+1 or q+2. We shall see that:
n = q+2 is possible iff q is even;
n = q+1 is attained only by conics when q is odd.
The latter is a celebrated
theorem of Segre (Canad. J. Math. 7 (1955), 414–416 [NB what Segre calls a “Galois field” γ is our finite field F]); in our setting, this theorem says that the [q+1, 3, q−1] MDS code over a field of odd order is unique up to isomorphism. Arcs of q+1 and q+2 points are called “ovals” and “hyperovals” respectively; thus we assert that P2(F) contains a hyperoval iff q is even, while for odd q the only ovals are conics.

The upper bound q+2 has a purely combinatorial proof, and thus holds for an arbitrary finite projective plane, not just for algebraic planes; the same is true for the result that if a plane of order q has a hyperoval then q is even. The former result is easy: given a point P of an n-arc, each of the q+1 lines through P can go through only one further point of the arc, for a total of q+2. For the latter, observe that since P was an arbitrary point of the arc, any line not disjoint from a hyperoval must meet it in exactly two points; considering all line through a given point off the hyperoval gives a partition of the hyperoval’s points into pairs, whence q+2 is even, which proves the claim. [Note that this argument fails in the degenerate case of the “projective plane of order 1”, for lack of a point off the hyperoval (we need q2+q+1 > q+2); and indeed that “plane” does have a hyperoval. But we won’t worry about geometry over a “one-element field”, though it can be Fun to contemplate such things.]

Still working in the general combinatorial setting, we see that each point P of an oval O is on exactly one line tP that meets O in no other point, and is thus called the “tangent” to O at P. Each point not in O is then on an even number of tangents if q is odd, and on an odd number of tangents if q is even. In the former case, a second-moment argument shows that the even number is either 0 or 2, just as in the familiar case of real conics (where a point in the plane lies on 0, 1, or 2 tangents according as it is inside, on, or outside the conic — in the projective setting the “inside” of a conic is the simply-connected component of its complement). For each of the q2 points P not in O, let τP be the number of tangents through P. There are q+1 tangents, each lying on q points off O; thus P τP = (q+1) q. Each pair of tangents meets in a unique point, which is not on O, and each P is the intersection of τPP−1)/2 such pairs; thus P τPP−1) = (q+1) q. But then P τPP−2) = 0, and since each τP is even none of the summands is negative. Hence all summands vanish, whence each τP is 0 or 2 as claimed. A similar argument (see Problem 3 of the second problem set) shows that when q is even each τP is either 1 or q+1, the latter arising once; that is, all the “tangents” meet at a point! This point is called the “center” of the oval, and an oval together with its center is a hyperoval. For Π = P2(Fq) we shall show algebraically that every conic has a center, which will prove the existence of hyperovals in algebraic projective planes of even order.

[Proof of Segre’s theorem, where the key step is in effect that every triangle inscribed in an oval (i.e., formed by three of its points) has a “symmedian point”, or equivalently that every triangle circumscribed about an oval (i.e., formed by three of its tangents) has a Gergonne point. (For more on these, see X(6) and X(7) in the “Encyclopedia of Triangle Centers”, which starts with X(1)=incenter and X(2)=centroid, and has reached X(5394) and beyond.) Along the way we use in effect the classical theorems of Menelaus (115±25 CE), Ceva (1678, but anticipated by about 700 years by al-Mutaman), and (in one direction only) Wilson(-Lagrange 1771, known and perhaps proved a century or 7+ earlier)! When q is even, the same argument gives an alternative proof (but valid only for finite algebraic Π) that any three tangents to an oval, and thus all its tangents, meet at a point. Here’s another application of Segre’s result, showing that quadratic polynomials on a prime field F=Z/pZ of odd order are the only maps f : FF such that xf (x+a) − f (x) is a permutation for each nonzero a in F. (And here’s how I found out about it.)]

Tuesday, Sep. 24: More on conics and other configurations in projective spaces

See the notes.

Once we know that every conic with a point can be written as xy + xz + yz = 0 we can check directly that in characteristic 2 all the tangents meet at (1:1:1). The existence of a “center” of a conic P(x, y, z) = 0 in characteristic 2 can be understood as follows. A quadratic form P(·) over any field yields a symmetric bilinear form B(·,·) defined by B(v,w) = P(v+w) − P(v) − P(w). In odd or zero characteristic, this form is nondegenerate iff P is. But in characteristic 2, B is also alternating, so if the number of variables is odd (as it is for a ternary form) there must be at least a one-dimensional kernel — and in our setting the kernel can be no larger (if B is identically zero then the conic is a double line). So there is some nonzero c, unique up to scaling, for which B(c,v) = 0 for all v. But then P(v+ac) = P(v) + a2P(c) for all scalars a, so if P(v+ac) = 0 (that is, if v is on the conic) then the line joining v to c meets the conic with multiplicity 2 at v, so is tangent to the conic, as claimed.

Once we know that every conic C with a rational point is a rational normal curve of degree 2, Bézout’s theorem becomes elementary for the intersection of C with any curve. Indeed once we identify C with P1, the intersection of C with a curve of degree d becomes a polynomial of degree 2d on that P1, and this polynomial is either identically zero (which happens precisely when our degree-d curve contains C) or has exactly 2d zeros counted with multiplicity over an algebraic closure. In particular, taking d=2 we see that two distinct smooth conics can meet in at most 4 points. This shows that when q is a power of 2 larger than 4 there are non-conic ovals obtained by starting from a conic plus its center and removing one point of the conic. (And it is known that once q>8 there are hyperovals that are not of the form conic+center at all.)

Thursday, Sep. 26: Introduction to weight enumerators and the MacWilliams identity

The (Hamming) weight enumerator of a linear code C of length n is a generating function that encodes the counts of words of weight w for each w = 0, 1, …, n, in a generating function WC(X,Y), defined as cC Xn−wt(c)Ywt(c), so for each w the XnwYw coefficient is the number of words of weight w. [WC also encodes the distribution of Hamming distances between codewords, thanks to the formula d(w, w’) = d(0, w’−w) = wt(w’−w); to generalize WC to nonlinear codes we would sum Xnd(c, c’)Yd(c, c’) over all pairs of codewords, and possibly divide by #(C).] The weight enumerator encodes more refined information than just the minimum weight, and (while [n, k, d] for a linear code do not in general determine the minimum weight of its dual) the weight enumerator of a linear code C determines the weight enumerator of its dual code C, and thus in particular also the minimum distance of C.

Before giving and proving MacWilliams’ formula relating WC with WC, we give some examples and simpler properties.

By now we have enough examples to guess the rule, at least for binary codes:

Theorem (MacWilliams identity for binary codes). The weight enumerators of any linear binary code C and its dual code C are related by

WC (X,Y) = |C|−1WC(X+Y, XY).

The proof is an application of Poisson summation for finite abelian groups. For any subgroup H of a finite abelian group G, and any function f :GC, Poisson summation writes hH f (h) as a multiple of the sum of the discrete Fourier transform of f over the annihilator of H, which consists of all characters of G whose restriction to H is trivial. For G = Fn, we shall identify the character group (a.k.a. the “Pontrjagin dual”) of G with G, and show that under this identification the annihilator of C is none other than the dual code C. When q=2, the Fourier transform of the function taking c to Xn−wt(c)Ywt(c) is the function (X+Y)nw(XY)w, which will imply the binary MacWilliams identity. For arbitrary F, the Fourier transform of the same function is (X+(q−1)Y)nw(XY)w, giving rise to the MacWilliams identity for linear codes over any finite field:

Theorem (MacWilliams identity for Hamming weight enumerators). The weight enumerators of any linear code C over a q-element field and its dual code C are related by

WC (X,Y) = |C|−1WC(X+(q−1)Y, XY).

So, for instance, our formula for the weight enumerator of a single-checksum code generalizes to ((X+(q−1)Y)n + (q−1)(XY)n) / q. Check that this is indeed consistent with the fact that this code has minimum weight 2!

We next develop Pontrjagin duality and Fourier analysis for finite abelian groups, where all the relevant C-vector spaces are finite dimensional. See for example paragraphs 2 and 3 of page 5 in this chapter of my notes on analytic number theory from 2009 (where this theory is used to describe Dirichlet characters); see also the further remarks on page 8, and Exercise 5 on page 10. The Fourier transform on binary Hamming space (Z/2Z)n is an important special case also known by other names such as the “Hadamard transform”. As usual with Fourier analysis there are several choices of normalization in the literature; we’ll use the same normalization that I chose here (page 1385), where the Fourier transform of f :GC is the function taking any character χ to gG χ(g) f (g), which saddles the formula for the inverse Fourier transform with the factor |G|−1 and the complex conjugate χ(g). [For G = (Z/2Z)n we need not worry about the complex conjugate because χ(g) is always ±1.]

Poisson summation figures in analytic number theory too (see this chapter of the 2009 notes), but again we need only the finite case, which is easier than Poisson summation on Z ⊂ R and entirely elementary. (We may cover Poisson summation on Z ⊂ R, or more generally on lattices in Rn, when/if we get to the connection between linear codes and Euclidean lattices.)

Once q>2 we can form a weight enumerator that keeps track of more information than WC, counting not just the distribution of zero vs. nonzero coordinates in codewords but also which nonzero field elements occur. (Yes, this breaks some of our symmetry, though it still treats all n coordinates equally.) The resulting complete weight enumerator cweC is a homogeneous polynomial of degree n in q variables indexed by F. Each codeword c contributes a term i Xci. The MacWilliams identity extends to complete weight enumerators: to obtain the c.w.e. of the dual code, start with cweC, apply the discrete Fourier transform to the function a ↦ Xa, and divide by |C|.
Tuesday, Oct. 1: Some uses of the MacWilliams identity; Gleason’s theorem for self-dual binary codes

Some applications of the MacWilliams identity:

Suppose C is a self-dual binary code of length n. Then C is automatically “even”: every codeword c has even weight, because c, c⟩ = 0. But a binary code is even iff its dual code contains the all-ones word 1. Since C is self-dual, it thus contains 1. Therefore C is closed under the ones’-complement involution c ↔ c+1, which takes words of length w to words of length nw.

Now consider what this means for the weight enumerator WC(X,Y).

In effect we’ve seen that WC is invariant under a group, call it GI, of 16 linear transformations of (X, Y); this group is dihedral: before the MacWilliams step, we had the symmetries of the square with vertices (±1, 0) and (0, ±1), and the MacWilliams involution adds to this the reflection about a line making angle π/8 with the horizontal, producing the symmetries of the regular octagon with vertices (±1, 0), (0, ±1), and (±2−½, ±2−½). We then showed that invariance under GI makes WC a weighted-homogeneous polynomial in X2+Y2 and δ8. [To prove the last step, we might write WC in terms of X2+Y2 and X2Y2 − (X2Y2)2/4, and then apply MacWilliams; this gives another set of generators of the invariant ring, but an equivalent one because
(X2Y2 − (X2Y2)2/4)2   =   (X2+Y2)4/16  −  δ8.]
This is Gleason’s theorem for self-dual binary codes. As a sample application, we determine all self-dual codes of length n ≤ 8. For any even n, Gleason’s theorem (together with the fact that WC has leading term Xn) shows that it takes only floor(n/8) coefficients to specify the weight enumerator of a self-dual code of length n; in particular, for n < 8 the weight enumerator must be (X2+Y2)n/2. It follows that there are n/2 words of weight 2. But in a self-dual (more generally, a self-orthogonal) code C with a word c of weight 2, every word is either disjoint from c or the sum of c with a word disjoint from c. That is, C decomposes as the direct sum of a self-dual (or self-orthogonal) code of length n−2 with the [2,1,2] repetition code supported on c. In our setting we conclude that our code C is the direct sum of n/2 copies of that [2,1,2] code. For n=8, either C contains a word of weight 2 — in which case we’ve reduced to n=6, and conclude that C is the direct sum of 4 copies of the [2,1,2] code — or WC is the unique linear combination of (X2+Y2)4 and δ8 of the form X8+O(Y4), which is (X2+Y2)4 − 4δ8 = X8 + 14X4Y4 + Y8, a polynomial we recognize as the weight enumerator of the [8, 4, 4] extended Hamming code. It is not hard to show that the extended Hamming code is the unique [8, 4, 4] code (for instance, by reducing to the uniqueness of the perfect [7, 4, 3] Hamming code, which we have shown already). This completes the classification of self-dual codes of length at most 8 up to isomorphism. [Such codes have by now been classified through length 24 and a bit beyond, but this requires some further tools beyond the MacWilliams identity, plus some nontrivial computation to treat the largest few cases; an eventual combinatorial explosion is inevitable, because the number of self-dual codes grows as 2n2/4−O(n), and there are only n! = 2O(n log n) permutations to identify different codes, making the total at least 2n2/4−O(n log n).]

A special property of binary codes is that once a linear code C is self-orthogonal the map taking a codeword c to its weight modulo 4 is a homomorphism C → 2Z/4Z. This follows from the inclusion-exclusion formula for symmetric differences (which correspond to addition in Z/2Z under the usual dictionary between (0,1)-valued functions and subsets): |AΔB| = |A| + |B| − 2|AB|. [This formula generalizes to symmetric differences of more than two sets; triple, quadruple, quintuple, etc. intersections are counted with multiplicity +4, −8, +16, and so forth. We’ll prove and use this generalization some weeks hence.] For a self-orthogonal code, |A|, |B| and |AB| are all even, so |AΔB| ≡ |A| + |B| mod 4 as claimed. (It also follows from the |AΔB| formula that conversely a binary linear code in which all weights are multiples of 4 must be self-orthogonal.) As a corollary, either C has as many words of weight 0 mod 4 as there are of weight 2 mod 4, or all the weights in C are divisible by 4. A linear binary code C is called singly even in the former case, and doubly even in the latter. If C is self-dual, it is also said to be of Type I  if singly even, and of Type II  if doubly even. For example, the [2,1,2] repetition code is Type I, while the extended Hamming code is Type II.

We can detect whether a self-orthogonal binary code C is singly or doubly even from its weight enumerator: WC(1, i) is zero if C is singly even, and equals |C| when C is doubly even. Now if C is self-dual then by Gleason it is a polynomial in X2+Y2 and δ8, and we see that substituting (X,Y) = (1, i) yields zero in X2+Y2 but −4 for δ8. Therefore WC(1, i) = 0 unless n is a multiple of 8, and we conclude that the length of every Type II code is a multiple of 8, in which case the 8)n/8 coefficient of WC is (−4)n/8. Conversely, if 8 | n we obtain a Type II code of length n as the direct sum of n/8 copies of the extended Hamming code (and again for large n there will be many other choices). Also, if 8 | n then a Type I code of length n must have weight enumerator divisible by X2+Y2, so the 8)n/8 coefficient of WC is zero, and we have determined one of the n/8 unknown coefficients of the expansion of WC in monomials of degree n in X2+Y2 and δ8.

The weight enumerator of a Type II code is also invariant under the substitution of (X, iY) for (X,Y); this linear transformation together with GI generates a larger group GII that leaves invariant any such weight enumerator. We next describe GII and its invariants, deduce the Gleason theorem for Type II codes, and obtain some consequences.

Thursday, Oct. 3: Gleason for self-dual codes of Type II, III, and IV

[Overview of Gleason I–IV in the context of complex reflections groups. [Here’s the paper “Finite unitary reflection groups” by Shephard and Todd (Canad. J. Math. 6 (1954), 274–304) that determined all such groups and gave the first proof of the theorem that these are precisely the finite subgroups of GLn(C) with a polynomial invariant ring.]

[For a picture of the vertices of the regular octahedron permuted by GII/μ8, see Figure 2 here (page 6 [1387 in the journal]; for what it’s worth, Figure 1 on the previous page shows a somewhat pixellated graph of δ8 and its double zeros on the unit disc X2+Y2≤1). For the approach I m taking to the invariant rings of GI and GII, see Appendix A (pages 33–34) of my paper with S. Kominers.]

[...] Thus the weight enumerator of a Type II code of length n has only floor(n/24) undetermined coefficients. In particular, for n < 24 there are no undetermined coefficients, and WC = ϕ8n/8 is forced. This does not, however, mean that C must be isomorphic with the direct sum of n/8 copies of the [8, 4, 4] code, as one might guess by analogy with the Type I case. In fact for n=16 there is another Type II code, obtained as follows. Start with the direct sum of 8 copies of the [2, 1, 2] repetition code; then form the doubly even [16, 7, 4] subcode, which consists of words (a1, a1, a2, a2, a3, a3, …, a8, a8) such that a1 + a2 + a3 + … + a8 = 0, and adjoin the vector (1, 0, 1, 0, 1, 0, …, 1, 0). You should check that this is a Type II code and verify directly that its weight enumerator is ϕ82. But it is not isomorphic with the direct sum of the [8, 4, 4] code with itself, because unlike that direct sum our new code is not linearly generated by its words of weight 4.

It is known that these are the only two Type II codes of length 16, and there are 9 such codes of length 24 (after which combinatorial explosion soon sets in). This is the first case where Gleason does not determine WC uniquely, and again (as with Type I) we can make WC unique by doubling the minimum weight, in this case getting the enumerator of a [24, 12, 8] code. We find that for any such code WC must be ϕ83 − 42δ24, which comes to

X24 + 759 X16Y8 + 2576 X12Y12 + 759 X8Y16 + Y24.

Remarkably it is again true that there is a unique such code, and that removing any coordinate yields a perfect code, in this case the binary Golay code G23, with parameters [23, 12, 7]. The extremal [24, 12, 8] code itself is called the “extended binary Golay code” G24, and we shall describe it and its automorphism group (which is the largest of the five sporadic simple groups discovered by Mathieu) after giving the Gleason theorems and their initial consequences for self-dual codes of Type III and IV.

A Type III code is a self-dual ternary code. (Recall that “ternary code” means “code over the 3-element field”.) A ternary linear code C is self-orthogonal iff c, c⟩ = 0 for all cC, which in turn holds iff every codeword has weight divisible by 3. This means that WC is invariant under the 3-cycle (X, Y) ↦ (X, ρY) where ρ is a cube root of unity. If C is self-dual then WC is also invariant under (X, Y) ↔ 3−½(X+2Y, XY) by MacWilliams. Thus WC is invariant under the group, call it GIII, generated by these two linear transformations, which preserves the unitary form |X|2 + 2 |Y|2. An example of a Type III code is the [4, 3, 2] “tetracode”, which is the unique MDS code of these parameters (four points in P1(Z/3Z); explicit generators can be 0 + + + and + 0 + −), a constant-weight code with weight enumerator ϕ4 = X4 + 8 XY3. So GIII permutes the zeros of ϕ4 in the Riemann sphere, which are at (X : Y) = 0, −2, and 1 ± sqrt(−3), forming a regular tetrahedron with respect to our unitary form |X|2 + 2 |Y|2. Since (X, Y) ↦ (X, ρY) acts by a 3-cycle and MacWilliams by a double transposition, GIII acts on the vertices of our regular tetrahedron by its full group A4 of orientation-preserving isometries. The kernel of the map GIIIA4 consists of roots of unity of order at most 4 (because we have a degree-4 invariant invariant ϕ, or alternatively because GIII is generated by linear transformations of determinant ±1). In fact all 4th roots of unity occur: the product of our two generators has determinant −1 and maps to a 3-cycle in A4, so its cube is a scalar of determinant −1, which must be ±i. [We’ve thus also shown en passant that n must be a multiple of 4; again this has a purely algebraic proof using the structure of quadratic forms over finite fields, which shows more generally that a self-dual code over a field of 4k−1 elements must have length divisible by 4.] Therefore |GIII| = 4 |A4| = 48. We could find a second generator of the invariant ring as we did for GII, by filtering the group down to the identity with each step using a normal subgroup with a small quotient and keeping track of the invariant subring of C[X, Y] at each step; alternatively, we can construct a second invariant directly from a quartic vanishing on the dual tetrahedron, which are the points where (X : Y) is either ∞ or a cube root of 1. That polynomial Y3(X3Y3) is not quite invariant under GIII: it“s fixed by the MacWilliams involution, but multiplying Y by ρ does the same to the quartic. Therefore the cube δ12 = Y3 (X3Y3)3 of the quartic is invariant under all of GIII, and it soon follows that C4, δ12] is the full ring of invariants (for example we can check that the two generators are algebraically independent and that the product 4 · 12 = 48 of their degrees equals the size of GIII). We now have Gleason’s theorem for self-dual ternary codes: the Hamming weight enumerator of any such code is a polynomial in ϕ4 and δ12.

We draw the same kind of consequences as before: it takes only floor(n/12) coefficients to specify the weight enumerator of a Type III of length n, and for n < 12 there are no coefficients to determine and the weight enumerator must be ϕ4n/4. This time we can show that a code with that weight enumerator must be isomorphic with the direct sum of n/4 copies of the tetracode: There are 2n pairs of words of weight 3, so some two of them must overlap; but then they overlap in exactly 2 coordinates, and then they generate a copy of the tetracode, which must be a direct summand because the tetracode is self-dual, etc. At n=12 we first have a choice, and if we use it to eliminate the coefficient of WC that counts words of weight 3 we find that any Type III [12, 6, ≥6] code must have weight enumerator

WC = ϕ43 − 24 δ12 = X12 + 264 X6Y6 + 440 X3Y9 + 24 Y6.

Yet again it turns out (and we may show) that there is a unique such code, and that removing any coordinate yields a perfect code, this time the 2-error-correcting ternary Golay code G11, with parameters [11, 6, 5] (check: the Hamming ball of radius 2 in (Z/3Z)11 has 1 + 2·11 + 22(11·10)/2 = 1 + 22 + 220 = 243 = 35 points). The extremal [12, 6, 6] code itself is called the “extended ternary Golay code” G12, and its automorphism group is the double cover 2.M12 of the sporadic Mathieu group M12. [Here there’s a nontrivial center (Z/3Z)* = {±1}, and the quotient Aut(G12) / {±1} is isomorphic with M12, but the Aut(G12) is not just the product of {±1} with M12, so is a nontrivial double cover, which turns out to be unique for M12 though we won’t prove this uniqueness statement.]

A Type IV code is a quaternary linear code that is its own Hermitian dual, i.e. dual under the sesquilinear pairing

v,w⟩   =   ∑i vi σ(wi)   =   v1σ(w1) + v2σ(w2) + · · · + vnσ(wn)

where σ: aa2 is the Galois involution of the 4-element field. In other words, the Hermitian dual of C is the image of the ordinary linear dual under componentwise application of σ. A quaternary linear code C is contained in its Hermitian dual iff c, c⟩ = 0 for all cC, which in turn holds iff every codeword has even weight. This makes WC invariant under the involution (X, Y) ↔ (X, −Y). If C is self-dual then WC is also invariant under (X, Y) ↔ (X+3Y, XY) / 2 by MacWilliams. Thus WC is invariant under the group, call it GIV, generated by these two linear transformations, which preserves the quadratic form X2 + 3Y2. We soon see that GIV is the 12-element dihedral group of isometries of the regular hexagon with vertices (1, 0) and (±½, ±½) [yes, this is a regular hexagon in the Euclidean geometry associated to our quadratic form]. An example of a Type IV code is the [2, 1, 2] repetition code, whose weight numerator is the invariant quadratic form X2 + 3Y2 itself. The ring of invariants consists of polynomials in this quadratic form and the sextic δ6 := Y2 (X2Y2)2 with double zeros on the long diagonals of our hexagon. Thus it takes floor(n/6) coefficients to specify WC, and in particular for n<6 the weight enumerator must be (X2 + 3Y2)n/2, indicating 3n/2 words of weight 2. For n=2, we clearly must have the [2, 1, 2] repetition code; and for n=4 and n=6, we can argue as we did for Type I that the only Type IV code is the direct sum of n/2 copies of the [2, 1, 2] code, unless n=6 and there are no words of weight 2, in which case C has parameters [6, 3, 4] and

WC = (X2 + 3Y2)3 − 9 δ6 = X6 + 45 X4Y2 + 18 Y6.

But we already know what quaternary [6, 3, 4] codes look like, even without any self-duality hypothesis: they are MDS codes associated to hyperovals in in P2(F4) — and indeed such a code must be even (and thus self-dual in our Hermitian sense) because every line meets a hyperoval in a 0 or 2 points and thus corresponds to a line of codewords of weight 6 or 4. We already know enough to show that the hyperoval is unique up to the action of PGL3(F4) (choose any four points in general linear position, all of which are equivalent under PGL3(F4), and then add the two points not collinear with any two of the four); thus C is unique up to isomorphism, and is called the “hexacode”. Curiously hyperovals in P2(F4) will also play a crucial part in our development of the self-dual [24, 12, 8] code G24.

Type IV is the final variation on our theme, at least for Hamming weight enumerators: there are no GV, GVI, etc., because (using the classification of finite subgroups of PGL2(C)) there are no further cases where the MacWilliams involution together with a map (X, Y) ↦ (X, ζY) (for some root of unity ζ≠1) generate a finite group. The closest one can come (which is misleadingly called “Type V” in some sources, though it doesn’t really belong in the same family — I’d rather think of it as “Type Zero”) is to take ζ=−1, getting a finite-index subgroup of the orthogonal group for X2 + (q−1) Y2; but then any even self-dual code over a field of more than 4 elements must have weight enumerator (X2 + (q−1) Y2)n/2 and is the direct sum of self-dual [2,1,2] codes (which exist iff  −1 is a square).

Beyond Hamming weight enumerators, there are some further examples (complete or joint weight enumerators, etc.) where elementary observations plus MacWilliams yield invariance under a complex reflection group, or a group close enough to being a complex reflection group that one can still give a satisfactory explanation of its invariant ring. See for instance Rains and Sloane’s chapter “Self-Dual Codes” from the Handbook of Coding Theory, especially Section 6 (“Invariant Theory”, starting on p.29) on general results and techniques, and Section 7 (“Gleason’s theorem and generalizations”, starting on p.47) for many further cases. For example (p.59), a self-dual quinary code C ⊂ (Z/5Z)n has symmetrized weight enumerator sweC(x, y, z) = cweC(x, y, z, z, y) invariant under a group isomorphic over C with the symmetries of an icosahedron, with a polynomial invariant ring with generator degrees 2, 6, 10; setting z = y recovers the Hamming weight enumerator as an element of a somewhat smaller ring than what one gets by applying the MacWilliams identity directly to WC.

Tuesday, Oct. 8: Existence and uniqueness of the binary Golay codes

There are (at least) two natural directions to go with the Gleason theorems: the extended Golay codes G24 and G12, which are the first codes of Type II and III without words of length 4 and 3 respectively, and have some remarkable combinatorial properties; and extremal enumerators and codes for large n, which lead to a different flavor of mathematics and some long-open problems (e.g. it is still unknown whether there exists a Type II [72, 36, 16] code). We take up the binary Golay code(s) first.

The Golay codes have symmetry groups that are large (e.g. large enough to be multiply transitive on the coordinates) and sporadic. The size indicates that there are various ways to prove uniqueness: in each case there are various natural objects on which the group acts uniquely (small subsets of the coordinates, codewords of given weight, etc.), and we can fix one of them, try to show there’s still a unique way to fit a code of the desired properties around it, and then get as a bonus the transitivity on objects of that type. The fact that the groups are sporadic suggests that no proof of uniqueness can tell us everything of interest about the Golay codes: different choices will reveal only different parts of the structure of the code and its automorphism group, and suggest generalizations in a different direction. One could give an entire graduate course on the Golay codes and Mathieu groups (even if I  might not be “one” who could give such a course), but since that’s not an option I must choose one approach and more-or-less stick with it. Given what we’ve seen so far, I’ll take the route via the subgroup M21 = PSL3(F4) of M24 = Aut(G24).

Suppose, then, that C is a Type II [24, 12, 8] code. We noted already that dropping any one coordinate yields a [23, 12, 7] code which is a perfect 3-error-correcting code. In particular, if we choose any word w of weight 5 supported on the remaining 23 coordinates then it is at distance at most 3 from one of the codewords, and thus at most 4 from one of the words c of C; but C is even with minimum distance 8, so c has weight 8 and contains the support of w, and is determined uniquely by that property (this can be seen directly from the triangle inequality together with dmin(C)=8). In other words, the supports of the 759 minimal words constitute a (5, 8, 24) Steiner system. As a check on this computation, verify that 759 = Binomial(24, 5) / Binomial(8, 5). [A similar argument works for the extended Hamming code, showing that its 14 words of weight 4 give a (3, 4, 8) Steiner system; and indeed 14 is Binomial(8, 3) / Binomial(4, 3). More generally, for each d ≥ 3 the quadruples in (Z/2Z)d that sum to zero constitute a (3, 4, 2d) Steiner system with automorphisms by AGLd(Z/2Z), and that group acts 3-transitively on the 2d points.] Moreover, any two of the blocks of the (5, 8, 24) Steiner system intersect in 0, 2, or 4 points. [This can also be checked combinatorially, without assuming that the blocks come from a [24, 12, 8] code of Type II, by calculating the “intersection triangle” of the system; for more on this, and Steiner systems in general, see for instance Cameron and van Lint’s text Designs, Graphs, Codes and their Links (London Math. Society, 1991, reprinted 1996).]

If we fix one, two, or three of the 24 coordinates, and consider only those octads (8-element blocks) containing the chosen point(s), then the residual heptads, hexads, or pentads constitute a (4, 7, 23), (3, 6, 22), or (2, 5, 21) Steiner system with 253, 77, or 21 blocks respectively. In the last case, any two blocks intersect in one point (because the original octads meet at the three chosen points, and thus in a unique fourth one; in fact this holds automatically for a (2, 5, 21) system because of general properties of “square designs” with as many blocks as points, via the minimal equations of their incidence matrices — see again Cameron and van Lint). So we get a combinatorial projective plane of order 4. We thus begin by showing that any such plane Π is isomorphic with the algebraic projective plane P2(F4). We use the hyperovals that we already encountered earlier, and along the way encounter some exceptional behavior of the alternating and symmetric groups A6 and S6, namely the triple cover 3. A6 and the outer automorphism of S6.

We already know Π has hyperovals; choose one, and call it O. Every point outside O is on three lines meeting O in two points each, and thus gives a “syntheme”, which in this context is the (somewhat quaint 19th-century) terminology for a partition of the six points of O into three pairs. But there are 21−6 = 15 points, and only 6! / (233!) = 15 synthemes, so each occurs for some point. We now have a labeling of the 21 points of Π by the 6 points and 15 synthemes. As for the lines, 15 of them meet O in two points (here 15 arises as Binomial(6,2)), so we have a labeling of all but 6 of the lines by pairs of O-points. We claim that those lines form a hyperoval O* in the dual project plane Π*. Indeed we noted already that every point outside O lies on three lines that meet O, and thus only two lines disjoint from O, i.e. two lines of O*. So no three lines of O* meet at a point, which makes O* a hyperoval in Π* because |O*| = 6. This O* is called the dual hyperoval of O (and indeed it is clear that O** = O). NB this construction is special to projective planes of order 4, unlike the notion of a “dual oval” for projective planes of odd order, where the dual oval consists of all lines tangent to (not “disjoint from”) the oval, and the construction works for all odd q (and for any choice of projective plane if there’s more than one plane that admits an oval).

Each of the O* lines, say l, lies on five syntheme points. This gives 3 · 5 = 15 pairs, and indeed each pair yields a line that meets l at a unique point and thus appears in exactly one of the five synthemes in l. So l determines a partition of the 15 pairs into five synthemes. But there are exactly six such partitions, so O* must contain them all! They’re called the “synthematic totals” or simply “totals” of the 6-element set O. We have thus accounted for all the points and lines of Π (points are 6 oval points and 15 syntheme points, and lines are 15 pairs and 6 totals), and shown in effect that Aut(Π) acts simply transitively on the 21 · 20 · 16 · 9 · 2 · 1 ordered hyperovals. Since we already know a projective plane of order 4 with that many automorphisms, we conclude that Π ≅ P2(F4) and Aut(Π) ≅ Aut(P2(F4)) = PΓL3(F4) (where Γ in place of G indicates semidirect product with Aut(F4); likewise ΣLn(F) and PΣLn(F) mean the semidirect products of SLn(F) and PSLn(F) with Aut(F)). We shall also need the number of hyperovals, which is 21 · 20 · 16 · 9 · 2 · 1 / 6! = 168.

Before resuming the construction of G24 we note some further properties of the stabilizer of O in Aut(Π). Becauase Π acts simply transitively on ordered hyperovals, the stabilizer of an unordered hyperoval O is isomorphic with the group S6 of permutations of the points of O. This group is generated by simple transpositions, which are not in PGL3(F4) because that group acts simply transitively on ordered 4-arcs. So they must be in the complement of PGL3(F4) in PΓL3(F4). This can also be seen explicitly by choosing coordinates so that four points of O are at (1, 0, 0),  (0, 1, 0),  (0, 0, 1), and (1, 1, 1): the remaining two points must be at (1, ρ, ρ2) and (1, ρ2, ρ), and are thus switched by the field automorphism acting coordinatewise. It follows that a permutation of O lifts to a PGL3(F4) automorphism of Π iff it is an even permutation. I claim that indeed all the even permutations are in PSL3(F4), and the entire S6 is in PΣL3(F4). Note that this is a nontrivial statement: because F4 consists of zero and the three cube roots of unity, every scalar matrix in GL3(F4) has determinant 1, so the determinant map GL3(F4) → (F4)* descents to a well-defined function on PGL3(F4). Now A6 is a simple group, so the restriction of our map to A6 must be trivial, i.e. A6 is in the kernel PSL3(F4) of the map, as claimed. This could also be seen directly using our coordinates for O: the group is generated by 3-cycles, all of which are conjugate and thus have the same determinant; and one of these cycles is diag(1, ρ, ρ2) (fixing the unit-vector points and cyclically permuting the other three), which indeed has determinant 1. We could also have used the permutation matrix corresponding to a 3-cycle, under which the last three points are fixed and the first three permuted cyclically. We conclude that for two hyperovals O, O all g PGL3(F4) such that g(O) = O have the same determinant; this gives a partition of the hyperovals into three PSL3(F4) orbits each of size 168 / 3 = 56.

We pause to note some exceptional behavior of the permutation groups A6 and S6 that are visible in this picture, namely the outer automorphism of S6 and the triple cover of A6.

Our copy of S6 in Aut(Π) also acts on O*, and this gives an outer automorphism of S6it is known that S6, alone among all symmetric groups, has an outer automorphism, and we can see it in the simultaneous action of the hyperoval stabilier on O and O*. This outer automorphism switches the conjugacy classes of 15 simple and 15 triple transpositions, and also the 40 single and 40 double 3-cycles, and the 120 six-cycles with the 120 permutations of cyclc structure 112131. The point stabilizer S5 maps to a transitive subgroup (indeed a sharply 3-transitive subgroup) of S6, which can be understood as PGL2(Z/5Z) acting on the projective line over Z/5Z. Note that two triple transpositions commute iff  they share a 2-cycle, whereas two simple transpositions commute unless they overlap; thus a total, considered as a collection of 5 triple transpositions, is mapped under an outer automorphism to a collection of 5 overlapping simple transpositions, i.e. a clique of size 5 in the line graph of the complete graph K6. For each of the six vertices p of this K6, the 5 edges through p are such a clique, and it is easy to see that there are no others (the other maximal cliques are triangles, which correspond to the one wrong turn that one might make when constructing a total).

The other exceptional object is the triple cover 3. A6 of the alternating group A6. The simple alternating groups An all have double covers, which are their preimages in the universal covers of the orthogonal groups On−1(R), but for n=6 and n=7 these do not give the full Schur multiplier of An because there is also a triple cover. For n=6 we readily obtain this triple cover as the preimage of A6 under the quotient map SL3(F4) → PSL3(F4). We show that this triple cover is nontrivial by checking that its restriction to a 3-Sylow subgroup is already nontrivial. This subgroup is generated by disjoint 3-cycles, such as the two that we have already represented by diag(1, ρ, ρ2) and a permutation matrix. These transformations commute in PSL3(F4), but their lifts to SL3(F4) have a nontrivial commutator, ρ±1 times the identity, and thus generate a Heisenberg group over Z/3Z, whereas a trivial triple cover would restrict to an elementary abelian 3-group. Because O is also the point configuration associated to the hexacode, we see that 3. A6 is the hexacode’s automorphism group in the group of (F4)*-signed permutation matrices; If we allow also field automorphism we get automorphisms by a triple cover 3. S6 of the symmetric group of permutations of the coordinates. The semidirect product “26:(3. S6)” of this automorphism group with the hexacode then arises as a maximal subgroup of M24, namely the stabilizer of a “sextet” (a partition of the 24 coordinates into 6 tetrads any two of which form an octad).

Further remarks: One can likewise use (hyper)ovals to prove the uniqueness of the projective planes of orders 2, 3, and 5, and count their automorphisms. For q=2 (respectively q=3), a hyperoval (resp. oval) O has four points, six secants, and three points off O where two secants meet (so here we get not an outer automorphism S6S6 but the nontrivial map S4S3 that makes S4 solvable); in each case we soon label the remaining lines and points, and find the incidences between them. S4 then arises as the affine linear group AGL2(Z/2Z) in the q=2 setting (stabilizer of a line in the projective plane), and the projective linear group PGL2(Z/3Z) for q=3 (automorphism group of the projective line, which is isomorphic with O). Once you’ve done this exercise, try the q=5 case, where the ovals have six points, and synthemes and totals again come into play but are combined in a different way from what we saw for q=4.

Thursday, Oct. 10: The binary Golay codes and related structures, cont’d

Let C1 be the code generated by (characteristic functions) of lines in Π and C0 be the even subcode of C1, which is generated by symmetric differences of lines. Then C is self-dual, and thus doubly even because the generators have weight 8. The [24, 12, 8] code C that we started with contains (c | 000) for every c C0, and (c | 111) for every c C1 not in C0. We shall show that C0 has dimension 9 (and minimum weight 8), and thus that (C0) has dimension 21 − 12 = 9, from which we’ll soon deduce the existence and uniqueness of C.

We need to study (C0), because C is self-dual and contains (c | 000) for every c C0, whence every codeword (c | ???) of C must have its Π part c in (C0). Now (C0) consists of all w such that w, l has constant parity as l varies over the 21 lines. It readily follows that either w is 0 or it has weight at least 5, with equality iff it is one of the lines. Now, as an extension of our earlier observations on doubly even codes, since C0 is doubly even every coset of C0 in (C0) has constant weight mod 4. For example, every word in C1 has weight congruent to either 0 mod 4 (if in C0) or 1 mod 4 (if not). Since C0 is contained in its dual, which has minimum weight 5, it follows that C0 has minimum weight 8, which is one property we’ll need if C is to have minimum weight 8. Further examples of low-weight words in (C0) are the 168 weight-6 hyperovals we counted Tuesday, and the 360 subplanes (copies of Π2 in Π4), which have weight 7. Now C1 contains the all-1 word (sum of all the lines, or of the 5 lines through any given point); so (C1) is the even subcode of (C0), and consists of all words w such that w, l is even for every line l. If such w has weight at least 6 then it has 4 collinear points, and is thus congruent mod C0 to a word of lower weight. It follows that (C0) has no words of weight 0 mod 4 other than those already in (C0), and the only words of weight 8 in C0 are the Binomial(21, 2) = 210 line pairs. We claim that there are exactly three cosets of C0 in (C0), other than C0 itself, that consist of words of even weight. To find three, start from any 3-arc and check that it is contained in three hyperovals, no two of which are in the same coset. But for any three binary words wi (i=1,2,3), if each wi and wi + wj has weight 2 mod 4 then w1 + w2 + w3 has weight 0 mod 4 (by inclusion-exclusion for symmetric differences). So once we’ve found three nontrivial cosets we’ve found them all. It follows that dim((C0)) = dim(C0) + 3, and thus that C0 has dimension 9 as claimed.


Here are some more details about the (re)construction of the (3, 6, 22),  (4, 7, 23), and (5, 8, 24) Steiner systems from Π, taken from a combinatorics course I taught here in 2009: lecture notes from March 24 and March 26. (At that point in the class I had not yet introduced linear codes, so some results had to be obtained differently.)

Tuesday, Oct. 15: Extremal enumerators and codes; the Mallows-Odlyzko-Sloane theorem

Each of the special codes suggested by Gleason’s Theorem (extended Hamming, G24, G12, and hexacode for Types I, II, III, IV respectively) was suggested by considering the smallest n for which Gleason’theorem does not determine the weight enumerator uniquely and then making the enumerator unique by incrementing the minimum weight, doubling it from (say) e to 2e (this e is 2, 4, 3, 2 for Types I, II, III, IV respectively). In general, in each Type if the weight enumerator of a code of length n has m undetermined coefficients then we can try incrementing the minimum weight m times, finding that the condition wtmin(C) ≥ (m+1)e determines WC uniquely. We call the resulting polynomial the extremal weight enumerator for its Type and length, and call any C with that enumerator an extremal code. Examples include the four special codes listed above, as well as the codes so short that Gleason already determines WC uniquely without any condition on the minimum weight. But in this generality it is no longer necessarily true that there is a unique length-n code of the specified Type with minimum weight (m+1)e. We already saw one example where the code is not unique (the two Type II codes of length 16), and one where no code exists (the Nordstrom-Robinson weight enumerator for Type I codes of length 16). We must also consider the possibility that an extremal code has minimum weight larger than (m+1)e: conceivably the coefficient of WC that counts words of that weight might happen to vanish. We already saw one such case of extra zeros, though it did not affect the count of words of weight (m+1)e: the extended binary Golay code G24 is also extremal of Type I, but has no words of weight 10. We first show that in fact the coefficient of the extremal weight enumerator that counts words of weight (m+1)e is always positive. This means that (m+1)e is an upper bound on the minimum weight of any self-dual code of each Type; for Types II, III, and IV this bound is asymptotically tighter than the sphere-packing bound for codes of length n → ∞ and rate 1/2. [For Type I the sphere-packing bound shows that for large enough n there are no extremal codes, and indeed for each k there are only finitely many n for which there can be a self-dual binary code of minimum weight ≥(n/4)−k; we shall soon see that this is true for each Type in a different way: while the coefficient that counts words of weight (m+1)e is positive, the next coefficient eventually goes negative, and indeed for large n it is not possible for each count through (m+2)e to be nonnegative if the minimal distance is within k of extremality.]

We can describe each of our four cases as follows. Dehomogenize WC by setting X=1, and let Y = qe because WC(1,Y) is a polynomial in this q. […]

Proposition (Mallows-Odlyzko-Sloane 1975). Suppose all the coefficients of the power series for F0 are positive, and the qk coefficient in the power series expansion of Δ−1 is positive for all k ≥ −1. Then for each m the extremal weight enumerator of degree m in F and Δ has a positive qm+1 coefficient.

The hypothesis holds for each of our four Types of self-dual codes, so we obtain:

i) A Type I code of length n has minimum distance at most  2 floor(n/8) + 2.
ii) A Type II code of length n has minimum distance at most 
4 floor(n/24) + 4.
iii) A Type III code of length n has minimum distance at most 
3 floor(n/12) + 3.
iv) A Type IV code of length n has minimum distance at most 
2 floor(n/6) + 2.

Proof of Proposition, using the invariance of the formal residue of a power series: see this TeX page. [This approach is simpler than the original proof from 1975, though that proof gives further information that we’ll return to next time. The approach we use here does let us give formulas for the qm+1 coefficient (number of minimal words of a putative extremal code) in each case, such as the formulas exhibited with OEIS Sequence A034414 for Type II codes.]

Thursday, Oct. 17: The sphere-packing problem; lattices and their theta series

[General introduction to Euclidean lattices, the Hermite constant (best lattice packing of equal spheres), etc., with material mostly contained in the first chapter of “SPLAG” = Sphere Packings, Latiices, and Groups by Conway and Sloane. For the analogue in the setting of unimodular lattices and their theta series, see also Part I of my Notices article “Lattices, Linear Codes, and Invariants”. In this setting, in the Type II (= even unimodular) case, Δ is actually the modular form usually denoted by Δ, so its inverse is q−1n≥1 (1−qn)−24, which is manifestly positive because it is a product of positive geometric series; while in the Type II (odd unimodular) case, the factor (1−qn)−24 is replaced by the 8th power of (1+qn) / (1−q4n), which again has a positive power-series expansion. This (together with the positivity of the theta functions F) gives us two further parts of our Corollary: a unimodular lattice of rank n has minimum norm at most floor(n/8) + 1, and an even unimodular lattice of rank n has minimum norm at most 2 floor(n/24) + 2. The second part was already proved in 1969 by Siegel, who was led to the question from a rather different direction (as one can already gather from the title of his paper “Berechnung von Zetafunktionen an ganzzahligen Stellen” = calculation of zeta functions [of number fields] at integers, which can be found in volume IV of his collected works).]

Tuesday, Oct. 22: Overview of sphere-packing bounds; theta series of unimodular lattices; extremal theta functions and lattices

The analogue for sphere packings of the Gilbert-Varshamov bound is the Minkowski-Hlawka bound, which holds for packings not just of spheres but of translates of an arbitrary centrally-symmetric convex body B in Rn, as long as B is bounded and has positive volume. It looks simpler than the Gilbert-Varshamov bound because real vector spaces have nontrivial homothecies: while all balls in a given Hamming space “look” different, 2B = B+B is just a scaled copy of B, and thus has volume 2n Vol(B). This gives us a lower bound of 2n of the packing density. Again an averaging argument lets us obtain a lattice packing satisfying the same bound; and again it is a long-standing embarrassment that we cannot attain this bound “explicitly”, nor do substantially better, even for Euclidean spheres (the best bounds there are of order n½, and still by averaging arguments). For lp balls with p > 2, one does get exponential improvements (but still not by explicit lattices, except for rather large p — NB l balls pack with density 1); but the Poisson summation formula, together with positivity of the Fourier transforms of Gaussian functions, prevents such improvements for p = 2 (and likewise the fact that exp(−|x|p) has positive Fourier transform for p < 2 prevents this technique from improving on Minkowski-Hlawka for p < 2). See the paper

Elkies, N.D., Odlyzko, A.M., and Rush, J.A.: On the packing densities of superballs and other bodies, Invent. Math. 105 (1991), 613–639.
For upper bounds on Euclidean sphere-packings, the (literal!) sphere-packing bound on the density is 1, and it’s not too hard to improve this to O(n2n/2) starting from the result of the second problem on our first problem set. This is already enough to show that we cannot have extremal Type I lattices for large n, because their density would grow as e/16+o(1))n/2, and πe/16 = 0.53+ > 1/2. The best asymptotic upper bound known (using much the same technique that we’ll develop for codes) is 2(−c+o(1))n with c ≈ 0.599, a constant that has not been improved on in decades now.
Thursday, Oct. 24: Overview of sphere-packing bounds; theta series of unimodular lattices; extremal theta functions and lattices

If L is unimodular, then (as for a self-dual binary code) the map taking a lattice vector v to its norm v, v⟩ mod 2 is a homomorphism, and the lattice is even (= “Type II”) when this homomorphism is trivial. Again this can happen only for n divisible by 8, which we prove using theta functions though there is also an algebraic proof. (See the first part of Serre’s A Course in Arithmetic, which also treats modular forms and the theta functions of unimodular lattices in the final chapter.) Whether or not this homomorphism is trivial, it’s given by v ↦ ⟨v, c⟩ mod 2 for some lattice vector c, because L is unimodular, and c is determined uniquely mod 2L. Such c is said to be a characteristic vector of the lattice; such vectors form a coset of 2L in L called the characteristic coset. (The same information is carried by the translate of L formed by all c/2 for c in this coset; this translate is known as the “shadow” of L.) The lattice is even iff the zero vector is characteristic iff the characteristic coset is trivial. iff L is its own shadow. Now the norm of any coset of 2L in L is constant mod 4, but the characteristic coset has constant norm mod 8. For example, the characteristic vectors of the Zn lattice are those all of whose coordinates are odd, and these have norm n mod 8. The fact that doubly even lattices exist only in dimensions divisible by 8 has a refinement: the characteristic norm is always n mod 8. Again this has an algebraic proof but can also be obtained from the transformation formulas for the theta function of L.

Construction A associates to any binary linear code C(Z/2Z)n a lattice LCRn by starting from the subgroup of Zn consisting of vectors whose reduction mod 2 is in C, and then scaling by 2−½ (i.e. dividing all inner products by 2). This last step makes Construction A commute with duality: the lattice LC. (This is proved starting from the observation that LC contains 2½Zn and is contained in 2−½Zn, and these lattices are each other’s dual, so the dual of LC is sandwiched between them as well.) If C has parameters [n, k, d] then LC has density 2k−(n/2) (so discriminant 2n−2k) and minimum norm min(2, d/2).

[Remarks on some nonlattice periodic sphere packings: Construction LC makes sense for any binary code C, whether linear or not; in general LC is a periodic subset of Rn, and if C has parameters (n, M, d) then LC has density 2−(n/2)M and the formula 2k−(n/2) still gives the minimum norm of a nonzero difference between vectors of LC. For example, suppose C is the Best [sic] code, which has parameters (10, 40, 4) and is known to maximize M given (n, d) = (10, 4). Then LC has minimum distance 2 and density 2−540 = 5/4. This is denser than any known lattice packing in R10, and is conjectured to be the densest possible in this dimension. This is the smallest dimension in which the Euclidean sphere-packing problem is expected to be solved only by a non-lattice packing (NB already for n=3, and several other small dimensions (some below 10), there are periodic nonlattice packings that have exactly the same density as the densest lattice packing). For more about Construction A, see SPLAG chapter 5, which also gives further Constructions, e.g. “Construction B” generalizes Leech’s construction of Λ24 from G24. Look up “Construction” in the index for further variations later in the book; for example there’s a “Construction A3 that associates a lattice in Rn to a ternary code of length n. We say a bit more about this a few paragraphs below.]

Back to the case of linear codes C and lattices LC: the theta series θL(τ) = ΘL(eπ) can be expressed in terms of the Hamming weight enumerator WC(X,Y): substitute

θZ⟨2⟩(τ) = nZ ein2τ = 1 + 2eiτ + 2eiτ + 2e18πiτ + ···
for X, and what might be called
θ(Z+½)⟨2⟩(τ) = nZ eπi(n+½)2τ = 2eπiτ/2 (1 + eiτ + e12πiτ + e24πiτ + ···)

for Y. [The notation “L⟨2⟩” means the lattice L with all inner products multiplied by 2; for L=Z the resulting lattice is also known as the root lattice A1.] The transformation formulas of these weight-½ modular forms under the generators τ ↦ τ+1 and τ ↦ −1/τ of PSL2(Z) should look familiar: the former takes (X, Y) to (X, iY), and the latter takes (X, Y) to (τ/i)½(X+Y, XY) / 2½. This explains why the same magic numbers 8 and 24 arise in the description of theta functions of Type II lattices as did for weight enumerators of Type II codes; indeed our substitution of θ(Z+½)⟨2⟩(τ) and θZ⟨2⟩(τ)) for X and Y takes the weight enumerator of the extended Hamming code to θE8, and takes δ24 = X4Y4 (X4Y4) to 16Δ. […]

This story generalizes in various directions (the space of lattices in Rn is bigger than the space of codes in Hamming space). In general if L is any lattice with rational inner products then θL is a modular form of weight n/2 for some congruence subgroup Γ of PSL2(Rn) (i.e., Γ contains (necessarily with finite index) some Γ(N), that being the subgroup of PSL2(Zn) consisting of matrices congruent to ±1 mod N; warning: Γ may also contain elements not in PSL2(Zn)). This ultimately comes down to Poisson summation again, but the proof is subtler because — with few notable exceptions — Γ is not generated by just a translation together with some involution τ ↔ −c. These “notable” exceptions include Γ(1) = PSL2(Z) itself, and also the index-3 subgroup of Γ(1) consisting of transformations of the theta function of a Type I lattice. Two further examples arise for even lattices L whose duals are isomorphic with L⟨1/N for N=2 or 3, such as D4 and A2 respectively. (This case includes several further record lattices, in dimensions 16 and 32 for N=2, and dimension 12 for N=3.) Here the rings of modular forms behave much as they do for Γ(1), but with the weights 4, 12 replaced by 2, 8 for N=2 and 1, 6 for N=3. This again gives rise to notions of extremal theta series and extremal lattices (such as the above records), with results analogous to what we saw for Type II lattices.

Construction A generalizes in several ways, with Z⟨2⟩ and ½Z⟨2⟩ replaced by any lattice L0 contained with finite index in some other lattice L1: additive codes C of length n with alphabet L1/L0 correspond to lattices LC between (L0)n and (L1)n. As long as all nontrivial cosets of L0 in L1 are isomorphic, the theta series of LC can be obtained from the Hamming weight enumeratorf WC. (Without the isomorphic-coset condition we need cweC.) Examples of such pairs (L0, L1) are (3Z, Z) [“Construction A3”], ((2+i)Z[i], Z[i]) [for codes over Z/5Z], and ((2−ρ)Z[ρ], Z[ρ]) [for codes over Z/7Z, where ρ is a cube root of unity, so Z[ρ] ≅ A2]. Two examples of particular interest are L0 = A2 or D4 and L1 = (L0)*: if C is of Type III or IV respectively then LC is an even unimodular lattice. Taking C=G12 in the former case, and the hexacode in the latter, we obtain lattices LC that are within one step from the Leech lattice, as with Leech’s original construction from G24.

[For more on “shadows” and their weight enumerators or theta series, see also Chapter 5 (starting on page 26) and also pages 48–49 and 73–75 of Rains-Sloane, and also my two papers “a characterization of the Zn lattice” and “Lattices and codes with long shadows”.]

Tuesday, Oct. 29: Asymptotics of kissing numbers of extremal codes and lattices via the stationary-phase method

For a statement and proof sketch of the stationary-phase asymptotics that we use, see this write-up (and note the two Exercises at the end). There are naturally plenty of sources for this and related techniques in general; for instance it is a recurring [no pun intended] theme in Stanley’s Enumerative Combinatorics.

Thursday, Oct. 31: Asymptotic impossibility of (nearly-)extremal codes. Start on Reed-Muller codes

See this write-up outlining a proof of the Mallows-Odlyzko-Sloane theorem.


CORRECTED Nov.2 (and replaced k by F for the field, because we need k for the dimension):

Reed-Muller codes RM(r, m) are linear binary codes of length n = 2m that generalize some of the codes we’ve seen already: zero, repetition, extended Hamming, parity-check, and the full Hamming space. Recall that the extended Hamming code can be constructed as the space of affine-linear functions on F3 where F = Z/2Z. This is RM(1, 3). In general, RM(r, m) consists of the space of polynomial functions of degree at most r on km (and thus automatically has automorphisms at least by the affine linear group AGLm(F)). We next find the dimension and minimum weight of each RM(r, m), prove that the dual of a Reed-Muller code is again Reed-Muller (specifically RM(m−1−r, m)), show that RM(r, m) is doubly even when it’s self-orthogonal (and even more highly even when r is small enough compared with m), and generalize to Reed-Muller codes over arbitrary finite fields.

The minimum weight is actually easier: I claim that RM(r, m) has minimum weight 2mr, and that this minimum is attained precisely by affine subspaces of codimension r in Fm. This is proved by induction on m, being clear for m=0. To get from m to m+1, split Fm+1 into two parallel copies of Fm, and use the fact that if a polynomial of degree r vanishes on a hyperplane then it factors as the corresponding linear polynomial times a polynomial of degree r−1. (NB this last step is not as trivial as it may seem because we’re allowed to evaluate only at points in Fm, not over the algebraic closure; at any rate the analysis in the next paragraph will suffice to establish this.) This will also let us prove by induction that a word c of weight 2mr in RM(r, m) is a codimension-r subspace, once we show that it must be contained some a hyperplane (except for r=0 when the claim is immediate). But if this weren’t true then every hyperplane would intersect the support of c in exactly half of wt(c), and then the function (−1)c: FmR would have a discrete Fourier transform supported on the origin, making (−1)c constant. But then c is constant and we’re back in the “immediate” case of r=0.

[The same inductive technique shows that if cRM(r, m) has weight less than 2m+1−r then 2m+1−r−wt(c) is a power of 2. In fact the words of weight less than 2.5·2mr have been completely described by Azumi, Kasami, and Tokura (Information and Control 30 (1976), 380–395).]

To obtain the dimension of RM(r, m), fix some choice of coordinates on Fm, and write any polynomial function FmF as a sum of monomials linear in each variable separately — this is possible because x2 = x for all (i.e. for both) x in F. This shows that every polynomial function is a linear combination of the 2m monomials. These monomials span the space of all functions FmF because the characteristic function of any point p is the polynomial i (1 + xi − pi). Therefore our 2m monomials are linearly independent, and RM(r, m) has dimension jr Binomial(m, j). At this point we can prove our claim about polynomials vanishing on a hyperplane in Fm+1: Choose coordinates so that the hyperplane is x1 = 0, and consider the 2m monomials divisible by x1. They vanish on the hyperplane x1 = 0, and on the complementary hyperplane x1 = 1 they agree with our 2m monomial generators of the functions on Fm. Since those monomials are independent, they generate the 2m-dimensional space of functions on Fm+1 that vanish on x1 = 0, and we conclude that every polynomial of degree at most r that vanishes on x1 = 0 can be written as x1 times a polynomial of degree at most r−1.

It follows that RM(m−1−r, m) has dimension 2m − dim(RM(r, m)). So we can prove that these two Reed-Muller codes are each other’s dual by checking that they are orthogonal to each other. By linearity it is enough to check that any two monomials of degrees at most r and m−1−r are orthogonal to each other, which is to say that the sum over Fm of any monomial of degree <m vanishes; but this is clear — indeed monomial of any degree d has weight exactly 2md. This also proves that RM(r, m) is the parity-check code for r = m − 1, and thus that RM(r, m) is even for all r < m. Moreover, RM(r, m) is self-orthogonal iff 2rm−1, and self-dual iff 2r = m−1; we conclude that RM(r, m) is doubly even if 2rm−1, with the one exception of (r, m) = (0, 1). In particular, if r ≥ 1 then RM(m−1−r, 2r+1) is a Type II code, and is extremal for r=1 (the extended Hamming code) and r=2 (a [32, 16, 8] code), but not for any r>2 (and very far from extremal as r gets large, because the minimum weight grows only as the square root of the length).

Tuesday, Nov. 5: Reed-Muller codes, cont’d

We saw that RM(r, m) is even for rm−1, and doubly even for 2rm−1. Also, all nonzero weights in RM(1, m) are 2m−1, and it can be shown that RM(2, m) has all weights either 2m−1 or 2m−1±2ρ for some ρ with floor((m−1)/2) ≤ ρ ≤ m−1 (this is routine at least if you know about quadratic forms over F; we already noted that we can show inductively that the weight is either 2m−1 or 2m−1±2ρ for some integer ρ, but not the lower bound on ρ). This suggests that in general all words in RM(r, m) have weight divisible by 2v where v = floor((m−1)/r) — that is, that the number of rational points on any hypersurface of degree at most r in Fm is a multiple of 2v. We prove this by using the inclusion-exclusion formula for symmetric differences that we already mentioned several times earlier in the class, but this time need in full generality. Choose coordinates as before, and write any word c in RM(r, m) as sum iI ci of monomials of degree at most r. Then inclusion-exclusion for Δ gives

wt(c) = ∅≠JI ((−2)|J|−1wt(ΠjJ cj))

and each subset of e monomials has weight a power of 2 that’s at least 2mre (and also at least 1). Multiplying by (−2)|J|−1 always yields a multiple of 2v (check this!), so the same is true of wt(c), QED.

The weight enumerator of RM(r, m) is still unknown for most values of r and m. It is trivial for r=0, and easy for r=1 (where all weights are 2m−1 except for the all-zero and all-ones words); and for r=2 the enumerator can be obtained from the theory of quadratic forms. By MacWilliams we thus know the weight enumerator also for r ≥ m−3. But for intermediate r only a handful of cases have been computed; for large m (and r in [3, m−4]) there are still many coefficient undetermined even once we use the known results on divisibility and on words of weight up to 2.5wtmin for both the code and its dual.

We easily generalize the construction of Reed-Muller codes by making F an arbitrary finite field and regarding the polynomials of degree at most r on Fm as words of length qm where q is the cardinality of F. Most of the results for binary codes generalize, though sometimes with nontrivial effort. For starters, the space of functions FmF now has a basis consisting of the qm monomials of degree at most q−1 in each variable (proved as before by constructing the characteristic function of any point as a product of polynomials of degree q−1 in the coordinates). The polynomial functions of degree ≤r thus constitute a code whose dimension is the sum over dr of the zd coefficients of the generating function (1 + z + z2 + ··· + zq−1)m. The codes of degrees r and (q−1)m−1−r have dimensions that sum to qm, so again we can show that they’re each other’s dual by checking orthogonality, which here reduces to showing that the sum over Fm of any monomial of degree less than (q−1)m vanishes; and we already saw this in the context of Chevalley’s part of the Chevalley-Warning theorem.

Once q>2 we cannot expect that most of these codes will have any nontrivial divisibility condition on their Hamming weights; for instance, the single-checksum code of degree (q−1)m−2 has all possible weights except 1. But it is still true that the weights are highly divisible for small r: for r=1 all nonzero weights are qm (for constant functions) and qm − qm−1 (for actual hyperplanes), and for r=2 we can use the structure of quadratic forms over finite fields to show that there are only O(m) different weights, all divisible by the floor((m−1)/2) power of q. In fact it is true in general that, as we saw for q=2, the number of points on any hypersurface of degree at most r is always a multiple of qv where v = floor((m−1)/r); but here this is a nontrivial theorem (J. Ax, Zeros of polynomials over finite fields, Amer. J. Math. 86 (1964), 255–261) and it seems that no easy proof is known. […]

Tuesday, Nov. 7: a bit more on Reed-Muller codes; introduction to cyclic codes

Tuesday, Nov. 12: Cyclic codes cont’d: the BCH bound and BCH codes; QR codes, and Golay codes as QR codes

Away from characteristic 2 it is simpler to proceed as follows: define c(0) to have zeroth coordinate α and the other coordinates 1 or −1 according to the Legendre symbol (quadratic character) mod n. Then each of the translates c(a) of c(0) has inner product α2+n−1 with itself and −1 with all the other translates. So, when (q/n) = +1, we can (by quadratic reciprocity) choose α in F so that α2 = −n, and then (c(a), c(b)) = −1 for all a, b mod n, whether equal or not. Extending each c(a) by a coordinate of 1 “at ∞” then gives generators of the extended QR code of length n+1, while the cyclic QR code of length n is generated by differences c(a) − c(b).

Thursday, Nov. 14: Krawtchouk polynomials and Lloyds’s theorem on perfect codes

At the end of the Krawtchouk/Lloyd notes we need the fact that the Krawtchouk polynomial Ke(x) actually has degree e, and no smaller. We can prove this by calculating that its xe coefficient is (−q)e/e!, a fact that is also key to the proof of the Tietäväinen–van Lint theorem; the factor qe comes from the binomial expansion of ((q−1)+1)e. We could also have deduced this from the definition of Ke in terms of the discrete Fourier transform of 1Si : if any Ke had degree strictly less than e then the Krawtchouk polynomials would be linearly dependent, and this linear dependence would be inherited by the characteristic functions 1Si — which is absurd because the Hamming spheres Si are pairwise disjoint.

Tuesday, Nov. 19: Introduction to the linear programming (LP) bounds on error-correcting codes

Thursday, Nov. 21: orthogonal polynomial basics

The Krawtchouk polynomials for given q and n are orthogonal. We shall apply some general results about orthogonal polynomials, which are classical but not well-known, at least not known as well as they would have been ome decades ago. So most of this class will be devoted to developing the results we shall use. This material is still standard enough that special-purpose lecture notes should not be necessary; for references, I think the relevant chapters of Körner’s Fourier Analysis contain everything we shall use, while Szegö’s Orthogonal Polynomials has a more extensive treatment of the topic.

[I switched the order of the last two because the three-term recurrence also gives an easy inductive proof of the interlacing property: given that Pi−1 changes sign between each pairs of roots of Pi, we deduce the same for Pi+1, but with opposite sign (because Pi+1 is congruent mod Pi to a negative multiple of Pi−1); and then the signs at ±∞ give the two extra roots of Pi+1 beyond the smallest and largest roots of Pi.]

Tuesday, Nov. 26: LP bounds II: an asymptotic upper bound

We show that for fixed q and δ ∈ [0, (q−1)/q] the asymptotic rate R = logq(M) / n of a code with error-detection rate δ = d/n is bounded by R log qHq(ι), where Hq is the q-entropy function

H(ε) = −ε log ε − (1−ε) log (1−ε),

we introduced back in mid-September, and
ι = (1−1/q) − (1−2/q)δ − (2/q) [(q−1) (δ−δ2)]½
is the smaller root of the quadratic
q2(ι+δ)2 − 4qιδ − 2q(q−1)(ι+δ) + (q−1)2 = 0
(which decreases from ι = (q−1)/q at δ = 0 to ι = 0 at δ = (q−1)/q).


When a system {Pi} of orthogonal polynomials does allow for a closed form for the triple products ⟨1, Pi Pj Pk⟩ = ⟨Pi Pj, Pk (which give the expansion of PiPj as a linear combination of the Pk), the resulting expression is sometimes called the “Adams formula”, because Adams gave such a formula for the Legendre polynomials (which are orthogonal with respect to dx on an interval). For the Krawtchouk polynomials, such a formula is available only in the symmetrical case q=2, when the generating function simplifies to (1 + YY' + YY'' + Y'Y'')n, in which each term is either zero or involves a trinomial coefficient.

The stationary-phase estimates on Ki(x) show further that the oscillatory region between the smallest and largest roots of Ki is also the region that contributes the most to Ki, Kj: the normalized polynomial |Sx|½Ki(x) oscillates with constant amplitude (to within factors nO(1)) within that region, and decays exponentially outside it. [This is similar to the asymptotic behavior of the Hermite functions, obtained in the same way by normalizing the Hermite polynomials, which are orthogonal with respect to the weight function exp(−x2) on R.] Here is a Sage plot that illustrates this behavior for q=3, n=150, and i=10 and 11 (so also illustrating the interlacing property):

[Taking ι=10/150 and 11/150 in the relation between δ and ι at the transition points, and multiplying the resulting δ’s by 150, gives the estimates [61.4, 131.9] (i =10, red) and [59.5, 133.2] (i =11, blue) for the oscillatory region, which looks about right already for n=150; the transition gets sharper (as a fraction of n) as n grows. Here’s the code I wrote; Sage has documentation for an implementation of Krawtchouk polynomials, but I wasn’t able to use Krawtchouk(...) directly — fortunately my rudimentary skill in Sage programming was sufficient. The function K(n,q,i,x) computes Ki(x) itself; K1(n,q,i,x) computes a normalized polynomial (qn |Sx|)½Ki(x); and K2(n,q,i,x) normalizes further by dividing by Ki, Ki½ = |Si|½ to make 0≤xnK2(x)2 = 1. ]

Tuesday, Dec. 3: The ternary Golay codes and related objects

Here are extended lecture notes for existence and uniqueness of the extended ternary Golay code G12 and related objects: the order-12 Hadamard matrix, the sporadic Mathieu group M12, and the perfect Golay code G11 and the smallest sporadic Mathieu group M11. (The lecture notes are “extended” because we won’t cover the development via Steiner systems in class; see my Math 155 notes for this approach; the relation between affine, inversive, and projective planes is developed in the notes for 17 February ‘09.)

A couple of addenda:

1) We noted already that Leech’s construction of his lattice Λ24 from G24 has a G12 analogue, with (A2)12 in place of Z24; it is one of many miracles of Λ24 and its symmetry group Co0 that they accommodate both the (G24, M24) and the (G12, M12) structures.

2) [...]