Almost exactly one year ago (10 Sept 2010), Cem Say asked the following deceptively simple question:

Is the language $latex \{ a^{i}b^{j}c^{k} \mid i \neq j, i \neq k, j \neq k \}$ non-context-free?

Recall that a language is context-free if it can be accepted by a pushdown automaton. A pushdown automaton, in turn, is a finite automaton with access to a stack (so it has access to an unbounded amount of memory). A standard example of a context-free language that cannot be recognized by a finite automaton without a stack is {$latex a^n b^n \mid n \geq 1 $}. If a finite automaton has no access to a stack, it cannot store the value of $latex n$ if $latex n$ gets too large, so it has no way to be certain that the number of $latex b$ appearing exactly match the number of $latex a$. This intuition is formalized in the famous pumping lemma; and there is a similar tool, the pumping lemma for context-free languages, that can be used to prove that a language is not context-free.

It is easy to apply a standard technique — the pumping lemma — to show the language Cem Say asked about is not regular. (We make the exponents large enough that a finite automaton has to repeat a state, and take advantage of that confusion.) However, there doesn’t appear to be a good way to apply the pumping lemma for context-free languages to show this particular language is context-free. Initially, though, some site members thought the question was easy enough to be undergraduate homework, since the definition of the language is so close to other languages whose status can be resolved with a pumping lemma.

Fortunately, Frank Weinberg knew about Ogden’s Lemma, which is an extension of the pumping lemma for context-free languages (the pumping lemma is a special case of Ogden’s Lemma). With this tool, he was able to prove that, indeed, the language Cem Say asked about is not context-free. The proof I will give in this blog entry can be found in Frank Weinberg’s answer; I will give a version of the proof polished by Tsuyoshi Ito in the comments to that answer.

First, here is a statement of Ogden’s Lemma.

Ogden’s Lemma: If $latex L$ is a context-free language, then there exists some number $latex p>0$ such that for any string $latex w$ of length at least $latex p$ in $latex L$, and every way of “marking” $latex p$ or more of the positions in $latex w$, $latex w$ can be written as $latex w=uxyzv$ with string $latex u$, $latex x$, $latex y$, $latex z$, $latex v$, such that

- $latex xz$ has at least one marked position,
- $latex xyz$ has at most $latex p$ marked positions, and
- $latex ux^iyz^iv$ is in $latex L$ for every $latex i \geq 0$.

The pumping lemma for context-free languages is the special case where every position is marked. Armed with this tool, we can now prove that $latex \lbrace a^{i}b^{j}c^{k} \mid i \neq j, i \neq k, j \neq k \rbrace$ is not context-free.

Proof(that Cem Say’s language is not context-free): For a given $latex p$, choose $latex a^i b^p c^k$ and mark all the $latex b$’s (and nothing else). We choose $latex i$ and $latex k$ such that for every choice of how many $latex b$’s are actually pumped there is one pumping exponent such that the number of $latex b$’s is equal to $latex i$ and one where it is equal to $latex k$. Then $latex i$ and $latex k$ have to be from the set $latex S= \bigcap_{1 \leq n \leq p} \lbrace p-n + m*n \mid m \in IN_0\rbrace$ where $latex IN_0$ is the set of positive integers. The set $latex S$ is infinite, because it contains $latex p+ im$ for all $latex i \geq 0$, where $latex m$ is the least common muiltiple of $latex \lbrace 1, \ldots, p \rbrace$. So, assuming the language is context-free, we can always find an $latex i$ and $latex k$ in $latex S$ for any $latex b$, and by condition (3) of Ogden’s Lemma, the resulting string will be in $latex L$, contradicting the original definition of $latex L$. So the language cannot be context-free after all.

For questions and answers on related topics, click on the formal languages tag.

Filed under Answer of the Week

Tagged: context-free languages, language theory, original proof

Is $IN_0 = \mathbb{N}_0$? If so, it should say “non-negative integers”.

I wonder; is Ogden’s Lemma not typically taught in undergrad theory lectures? I know it was presented to us. I always have to look its exact formulation up, but the fact that there is a “stronger version” of pumping lemma for CFL should be handed to all students.

Hi Raphael,

I would say that Ogden’s lemma is not

typicallytaught in an undergraduate course in formal language theory. Considering that its proof is obtained just by a careful observation of a proof of the pumping lemma for CFL, I imagine that Ogden’s lemma can be one of the possible advanced topics covered in such a course. But clearly we have to draw a line somewhere.To fully motivate Ogden’s lemma, we would like to argue that it is strictly stronger than the pumping lemma. However, showing that the non-context-freeness of some language cannot be proved by the pumping lemma seems to be beyond the scope of an undergraduate course. This might be one of the reasons why Ogden’s lemma is not taught in many undergraduate courses.

Raphael,

For what it is worth, I have taught this material at a few universities in Europe and I have never taught Ogden’s Lemma. In fact, the Pumping Lemma for CFL is already hard enough for most undergraduate students.

You write: “the fact that there is a “stronger version” of pumping lemma for CFL should be handed to all students.” I guess that this is correct for theory-oriented graduate students, but I believe that in an undergraduate course our main goal should be to give students a flavour of the material and entice them towards further study of the material in the future. Less is actually more. (For instance, we do not teach students many different versions of the Pumping Lemma for regular languages.)

Nce. I’ve heard that there are some quite simple languages whose context-free status is still unknown. Can anyone back this up with examples? One of them

mighthave been$\{xx’: x, x’$ are of equal length, and disagree in at most 2 positions$\}$.

Also, consider the kind of algorithmic problem that Cem Say’s example suggests. We’re given an expression in some grammar augmented with variable exponents (i, j, k, etc.), along with some relations between these–perhaps just inequalities, perhaps also arithmetic relations. Question: is the resulting language regular? Context-free?

Obviously with a rich-enough grammar or set of exponent-relation constraints this is going to be undecidable. But what is the simplest specification vocabulary for which this problem is undecidable?

Tsuyoshi, Luca, we were given Ogden’s Lemma with proof in our first (undergraduate) theory course in second term. As you have noted, the proof is not considerably harder as the one of Pumping Lemma which was given as a special case.

I don’t think it was formally proven that OL is strictly more powerful than PL but we were given examples that convinced us.

But you are certainly right: You have to draw the line somewhere, and I can’t say that most students can use Ogden’s Lemma after that course.