# Archive for August, 2011

## On Learning Regular Languages

In our first post on learning theory, we’ll start by examining a nice question that started an interesting discussion and brought out some of the core issues learning theorists constantly have to deal with. It also serves as a good introduction to one of the classic areas of learning theory.

**The Question**

Laszlo Kozma asked the following question: **Is finding the minimum regular expression an NP complete problem?** Moreover, Laszlo wanted to know if this has any connections to learning regular languages. This question and the answers to it brought out many subtile points, including issues of representation, consistency, and complexity.

As we know, regular expressions describe regular languages, which are also equivalent to the class of Finite State Automata (DFA), a classic object in computability theory. Suppose we get examples of strings that are members of a regular language (positive examples) and strings that are not members of the language (negative examples), labeled accordingly. Laszlo wanted to know whether finding the smallest regular expression (or DFA) consistent with these examples is NP-hard, and moreover, what this has to do with learning regular languages.

**Learning and Occam’s Razor**

When we talk about learning, we need to first pick a model, and the first natural one to think about is Leslie Valiant’s classical PAC learning model (’84), partly for which he received this year’s Turing award. In PAC learning, the learner is trying to learn some concept (in this case, a specific regular language) and is given positive and negative examples (member strings and not) from some unknown distribution (say, uniform over all strings). The learner’s goal is to produce a hypothesis that will match the label of the hidden target (the membership in the target regular language) on most examples from the same distribution, with high probability. The learner has to succeed for every choice of target and every distribution over examples.

Say the learner knows the target class (in our case, that the target is some regular language). In that case, if the learner gets some polynomially many examples in the representation of the concept and can find *a small* hypothesis from the target class that is correct on all the examples, even if this hypothesis isn’t exactly equal to the target, it will generalize well and be suitable for PAC learning. This theorem is aptly named Occam’s Razor (Blumer et al. ’87).

**Hardness of Learning**

So, indeed, Laszlo was right — whether or not finding the smallest regular expression (or smallest DFA) consistent with a given set of examples is NP-hard has *a lot* to do with being able to learn regular languages *efficiently*. Note that if we’re *not* concerned with efficiency, exhaustive search through all automata would yield the smallest one. But unfortunately, it is indeed NP-hard to find the smallest DFA (Angluin ’78; Gold ’78), and moreover, it is NP-hard to approximate a target DFA of size OPT by a DFA of size OPT^k for any constant factor k (Pitt and Warmuth ’93).

So does this imply regular languages are hard to learn? Unfortunately not; efficiently finding a small, zero error, DFA would have given us an efficient learning algorithm, but just because we can’t find one doesn’t mean there isn’t some *other* way to learn regular languages. For example, perhaps, we could learn by finding a hypothesis that’s something else entirely, and not a DFA. (For those interested, this is the difference between proper and improper learning.) Just because we’ve eliminated one way of learning doesn’t mean we’ve eliminated other venues.

There is, however, a more serious impediment to learning regular languages. In an important paper, Kearns and Valiant (’94) proved that you can encode the RSA function into a DFA. So, even if the labeled examples come from the uniform distribution, being able to generalize to future examples (also even coming from the uniform distribution) would break the RSA cryptosystem (Rivest et al. ’78), which is quite unlikely. This type of result is known as a cryptographic hardness of learning result. Hence, there is good reason to believe that DFA aren’t efficiently PAC learnable after all.

**A Small Puzzle**

This question raised a lot of interesting discussion among the answers and in the comments, with the pointers to the crpytographic hardness results bringing up a puzzle. It is known that given an automaton, there exist algorithms to minimize it efficiently (Huffman ’54; Moore ’56), meaning it is possible to efficiently find the automaton with the smallest number of states that accepts exactly the same language as the given one. At first, it may seem that contradicts the the NP-hardness result, that one can find the smallest automaton consistent with a given set of examples using the following procedure: construct a large DFA so that for each positive example, we have a chain accepting only that example. Then, minimize that DFA. It seems like this should work, but why doesn’t it?

The answer turns out to be that when you construct a DFA using the procedure above, you commit to a certain language. Then, upon minimizing the DFA, you have to stay within the same language, whereas the smallest DFA consistent with all the examples might encode a different language altogether. One can easily see this by the following example: imagine the only positive examples are “000001” and “000011.” Building the DFA in the manner described above commits to the language {“000001”, “000011”}, whereas the smallest DFA consistent with all examples is just one accept state pointing to itself, and that accepts everything (because we have no negative examples, this is allowed). Puzzle resolved.

**Other Models**

The question didn’t specify what it means to learn, and one great thing about the cstheory site is that different answers were able to address learnability in different models. We’ve just discussed the PAC model and learning regular languages in it, but the picture changes when we consider other models of learning. One model addressed in the answers is Mark Gold’s learning in the limit (’67), the first learning model considered in the literature. Here, the learner is presented an infinite stream of data and must eventually converge to the correct hypothesis. It turns out that in this setting, regular languages are unconditionally not learnable in the limit (Gold ’67) from text (positive examples).

To give a positive result, regular language learning is the canonical problem for another model, learning with membership and equivalence queries. In this model, the learner gets to ask membership queries (“Is string [x] in the language?”) and equivalence queries (“Is my hypothesis correct, and if not, what is a string on which it disagrees with the target?”). In a seminal paper, Dana Angluin (’87), who introduced the model, showed that regular languages are exactly learnable with membership and equivalence queries, and this result helped illustrate the power and relevance of this query model.

**Open Problems**

Because this is a blog post and not a survey, many important results have been left out from this discussion. Nonetheless, it may seem that everything is already known about the learnability of automata. This is far from true. There are many interesting problems left to explore: What happens for specific subclasses of reglar languages? Can we learn when the target language isn’t worst-case, but comes from some distribution? What if we restrict to certain types of algorithms? etc.

But instead of listing many specific questions, we will instead turn to a fascinating (and difficult) question asked on this site which has implications for learning regular languages. Aryeh Kontorovich recently asked: **How many DFAs accept two given strings? ** Or, rather, how hard is it to compute this quantity, as a function of n, the number of states in the DFA. At the time of this post, this question has no accepted answers, so if you think you have an answer, come to cstheory.stackexchange.com and let us know!

**References**

D. Angluin. On the complexity of minimum inference of regular sets. Information and Control, 3(39):337–350, 1978.

D. Angluin. Learning regaular sets from queries and counterexamples. Information and Computation, 75:87–106, 1987.

A. Blumer, A. Ehrenfeucht, D. Haussler, M. K. Warmuth. Occam’s razor. Information Processing Letters 24, 377-380, 1987.

E. M. Gold. Language identification in the limit. Information and Control, 10:447–474, 1967.

E. M. Gold. Complexity of automaton identification from given data. Information and Control, 3(37):302–420, 1978.

D. A. Huﬀman: The Synthesis of Sequential Switching Circuits. Journal of the Franklin Institute 257: 161-190 34, 1954.

E. F. Moore: Gedanken-experiments on sequential machines. Automata Studies. Princeton University Press, 129-15, 1956.

L. Pitt and M. Warmuth. The minimum consistent DFA problem cannot be approximated within any polynomial. Journal of the Association for Computing Machinery, 40(1):95–142, 1993.

R. L. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, Volume 21 Issue 2, Feb. 1978.

L. G. Valiant. A Theory of the Learnable. Communications of the ACM, 1984,

## Happy birthday, cstheory !

Consider the problem UNSHUFFLE:

A shuffle of two strings is formed by interspersing the characters into a new string, keeping the characters of each string in order. For example,`MISSISSIPPI`

is a shuffle of`MISIPP`

and`SSISI`

. Let me call a stringsquareif it is a shuffle of two identical strings. For example,`ABCABDCD`

is square, because it is a shuffle of`ABCD`

and`ABCD`

, but the string`ABCDDCBA`

is not square. Is there a fast algorithm to determine whether a string is square, or is it NP-hard ?

Today is the one year anniversary of this problem, first posed by Jeff Erickson on cstheory on August 16, 2010.

The problem received furious attention on the site, with a number of (sadly erroneous) solutions being proposed. One nontrivial observation was made by Per Austrin, who showed that the problem is easy (by reduction to 2-SAT) if each character occurs at most four times. This problem also spawned a paper by Henshall, Rampersad and Shallit that studies properties of shuffle languages.

There’s another anniversary today – of cstheory itself ! On August 16, 2010, the cstheory stackexchange site came online. In the past year, we’ve racked up 2100+ questions and nearly 5000 users. We elected site moderators, and even started a dedicated blog (that you’re reading right now). We’ve had a number of original proofs produced on the site, and answers covering topics in complexity theory, algorithms and data structures, geometry, quantum computing, and so many more areas. We’re also beginning to see references to the site appear in papers.

It’s been a busy year.

A year on in, one could ask: do we need cstheory ? John Sidles provided one answer:

in the 21st century, mathematical genius is as scarce as ever, and yet fortunately, the ability to “prove what we want” is becoming ever-more-widely distributed. This is in consequence of the confluence of several factors, among which are the ever-increasing volume of mathematical literature, the ever-improving access and searchability of that literature, and the literature’s increasing emphasis upon naturality and universality. And yet, these gains in mathematical volume, access, naturality, and universality aren’t much good without the additional crucial ingredient ofcommunity… and here both cstheory and its sister site Mathoverflow have made a contribution that (to my mind) is absolutely essential and wonderful. Thus (for me), not the sole contribution of cstheory to mathematics, but also not the least important, is the sense of community that cstheory fosters, and the concomitant mathematical confidence that “yes, we can prove what we want”, which cstheory so ably helps distribute among many people (young researchers especially).

Community is important. It’s becoming impossible to keep track of all the results being published. It’s also hard (and time-consuming) to travel to conferences. So a place where you can have ‘in the corridor’ discussions across the globe is very valuable. I think it’s particularly neat that a Ph.D student working in quantum computing can ask a question about whether to publish a result and have Peter Shor (and many others) help him. In fact, there’s now an invaluable and growing body of theoryCS-specific career advice that combines the knowledge and experience of the community, and would be difficult to get any other way.

So what can we look forward to in the coming year ? There are still many researchers out there who either haven’t heard of the site, or are not quite sure what it can do for them. I hope (and your ideas are welcomed) that we can grow the cstheory community to encompass more of the theoretical CS community at large.

I’d like to see ways of integrating this site into other research forums like conferences and workshops. How that might be done is again a topic for discussion (meta question, anyone?). We were hoping to get a formal link between this site and SIGACT, but the powers that be at Stackexchange Inc. appears to have cooled on that idea for now.

But ultimately, we plan to be around for the second anniversary, and the third…

Our goal is to be the site every theoretican visits every day, and even more, the site that every theoretician feels COMPELLED to visit every day.

p.s An appeal: we’re looking for volunteers to write blog posts for the cstheory blog. Posts don’t necessarily have to be about the site: in fact, it might be neat to have conference reports posted here. If you don’t run your own blog (or do), and found yourself berating the theoryCS bloggers (that danged geomblog!) for not writing about your favorite topic, this is your chance ! If you’re interested, sign up by adding your name to this list.

## If a family of forbidden subgraphs is hard, does that imply that the graph class is hard?

The previous graph theory post mentioned the concept of forbidden minors — a class of graphs that contains no minors of a particular type. We can also, of course, consider the notion of *forbidden subgraphs*, in which a graph class $latex \mathcal{C}$ is defined by avoidance of another graph class $latex \mathcal{H}$: graph $latex G$ is a member of $latex \mathcal{C}$ iff $latex G$ contains no subgraph that is a member of $latex \mathcal{H}$. A partial list of graph classes defined in this way can be found here.

Suppose the forbidden subgraph family $latex \mathcal{H}$ is finite. If we want to check whether $latex G \in \mathcal{C}$ “all we have to do” is to confirm that $latex G$ does not contain finitely many subgraphs of a particular type, which can be done in time polynomial in the size of $latex G$. While this has tremendous real-world applications, it also lands us in the world of “nonconstructive polynomial time”: there are graph classes that are provably characterized by finite forbidden families, and yet, we don’t know what the families are. Moreover, there may be no reasonable way to find out, as shown by the following theorem of Fellows and Langston:

Theorem: There is no algorithm to compute, from a finite description of a minor-closed family represented by a Turing machine that accepts precisely the graphs in the family, the set of obstructions for that family.

I will turn now to a question about what happens when the forbidden family is infinite but easy (or hard) to recognize. Nonconstructive polynomial time, and the use of finite forbidden family characterization in graph algorithms, is dealt with in more detail in many places, including Algorithmic Implications of the Graph Minor Theorem, by Bienstock and Langston.

Consider chordal graphs, which are graphs where every cycle of length four or more contains a chord, that is to say, an edge between two vertices that are part of the cycle. So chordal graphs have a simple forbidden graph characterization: no subgraphs that are cycles of size four or more with no chords. This is an infinite forbidden family, but one that is computationally simple to recognize. Of course, not all graph classes are simple to recognize, which motivated Vincius Santos to ask the question on CSTheory: *is there a relationship between the computational complexity of recognizing a forbidden family of subgraphs, and the computational complexity of the graph class defined by the forbidden family?*

Hugo Nobrega gave a remarkable answer to this question, but I wanted to blog about this question both to publicize the answer, and also because the original question remains open. Nobrega’s answer boils down to “probably not,” though it provides a beautiful example that shows any straightforward “intuitions” will not work. Perhaps someone reading this blog knows the answer, or has further insight. If you know any work in this direction, please do leave a comment on the post, or on the original question. Nobrega’s answer is brief enough that I will reproduce it in full (lightly edited).

Although it seems intuitive that the list of forbidden (induced) subgraphs for a class $latex \mathcal{C}$ of graphs which has NP-hard recognition *should* possess some “intrinsic” complexity, I have recently found some striking negative evidence to this intuition in the literature.

Perhaps the simplest to describe is the following, taken from an article by B. Lévêque, D. Lin, F. Maffray and N. Trotignon.

Let $latex \mathcal{F}$ be the family of graphs which are composed of a cycle of length at least four, plus three vertices: two adjacent to the same vertex $latex u$ of the cycle, and one adjacent to a vertex $latex v$ of the cycle, where $latex u$ and $latex v$ are not consecutive in the cycle (and no other edges).

Now let $latex \mathcal{F}^*$ be the family of graphs which are composed exactly the same way, except that you add *four* vertices: two adjacent to the same vertex $latex u$ of the cycle (as before), but now *two* adjacent to the same vertex $latex v$ of the cycle, where again $latex u$ and $latex v$ are not consecutive.

Then the class of graphs which has $latex \mathcal{F}$ as the forbidden induced subgraphs has polynomial-time recognition, whereas the recognition of the class which has $latex \mathcal{F}^*$ as the forbidden induced subgraphs is NP-hard.

Therefore, I find it hard to conceive of any *general* condition that a list of forbidden induced subgraphs has to satisfy when it results in a class with (NP-) hard recognition, considering that such a condition will have to separate the “very similar” $latex \mathcal{F}$ and $latex \mathcal{F}^*$ above.

## Lower bounds by negative adversary method

Are some questions harder than others?

Last time we quantified hardness of answering a question with a quantum computer as the quantum query complexity. We promised that this model would allow us to develop techniques for proving lower bounds. In fact, in this model there are two popular tools: the polynomial method, and the (negative) adversary method.

The original version of the quantum adversary method, was proposed by Ambainis [Amb00]. The method starts by choosing two sets of inputs on which \(f\) takes different values. Then the lower bound is determined by combinatorial properties of the graph of the chosen inputs. Some functions, such as sorting or ordered search, could not be satisfactorily lower-bounded by the unweighted adversary method. Hoyer, Neerbek, and Shi [HNS02] weighted the input pairs and obtained the lower bound by evaluating the spectral norm of the Hilbert matrix. Barnum, Saks, and Szegedy [BSS03] proposed a complex general method and described a special case, the spectral method, which gives a lower bound in terms of spectral norms of an adversary matrix. Ambainis also published a weighted version of his adversary method [Amb03]. He showed that it is stronger than the unweighted method but this method is slightly harder to apply, because it requires one to design a weight scheme, which is a quantum counterpart of the classical hard distribution on inputs. Zhang [Zha05] observed that Ambainis had generalized his oldest method in two independent ways, so he united them, and published a strong weighted adversary method. Finally, Laplante and Magniez [LM04] used Kolmogorov complexity in an unusual way and described a Kolmogorov complexity method. Spalek and Szegedy [SS06] unified the above methods into one equivalent method that can be formulated as a semidefinite program (SDP).

We will present an alternative development observed by Reichardt in terms of optimization problems [Rei11]. Our approach will use SDP duality extensively, but we will not prove the duals explicitly, referring the interested reader to previous literature instead [SS06, Rei09].

## From certificate complexity to the adversary method

The certificate complexity of a function \(f\) is the non-deterministic version of query complexity and can be found by solving the following optimization problem:

\[ \begin{aligned} C(f) & = \min_{\vec{p}_x \in \{0,1\}^n} \max_x ||\vec{p}_x||^2 \\ \text{s.t.} & \sum_{j:x_j \neq y_j} p_x[j]p_y[j] \geq 1 \quad \text{if} \; f(x) \neq f(y) \end{aligned} \]

We think of \(\vec{p}_x\) as a bit-vector with the \(j\)-th component telling us if index \(j\) is in the certificate or not. The constraint is simply the contrapositive of the definition of certificate complexity: if \(f(x) \neq f(y)\) then there must be at least one bit of overlap on the certificates of \(x\) and \(y\) such that they differ on that bit. The discrete nature of the bit-vector makes certificate complexity a difficult optimization, and it is natural to consider a relaxation:

\[ \begin{aligned} Adv^*(f) & = \min_{\vec{p}_x \in \mathbb{R}^n} \max_x ||\vec{p}_x||^2 \\ \text{s.t.} & \sum_{j:x_j \neq y_j} p_x[j]p_y[j] \geq 1 \quad \text{if} \; f(x) \neq f(y) \end{aligned} \]

Note the suggestive name: the above equation corresponds to the dual of the adversary method. Its more common form is the primal version:

\[ \begin{aligned} Adv(f) & = \max_{\Gamma \in \mathbb{R}^{|D| \times |D|}} ||\Gamma|| \\ \text{s.t.} & \forall j \quad ||\Gamma \cdot \sum_{x,y:x_j \neq y_j} |x\rangle\langle y | || \leq 1 \\ & \Gamma[x,y] = 0 \quad \text{if} \; f(x) = f(y) \\ & \Gamma[x,y] \geq 0 \end{aligned} \]

Since the adversary bound is a relaxation of the certificate complexity, we know that \(Adv(f) \leq C(f)\) for all \(f\). In fact, two even stronger barrier stands in the way of the adversary method: the certificate and property testing barriers. The certificate barrier is that for all functions \(f\), \(Adv(f) \leq \min { \sqrt{C_0(f)n}, \sqrt{C_1(f)n} }\), and if \(f\) is total, then we have \(Adv(f) \leq \sqrt{C_0(f) C_1(f)}\) [Zha05, SS06]. The property testing barrier is for partial functions, where every zero-input is of Hamming distance at least \(\epsilon n\) from every one-input. In that case the method does not yield a lower bound better than \(1/\epsilon\).

## Negative adversary method

To overcome these barriers, Hoyer, Lee, and Spalek [HLS07] introduced the negative adversary method. The method is a relaxation of the adversary method that allows negative weights in the adversary matrix:

\[ \begin{aligned} Adv^{\pm}(f) & = \max_{\Gamma \in \mathbb{R}^{|D| \times |D|}} ||\Gamma|| \\ \text{s.t.} & \forall j \quad ||\Gamma \cdot \sum_{x,y:x_j \neq y_j} |x\rangle\langle y | || \leq 1 \\ & \Gamma[x,y] = 0 \quad \text{if} \; f(x) = f(y) \end{aligned} \]

Note that we started with certificate complexity and relaxed that to get the dual of the adversary method. We took the dual to get the adversary method, and then relaxed the adversary method to get the negative adversary method. Since we relaxed in both directions in two different ways, the negative adversary method is not clearly related to the certificate complexity.

We can simplify this idea with the above picture. The certificate complexity involved a minimization over the integers, so its feasible region is above the boundary drawn by the red line. The optimization problem for \(C(f)\) tries to find the lowest point in the feasible region. The adversary method replaces the integer constraint by real values, and is represented by the blue line. Since the optimization of \(Adv^*(f)\) is still a minimization, this allows us to find lower points in the bigger feasible region. Since the goal of the method is to provide a lower bound, we instead look at \(Adv(f)\) as a maximization problem (with boundary in orange), now the feasible region is below the orange line and any feasible point provides a lower bound. Note that by duality this lower bound is always less than or equal to a feasible point in \(Adv^*(f)\)’s region and thus less than or equal to the certificate complexity. The negative adversary method’s feasible region is represented by the black line. It takes the maximization in \(Adv(f)\) and removes the non-negativity constraint, expanding the feasible region and thus making it possible to find points with \(Adv^\pm(f) > Adv(f)\). Finding natural problems where we have a separation though, has proved to be difficult. In particular, for important problems beyond the certificate complexity barrier (like the collision problem) our best lower bounds come from the polynomial method, and no one has found an approach that uses the negative adversary method directly. Do you know a natural \(f\) such that \(Adv^\pm(f) > Adv(f)\) or better yet, such that \(Adv^\pm(f) > C(f)\)?

## Query complexity and the negative adversary method

Of course, without context the method is not very useful. It is the fact that it lower bounds quantum query complexity that first made it a useful tool. Given a function \(f: D \subseteq \{0,1\}^n \rightarrow \{0,1\}\), we have:

\[ Q_\epsilon(f) \geq \frac{1 – 2\sqrt{\epsilon(1 – \epsilon)}}{2} Adv^{\pm}(f) \]

To prove this relation, we think of an adversary holding a quantum superposition \(|\delta\rangle\) over oracle strings instead of the oracle representing a specific input. This \(|\delta\rangle\) is found by the negative adversary method, and corresponds to any eigenvalue of \(\Gamma\) with eigenvalue \(||\Gamma||\). Thus, any algorithm for solving the problem starts with the initial state (refer to the query complexity post for background on notation):

\[ |\Psi_0\rangle = \delta_x |x\rangle_I \otimes |1,0\rangle_Q |0\rangle_W \]

We then consider the reduced density matrix \(\rho_t = Tr_I |\Psi_t\rangle\langle \Psi_t |\) that the adversary holds. This state begins with no entanglement, and in order to learn \(x\) we must create a lot of entanglement in the state. Specifically, we define a progress function \(W^t = \langle \Gamma, \rho_t \rangle\) and show 3 points:

- \(W^0 = ||\Gamma||\).
- \(W^t – W^{t+1} \leq 2 \max_j ||\Gamma \cdot \sum_{x,y:x_j \neq y_j} |x\rangle\langle y | ||\)
- \(W^T = 2\sqrt{\epsilon(1 – \epsilon}||\Gamma||\)

The first step is obvious from our choice of the adversary’s initial state and progress function. The second step is an application of Cauchy–Schwarz and triangle inequalities. The third-step demands that the algorithm give the correct output: \(||\Pi_{f(x)} |\psi_x^T\rangle ||^2 \geq 1 – \epsilon\). This is the most difficult step, and the proof goes through a few special choices of operators and an application of Holder’s inequality. We reference the reader to the original paper [HLS07] for details.

## Where do we go from here?

Does the negative adversary method answer if some questions are harder than others? Next time we will see that by looking at span program we can show that the negative adversary method characterizes query complexity: \(Q(f) = \Theta(Adv^\pm(f))\). Thus, the method is a great way to show that some questions are more difficult than others. It also behaves well with respect to iterated functions. In particular, to break the certificate complexity barrier, we can iterate the Kushilevitz-Ambainus partial function \(f\) on 6 bits, defined as:

- Zero inputs of \(f\) are: \( 111000, 011100, 001110, 100110, 110010, 101001, 010101, 001011, 100101, 010011\),
- One inputs of \(f\) are: \( 000111, 100011, 110001, 011001, 001101, 010110, 101010, 110100, 011010, 101100\).

We leave it as an exercise to the eager reader to use their favorite SDP solver to show that \( Adv^\pm(f) \geq 2 + 3\sqrt{5}/5 > C(f) = 3\).

## References

[Amb00] A. Ambainis. Quantum lower bounds by quantum arguments. In Proceedings of the thirty-second annual ACM symposium on Theory of computing, pages 636-643. ACM, 2000, arXiv:quant-ph/0002066v1.

[Amb03] A. Ambainis. Polynomial degree vs. quantum query complexity. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on, pages 230{239. IEEE, 2003, arXiv:quant-ph/0305028.

[BSS03] H. Barnum, M. Saks, and M. Szegedy. Quantum decision trees and semidefinite programming. In Proc. of 18th IEEE Complexity, pages 179-193, 2003.

[HLS07] Peter Hoyer, Troy Lee, and Robert Spalek. Negative weights make adversaries stronger. 2007, arXiv:quant-ph/0611054v2.

[HNS02] P. Hoyer, J. Neerbek, and Y. Shi. Quantum complexities of ordered searching, sorting, and element distinctness. Algorithmica, 34(4):429-448, 2002, arXiv:quant-ph/0102078.

[LM04] S. Laplante and F. Magniez. Lower bounds for randomized and quantum query complexity using Kolmogorov arguments. 2004, arXiv:quant-ph/0311189.

[Rei09] Ben W. Reichardt. Span programs and quantum query complexity: The general adversary bound is nearly tight for every boolean function. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 544-551. IEEE, 2009, arXiv:0904.2759v1 [quant-ph].

[Rei11] Ben W. Reichardt. Quantum query complexity. Tutorial at QIP2011 Singapore, 2011.

[SS06] Robert Spalek and Mario Szegedy. All quantum adversary methods are equivalent. Theory of Computing, 2:1-18, 2006, arXiv:quant-ph/0409116v3.

[Zha05] S. Zhang. On the power of Ambainis lower bounds. Theoretical Computer Science, 339(2-3):241-256, 2005, arXiv:quant-ph/0311060.