Mathematical Programming Glossary

ABS algorithm. This is a family of algorithms to solve a linear system of m equations, Ax=b, with n variables such that n >= m. Its distinguishing property is that the k-th iterate, x^k, is a solution to the first k equations. While algorithms in this class date back to 1934, the family was recognized and formally developed by J. Abaffy, C.G. Broyden and E. Spedicato, so the algorithm family name bears their initials. The ABS algorithm is given by the following.

Notation: A_i is the i-th row of A.

Initialize: choose x⁰ in Rⁿ, H¹ in R^n×n and nonsingular; set i=1.

Set search direction dⁱ = Hⁱz for any z that does not satisfy z'HⁱA_i=0. Compute residuals: r = Axⁱ-b.
Iterate: xⁱ⁺¹ = xⁱ - s dⁱ, where s is the step size: s = r_i / A_i dⁱ.
Test for convergance (update residuals, if used in test): Is solution within tolerance? If so, stop; else, continue.
Update Hⁱ⁺¹ = Hⁱ - Dⁱ, where Dⁱ is a rank 1 update of the form HⁱA_iwHⁱ' for w such that wHⁱA_i = 1.
Increment i and go to step 1.

The ABS algorithm extends to nonlinear equations and to linear diophantine equations.

Abstract program. This is a mathematical program defined on an abstract vector space (generally a Banach space). It unifies the ordinary mathematical program with control theory.

Active constraint. An inequality constraint that holds with equality (at a point). Same as tight, but not to be confused with binding. The set of all active constraints generally includes equality constraints too.

Active set method. This proceeds by partitioning inequality constraints into two sets: active (or sufficiently close to be deemed active for this iteration) and inactive. The inactive constraints are ignored for the iteration. The active set for this iteration is sometimes called the working set. Then, the new point is selected by moving on the surface defined by the working set (called the working surface).

Activity analysis. This is an approach to micro-economics, where a system is composed of elementary functions, called activities, which influenced the early growth of linear programming. When using the LP form: Min{cx: x >= 0, Ax >= b}, an insightful way to view an activity is that negative coefficients represent input requirements and positive coefficients represent outputs:

             _____
            | c_j | <--- cost or revenue
            |-----|
            |  -  | <--- input to activity j
            |     |
            |-----|
            |  +  | <--- output from activity j
            |_____|

In general, the reduced cost then represents the net worth of the activity for the prices, p:

d_j = c_j – pA_j = input cost – output revenue.

This leads to an economic interpretation not only of LP, but also of the simplex method: agents (activity owners) respond instantaneously to changes in prices, and the activity with the greatest net revenue wins a bid to become active (basic), thus changing the prices for the next time (iteration).

In this context, activities can be regarded as transformations, from inputs to outputs. Three prevalent transformations are: form, place and time. Here are examples:

        blend               transport           inventory
        _____                 _____               _____
       |  c  |               |  c  |             |  c  |           
       |-----|               |-----|             |-----|
       |-.4  | apples        | -1  | Chicago     | -1  | t
       |-.6  | cranberries   |-----|             |-----|
       |-----|               | .6  | Denver      |  1  | t+1
       | 1   | cranapple     | .4  | Seattle     |     |
       |_____|               |_____|             |_____|

      Transformation        Transformation      Transformation
      of form               of place            of time

Acyclic. Without cycles...arises in graphs and networks, as well as in the context of algorithms (e.g., not cycling back to some previous solution).

Adjacency matrix. The n by n binary matrix, say A, to represent node adjacency in a simple graph: A(i,j)=1 if node i is adjacent to node j. In the undirected case, A is symmetric. In the directed case, the adjacency means there is an arc from i to j. In the case of a multigraph, A(i, j) = number of edges whose endpoints are nodes i and j.

Adjacent basis. Two bases are adjacent if they differ by exactly one column (so one is reachable from the other by a pivot).

Adjoint. The classical adjoint of a matrix A is its transpose matrix of cofactors: Adj(A)_ij = (–1)^(i+j)det(A^(j,i)), where A^(j,i) is the transpose of A with j-th column of A and i-th row of A deleted. The Hermitian adjoint, A*, is the transpose of the conjugate. The latter is generally what is meant by the adjoint in most contexts, and we simply have A*=A' when A is real-valued.

Admissible. Generically, this means something satisfies conditions. One example is the admissibility of a direction of change of some parameter, such as a right-hand side of a linear program (LP) from b to b+th, where t is some scalar and h is the direction of change. For a positive value of t, this might cause the LP to become infeasible, no matter how small t is chosen. In that case it is common to call the direction vector inadmissible. For a perturbation to be admissible in this context, it is required that the mathematical program have a solution in some neighborhood of that direction. For the example, h is admissible if the (primal) LP is feasible for all t in [0, t*) for some t* > 0.

Admissibility need not be for directions of change in the context of sensitivity analysis. The term is used elsewhere and defined for the context in which it is used. For example, a primal ascent algorithm could define a direction, say d, to be admissible if x+td is feasible and f(x+td) > f(x) for all t in [0, t*) for some t* > 0.

Advanced basis. Using a basis to begin the simplex method, other than simply all logical variables. The advanced basis could be optimal for some parameter values, but those values have changed. It is usually better to start with an advanced basis because it will generally require fewer iterations to reach the new optimal basis.

Affine combination. A linear combination of vectors, Sum_k{a_k x^k}, whose coefficients sum to 1 -- i.e., Sum_k{a_k} = 1.

Affine function. Let f:X-->R^n. Then, f is affine if X is a convex set and

f(ax + (1-a)y) = af(x) + (1-a)f(y) for all x, y in X and a in [0, 1].

Equivalently, f is affine if it is both convex and concave. Moreover, if X=R^n, f is the translation of a linear function: ax+b.

Affine hull of a set, S. The intersection of all affine sets containing S. Equivalently, the set of all affine combinations of points in the set.

Affine independence. Vectors {x^0, ..., x^k} are affinely independent if their affine hull is k-dimensional. Equivalently, {x^1-x^0, ..., x^k-x^0} are linearly independent. For n=2, any two distinct vectors are affinely independent (k=1). Three vectors need to form a triangle; otherwise, they are co-linear, and they are affinely dependent.

Affine scaling. A variant of Karmarkar's algorithm for the linear program in standard form: Min{cx: x >= 0, Ax = b}, where A has full row rank.

primal method

Given x > 0, let X = diag(x_j).
Estimate dual: y = [AX²A']^–1 A[X²]c' and d = c - y'A.
Move: x' = x - aX²d'/Max{d_j x_j}, where a is a parameter in (0, 1).

Multiplication of A and c by X is a scaling operation, using current levels as scale values (i.e., AX multiplies column j of A by x_j, and Xc' multiplies c_j by x_j). Note that the reduced cost (d) is in the null space of A (i.e., Ad = 0), so d is a feasible direction. This affine scaling direction is used in its own right for interior methods. It minimizes the duality gap subject to Ax=b and x in an ellipsoid (replacing x >= 0).

Affine set (or affine manifold, affine variety, linear variety, flat.) One that contains the line through any two of its points; i.e., x, y in S implies ax + (1-a)y is in S for all real values, a. The dimension of an affine set is that of the (unique) subspace parallel to it. Dimensions of 0, 1 and 2 are called points, lines and planes, resp.

Aggregation. This is combining units to reduce problem dimensions. One form of aggregation is to combine basic entities at the data level, such as regions, time periods, and materials. The dimensions of the mathematical program, which include numbers of variables and constraints, are generally reduced by aggregation at the entity level. Another form of aggregation is the use of surrogates, including the strong form: integer equivalent aggregation.

AIMMS. Advanced Integrated Multidimensional Modeling Software.

Algorithm. An iterative method that generates a sequence of the form x^(k+1) = S_k(A_k(x^k)), where x^0 is given (called the initial point); A_k is an algorithm map that yields a set of policies, given the current point, x^k; and S_k is a selection function (in case A_k has more than one policy in it). Note that the algorithm map and selection function can depend on the iteration number (k). A concrete example is of the form x^(k+1) = x^k + s_k d^k, where s_k is a scalar parameter, called the step size and d^k is a vector, called the direction of change. The step size may require a selection from a line search solution (if there are many optimal solutions); one is to select the least value (perhaps above some threshold). In practice, algorithms have a variety of tolerances to control its computer representation, progression and termination.

A stochastic algorithm is one whose algorithm map is a random variable. Unlike the meaning in computer science, most people in mathematical programming mean an algorithm whereby an iterate is selected by some random mechanism. One example is simulated annealing. When x^k is a random variable, there is a meaning to convergence that differs from ordinary (deterministic) sequences.

An online algorithm is one that obtains a solution for an online problem. Typically, this is a heuristic that obtains a "good" solution because there is not enough time to guarantee optimality.

Almost complementary. A pair of vectors, w, z in R^n+, such that each coordinate, except one, satisfies complementary slackness.

Alternative systems. Two systems of inequalities and/or equations such that there exists a solution to one system, or there exists a solution to the other system, and there cannot exist a solution to both systems. There are particular alternative systems widely used in mathematical programming, especially for duality theorems.

AMPL . A Mathematical Programming Language.

Analytic center. Given the set, {x in X: g(x) >= 0}, which we assume is non-empty and compact, such that g is concave on X, with a non-empty strict interior, its analytic center is the (unique) solution to the maximum entropy:

Max $Sum_i{ln{g_i(x)}: x in X, g(x) > 0$ .

Note that the analytic center depends on how the set is defined -- i.e., the nature of g, rather than just the set, itself. For example, consider the analytic center of the box, [0,1]^n. One form is to have 2n functions as: {x: x >= 0 and 1-x >= 0}. In this case, the analytic center is x*_j=y* for all j, where y* is the solution to:

Max ln(y) + ln(1 - y): 0 < y < 1.

Since y* = 1/2, the analytical center is what we usually think of as the center of the box. However, the upper bounds could also be defined by 1-(x_j)^p >= 0 for all j, where p > 1 (so 1-(x_j)^p is concave). This changes the functional definition, but not the set -- it's still the unit box. The analytic center becomes skewed towards the corner because now the defining mathematical program is:

Max ln(y) + ln(1 - y^p): 0 < y < 1.

The solution is y* = [1/(1+p)]^(1/p), so the analytic center approaches (1,1,...,1) as p-->inf.

ANALYZE. A computer-assisted analysis system to analyze mathematical programs, including an artificially intelligent level of support.

Ant colony optimization. A heuristic search approach to combinatorial optimization based on the behavior of ant colonies, particularly their ability to determine shortest paths collectively. (See the ACO page for more information.)

Anticycling rule. A rule that prevents cycling, such as Bland's rule for linear programming.

Approximation algorithm. This reaches a feasible solution to an NP-complete problem in polynomial time with a guaranteed bound on the quality of the solution. For minimization, with Z* = optimal value > 0, the bound generally has the form:

Z <= rZ* for some r > 1,

where Z is the value of the feasible solution.

Arborescent sets. Any two sets are either disjoint, or one is a proper subset of the other.

Argmax. This is the optimality region for maximization:

argmax{f(x): x in X} = {x* in X: f(x*) >= f(x) for all x in X}.

(It is the ``argument of the maximum.'')

Argmin. This is the optimality region for minimization:

argmin{f(x): x in X} = {x* in X: f(x*) <= f(x) for all x in X}.

(It is the ``argument of the minimum.'')

Artificial variable. A variable, say v, added to an equation, h(x) = 0. The resulting system, h(x) + v = 0, is feasible upon letting v = -h(x) for a chosen x. Then, the objective function is modified to penalize nonzero values of v. Often, v >= 0 is required, multiplying h by -1, if necessary, to get started. This grew from linear programming, where a Phase I objective is used to find a solution with v=0 (or ascertain that the original system has no feasible solution) by minimizing Sum_i{v_i} (ignoring the original objective, cx).

Assembly line balancing problem. This is a combinatorial optimization problem. An assembly line consists of m work stations, connected by a conveyor belt. Each station performs a set of tasks that takes a cycle time, c. Each task must be assigned to exactly one station such that precedence constraints are satisfied. One model is to minimize the total cycle time, given the number of stations. Another is to minimize the number of stations, given the maximum allowable cycle time. There are variants of this simple form.

Assignment polytope. {X in R^(n*n): X >= 0, Sum_i{X(i, j)} = 1 for all j, Sum_j{X(i, j)} = 1 for all i}. The name stems from a meaning of its extreme points in the assignment problem: If X(i, j) = 1, assign person i to task j. Each extreme point has the property that each element of X is 0 or 1. The sums in the polytope then require that each row (i) and each column (j) has exactly one 1 in it. This means every person is assigned to some task, and every task is assigned to be done by one person. The polytope is also the set of doubly stochastic matrices, whose extreme points are the permutation matrices.

Assignment problem. Choose an assignment of people to jobs so as to minimize total cost. The ordinary model is that of a matching problem. Although the usual assignment problem is solvable in polynomial time (as a linear program), important extensions are the following NP-complete problems:

Quadratic assignment (QAP). The objective is a quadratic form, which can have just cross terms, so it need not be convex (in fact, it can be quasiconcave, in which case a solution occurs at an extreme point). The QAP includes the traveling salesman problem as a special case (this is what is used with a neural network model of the TSP).

Multi-dimensional assignment. More than 2 indexes, the variables are of the form, X(i, j, k, ...). The constraints sum all but one index -- for example, the 3-index assignment problem is given by:

Min Sum _i,j,k{C(i, j, k) X(i, j, k)}: X >= 0;

Sum _j,k{X(i, j, k)} = 1 for all i;

Sum _i,k{X(i, j, k)} = 1 for all j;

Sum _i,j{X(i, j, k)} = 1 for all k.

Asymptotic Linear Programming. A linear program in which the coefficients are functions of a single parameter, usually denoting time. (Some authors require the functions to be rational – i.e., of the form p(t)/q(t), where p and q are polynomials.) The problem is to find a steady state solution – i.e., one that is optimal (or nearly optimal) for sufficiently large values of the time parameter.

Asymptotic stability. The system x^(k+1) = A(x^k) is called asymptotically stable in the large if, given any initial point x^0, {x^k}-->0 = A(0). Originally developed for differential (and difference) equations, this applies to a sequence generated by an algorithm.

Auction algorithm. See bidding algorithm.

Augmented Lagrangian. The Lagrangian augmented by a term that retains the stationary properties of a solution to the original mathematical program but alters the hessian in the subspace defined by the active set of constraints. The added term is sometimes called a penalty term, as it decreases the value of the augmented Lagrangian for x off the surface defined by the active constraints. For example, the penalty term could be proportional to the squared norm of the active constraints:

L_a(x, y) = L(x, y) + p G(x)'G(x),

where p is a parameter (negative for maximization), G(x) is the vector of active constraints (at x), and L is the usual Lagrangian. In this case, suppose x* is a Kuhn-Tucker point with multiplier y*. Then, L_a(x*, y*) = L(x*, y*) (since G(x*)=0). Further, the gradient of the penalty term with respect to x is 0 at x*, and the hessian of the penalty term is simply p A(x*)'A(x*), where A(x*) is the matrix whose rows are the gradients of the active constraints. Since A(x*)'A(x*) is positive semi-definite, the penalty term has the effect of increasing the eigenvalues (decreasing them if maximizing).

Augmenting path. In network flows, it is a path from a source (s) to sink (t) such that:

every forward arc has flow less than capacity: d(k) := c(k) - f(k);

every backward arc has positive flow: d(k) := f(k);

Then, the flow from s to t can increase by d* := Min{d(k)} > 0. The flows along the path change as:

f '(k) = f(k) + d*, if k is a forward arc; f '(k) = f(k) - d*, if k is a backward arc.

Automatic differentiation. A process for evaluating derivatives of a function that depends only on an algorithmic specification of the function, such as a computer program. (The term "automatic" stems from having the derivatives produced when executing the program to evaluate the function.)

forward - direct parse and evaluation, introducing intermediate variables in the usual way, as rules of differentiation are applied.
reverse - evaluates entries in the code list first, and lastly differentiates the intermediate and independent variables in reverse order.

While the forward mode is more straightforward and easier to implement, the reverse mode is computationally superior for scalar functions, as its complexity is independent of the number of intermediate variables. More generally, if F:R^n-->R^m, the forward mode has complexity of O(n), and the backward mode has complexity of O(m).