Mathematical Programming Glossary

Mathematical Programming Glossary - L

Label correcting algorithm. Arises in labeling algorithms for shortest path problem. Each iteration a label is set to an estimate of the shortest path from a given node. All labels become exact values at termination.

Label setting algorithm. Arises in labeling algorithms for shortest path problem. Each iteration a label becomes the actual shortest path from some node. (Termination occurs when the destination node(s) are permanently lablelled.)

Labeling algorithm. A class of algorithms for path-related problems on graphs and networks. To illustrate, consider a network [V,A] with arc values c.

max flow

Input. source (s), destination (t), capacities (c).

Output. max flow from s to t.

Initialize all flows to be zero and assign the label (-,inf) to the source. All other nodes are unlabeled, and all nodes (including the source) are unscanned.
Labeling process: Select any labeled, unscanned node, say k with label (a,b). If node j is an unlabeled neighbor and x(k, j) < c(k, j), assign the label (k+,b*) to node j, where b* = Min{b, c(k,j)-x(k,j)}. If node j is an unlabeled neighbor and x(j, k) > 0, assign the label (k-,b*) to node j, where b* = Min{b, x(j,k)}. (In either case, define node j as labeled, but unscanned, while node k is now labeled and scanned.) Repeat this until either the sink is labeled and unscanned, or until no more labels can be assigned. In the latter case terminate; in the former case, go to the flow change step.
Flow change: Let the sink (t) have label (a,b). If a is k+, replace x(k,t) by x(k,t)+b; if it is (k-,b), replace x(k,t) by x(k,t)-b. In either case, apply the flow change (b or -b) along the path back to the source. The result is a new flow that is b more than it was. Erase labels, except on the source, and go to the labeling process.
(See technical note, with an example.)

shortest path

Input. source (s), destinations (T not Ø), distances (c).

Output. shortest path from s to each node in T.

Initialize labels: L(v)=inf for v not= s and L(s)=0. Set S=Ø.
Label: select u in argmin{L(v): v in V\S} (terminate if S=V). Then, update S := S\/{u}, and for v in V\S: if L(u) + c(u,v) < L(v), set L(v) := L(u) + c(u,v).

Terminate when T is a subset of S (i.e., all destination nodes are labeled with their shortest distance from node s).

shortest path

Input. destination (t), distances (c).

Output. shortest path from each node to t.

Initialize labels, L(v) = estimate of shortest path length from node v to node t, with L(t) = 0.
Label: Find an arc, say (i, j), that violates the distance equations, say L(j) > L(i) + c(i, j). If none, terminate; otherwise, update the label: L(j) = L(i) + c(i, j).

The labeling step is repeated until there are no violations. At termination, L(j) = length of shortest path from node j to node t.

Another type of labeling algorithm occurs in simplicial subdivision, used to compute a fixed point that represents an economic equilibrium.

Lagrange conditions. The optimality equation (plus feasibility conditions) in Lagrange's multiplier theorem. Here is biography of Lagrange.

Lagrange Multiplier Rule (LMR). From the extension to Lagrange's multiplier theorem:

Suppose x* is in argmax{f(x): g(x) <= 0, h(x) = 0}, where f, g, h are in C^1. Then, there exist multipliers (u, v) for which the following conditions hold:

grad_x_[f(x*) - ug(x*) - vh(x*)] = 0;
u >= 0;
ug(x*)=0.

Since g(x*) <= 0, the last condition, given u >= 0, is equivalent to complementary slackness. These are considered first-order optimality conditions, though the Lagrange Multiplier Rule is not always valid -- see constraint qualifications.

Lagrange's multiplier theorem. Let f, h be in C^1 and suppose grad_h(x*) has full row rank. Then, x* is in argmax{f(x): h(x)=0} only if there exists v in R^m such that:

grad_f(x*) - v grad_h(x*) = 0.

v_i is called the Lagrange multiplier associated with the i-th constraint (h_i(x)=0). Extensions to remove the rank condition and/or allow inequality constraints were by Carathéodory, John, Karush, and Kuhn & Tucker. Also see the Lagrange Multiplier Rule.

Lagrangian. For the mathematical program in standard form, the Lagrangian is the function:

L(x, u, v) = f(x) - u g(x) - v h(x) for x in X and u >= 0.

Note that the Lagrange Multiplier Rule can be written as the first-order conditions for (x*, u*,v*) to be a saddlepoint of L. In Lagrange's multiplier theorem (where X=R^n and g is vacuous), this is simply that grad_L(x*,v*)=0, which could be any type of stationary point.

A more general Lagrangian can have nonlinear functions, U(g) and V(h), instead of the linear forms, ug and vh, respectively. In this form, U is monotonic non-decreasing (u >= 0 in the linear case), and the extension of complementary slackness is that U(g(x)) = U(0). This is a type of generalized penalty function.

Lagrangian duality. See it in the list of particular duals.

Lagrangian relaxation. See Generalized Lagrange Multiplier method.

Lattice. A set S that contains x /\ y and x \/ y for all x, y in S.

Lattice program. Minimizing a subadditive function on a lattice.

Lattice search. This finds the maximum of a unimodal function on a finite set, {x1, x2, ..., xm}, by evaluating the function at points placed by the modified fibonacci sequence: K_(n+2) = K_(n+1) + K_n + 1. This modification comes from the improvement obtained by dropping the inferior end point after an evaluation: if the set is in [a, b] and f(x) < f(y), the new interval is [x+1, b] dropping the point x from the remaining search set.

The method proceeds the same as fibonacci search, except the placements follow the modified fibonacci sequence, which is equivalent to K_n = F_(n+1) - 1, reflecting the improvement that accumulates as n increases, for the extra point dropped after each evaluation. The placement ratio, K_(n-1)/K_n, also approaches the golden mean, so lattice search also approaches the golden section search.

Some refer to fibonacci search when they mean lattice search. The following table shows the difference. For example, with 20 evaluations, fibonacci search can guarantee finding the maximum on a set of 10,946 points, while lattice search can find it on a set of 17,710 points. Golden section is included for comparison as well. With 20 evaluations it can find the maximum on a set of only 9,349 points.

           n        5     10     15     20
       ===================================
           F_n      8     89    987  10946
           K_n     12    143   1596  17710
        golden      6     76    843   9349 
       section
       ===================================

Lattice search minimizes the max number of evaluations needed to find the maximum on a given set of points. If the number in that set is K_N, the number of evaluations is N. If the number in the set is not exactly some modified fibonacci number, one can fill in dummy points at the end of the set. For example, if K_(N-1) < m < K_N, let the set be {x1, ..., xm, y1, ..., yk}, where k = K_N - m. Then, the (minimum) number of evaluations to guarantee finding the maximum of any unimodal function is N.

Level set. For f:X–>R, the c-level set is {x: x in X, f(x) <= c}. Usually, the 0-level set is called simply the level set of f.

Levenberg-Marquardt algorithm. This is designed for solving nonlinear programs using the equation:

(H_f(x) + pI)d = –grad_f(x),

where H_f is the hessian. For unconstrained optimization, the solution d serves as a direction vector for the iteration. This is also used in a trust region method. The parameter p is set to give a balance between Newton's method (p=0) and Cauchy's steepest descent (p >> 0). A low value of p helps get through difficult landscape curvature, and a high value yields some descent.

Lexicographic order. A nonzero vector is lexicographically positive if its first non-zero coordinate is positive. The vector x is lexicographically greater than the vector y if x-y is lexicographically positive, and this defines the lexicographic order in R^n. This is a total ordering in that every two vectors are either equal, or one is lexicographically greater than the other.

This was first used in mathematical programming to resolve cycling in the simplex method (for LP). It also provides a way to obtain solutions for multiple objectives with the property that x is a Pareto maximum if f(x) is lexicographically greater than or equal to f(y) for all feasible y.

Lifting. In integer programming, it is creating a valid inequality for the original set, given one for a lower dimensional set, by augmenting a binary variable that is absent from the original inequality. That is, suppose S is a subset of {0,1}^n and ax <= b is a valid inequality for S/\{x in {0,1}^n: x(1)=0} (where it does not matter what a(1) is since x(1)=0). Then, under certain conditions, the inequality is valid for S. The "conditions" are problem dependent, and a(1) is set as large as possible. For example, suppose x(2) + x(3) + x(4) <= 2 is a valid inequality for S. A lifting is the strengthening, 2x(1) + x(2) + x(3) + x(4) <= 2, if it can be ascertainted that this is a valid inequality for S. The coeffiecient of x(1) (2 in the example) is determined by considering how large a(1) can be with x(1)=1. (In this case, the lifted inequality is valid if x(2) + x(3) + x(4) <= 2 is valid for {x in S: x(1)=0} and if x(1)=1 implies x(2) = x(3) = x(4) = 0.) The motivation for doing this arises in a branch and cut algorithm strategy. At a node in the search tree we might have a valid inequality, ax <= b, when the branch had x(1)=0, and we want to make it valid even if x(1)=1. The inequality is valid for all a(1) <= b - max{a(2)x(2) +...+ a(n)x(n): x feasible and x(1)=1}.

LINDO. A solver using a simplified problem specification for Linear, Integer, Nonlinear and Dynamic optimization.

Line search. Optimizing the objective along a line: Opt{f(x+sd): s >= 0}, where x is a given point and d is a given direction vector (s is a scalar). A coarse tolerance could be used, so "Opt" is not entirely accurate. (In fact, there are reasons to believe not optimizing is best.)

Here are some particular search methods in this glossary:

Dichotomous
Fibonacci
Golden section
Lattice

Line segment of end points x and y in R^n. {ax + (1-a)y: a in [0, 1]}, which is denoted [x, y] and is the closed line segment. The open line segment is (x, y) = {ax + (1-a)y: a in (0, 1)}.

Linear combination. The vector y is a linear combination of vectors x^1, x^2, ..., x^m if y = Sum_k{a_k x^k}, where a_k is called the coefficient of x^k.

Linear convergence. See convergence.

Linear program (LP). Opt{cx: Ax = b, x >= 0}. (Other forms of the constraints are possible, such as Ax <= b.) The standard form assumes A has full row rank. Computer systems ensure this by having a logical variable (y) augmented, so the form appears as Opt{cx: Ax + y = b, L <= (x, y) <= U} (also allowing general bounds on the variables). The original variables (x) are called structural. Note that each logical variable can be a slack, surplus, or artificial variable, depending on the form of the original constraint. This computer form also represents a range constraint with simple bounds on the logical variable. Some bounds can be infinite (i.e., absent), and a free variable (logical or structural) is when both of its bounds are infinite.

Linearity interval. When a univariate function is piece-wise linear, it has the form a(i)x + b(i) for x in the interval [c(i), c(i+1)], where a(i) is not equal to a(i+1). (The phrase usually means the function is continuous.) This arises in linear programming when considering the optimal value as a function of varying right-hand sides (or cost coefficients) in fixed proportions: b+td (or c+td), where d is an admissible change vector and t is the (scalar) variable. Then, for the range of t where the LP has a solution, the optimal value function has this piece-wise linear (continuous) form, and the intervals that comprise the domain are called linearity intervals.

LINGO. A modeling language companion to LINDO.

Lipschitz continuous (of a function, f:X–>R). There exists a value K, called the Lipschitz constant, such that |f(x) - f(y)| <= K||x-y|| for all x, y in X. This relation is called the Lipschitz condition. It is stronger than continuity because it limits the slope to be within [–K, K]. The generalized Lipschitz condition is that there exists a monotonically increasing function, m:R–>R with the property that m(z)–>0 as z–>0, such that there exists K: |f(x)-f(y)| <= Km(||x-y||) for all x, y in X. Here is biography of Lipschitz.

Local convergence. See convergence.

Local optimum. A feasible point x* that is an optimal solution to the mathematical program whose feasible region is the intersection of the original region and some neighborhood of x*.

Locally convex function. There exists a subset of the domain, such as the neighborhood of a point, such that the function is convex on that subset.

Location problem. See facility location problem.

Lockbox problem. A firm wants a check that it receives to clear as quickly as possible (so it can use the money). To reduce the number of days to clear a check, the firm opens offices, called lockboxes, in different cities to handle the checks. There is a cost to open a lockbox, and there are average annual losses calculated from interest rates and the number of days to clear a check sent from source i to lockbox city j. Here is an integer programming model of this problem.

Logical variable. See linear program.

Lot size problem. This is one of the oldest mixed-integer programs in operations research, first presented by Wagner and Whitin in 1958. The problem is to minimize cost while satisfying product demands over (discrete) time. Let y_t be the number of units produced in period t, for t=1,...,T (T is called the planning horizon), and let x_t = 1 if a setup occurs in period t; 0 otherwise. Let D_t be the demand in period t, and let the demand from period i to period j, inclusive, be d_ij := sum_i<=t<=j D_t. Then, a MILP formulation is:

min cx + dy: x in {0,1}ⁿ, y >= 0,

sum_t<=i y_t >= d_1i for i=1,...,n-1

sum_t<=n y_t = d_1n,

d_inx_i - y_i >= 0 for i=1,...,n.

(This is preferred over the MILP that defines inventory balance equations, Even though they are equivalent, the LP relaxation of the above is sharper.)

Lower semi-continuity (or lower semi-continuous, abbr. lsc). Suppose {x^k}-->x. Of a function, f is lsc if lim inf f(x^k) >= f(x). Of a point-to-set map, A, let N_e[S] be a neighborhood of the set S: for each e > 0, there exists K such that for all k > K, A(x) is a subset of N_e[A(x^k)]. A pathology to show that the feasibility map may not be lsc:

A(b) = L_b(g) = {x in [-1, 1]: g(x) <= b}, where

g(x) = x if x is in [-1, 0], and g(x) = 0 if x in [0,1].

(Note that g is continuous.) At b=0, A(0) = [-1, 1]. Let b^k = -1/k, so {b^k}-->0 and A(b^k) = [-1, -1/k]. Thus, N_e[A(b^k)] = [-1-e, -1/k+e)] does not contain 1, which is in A(0). We lose the right half of the feasibility region, (0, 1], when we approach b=0 from below.

Lower triangular matrix. A square matrix, A, such that A(i, j)=0 for i <= j.

LPL. Linear Programming Language - available from Tony Hürlimann.

LU decomposition (of matrix A). Finding a lower triangular matrix (L) and an upper triangular matrix (U) for which A=LU. The advantage is to solve the system Ax=b, we first apply forward substitution to solve Ly=b, then backward substitution to solve Ux=y. Similarly, to solve vA=c, first solve uU=c, then vL=u.

Notation