Scholarly article on phenyl-thiocarbamic acid O-(2-diethylamino-ethyl ester) 26334-18-9 from British Journal of Pharmacology and Chemotherapy p. 297

DOI: 10.1155/2001/569670

Source and publish data:

British Journal of Pharmacology and Chemotherapy p. 297 (1948)

Update date:2022-08-16

Topics:

Authors:

Huang Huang

Yieh Yieh

Chang Chang

Read Full Text PDF DownLoad Join now for total 90,000,000 free articles

Article abstract of DOI:10.1155/2001/569670

Full text of DOI:10.1155/2001/569670

Recursive approach in sparse matrix LU

factorization

Jack Dongarra, Victor Eijkhout and

Piotr Łuszczek

University of Tennessee, Department of Computer

Science, Knoxville, TN 37996-3450, USA

Tel.: +865 974 8295; Fax: +865 974 8296

the resulting matrix is often guaranteed to be positive

deﬁnite or close to it. However, when the linear sys-

tem matrix is strongly unsymmetric or indeﬁnite, as

is the case with matrices originating from systems of

ordinary differential equations or the indeﬁnite matri-

ces arising from shift-invert techniques in eigenvalue

methods, one has to revert to direct methods which are

the focus of this paper.

In direct methods, Gaussian elimination with partial

pivoting is performedto ﬁnd a solution of Eq. (1). Most

commonly, the factored form of A is given by means

of matrices L, U, P and Q such that:

∗

This paper describes a recursive method for the LU factoriza-

tion of sparse matrices. The recursive formulation of com-

mon linear algebra codes has been proven very successful in

dense matrix computations. An extension of the recursive

technique for sparse matrices is presented. Performance re-

sults given here show that the recursive approach may per-

form comparable to leading software packages for sparse ma-

trix factorization in terms of execution time, memory usage,

and error estimates of the solution.

LU = PAQ,

where:

(2)

–

L is a lower triangular matrix with unitary diago-

nal,

U is an upper triangular matrix with arbitrary di-

agonal,

. Introduction

Typically, a system of linear equations has the form:

P and Q are row and column permutation matri-

ces, respectively (each row and column of these

matrices contains single a non-zero entry which is

Ax = b,

where A is n by n real matrix (A ∈ R

(1)

n×n

), and x

_,_a_n_d_t_h_e_f_o_l_l_o_w_i_n_g_h_o_l_d_s_:_P_PT = QQ = I,

and b are n-dimensional real vectors (b, x ∈ R ). The

values of A and b are known and the task is to ﬁnd

x satisfying Eq. (1). In this paper, it is assumed that

the matrix A is large (of order commonly exceeding

ten thousand) and sparse (there are enough zero entries

in A that it is beneﬁcial to use special computational

methods to factor the matrix rather than to use a dense

code). There are two common approaches that are used

to deal with such a case, namely, iterative [33] and

direct methods [17].

with I being the identity matrix).

A simple transformation of Eq. (1) yields:

−1

(

PAQ)Q x = Pb,

(3)

(4)

which in turn, after applying Eq. (2), gives:

−1

LU(Q x) = Pb,

Solutionto Eq. (1) may now be obtainedin two steps:

Ly = Pb

(5)

(6)

Iterative methods, in particular Krylov subspace

techniques such as the Conjugate Gradient algorithm,

are the methods of choice for the discretizations of el-

liptic or parabolic partial differential equations where

−

U(Q x) = y

and these steps are performed through forward/back-

ward substitution since the matrices involved are trian-

gular. The most computationally intensive part of solv-

ing Eq. (1) is the LU factorization deﬁned by Eq. (2).

This operation has computational complexity of or-

der O(n ) when A is a dense matrix, as compared

to O(n ) for the solution phase. Therefore, optimiza-

∗

Corresponding author: Piotr Luszczek, Department of Computer

Science, 1122 Volunteer Blvd., Suite 203, Knoxville, TN 37996-

450, USA. Tel.: +865 974 8295; Fax: +865 974 8296; E-mail:

luszczek@cs.utk.edu.

Scientiﬁc Programming 9 (2001) 51–60

J. Dongarra et al. / Recursive approach in sparse matrix LU factorization

tion of the factorization is the main determinant of the

overall performance.

When both of the matrices P and Q of Eq. (2) are

non-trivial, i.e. neither of them is an identity matrix,

then the factorization is said to be using complete piv-

oting. In practice, however, Q is an identity matrix and

this strategy is called partial pivoting which tends to be

sufﬁcient to retain numerical stability of the factoriza-

tion, unless the matrix A is singular or nearly so. Mod-

erate values of the condition number κ = ꢀA ꢀ · ꢀAꢀ

guarantee a success for a direct method as opposed to

matrix structure and spectrum considerations required

for iterative methods.

When the matrix A is sparse, i.e. enoughof its entries

are zeros, it is important for the factorization process

to operate solely on the non-zero entries of the matrix.

However, new nonzero entries are introduced in the

L and U factors which are not present in the original

matrix A of Eq. (2). The new entries are referred to

as ﬁll-in and cause the number of non-zero entries in

the factors (we use the notation η(A) for the number

of nonzeros in a matrix) to be (almost always) greater

than that of the original matrix A: η(L + U) ꢀ η(A).

The amount of ﬁll-in can be controlled with the ma-

trix ordering performed prior to the factorization and

consequently, for the sparse case, both of the matrices

P and Q of Eq. (2) are non-trivial. Matrix Q induces

the column reordering that minimizes ﬁll-in and P per-

mutes rows so that pivots selected during the Gaussian

elimination guarantee numerical stability.

Fig. 1. Iterative LU factorization function of a dense matrix A. It is

equivalent to LAPACK’s xGETRF() function and is performed using

Gaussian elimination (without a pivoting clause).

−

the loops and introduction of blocking techniques can

signiﬁcantly increase the performance of this code [2,

9]. However, the recursive formulation of the Gaussian

elimination shown in Fig. 2 exhibits superior perfor-

mance [25]. It does not contain any looping statements

and most of the ﬂoating point operations are performed

by the Level 3 BLAS [14] routines: xTRSM() and

xGEMM(). These routines achieve near-peak MFLOP/s

rates on modern computers with a deep memory hierar-

chy. They are incorporated in many vendor-optimized

libraries, and they are used in the Atlas project [16]

which automatically generates implementations tuned

to speciﬁc platforms.

Yet another implementation of the recursive algo-

rithm is shown in Fig. 3, this time without pivoting

code. Experiments show that this code performs equal-

ly well as the code from Fig. 2. The experiments also

provide indications that further performance improve-

ments are possible, if the matrix is stored recursive-

ly [26]. Such a storage scheme is illustrated in Fig. 4.

This scheme causes the dense submatrices to be aligned

recursively in memory. The recursive algorithm from

Fig. 3 then traverses the recursive matrix structure all

the way down to the level of a single dense submatrix.

At this point an appropriate computational routine is

called (either BLAS or xGETRF()). Depending on the

size of the submatrices (referred to as a block size [2]),

it is possible to achieve higher execution rates than for

the case when the matrix is stored in the column-major

or row-major order. This observation made us adopt

the code from Fig. 3 as the base for the sparse recursive

algorithm presented below.

Recursion started playing an important role in ap-

plied numerical linear algebra with the introduction

of Strassen’s algorithm [6,31,36] which reduced the

complexity of the matrix-matrix multiply operation

log 7

from O(n ) to O(n

). Later on it was recognized

that factorization codes may also be formulated recur-

sively [3,4,21,25,27] and codes formulated this way

perform better [38] than leading linear algebra pack-

ages [2] which apply only a blocking technique to in-

crease performance. Unfortunately, the recursive ap-

proach cannot be applied directly for sparse matrices

because the sparsity pattern of a matrix has to be taken

into account in order to reduce both the storage require-

ments and the ﬂoating point operation count, which are

the determining factors of the performance of a sparse

code.

3. Sparse matrix factorization

. Dense recursive LU factorization

Matrices originating from the Finite Element Me-

Figure 1 shows the classical LU factorization code

which uses Gaussian elimination. Rearrangement of

thod [35], or most other discretizations of Partial Dif-

ferential Equations, have most of their entries equal to

J. Dongarra et al. / Recursive approach in sparse matrix LU factorization

Fig. 2. Recursive LU factorization function of a dense matrix A equivalent to the LAPACK’s xGETRF() function with a partial pivoting code.

zero. During factorization of such matrices it pays off

to take advantageof the sparsity pattern for a signiﬁcant

reductionin the numberof ﬂoating point operationsand

executiontime. The major issue of the sparse factoriza-

tion is the aforementioned ﬁll-in phenomenon. It turns

out that the proper ordering of the matrix, represent-

ed by the matrices P and Q, may reduce the amount

of ﬁll-in. However, the search for the optimal order-

ing is an NP-complete problem [39]. Therefore, many

heuristics have been devised to ﬁnd an ordering which

approximates the optimal one. These heuristics range

from the divide and conquer approaches such as Nest-

ed Dissection [22,29] to the greedy schemes such as

Minimum Degree [1,37]. For certain types of matrices,

bandwidth and proﬁle reducing orderings such as Re-

verse Cuthill-McKee [8,23]and the Sloan ordering[34]

may perform well.

Once the amount of ﬁll-in is minimized through

the appropriate ordering, it is still desirable to use the

optimized BLAS to perform the ﬂoating point opera-

tions. This poses a problem since the sparse matrix

coefﬁcients are usually stored in a form that is not

suitable for BLAS. There exist two major approach-

es that efﬁciently cope with this, namely the multi-

frontal [20] and supernodal [5] methods. The Super-

LU package [28] is an example of a supernodal code,

whereas UMFPACK [11,12] is a multifrontal one.

Factorization algorithms for sparse matrices typical-

ly include the following phases, which sometimes are

intertwined:

J. Dongarra et al. / Recursive approach in sparse matrix LU factorization

Fig. 3. Recursive LU factorization function used for sparse matrices (no pivoting is performed).

–

matrix ordering to reduce ﬁll-in,

symbolic factorization,

search for dense submatrices,

numerical factorization.

The next phase ﬁnds the ﬁll-in and allocates the re-

quired storage space. This process can be performed

solely based on the matrix sparsity pattern information

without considering matrix values. Substantial per-

formance improvements are obtained in this phase if

graph-theoretic concepts such as elimination trees and

elimination dags [24] are efﬁciently utilized.

The last two phases are usually performed jointly.

They aim at executingthe required ﬂoating point opera-

tions at the highest rate possible. This may be achieved

in a portable fashion through the use of BLAS. Super-

LU uses supernodes, i.e. sets of columns of a similar

sparsity structure, to call the Level 2 BLAS. Memory

bandwidth is the limiting factor of the Level 2 BLAS,

so, to reuse the data in cache and consequently improve

the performance, the BLAS calls are reorganizedyield-

ing the so-called Level 2.5 BLAS technique [13,28].

UMFPACK uses frontal matrices that are formed dur-

The ﬁrst phase is aimed at reducing the aforemen-

tioned amount of ﬁll-in. Also, it may be used to im-

prove the numerical stability of the factorization (it is

then referred to as a static pivoting [18]). In our code,

this phase serves both of these purposes, whereas in Su-

perLU and UMFPACK the pivoting is performed only

during the factorization. The actual pivoting strategy

being used in theses packages is called a threshold piv-

oting: the pivot is not necessarily the largest in abso-

lute value in the current column (which is the case in

the dense codes) but instead, it is just large enough to

preserve numerical stability. This makes the pivoting

much more efﬁcient, especially with the complex data

structures involved in sparse factorization.

J. Dongarra et al. / Recursive approach in sparse matrix LU factorization

Fig. 4. Column-major storage scheme versus recursive storage (left) and function for converting a square matrix from the column-major to

recursive storage (right).

ing the factorization process. They are stored as dense

matrices and may be passed to the Level 3 BLAS.

point operations that are performed on the additional

zero values. This leads to the conclusion that the sparse

recursive storage scheme performs best when almost

dense blocks exist in the L and U factors of the ma-

trix. Such a structure may be achieved with the band-

reducing orderings such as Reverse Cuthill-McKee [8,

. Sparse recursive factorization algorithm

The essential part of any sparse factorization code

23] or Sloan [34]. These orderings tend to incur more

is the data structure used for storing matrix entries.

The storage scheme for the sparse recursive code is

illustrated in Fig. 5. It has the following characteristics:

ﬁll-in than others such as Minimum Degree [1,37] or

Nested Dissection [22,29], but this effect can be expect-

ed to be alleviated by the aforementioned compactness

of the data storage scheme and utilization of the Level

–

the data structure that describes the sparsity pattern

is recursive,

the storage scheme for numerical values has two

levels:

BLAS.

The algorithm from Fig. 3 remains almost unchanged

in the sparse case – the differences being that calls to

BLAS are replaced by the calls to their sparse recur-

sive counterparts and that the data structure is no longer

the same. Figures 6 and 7 show the recursive BLAS

routines used by the sparse recursive factorization al-

gorithm. They traverse the sparsity pattern and upon

reaching a single dense block level they call the dense

BLAS which perform actual ﬂoating point operations.

∗

the lower level, which consists of dense square

submatrices (blocks) which enable direct use of

the Level 3 BLAS, and

∗

the upper level, which is a set of integer indices

that describe the sparsity pattern of the blocks.

There are two important ramiﬁcations ofthis scheme.

First, the number of integer indices that describe the

sparsity pattern is decreased because each of these in-

dices refers to a block of values rather than individual

values. It allows for more compact data structures and

during the factorization it translates into a shorter ex-

ecution time because there is less sparsity pattern data

to traverse and more ﬂoating operations are performed

by efﬁcient BLAS codes – as opposed to in code that

relies on compiler optimization. Second, the blocking

introduces additional nonzero entries that would not be

present otherwise. These artiﬁcial nonzeros amount

to an increase in storage requirements. Also, the ex-

ecution time is longer because it is spent on ﬂoating

5. Performance results

To test the performance of the sparse recursive factor-

ization code it was compared to SuperLU Version 2.0

(available at http://www.nersc.gov/˜xiaoye/SuperLU/)

and UMFPACK Version 3.0 (available at http://www.

cise.uﬂ.edu/research/sparse/umfpack/). The tests were

performed on a Pentium III Linux workstation whose

characteristics are given in Table 1.

Each of the codes were used to factor selected matri-

ces from the Harwell-Boeing collection [19], and Tim

J. Dongarra et al. / Recursive approach in sparse matrix LU factorization

Fig. 5. Sparse recursive blocked storage scheme with the blocking factor equal 2.

Fig. 6. Recursive formulation of the xGEMM() function which is used in the sparse recursive factorization.

Davis’ [10] matrix collection. These matrices were

used to evaluate the performanceof SuperLU [28]. The

matrices are unsymmetric so they cannot be used di-

rectly with the Conjugate Gradient method and there

is no general method for ﬁnding the optimal iterative

method other than trying each one in turn or running

all of the methods in parallel [7]. Table 2 shows the to-

tal execution time of factorization (including symbolic

and numerical phases) and forward error estimates.

The performance of a sparse factorization code can

be tuned for a given computer architecture and a par-

ticular matrix. For SuperLU, the most inﬂuential pa-

rameter was the ﬁll-in reducing ordering used prior to

factorization. All of the available ordering schemes

that come with SuperLU were used and Table 2 gives

the best time that was obtained. UMFPACK supports

only one kind of ordering (a column oriented version of

the Approximate Minimum Degree algorithm [1]) so it

was used with the default values of its tuning parame-

ters and threshold pivoting disabled. For the recursive

approach all of the matrices were ordered using the

Reverse Cuthill-McKee ordering. However, the block

J. Dongarra et al. / Recursive approach in sparse matrix LU factorization

Table 1

Parameters of the machine used in the tests

Hardware speciﬁcations

CPU type

Pentium III

550 MHz

100 MHz

16 Kbytes

512 Kbytes

512 MBytes

CPU clock rate

Bus clock rate

L1 data cache

L1 instruction cache

L2 uniﬁed cache

Main memory

CPU performance

Peak

Matrix-matrix multiply

Matrix-vector multiply

550 MFLOP/s

390 MFLOP/s

100 MFLOP/s

raefsky4). There are two major reasons for the poor

performance of the recursive code on the second class.

First, there is an average density factor which is the ra-

tio of the true nonzero entries of the factored matrix to

all the entries in the blocks. It indicates how many ar-

tiﬁcial nonzeros were introduced by the blocking tech-

nique. Whenever this factor drops below 70%, i.e. 30%

of the factored matrix entries do not come from the L

and U factors, the performance of the recursive code

will most likely suffer. Even when the density factor

is satisfactory, still, the amount of ﬁll-in incurred by

the Reverse Cuthill-McKee ordering may substantially

exceed that of other orderings. In both cases, i.e. with

a low value of the density factor or excessive ﬁll-in,

the recursive approach performs too many unnecessary

ﬂoating point operations and even the high execution

rates of the Level 3 BLAS are not able to offset it.

The computed forward error is similar for all of the

codes despite the fact that two different approaches to

pivoting were employed. Only SuperLU was doing

threshold pivoting while the other two codes had the

threshold pivoting either disabled (UMFPACK) or there

was no code for any kind of pivoting.

Fig. 7. Recursive formulation of the xTRSM() functions used in the

Table 3 shows the matrix parameters and storage re-

quirements for the test matrices. It can be seen that

SuperLU and UMFPACK use slightly less memory and

consequently perform fewer ﬂoating point operations.

This may be attributed to the Minimum Degree algo-

rithm used as an ordering strategy by these codes which

minimizes the ﬁll-in and thus the space requiredto store

the factored matrix.

sparse recursive factorization.

size selected somewhat inﬂuences the execution time.

Table 2 shows the best running time out of the block

sizes ranging between 40 and 120. The block size de-

pends mostly on the size of the Level 1 cache but also

on the sparsity pattern of the matrix. Nevertheless, run-

ning times for the different block sizes are comparable.

SuperLU and UMFPACK also have tunable parameters

that functionally resemble the block size parameter but

their importance is marginal as compared to that of the

matrix ordering.

6. Conclusions and future work

The total factorization time from Table 2 favors

the recursive approach for some matrices, e.g., ex11,

psmigr 1 and wang3, and for others it strongly

discourages its use (matrices mcfe, memplus and

We have shown that the recursive approach to the

sparse matrix factorization may lead to an efﬁcient im-

plementation. The execution time, storage require-

ments, and error estimates of the solution are compara-

J. Dongarra et al. / Recursive approach in sparse matrix LU factorization

Table 2

Factorization time and error estimates for the test matrices for three factorization codes

Matrix

name

SuperLU

FERR

UMFPACK

T [s] FERR

Recursion

T [s] FERR

T [s]

44.2 5 · 10⁻

29.3

66.2

17.8

0.1

4 · 10

2 · 10

4 · 10

2 · 10

5 · 10

2 · 10

4 · 10

5 · 10

−04

31.3

55.3

6.7

0.3

0.2

12.7

22.1

0.5

88.6

69.7

104.3

1.0

2 · 10

1 · 10

5 · 10

3 · 10

9 · 10

7 · 10

4 · 10

2 · 10

1 · 10

4 · 10

1 · 10

5 · 10

6 · 10

2 · 10

−14

−06

−15

−13

−09

−13

−05

−13

−06

−11

−13

−15

−14

af23560

ex11

goodwin

jpwh 991

mcfe

−

−03

−02

−12

−13

−11

−06

−12

−08

−10

+01∗

−07

−11

−12

−08

109.7

3 · 10

−

6.5 1 · 10

0.2 3 · 10

0.1 1 · 10

0.3 2 · 10

26.2 1 · 10

0.5 1 · 10

0.2

memplus

olafu

20.1

19.6

0.3

242.6

52.4

101.9

0.7

0.5

0.3

132.1

orsreg 1

psmigr 1

raefsky3

raefsky4

saylr4

sherman3

sherman5

wang3

110.8

8 · 10

62.1 1 · 10

82.5 2 · 10

0.9 3 · 10

0.6 6 · 10

0.3 1 · 10

0.7

0.3

79.2

84.1 2 · 10⁻

T – combined time for symbolic and numerical factorization

ꢀ

xˆ

−

xꢀ∞

FERR =

∗

(forward error)

ꢀ

ꢀ∞

the matrix raefsky4 requires the threshold pivoting in UMFPACK

to be enabled in order to give a satisfactory forward error

Table 3

Parameters of the test matrices and their storage requirements for three factorization codes

Matrix parameters

SuperLU

L + U

UMFPACK

L + U

[MB]

Recursion

Name

NZ·10

L + U

[MB]

block

size

[MB]

af23560

ex11

goodwin

jpwh 991

mcfe

23560

16614

7320

991

461

1097

325

132.2

210.2

31.3

1.4

96.6

129.2

57.0

1.4

149.7

150.6

35.0

2.3

120

765

0.9

0.7

1.8

memplus

olafu

17758

16146

2205

3140

21200

19779

3564

5005

3312

26064

126

1015

543

1489

1317

177

5.9

83.9

3.6

64.6

147.2

156.2

6.0

5.0

3.0

116.7

112.5

63.3

2.8

76.2

150.1

171.5

4.6

3.5

1.9

249.7

195.7

96.1

3.9

78.4

193.9

234.4

7.2

7.3

3.1

256.7

100

120

orsreg 1

psmigr 1

raefsky3

raefsky4

saylr4

sherman3

sherman5

wang3

120

N – order of the matrix

NZ – number of nonzero entries in the matrix

L + U – size of memory required to store the L and U factors

ble to that of supernodal and multifrontal codes. How-

ever, there are still matrices for which the recursive

code does not perform well. These cases should be

investigated further and possibly a metric devised that

would allow selecting the best factorization method for

a given matrix. This metric will probably include the

aforementioned density factor. During a preprocess-

ing phase, the density factor is computed and only if it

exceeds a certain threshold the recursive code is used.

An open question is which code to choose when the

recursive one is not appropriate. A performance model

is necessary that links together the features of the mul-

tifrontal and supernodal approaches with the character-

istics of the matrix to be factored and machnie it is to

be used on.

The problem with low values of the density factor

may be regarded as a future research direction. The

aim should be to make the recursivecode more adaptive

to the matrix sparsity pattern. It could allow the use

of matrix orderings other than Reverse Cuthill-McKee

J. Dongarra et al. / Recursive approach in sparse matrix LU factorization

because the high average density of the blocks will not

be as crucial any more.

[5] C. Ashcraft, R. Grimes, J. Lewis, B. Peyton and H. Simon,

Progress in sparse matrix methods in large sparse linear sys-

tems on vector supercomputers, Intern. J. of Supercomputer

Applications 1 (1987), 10–30.

Another outstanding issue is the numerical stability

of the factorization process. As it is now, it does not

perform pivoting and still delivers acceptable accuracy.

On matrices other than those tested, the method may

still fail, and even iterative reﬁnement may be unable

to regain sufﬁcient accuracy. Therefore, an extended

version that performs at least some form of pivoting

would likely be much more robust.

[

6] D. Bailey, K. Lee and H. Simon, Using Strassen’s algorithm

to accelerate the solution of linear systems, The Journal of

Supercomputing 4 (1990), 357–371.

[

7] R. Barrett, M. Berry, J. Dongarra, V. Eijkhout and C. Romine,

Algorithmic bombardment for the iterative solution of linear

systems: a poly-iterative approach, JCAM 74 (1996), 91–109.

[8] E. Cuthill and J. McKee, Reducing the bandwidth of sparse

symmetric matrices, in: Proceedings of ACM National Con-

ference, Association of Computing Machinery, New York,

A parallel version of the recursive approach for

sparse matrices is also under consideration. At this

point, there are many issues to be resolved and the main

direction is still not clear. Supernodal and multifrontal

approaches use symbolical data structures from the se-

quential algorithm to assist the parallel implementa-

tion. In the recusive approach no such structures are

used and consequently parallelism has to be exploited

in some other way. On the other hand, dense codes [21,

969.

[

9] M. Dayde and I. Duff, Level 3 BLAS in LU factorization on

Cray-2, ETA-10P and IBM 3090-200/VF, The International

Jorunal of Supercomputer Applications 3 (1989), 40–70.

[

10] T. Davis, University of Florida Sparse Matrix Collection,

http://www.cise.uﬂ.edu/˜davis/sparse/,

ftp://ftp.cise.uﬂ.edu/pub/faculty/davis/matrices, NA Digest

94(42) (October 16, 1994), NA Digest 96(28) (July 23, 1996),

and NA Digest 97(23) (June 7, 1997).

[

11] T. Davis and I. Duff, An unsymmetric-pattern multifrontal

method for sparse LU factorization, Technical Report RAL-

93-036, Rutherford Appleton Laboratory, Chilton, Didcot,

Oxfordshire, 1994.

12] T. Davis, User’s guide for the unsymmetric-pattern multi-

frontal package (UMFPACK), Technical Report TR-93-020,

Computer and Information Sciences Department, University

of Florida, June 1993.

0] use recursion only locally and resort to other tech-

niques in order to expose parallelism inherent in the

factorization process [32].

[

Acknowledgments

[13] J. Demmel, S. Eisenstat, J. Gilbert, X. Li and J. Liu, A su-

pernodal approach to sparse partial pivoting, Technical report

UCB//CSD-95-883, Computer Science Division, U.C. Berke-

ley, Berkeley, California, 1995.

[14] J. Dongarra, J. Du Croz, I. Duff and S. Hammarling, A set of

Level 3 FORTRAN Basic Linear Algebra Subprograms, ACM

Transactions on Mathematical Software 16 (March 1990), 1–

This work was supported in part by the Universi-

ty of California Berkeley through subcontract num-

ber SA2283JB, as part of the prime contract ACI-

813362from the National Science Foundation; and by

17.

the University of California Berkeley through subcon-

tract number SA1248PG, as part of the prime contract

DEFG03-94ER25219 from the Department of Energy.

[

15] J. Dongarra, J. Du Croz, S. Hammarling and R. Hanson,

An extended set of FORTRAN Basic Linear Algebra Sub-

programs, ACM Transactions on Mathematical Software 14

(March 1988), 1–17.

[16] J. Dongarra and R. Whaley, Automatically Tuned Linear Al-

gebra Software (ATLAS), in: Proceedings of SC’89, 1989.

[17] I. Duff, A. Erisman and J. Reid, Direct methods for sparse

matrices, Oxford University Press, 1989.

References

[

18] I. Duff and J. Koster, The design and use of algorithms for

permuting large entries to the diagonal of sparse matrices,

SIAM J. Matrix Anal. Appl. 20 (1999), 889–901.

19] I. Duff, R. Grimes and J. Lewis, Sparse matrix test problems,

ACM Transactions on Mathematical Software 15 (1989), 1–

14.

[

1] R. Amestoy, T. Davis and I. Duff, An approximate minimum

degree algorithm, Technical Report TR/PA/95/09, CERFACS,

Toulouse, France.

[

2] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel,

J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling,

A. McKenney and D. Sorensen, LAPACK User’s Guide, Soci-

ety for Industrial and Applied Mathematics, Philadelphia, PA,

Third edition, 1999.

[20] I. Duff and J. Reid, The multifrontal solution of indeﬁnite

sparse symmetric linear equations, ACM Transactions on

Mathematical Software 9(3) (September 1983), 302–325.

[

3] B. Andersen, F. Gustavson and J. Wasniewski, A recursive

formulation of the Cholesky factorization operating on a ma-

trix in packed storage form, in Proceedings of the 9th SIAM

Conference on Parallel Processing for Scientiﬁc Computing,

San Antonio, TX, USA, March 24–27, 1999.

[

21] E. Elmroth and F. Gustavson, Applying recursion to serial and

parallel QR factorization leads to better performance, IBM

Journal of Research and Development 44(4) (2000), 605–624.

[22] A. George, Nested dissection of a regular ﬁnite element mesh,

SIAM Journal of Numerical Analysis 10 (1973), 345–363.

[23] N.E. Gibbs, W.G. Poole and P.K. Stockmeyer, An algorithm

for reducing the bandwidth and proﬁle of a sparse matrix,

SIAM Journal of Numerical Analysis 13(2) (April 1976).

[

4] B. Andersen, F. Gustavson, A. Karaivanov, J. Wasniewski and

P. Yalamov, LAWRA – linear algebra with recursive algo-

rithms, in: Proceedings of the Conference on Parallel Pro-

cessing and Applied Mathematics, Kazimierz Dolny, Poland,

September 14–17, 1999.

J. Dongarra et al. / Recursive approach in sparse matrix LU factorization

[

24] J.R. Gilbert and J.W.H. Liu, Elimination structures for unsym-

cs.utk.edu/hpl/, http://www.netlib.org/benchmark/hpl/.

[31] M. Paprzycki and C. Cyphers, Using Strassen’s matrix multi-

plication in high performance solution of linear systems, Com-

puters Math. Applic. 31(4/5) (1996), 55–61.

[32] Y. Saad, Communication complexity of the Gaussian elimi-

nation algorithm on multiprocessors, Linear Algebra and Its

Applications 77 (1986), 315–340.

[33] Y. Saad, Iterative methods for sparse linear systems, PWS

Publishing Company, New York, 1996.

[34] S.W. Sloan, An algorithm for proﬁle and wavefront reduc-

tion of sparse matrices, International Journal for Numerical

Methods in Engineering 23 (1986), 239–251.

metric sparse LU factors, SIAM J. Matrix Anal. Appl. 14(2)

(April 1993), 334–352.

25] F. Gustavson, Recursion leads to automatic variable blocking

for dense linear-algebra algorithms, IBM Journal of Research

and Development 41(6) (November 1997), 737–755.

26] F. Gustavson, A. Henriksson, I. Jonsson, B. Kgstr o¨ m and P.

Ling, Recursive blocked data formats and BLAS’s for dense

linear algebra algorithms, in: Proceedings of Applied Parallel

Computing, PARA’98, B. Kgstr o¨ m, J. Dongarra, E. Elmroth

and J. Wa s´ niewski, eds, Lecture Notes in Computer Science

1541, Springer-Verlag, Berlin, 1998, pp. 195–206.

[

27] F. Gustavson and I. Jonsson, Minimal-storage high-

performance Cholesky factorization via blocking and recur-

sion, IBM Journal of Research and Development 44(6) (Nove-

meber 2000), 823–850.

[35] G. Strang and G. Fix, An analysis of the Finite Element

Method, Prentice-Hall, Inc., 1973.

[36] V. Strassen, Gaussian elimination is not optimal, Numerical

Mathematics 13 (1969), 354–356.

[

28] X. Li, Sparse Gaussian elimination on high performance com-

puters, Ph.D. thesis, University of California at Berkeley,

Computer Science Department, 1996.

29] R.J. Lipton, D.J. Rose and R.E. Tarjan, Generalized Nested

Dissection, SIAM Journal on Numerical Analysis 16 (1979),

[37] W. Tinney and J. Walker, Direct solutions of sparse network

equations by optimally ordered triangular factorization, Pro-

ceedings of the IEEE 55 (1967), 1801–1809.

[38] S. Toledo, Locality of Reference in LU Decomposition with

partial pivoting, SIAM J. Matrix Anal. Appl. 18(4) (October

1997), 1065–1081.

[39] M. Yannakakis, Computing the minimum ﬁll-in is NP-

complete, SIAM Journal on Algebraic and Discrete Methods

2(1) (March 1981), 77–79.

346–358.

30] A. Petitet, R.C. Whaley, J. Dongarra and A. Cleary, HPL –

A portable implementation of the high-performance Lin-

pack benchmark for distributed-memory computers, http://icl.

Advances in

Journal of

Industrial Engineering

Multimedia

Applied

Computational

Intelligence and Soft

Computing

International Journal of

Distributed

The Scientiﬁc

World Journal

Sensor Networks

Hindawi Publishing Corporation

http://www.hindawi.com

Hindawi Publishing Corporation

http://www.hindawi.com

Hindawi Publishing Corporation

http://www.hindawi.com

Volume 2014

http://www.hindawi.com

Volume 2014

http://www.hindawi.com

Volume 2014

Advances in

Fuzzy

Systems

Modelling &

Simulation

in Engineering

Hindawi Publishing Corporation

http://www.hindawi.com

Hindawi Publishing Corporation

http://www.hindawi.com

Volume 2014

Submit your manuscripts at

http://www.hindawi.com

Journal of

Computer Networks

and Communications

ꢀAdvancesꢀinꢀ

Artiﬁcial

Intelligence

Hindawi Publishing Corporation

http://www.hindawi.com

HindawiꢀPublishingꢀCorporation

http://www.hindawi.com

Volume 2014

Volumeꢀ2014

International Journal of

Advances in

Biomedical Imaging

Artiﬁcial

Neural Systems

International Journal of

Computer Games

Technology

Advances in

Computer Engineering

Software Engineering

Hindawi Publishing Corporation

http://www.hindawi.com

Hindawi Publishing Corporation

http://www.hindawi.com

Hindawi Publishing Corporation

http://www.hindawi.com

Hindawi Publishing Corporation

http://www.hindawi.com

Volume 2014

http://www.hindawi.com

Volume 2014

International Journal of

Reconﬁgurable

Computing

Computational

Intelligence and

Neuroscience

Advances in

Journal of

Human-Computer

Electrical and Computer

Robotics

Interaction

Engineering

Hindawi Publishing Corporation

http://www.hindawi.com

Hindawi Publishing Corporation

http://www.hindawi.com

Hindawi Publishing Corporation

http://www.hindawi.com

Hindawi Publishing Corporation

http://www.hindawi.com

Hindawi Publishing Corporation

Volume 2014

http://www.hindawi.com

Volume 2014

Products guided by the article

Product name:phenyl-thiocarbamic acid O-(2-diethylamino-ethyl ester)

Cas No:26334-18-9

R&D Labs maybe for 26334-18-9

Beijing Zhongshuo Pharmaceutical T & D Co.,Ltd

Contact:0086-10-64430626

Address:ea No 16, HEPINGLI,DONGCHENG DISTRICT,BEIJING,P.R.CHINA.
Shanghai Bosman Industrial Co., Ltd

Contact:86-21-63065878-8006

Address:Rm907, No.1611, North Sichuan Road, Hongkou District, Shanghai, 200080 China
Huangshi Meifeng Chemical Co.,ltd.

Contact:+86-714-6516706

Address:1941-4-3#,Hubin Avenue,Huangshi,Hubei,China
Hubei Lingsheng Pharmaceuticals Co., Ltd.

Contact:+86-0710-3538058

Address:Xiangyang City Xiangcheng Economic Development Zone, Hubei Province
Yingkou Sanzheng New Technology Chemical Industry Co., Ltd.

Contact:+86-417-2927806

Address:yingkou

Relevant to this article

Doi:10.1039/C2970000101a
(1970)
Synthesis of new benzo-substituted macrocyclic ligands containing quinoxaline subunits

Doi:10.1016/S0040-4020(99)01072-8
(2000)
Efficient synthesis of a hetero[4]rotaxane by a threading-stoppering- followed-by-clipping approach

Doi:10.1039/c001343a
(2010)
Controllable, Sequential, and Stereoselective C-H Allylic Alkylation of Alkenes

Doi:10.1021/jacs.9b08801
(2019)
Doi:10.1016/S0020-1693(00)80576-9
(1989)
Biomimetic syntheses of the neurotrophic natural products caryolanemagnolol and clovanemagnolol

Doi:10.1021/ol100214X
(2010)

Article Doi

DOI: 10.1155/2001/569670

Source and publish data:

Authors:

Article abstract of DOI:10.1155/2001/569670

Full text of DOI:10.1155/2001/569670

Products guided by the article

R&D Labs maybe for 26334-18-9

Relevant to this article

Hot Product