Digital Kommunikationselektronik TNE027 Lecture 3 1 Multiply-Accumulator (MAC) Compute Sum of...

Digital Kommunikationselektronik TNE027 Lecture 3

1

Multiply-Accumulator (MAC)• Compute Sum of Product (SOP)• Linear convolution

y[n] = f[n]*x[n] = Σ f[k] x[n-k]

requires L consecutive multiplications and L–1 additions.

• For a NN multiplier, the product is 2N bits wide.• The Accumulator should be designed with an

extra K bits in width depending on L.

k=0

L-1


2

Distributed Arithmetic (DA)

• A method for computing Sum of Product

y = <c, x> = Σ c[n] x[n]

= Σ c[n] Σ xb[n] 2b

= Σ 2b Σ c[n] xb[n]

= Σ 2b Fc(xb)

N-1

n=0

b=0

B-1N-1

n=0

B-1

b=0

N-1

n=0

B-1

b=0


3

Fc(xb) = Σ c[n] xb[n]

where xb = (xb[0], xb[1], …, xb[N–1])

Fc(xb) is a function of an N-bit input vector and can be pre-computed and saved in a look-up table.

• During computation, the look-up table accepts one bit from each of the N words as its input.

• The output of the look-up table is weighted by the approximate power-of-two factor and accumulated. Note that, in Fig.2.31, the content of the accumulator is shifted instead of shifting the output of the look-up table.

N-1

n=0


4

• Signed DA systems

x[n] = – 2B xB[n] + Σ xb[n] 2b

where x[n] has (B+1) bits.

y = – 2B Fc(xB) + Σ 2b Fc(xb)

where xB is the N-bit vector of sign bit.– The signed DA system can be implemented by

• An accumulator with add/subtract control (See Fig. 2.31)

• Using a ROM with one additional input bit to

indicate Fc(xb) or – Fc(xb) and an accumulator without add/subtract control.

B-1

b=0

B-1

b=0


5

• Modified DA Solutions– Use partial LUTs to reduce the size of ROM

Suppose the length LN inner product is computed

y = <c, x> = Σ c[n] x[n]

= Σ Σ c[Nl+n] x[Nl+n]

See Fig. 2.32.

The ROM size is reduced from 2LN B to

L 2N B.

LN –1

n=0

L –1 N –1

l=0 n=0


6

Modified DA Solutions (cont.)• Use additional LUTs to increase speed

– A basic DA architecture accepts one bit from each of the N words as input to its LUT. If two bits per words are accepted, then the computational speed can be essentially doubled.

– The maximum speed can be achieved with the fully pipelined word-parallel architecture.

See Fig. 2.33.


7

Computation of Special Functions Using CORDIC

• Digital signal processing algorithms might use certain trigonometric, hyperbolic, linear and logarithmic functions.

• Taylor series can be computed to approximate the function and a sequence of multiply and add operations should be performed.


8

Coordinate Rotation Digital Computer (CORDIC)

• CORDIC algorithm is a more efficient alternative approach.

• CORDIC algorithm:At each iteration, compute

[ ]= [ ] [Xk+1

Yk+1

Xk] Yk

1

1

–mδk2–k

δk2–k

Zk+1 = Zk – δk θk


9

CORDIC TheoryThe following text is based on the article

”A survey of CORDIC algorithms for FPGA based computers” by Ray Andraka

• Vector rotation transform

x´ = x cosφ – y sinφ

y´ = y cosφ + x sinφ

which rotate a vector in a Cartesian plane by the angle φ.

x´ = cosφ (x – y tanφ)

y´ = cosφ (y + x tanφ)


10

If the rotation angles are restricted so that tanφ = ±2–i, the multiplication by the tangent term is reduced to a simple shift operation.Arbitrary angles of rotation are obtainable by performing a series of successively smaller elementary rotations.If the decision at each iteration, i, is which direction to rotate, then the cos(δi) term becomes a constant, because cos(δi) = cos(– δi).

xi + 1 = Ki (xi – yi di •2–i) yi + 1 = Ki (yi + xi di •2–i) where:

Ki = cos(tan–1 2–i) = 1/ (1 + 2–2i)1/2

di = ±1


11

• Removing the scale constant from the iterative equations yields a shift-add algorithm for vector rotation.

• The product of the Ki’s can be applied elsewhere in the system or treated as part of a system processing gain.

• That product approaches 0.6073 as the number of iterations goes to infinity.

• Therefore, the rotation algorithm has a gain, An, of approximately 1.647. The exact gain depends on the number of iterations, and obeys the relation

An = П (1 + 2–2i)1/2n


12

• The angle of a composite rotation is uniquely defined by the sequence of the directions of the elementary rotations. That sequence can be represented by a decision vector.

• Conversion from this vector representation to a convensional angle representation can be accomplished by using a look-up table.

• A better conversion method uses an additional adder-subtractor that accumulates the elementary rotation angles at each iteration.

• The angular values are supplied by a small look-up table (one entry per iteration) or are hardwired.


13

• The angle accumulator adds a third difference equation to the CORDIC algorithm:

zi + 1 = zi – di • tan–1(2–i)

• CORDIC rotator is normally operated in one of two modes: – Rotation mode– Vectoring mode


14

• The rotation mode rotates the input vector by a specified angle.

• In the rotation mode, the CORDIC equations are:

xi + 1 = xi – yi di •2–i

yi + 1 = yi + xi di •2–i

zi + 1 = zi – di • tan–1(2–i)

where di = –1 if zi < 0, +1 otherwise.

The angle acummulator zi is initialized with the desired rotation angle z0. xi and yi are initialized with the input vector x0 and y0.


15

• The following results of the rotation mode are obtained when n is reasonably large: xn = An (x0 cosz0 – y0 sinz0)

yn = An (y0 cosz0 + x0 sinz0)

zn = 0

An = П (1 + 2–2i)1/2

where x0 and y0 define the initial input vector and z0 defines the desired rotation angle.

The result xn and yn should be scaled by

1/An to get the actual rotated vector.

n


16

• The vectoring mode rotates the input vector to the x axis while recoding the angle required to make that rotation.

• In the vectoring mode, the CORDIC equations are:

xi + 1 = xi – yi di •2–i

yi + 1 = yi + xi di •2–i

zi + 1 = zi – di • tan–1(2–i)

where di = +1 if yi < 0, –1 otherwise.


17

In the vectoring mode, after n iterations, the result is:

xn = An (x0 2 + y0

2) 1/2

yn = 0

zn = z0 + tan–1(y0 / x0 )

An = П (1 + 2–2i)1/2

zn is the rotated angle, i.e., the angle between the input vector (x0 , y0) and x axis.

xn/An is the magnitude of the input vector

(x0 , y0).

n


18

• The CORDIC rotation and vectoring algorithms are limited to rotation angles between –π/2 and π/2. The limitation is due to the use of 20 for the tangent in the first iteration.

• For composite rotation angles larger than π/2, an additional rotation of π/2 or π is required.


19

• Computing sin(θ) and cos(θ) – Use the rotation mode

– Setting y0 = 0, x0 = 1/ An and z0 = θ

– sin(θ) and cos(θ) can be obtained at the end of rotation:

xn = An ((1/ An)cos θ – 0 · sin θ)

yn = An (0 · cos θ + ((1/ An) sin θ)– Note that multiplication of 1/ An is performed as

part of rotation operation. This eliminates the need for another multiplier for modulation.


20

• Extension to Linear Functions– A simple modification to the CORDIC equation

permits the computation of linear functions:

xi + 1 = xi – 0 • yi di •2–i = xi yi + 1 = yi + xi di •2–i

zi + 1 = zi – di • (2–i)For rotation mode, di = – 1 if zi < 0, +1 otherwise, the linear rotation produces:

xn = x0

yn = y0 + x0 z0

zn = 0The operation is similar to that of a shift-add multiplier.


21

For the vectoring mode, di = +1 if yi < 0, –1 otherwise, it provides a method for evaluating ratios:

xn = x0

yn = 0

zn = z0 + y0 /x0

Note that for linear functions, no scaling corrections are required, since the gain is a unity gain.


22

• Extension to Hyperbolic FunctionsThe close relationship between the trigonometric and hyperbolic functions suggests the same architecture can be used to compute hyperbolic functions. The CORDIC equations are:

xi + 1 = xi + yi di •2–i

yi + 1 = yi + xi di •2–i

zi + 1 = zi – di • tanh–1(2–i)

For rotation mode, di = –1 if zi < 0, +1 otherwise.

For vectoring mode, di = +1 if yi < 0, –1 otherwise.

Note that An = П (1 – 2–2i)1/2 0.80 .n


23

CORDIC Architectures

• CORDIC state machine

See Fig. 2.38 and Fig. 2.39.

• Fast CORDIC pipeline

See Fig. 2.40.

Digital Kommunikationselektronik TNE027 Lecture 3 1 Multiply-Accumulator (MAC) Compute Sum of...

Documents

Transcript of Digital Kommunikationselektronik TNE027 Lecture 3 1 Multiply-Accumulator (MAC) Compute Sum of...