Simple AEAD Hardware Interface SAEHI in a SoC: Implementing an On-Chip Keyak/WhirlBob Coprocessor

17
2014 1. “Beyond Modes: Building a Secure Record Protocol from a Cryptographic Sponge Permutation.” CT-RSA 2014. LNCS 8366, pp. 270–285, Springer (2014) 2. “CBEAM: Efcient Authenticated Encryption from Feebly One-Way ϕ Functions.” CT-RSA 2014. LNCS 8366, pp. 251–269, Springer (2014) 3. “STRIBOB: Authenticated Encryption from GOST R 34.11-2012 LPS Permutation.” CTCrypt ’14. To appear in Mathematical Aspects of Cryptography, Steklov Mathematical Institute of RAS (2014) 4. “Simple AEAD Hardware Interface (SÆHI) in a SoC: Implementing an On-Chip Keyak/WhirlBob Coprocessor.” TrustED 2014, ACM CCS 2014 Workshops, 03 November 2014, Scottsdale AZ US. To appear. ACM (2014) 5. “Lighter, Faster, and Constant-Time: WHIRLBOB, the Whirlpool variant of STRIBOB.” With Billy Bob Brumley. IACR ePrint 2014/501. Submitted (2014) 6. “BRUTUS: Identifying Cryptanalytic Weaknesses in CAESAR First Round Candidates.” IACR ePrint 2014/850. Submitted (2014) + Invited Talks. 1 / 17

Transcript of Simple AEAD Hardware Interface SAEHI in a SoC: Implementing an On-Chip Keyak/WhirlBob Coprocessor

20141. “BeyondModes: Building a Secure Record Protocol from a Cryptographic SpongePermutation.” CT-RSA 2014. LNCS 8366, pp. 270–285, Springer (2014)

2. “CBEAM: Efficient Authenticated Encryption from Feebly One-Way ϕ Functions.”CT-RSA 2014. LNCS 8366, pp. 251–269, Springer (2014)

3. “STRIBOB: Authenticated Encryption fromGOST R 34.11-2012 LPSPermutation.” CTCrypt ’14. To appear inMathematical Aspects of Cryptography,SteklovMathematical Institute of RAS (2014)

4. “Simple AEADHardware Interface (SÆHI) in a SoC: Implementing anOn-ChipKeyak/WhirlBob Coprocessor.” TrustED 2014, ACMCCS 2014Workshops, 03November 2014, Scottsdale AZUS. To appear. ACM (2014)

5. “Lighter, Faster, and Constant-Time: WHIRLBOB, theWhirlpool variant ofSTRIBOB.”With Billy Bob Brumley. IACR ePrint 2014/501. Submitted (2014)

6. “BRUTUS: Identifying CryptanalyticWeaknesses in CAESAR First RoundCandidates.” IACR ePrint 2014/850. Submitted (2014)

+ Invited Talks.1 / 17

Simple AEADHardware Interface (SÆHI) in a SoC:Implementing anOn-Chip Keyak/WhirlBob Coprocessor

Dr. Markku-Juhani O. [email protected]

NORWEGIANUNIVERSITY OF SCIENCE AND TECHNOLOGY

TrustED ’14 – 03November 2014 – Scottsdale AZ

2 / 17

Authenticated Encryption with Associated DataAn Authenticated Encryption with Associated Data (AEAD) primitive provides:

▶ Encryption or confidentiality / privacy protection, and▶ Authentication or integrity protection for encrypted and associated data.

Preferably in a single pass over the data.Security protocols such as IPSec and SSL/TLS usually required two processing stepsfor each packet in 1990’s and 200x’s.

▶ Authenticationwas handled with a HMAC (HashMessage Authentication Code).▶ Encryptionwas provided either with block cipher such as 3DES-CBC orAES-CBC or a stream cipher such as RC4.

Hardware implementation of such a twin set-up is cumbersome.Transition to AE has been swift during recent years because of ACM-GCM’s status inSuite B (Classified COTS) andmany attacks like CRIME, LUCKY13, POODLE.

3 / 17

Background: CAESAR project for newAEAD algorithms 2014-2017NIST - sponsored internationalCompetition forAuthenticatedEncryption: Security,Applicability, and Robustness.http://competitions.cr.yp.to/caesar-call.html

▶ Jan 2013 Announced by Dan Bernstein (secretary)▶ Mar 2014 Deadline for first-round submissions (57)▶ May 2014 Deadline for first-round software▶ Aug 2014 DIAC ’14Workshop, UCSB▶ Jan 2015 Second round candidates announced▶ Feb 2015 Second round tweaks (fixes)▶ Feb 2015 Second roundVerilog / VHDL (this talk)▶ Dec 2015 Third round candidates▶ Dec 2016 Final round candidates▶ Dec 2017 Final CAESAR portfolio announcement

4 / 17

Hardware API for Authenticated EncryptionCAESAR candidates came in many shapes and sizes. Here’s a rough breakdown:

8 are clearly based on a SHA3-style Sponge construction.9 are (somehow) constructed from AES components.

19 are AESmodes of operation.21 are based on other design paradigms or are entirely ad hoc.

Wewant consistent testing across second round candidates.Signalling. How to communicate with the hardware ? Can a consistent, high-level“hardware API” be constructed ?Memory access. Some prominent proposals (AEZ and SIV) require two passes over thedata, so APIs in the style of hash functions don’t really work.What to test. Realistic test profiles via operating system and application integration.

5 / 17

System-on-Chip (SoC) Designs

1241.664853.829

314.065

Total global shipments 2014 (million units)

Android Other mobile PC total

Majority of Internet andcommunication devices areAndroid Linux - based tablets

or smart phones.

System-on-Chip (SoC) designs integrate all the necessarycomponents of a computing application on a single chip.Mobile electronics such as (smart) phones and tablets arebuilt on SoCs. Also used in found in Internet-of-Things (IoT)appliances, modems, routers, homemedia, cars, etc.Security of transmitted and stored data is evenmore relevantto mobile devices than to traditional PC systems.Limited CPU performance.Energy efficiency critical.Coprocessors: Audio and video codecs, RF processing, 3Ddisplay rendering, M7/CCPmotion, natural language, etc.→Our evaluation target.

6 / 17

Zynq-7000 FPGAArtix 7 / ARMCortex A9 SoC

2xI2C

2xSPI

2xCAN

2xUART

GPIO

2x SDIOwith DMA

2x USBwith DMA

2x GigEwith DMA

XADC2x ADC, Mux,

Thermal Sensor

EMIOACPGeneral Purpose

AXI PortsHigh Performance

AXI Ports

PCIe Gen21-8 Lanes

SecurityAES, SHA, RSA

Multi-Gigabit TransceiversMulti-Standard I/Os (3.3V & High-Speed 1.8V)

Processing System

Programmable Logic(System Gates, DSP, RAM)

Proc

esso

r I/O

Mux

Flash ControllerNOR, NAND, SRAM, Quad SPI

Multiport DRAM ControllerDDR3, DDR3L, DDR2

DMATimersConfiguration

ARM®CoreSight™Multi-Core Debug and Trace

256 KbyteOn-ChipMemory

SnoopControlUnit

512 Kbyte L2 Cache

General InterruptController

WatchdogTimer

Cortex- A9 MPCore32/32 KB I/D Caches

NEON™DSP/FPUEngine NEONDSP/FPUEngine

Cortex ™- A9 MPCore32/32 KB I/D Caches

AMBA® Interconnect AMBA Interconnect

AMBA Interconnect AMBA Interconnect

On a single chip:▶ Dual-core ARMCortex A9 CPU@650MHz.▶ Artix-7 or Kintex-7 - type FPGA logic fabric.▶ Can run Linux and Android.▶ Realistic target for SoC implementations.▶ Full devkit under $200.

We Study:▶ Hardware assisted implementations vs.software vs. hardware implementations.

▶ FPGA and software footprint, speed, power.▶ Integration in applications e.g. via OpenSSL.

7 / 17

Implementation 1: Keyak (SHA3Keccak) Corek1600_1

k1600

clk

rnd[4:0]

in[1599:0]

out[1599:0]

keccak_rc_i

RTL_ROM

O[63:0]A[4:0]

tp[0]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[0]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[0]1_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[0]2_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[1]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[1]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[1]1_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[1]2_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[2]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[2]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[2]1_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[2]2_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[3]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[3]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[3]1_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[3]2_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[4]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[4]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[4]1_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

tp[4]2_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[0]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[0]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[1]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[1]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[2]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[2]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[3]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[3]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[4]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[4]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[5]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[5]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[6]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[6]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[7]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[7]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[8]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[8]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[9]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[9]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[10]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[10]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[11]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[11]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[12]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[12]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[13]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[13]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[14]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[14]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[15]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[15]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[16]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[16]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[17]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[17]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[18]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[18]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[19]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[19]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[20]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[20]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[21]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t1[21]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[22]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t1[22]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0] t1[23]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[23]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0] t1[24]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t1[24]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t3[0]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t3[0]0_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t3[0]0_i__0

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[1]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[1]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[2]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[2]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[3]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[3]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[4]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[4]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[5]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t3[5]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[6]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t3[6]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[7]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t3[7]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[8]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t3[8]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[9]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t3[9]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[10]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[10]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[11]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[11]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[12]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[12]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[13]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[13]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[14]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[14]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[15]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[15]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[16]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t3[16]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[17]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[17]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[18]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[18]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[19]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[19]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[20]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[20]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[21]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t3[21]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0] t3[22]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t3[22]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]t3[23]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]

t3[23]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]

t3[24]_i

RTL_XOR

O[63:0]I0[63:0]

I1[63:0]t3[24]0_i

RTL_AND

O[63:0]I1[63:0]

I0[63:0]63

:0

Single round of Keccak/Keyak 1600-bitcore permutation drawn with 64-bitdata paths. Mainly XORs visible.

SHA3 algorithm Keccak and Keyak AEADs:▶ Keccak is a sponge hash with a 1600-bit corepermutation, selected as SHA3 in 2012.

▶ Designed by Guido Bertoni, Joan Daemen,Michaël Peeters, and Gilles Van Assche.

▶ Same team proposed the Keyak family ofAEADs that utilize the same permutation in theCAESAR project.

We implemented:▶ The 1600-bit core in Verilog for the Artix-7FPGA core of Zynq 7000.

▶ Themodule can be utilized for both hashing andauthenticated encryption.

8 / 17

Implementation 2: WhirlBob /Whirlpool Core

WhirlBob Core.

StriBob,WhirlBob, andWhirlpool:▶ StriBob is a CAESAR proposal byMarkku-Juhani O. Saarinen.▶ WhirlBob is a 2nd round tweak proposed byM.-J. O. Saarinen andBilly Bob Brumley (submitted to INSCRYPT ’14).

▶ WhirlBob is based on the permutation of the (ISO) StandardizedWhirlpool 3.0 hash by Paulo Barreto and Vincent Rijmen.

We implemented:▶ The 512-bit, 1 cycle per round core permutation in Verilog for theArtix-7 FPGA core of Zynq 7000.

▶ Themodule can be utilized for bothWhirlpool hashing andWhirlBob authenticated encryption.

9 / 17

Hardware Performance

With one “extra” reloading cycle per block, the maximum theoretical throughput ofthese implementations is:

Parameter WhirlBob KeyakRounds 12 12Cycles 13 13Rate (bits) 256 1344Speed (bit/clk) 19.7 103.4

Processing speeds are significantly slower when the Keccak core is used in the24-round SHA3 hashing mode. Speed ranges from 23.0 (SHA3-512) to 47.5(SHA3-224) bits/clock. Whirlpool, in coparison, is slightly faster thatnWhirlBob.

10 / 17

CAESAR Software API vs. Hardware APIA simple C API was specified by the CAESAR secretariat for reference softwareimplementations of the first round candidates.i n t c rypto_aead_encrypt (

u i n t 8 _ t ∗c , u i n t64_ t ∗ c len , / / C ipher tex tconst u i n t 8 _ t ∗m, u in t64_ t mlen , / / Messageconst u i n t 8 _ t ∗ad , u i n t64_ t adlen , / / Assoc ia ted Dataconst u i n t 8 _ t ∗nsec , / / ( Secret IV )const u i n t 8 _ t ∗npub , / / Nonceconst u i n t 8 _ t ∗k ) ; / / Secret Key

Decryption and integrity verification can be performedwith crypto_aead_decrypt(),which has an equivalent interface.SÆHI utilizes the same software API and a simple memory-mapped hardware API. Thesoftware side is essentially a driver suitable for bare metal implementation.

11 / 17

Proposed Baseline Hardware APIOur cryptographic coprocessor has a simple, almost universal memory-mappedinterface. Themodule or hardware PIN interface is the same as for generic single portRAM (with optional interrupt request line).

Signal Dir Purpose DiagramADDR In Address

ADDR

DI

WE

EN

CLK

AEAD

Core

DO

IRQ

DI In DataWriteWE In Write enableEN In Enable/SelectCLK In ClockDO Out Data ReadIRQ Out Interrupt Req.

The signaling between software component and this API is defined by the driver.Faster (DMA, AXI) alternatives can be used – this is just the baseline interface.

12 / 17

Comparing Implementations

Code lines in ourWhirlBob (StriBob) andKeyak reference implementations:

Component WhirlBob KeyakInterface Verilog 99 114Round Verilog 228 129Driver C 60 60API Interface C 261 ≈ 250Total code 639 553

Post synthesis and route utilization withinArtix-7 FPGA fabric of Xilinx Zynq 7010:

Logic WhirlBob KeyakLUTs 3,795 4,574Flip-Flops 1,060 3,237MUXs 90 159Other 1 2Total logic 4,946 7,972

13 / 17

Implementations

We first developed the implementations with a homemade VGAmodule (not utilizingCPU at all). The implementations were then integrated into Xillinux and andmadeaccessible to user space daemons.

14 / 17

What to test?Wehope tomeasure for each candidate:A Area. FPGA Slices or ASICGate Equivalents.W Power. Power consumption (Watts = J/s).R Speed. Ideal throughput (Bytes/Second).

One key goal is to maximizee = R

W .

Note that doubling A for factor 2 parallelismwill approximately double both R andWand ewill remain constant.The same is true for doubling the clock frequency since power consumption is almostlinearly dependent on clock frequency for most (CMOS) circuits.Hence Bytes/Joule is perhaps themost relevant metric for mobile devices.

15 / 17

Integration path for Linux/Android Testing

libcrypto.so

OpenSSL Crypto API

AEAD Plugin “engine”

libmyaead.so

TLS API

libssl.so

SSH API

libssh.so

Browser

application

SSH

application

utilities

cmd tools

ciphers

protocols

apps

user space processes

SÆHI daemonnot available

interprocesscommunication

Cipher

Software SÆHI

Daemons

CPU Core KERNEL

SÆHI

AEAD 1

SÆHI

AEAD 2System-on-Chip ▶ The dominant underlying API for Linux is based on

OpenSSL: libcrypto, libssl. Supported by browsers etc.▶ OpenSSL supports configurable plugin “engines”.▶ After recent bugs (heartbleed), new forks:

▶ Google: BoringSSL.▶ OpenBSD group: LibreSSL (upcoming ressl API).

▶ Since the hardware accelerator is a shared resource,implement as an user space daemon.

▶ Utilize experimental ciphersuite identifiers inapplications and TLS, SSH, IPSec. Plug-in CAESARciphers to replace AES-GCM.

▶ Measure utilization, power, time, throughput withrealistic usage profiles.

16 / 17

Conclusions

▶ CAESAR is a project to find next-generation Authenticated Encryption algorithms.▶ Proposed SÆHI, a simple memory-mapped hardware API for CAESAR ciphers.▶ Realistic hardware target: System-on-Chip with FPGA logic and ARMCortex A9.▶ FPGA implementations of Keyak andWhirlBob algorithms.▶ Integration path for Applications in Android.

next.. a little demo!

17 / 17