Tufts CSnr/toolkit/working/sml/rtl/specifying.pdfAbstract The Zephyr project is part of an eﬀort...

Specifying Instructions’ Semantics Using λ-RTL

(Interim Report)

Norman Ramsey and Jack W. Davidson

Department of Computer Science

University of Virginia

Charlottesville, VA 22903

August 5, 2003

Abstract

The Zephyr project is part of an effort to build a National Compiler Infrastructure,which will support research in compiling techniques and high-performance computing.Compilers work with source code, abstract syntax, intermediate forms, and machineinstructions. By using high-level descriptions of the representations and semantics ofthese forms, we expect to be able to create compiler components that will be usable withdifferent source languages, front ends, and target machines.

To help deal with multiple machines, we are developing a family of Computer SystemsDescription Languages (CSDL) to describe properties that are relevant to the construc-tion of compilers and other systems software. The languages describe properties of amachine’s instructions or its mutable state, or both. Of particular interest is the de-scription of the semantics of instructions, i.e., their effects on the state of the machine.This report describes our design of λ-RTL, a CSDL language for specifying instructions’semantics.

We describe the effects of instructions using register transfer lists (RTLs). A registertransfer list is a collection of assignments to locations, which represent registers, memory,and all other mutable state. We prescribe a form of RTLs that makes it explicit how tocompute the values assigned and on what state the computation depends. The form alsomakes byte order explicit and provides for instructions whose effects may be undefined.

Because our form of RTLs contains so much information, it is convenient for use bytools, but it would be tedious to write RTLs by hand. λ-RTL, which is based on theλ-calculus and on register transfer lists, is a metalanguage designed to make it easier forpeople to write RTLs and to associate them with machine instructions. It enables usto omit substantial information from hand-written specifications; the λ-RTL translatorinfers the missing information and puts the resulting RTLs into canonical form. λ-RTLalso provides a “grouping” construct designed to help specify large groups of similarinstructions.

This report presents a short overview of λ-RTL, followed by examples. The examplesinclude a description of the SPARC and excerpts from a description of the Pentium. Thereport also includes λ-RTL definitions of a library of basic operators, which we believewill be useful for describing a wide variety of machines.

Contents

1 Overview of CSDL 1Machine descriptions for machine-level tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Core aspects for CSDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Combining instructions and storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Languages in the CSDL family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Register Transfer Lists 6Form of RTLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Types in RTLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7RTL languages and translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Interpretations of RTLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Using λ-RTL to specify register transfer lists 10Design considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Restrictions eased in λ-RTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Implicit fetches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Implicit aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Overview of λ-RTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Top-level declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Inner declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Lexical conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18The initial basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Differences between λ-RTL and Standard ML . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 A Library of standard RTL operators 204.1 Building RTLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

iii

iv CONTENTS

4.3 Integer arithmetic and comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.4 Logical operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.5 Undefined effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.6 Floating-point arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 The meanings of the standard RTL operators 285.1 Known algebraic laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2 More SPARC description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

More locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37More Loads and Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38More integer ALU instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39More control flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42State Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Miscellaneous instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Floating point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6 Describing the Intel Pentium 456.1 Storage spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Effective addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Load effective address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.2 Real instructions, for serious . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

7 Describing the MIPS R?000 607.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607.2 Storage spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Integer Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Integer Unit control/status registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Aggregating cells in storage spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

7.3 MIPS instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Types of MIPS operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62MIPS addresses and operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Load instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Multiply and Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Logical Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Shift Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Set instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Load upper immediate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Jump and branch instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.4 Floating Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

CONTENTS v

Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69Moving to and from the FPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

7.5 Coprocessor instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.6 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.7 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.8 What this specification does not do... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717.9 SLED specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

8 Describing the Alpha 758.1 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75Explicit Size Depiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Integer Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Floating Point Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Special-Purpose Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Program Counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

8.2 ALPHA instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Types of ALPHA operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77ALPHA addresses and operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78A Useful Composition Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78Load Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Logical Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Add instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8.3 Lambda RTL File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

9 Describing the PDP-8 829.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829.2 Storage Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839.4 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

10 Describing MMIX 8710.1 Storage spaces: memory and registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8710.2 MMIX instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Trips and traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Integer overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90Loading and storing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

10.3 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

vi CONTENTS

11 Describing the PowerPC (by Matthew Fluet) 9211.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9211.2 Storage spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94General-purpose registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Floating-point registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Condition register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Floating-point status and control register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96XER register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96Link register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Count register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Instruction Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Time base registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98Bogus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

11.3 Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9811.4 Instruction semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100Integer arithmetic instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102Integer compare instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107Integer logical instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108Integer rotate instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113Integer shift instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114Floating-point arithmetic instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115Floating-point multiply-add instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116Floating-point rounding and conversion instructions . . . . . . . . . . . . . . . . . . . . . . . 116Floating-point compare instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116Floating-point status and control register instructions . . . . . . . . . . . . . . . . . . . . . . 116Integer load instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117Integer store instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119Integer load and store multiple instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122Integer load and store string instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Memory synchronization instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Floating-point load instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Floating-point store instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Floating-point move instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124Branch instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124Condition register logical instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126System linkage instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126Trap instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126Processor control instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126Cache management instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128Segment register manipulation instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128Lookaside buffer management instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128External control instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

CONTENTS vii

12 Index 13012.1 List of chunks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13012.2 Identifier index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

References 137

Chapter 1

Overview of CSDL

Machine descriptionsfor machine-level tools

Special-purpose and general-purpose computers aredesigned at a rapid pace, and software tools andtechnology don’t always keep up. The tools we needinclude not only the compilers, assemblers, link-ers, and debuggers familiar to all programmers, butalso less familiar tools, like profilers, tracers, test-coverage analyzers, and general code-modificationtools. Most of these tools must work with machineinstructions.

Machine-dependent detail makes it hard to buildmachine-level tools. For many years, compilershave used machine descriptions to capture such de-tail. Machine descriptions isolate target-specific in-formation so that it can easily be examined andchanged. Despite their successful use in compilers,machine descriptions are seldom used to build othersystems software. Descriptions used in compilersare hard to reuse because they typically combineinformation about the target machine with infor-mation about the compiler. For example, “machinedescriptions” written using tools like BEG (Emmel-mann, Schroer, and Landwehr 1989) and BURG(Fraser, Henry, and Proebsting 1992) are actuallydescriptions of code generators, and they dependnot only on the target machine but also on a partic-ular intermediate language. In extreme cases (e.g.,gcc’s md files), the description formalism itself de-pends on the compiler.

We believe we can simplify construction of com-pilers and other machine-level tools by developingdescription techniques that separate machine prop-erties from compiler concerns. We expect this capa-bility to be useful in a National Compiler Infrastruc-ture because it will lead to a back-end infrastructurethat should be usable not only with many differenttarget machines, but also with different source lan-guages, front ends, and intermediate languages.

Some existing languages for machine description,like VHDL (Lipsett, Schaefer, and Ussery 1993)and Verilog (Thomas and Moorby 1995) do describeonly properties of the machine, but they are at toolow a level, describing implementations as much asarchitectures. These description languages requiretoo much detail that is not needed to build systemssoftware.

We are designing a family of Computer SystemsDescription Languages (CSDL) to support a varietyof machine-level tools while remaining independentof any one in particular. We have several goals forCSDL:

• Descriptions should be composed from simplecomponents. Each component should describe,as much as possible, a single aspect of thetarget machine. Such aspects might includecalling conventions, representations of instruc-tions, pipeline implementations, memory hier-archy, or other properties. It should not alwaysbe necessary to describe aspects completely.For example, most compilers use only a sub-set of the instructions available on a particular

1

2 CHAPTER 1. OVERVIEW OF CSDL

machine, and a compiler writer should be re-quired to describe only that subset.

• An application writer should be able to deriveuseful tools not only when the components de-scribing individual aspects are incomplete, butalso when not all aspects are described. Anapplication writer should not have to describeaspects of the machine unless the informationis needed to build his application. For exam-ple, an application writer working entirely atthe assembly-language level or above shouldnot have to describe binary representations ofinstructions. Someone writing a garbage col-lector might need to describe the stack-framelayouts determined by a calling convention, buthe should not have to describe the instruc-tions used in the calling sequences that estab-lish those layouts.

• Components, especially those describing “coreaspects,” should be reusable. For example,writers of all specifications should benefit fromhaving a common formalization of what itmeans to be a “SPARC instruction supportedby the bare hardware.”

CSDL forms a family of languages because all fam-ily members have the same view of two core aspectsof machines: instructions and state. This report ex-plains the CSDL view of these core aspects, and itpresents a rationale for and examples of the mostimportant member of the CSDL family, which de-scribes the semantics of machine instructions.

Core aspects for CSDL

To identify core aspects to be used throughoutthe CSDL family, we examined descriptions usedto help retarget a variety of systems-level tools.These tools included an optimizer (Benitez andDavidson 1988), a debugger (Ramsey and Han-son 1992), an instruction scheduler (Proebsting andFraser 1994), a call-sequence generator (Bailey andDavidson 1995), a linker (Fernandez 1995), and anexecutable editor (Larus and Schnarr 1995). Cur-sory inspection showed no single common aspect of

a machine used in descriptions for all of these tools,but a closer look revealed that all the descriptionsrefer either to the machine’s instruction set or toits storage locations, and some to both. For ex-ample, the specifications used by the scheduler andlinker refer only to the machine’s instructions andthe properties thereof. The specifications used inthe call-sequence generator and in the debugger’sstack walker refer only to storage, explaining in de-tail how values move between registers and mem-ory. The specifications used in the optimizer andthe executable editor refer both to instructions andto storage, and in particular, to how instructionschange the contents of storage. From these obser-vations, we have chosen to require that languages inthe CSDL family refer to instructions, storage, orboth, and that they use the models of instructionsand storage presented below.

Instructions

The CSDL model of an instruction set is a list ofinstructions together with some identifying infor-mation about their operands. Although instruc-tion names in assembly languages are typically over-loaded, CSDL requires instructions to have uniquenames, because tools often need uniquely namedcode for each instruction in an instruction set. Forexample, an assembler might use a unique C pro-cedure to encode each instruction, or an executableeditor might use a unique element of a C union torepresent an instance of each instruction.

An individual instruction is viewed as a functionor constructor that, when applied to operands, pro-duces something interesting: a binary representa-tion, an assembly-language string, a semantics, etc.Instruction descriptions that use the CSDL core willinclude the names and types of the operands of eachinstruction. Types will include integers of varioussizes, but it will also be possible to introduce newtypes to define such machine-dependent conceptsas effective addresses. Values of these new typesmay be created by applying suitable constructors;for example, Chapter 6 shows how a Pentium ef-fective address may be formed by applying any of

COMBINING INSTRUCTIONS AND STORAGE 3

9 constructors, each one representing a different ad-dressing mode.

The structure defined by CSDL constructors andtheir operands can be viewed as a discriminated-union type. It is also equivalent to an ASDL gram-mar (Wang et al. 1997), without recursion, in whichthe start symbol is “instruction,” other nontermi-nals refer to machine-level concepts like “effectiveaddress” or “integer-instruction operand,” and ter-minal symbols are defined by integers or addresses.This structure is a simplification of the “construc-tor” from the Specification Language for Encodingand Decoding (SLED) of the New Jersey Machine-Code Toolkit (Ramsey and Fernandez 1997). Itis determined by the machine, independent of anytool, so it should be useful in any specification lan-guage that deals with individual machine instruc-tions. For example, the nML machine-descriptionlanguage (Fauth, Praet, and Freericks 1995) usesthis structure, although nML is otherwise quite dif-ferent from SLED.

Properties of instructions

Experience with the New Jersey Machine-CodeToolkit shows that many applications can be builtfrom specifications that discuss only instructionsand their properties, with no reference to storage.Such applications include assemblers and disassem-blers, as well as code generators that work by pat-tern matching on the names of instructions.

We specify properties of instructions in a compo-sitional style, so instructions’ properties are func-tions of the properties of their operands. We formal-ize that style using attribute grammars. For exam-ple, assembly-language representations of instruc-tions can be computed as synthesized attributes,where the attributes are strings. Binary representa-tions can also be computed as attributes, where theattributes are SLED “patterns.” Attributes readilysupport machines like the 68000 family, in whichthe representations of effective addresses depend oncontext.

Storage

The CSDL model of a storage space is a sequenceof mutable cells. A storage space is like an array;cells are all the same size, and they are indexedby integers. For example, a typical microprocessorhas a memory made up of 8-bit cells (bytes) and aregister file made up of 32-bit cells. The number ofcells in a storage space may be left unspecified.

The state of a machine can be described as thecontents of a collection of storage spaces. We usestorage spaces to model main memory, general-purpose registers, special-purpose registers, condi-tion codes, and so on.

Experience with CCL, a Calling Convention Lan-guage (Bailey and Davidson 1995), shows that ap-plications can be built from specifications that dis-cuss only storage, with no reference to instructions.For example, calling conventions can be describedby discussing the placement of parameters in stor-age cells and the the effects of calls and returnson storage. From a CCL description one can gen-erate procedure prologs and epilogs in a compiler,and one can also generate code to test compilers’implementations of calling conventions (Bailey andDavidson 1996). One might be able to derive stackwalkers or exception-handling code from similar de-scriptions. A garbage collector may need to knowwhich locations in storage can contain pointers; thisis a property of an application, not of a machine,but it can be described using the CSDL storagecore. (We want to describe application propertiesas well as machine properties, but we want to keepthem separate.)

Languages in the CSDL family may refer to in-dividual locations. Ways of writing locations mayvary, but each one must resolve to a name of a stor-age space and an integer offset identifying a cellwithin that storage space.

Combining instructions

and storage

An architecture manual tells programmers whatconstitutes the state of the processor, and it lists

4 CHAPTER 1. OVERVIEW OF CSDL

instructions and their operands, but the most im-portant thing it does is explain the semantics ofeach instruction in terms of that instruction’s ef-fects on the state of the processor. Many applica-tions can be built based on this information, butsome require more detail than others. For example,

• Information about control flow is enough tobuild control-flow graphs.

• Information about calling conventions may beenough to recognize procedure calls and re-turns.

• Information about locations read and written isenough to build data-flow graphs. Such graphs,together with the ability to match individualinstructions, may suffice to build code-editingtools like EEL (Larus and Schnarr 1995) orATOM (Srivastava and Eustace 1994).

• Information about register-transfer semanticsis enough to build code improvers in the style ofPO (Davidson and Fraser 1980), vpo (Benitezand Davidson 1988), and gcc (Stallman 1992).These code improvers work by pattern match-ing, so they need not know what all of theregister-transfer operators do. They do, how-ever, need at least a partial semantics, to per-form strength reduction, constant folding, andsimilar transformations.

• To build a code generator, one needs to knowmore about the operators used in the regis-ter transfers. In particular, one needs to knowenough so that one can find a sequence of in-structions to implement each operation in thecompiler’s intermediate code. It is sufficientto know that the intermediate code and theinstructions compute the same operation; oneneed not know exactly what the operations doto the bits.

• To build an emulator like SPIM (Larus 1990)or a binary translator like FX!32 (Thomp-son 1996), one needs enough information aboutthe operations in the register transfers to inter-pret the the effect of each register transfer oneach bit of the processor’s state.

We believe that these applications and more can beserved by providing register-transfer semantics forinstructions. The effect of a particular instructioncan be specified as a register-transfer list (RTL),which modifies storage cells. As with other proper-ties of instructions, the RTL is computed using anattribute grammar.

In principle, a single register-transfer descriptioncould support all of the different uses above. Asingle RTL could be given different interpretations,depending on its intended use. We believe we cansupport this mode of specification by prescribinga rigid form for RTLs, while supplying suitable ab-straction mechanisms for use within that form. Ab-straction mechanisms support our goals of enablingapplication writers to leave irrelevant facts unspeci-fied. For example, we want to make it easy to writean RTL that means “register rd is assigned an un-specified function of registers rs and rt.”

Languages in the CSDL family

We consider SLED, for specifying representationsof instructions (Ramsey and Fernandez 1997), andCCL, for specifying calling conventions (Bailey andDavidson 1995), to be the first languages in theCSDL family. We are developing a new language,λ-RTL, for specifying instruction semantics. We ex-pect that CSDL will expand to include languagesfor specifying properties of memory hierarchies andof pipelines. Indeed, the simple functional-resourcelanguages of Proebsting and Fraser (1994) and Balaand Rubin (1995) fit nicely into the CSDL frame-work.

This report focuses on λ-RTL. The report doesnot give a specification or definition of λ-RTL, be-cause λ-RTL is not sufficiently developed for a spec-ification or definition to be worthwhile. Instead,this report presents some essential properties of λ-RTL, and it gives examples of machine specifica-tions using the current, imperfect version. Chap-ter 2 describes the form of register transfer lists usedin λ-RTL. Chapter 3 presents λ-RTL and discussesits translation into RTLs. Chapter 4 describe somebasic content which fits into that form, and which

LANGUAGES IN THE CSDL FAMILY 5

we believe will be useful for describing many ma-chines. Chapter ?? presents a λ-RTL descriptionsof the SPARC processor. Chapter 6 presents de-scriptions of a few aspects of the Pentium processorsthat have no analogs on the SPARC, like addressingmodes whose meanings depend on context.

Chapter 2

Register Transfer Lists

Computer scientists have used register transfersin many different forms. For λ-RTL, we have chosena form designed for use by tools, not by people. Wehave therefore insisted that as much information aspossible be explicit in the RTL itself. Under ourcurrent plan,

• RTLs are represented as trees.

• All operators are fully disambiguated, e.g., asto type and size.

• There is no aliasing of locations.

• Fetches are explicit, as are changes in the sizeor type of data.

• Stores are annotated with the size of the datastored.

• Explicit tree nodes specify byte order. Moregenerally, they specify how to transfer data be-tween storage spaces of different granularity.

• RTL is a typed representation. We plan to gen-erate RTL type checkers from CSDL specifi-cations. Optimizations and other transforma-tions that are intended to preserve semanticsshould also preserve the property of being welltyped. Typing is a good way to catch bugs inoptimizers (Morrisett 1995).

The form of RTLs proposed here may be suitablenot just for specification, but also for use in the im-plementations of compilers, binary translators, andother tools.

〈ASDL specification of the form of RTLs〉≡ty = (int) -- size of a value, in bits

exp = CONST (const)

| FETCH (location, ty)

| APP (operator, exp*)

location = AGG(aggregation, cell)

cell = CELL(space, exp)

effect = STORE(location dst, exp src, ty)

| KILL(location)

| KILLALL(space)

guarded = GUARD(exp, effect)

rtl = RTL (guarded*)

Figure 2.1: ASDL specification of the form of RTLs

Form of RTLs

Figure 2.1 uses the Zephyr Abstract Syntax De-scription Language (Wang et al. 1997) to show theform of RTLs. A register transfer list is a list ofguarded effects. Each effect represents the transferof a value into a storage location, i.e., a store opera-tion. The transfer takes place only if the guard (anexpression) evaluates to true. Locations may besingle cells or aggregates of consecutive cells withina storage space. Values are computed by expres-sions without side effects, which simplifies analy-sis and transformation. These expressions includeinteger constants, fetches from locations, and ap-plications of RTL operators to lists of expressions.Effects in a list take place simultaneously, as in Di-jkstra’s multiple-assignment statement, so one RTLrepresents one change of state. This decision makes

6

TYPES IN RTLS 7

it possible to specify swap instructions without hav-ing to introduce temporary locations.

Not every effect is a true assignment. A kill effectchanges the value in a storage location in an unde-fined way. Kill effects are needed to model sucharchitectural specifications as “the effect of a logi-cal instruction on the AF flag is undefined.”

The “kill all” effect in Figure 2.1 kills all of thecells in the given storage space. It is needed to givea semantics to certain phrases of λ-RTL. We haven’tdemonstrated a need for it yet, but it may be usefulto model stores into arbitrary memory locations.

As an example of a typical RTL, consider aSPARC load instruction using the displacement ad-dressing mode, written in the SPARC assembly lan-guage as

〈sample SPARC instruction〉≡ld [%sp-12], %i0

Although we would not want to specify just a sin-gle instance of a single instruction, the effect of thisload instruction might be notated in λ-RTL as fol-lows:1

〈λ-RTL for sample instruction〉≡$r[24] <-- $m[$r[14]+sx(~12)]

because the stack pointer is register 14 and registeri0 is register 24. The corresponding RTL is muchmore verbose, with the sizes of all quantities iden-tified explicitly, as a fully disambiguated tree:

STORE #32

AGG B #32 #32

CELL ’r’

CONST #5

24

FETCH #32

AGG B #8 #32

CELL ’m’

ADD #32

FETCH #32

AGG B #32 #32

CELL ’r’

CONST #5

14

SX #13 #32

CONST #13

-12

The various constants labeled with hash marks,like #32, indicate the number of bits in arguments,

1The ~ in ~12 is a unary minus.

results, or data being transferred. Such constantswill fit into a generalization of the Hindley-Milnertype system (Milner 1978).

Figure 2.2 shows the operators used in this tree.The left child of the STORE is a subtree represent-ing the location consisting of the single register i0,which is register 24. The right-hand child representsa 32-bit word (a big-endian aggregration of fourbytes) fetched from memory at the address givenby the subtree rooted at ADD. This node adds thecontents of the stack pointer (register 14) to theconstant −12. The constant is a 13-bit constant,and the SX operator sign-extends it to 32 bits, so itcan be added to the stack pointer.

Types in RTLs

RTL is a typed format. The type system includesthe following types:

#n bits A value that is n bits wide.

#n loc A location containing an n-bit value.

#n cell One of a sequence of n-bit storage cells,which can be aggregated together tomake a larger location, as by the AGG

B nodes in the example tree.

bool A Boolean condition.

effect A side effect on storage.Figure 2.2 uses these types to show the types ofthe operators used above. We have extended Mil-ner’s type inference to this system, so that thosewriting specifications in λ-RTL can omit types andwidths. Unlike in ML, type inference alone does notguarantee that terms make sense; in general, it willbe necessary to check additional constraints. Forexample, in the RTL shown above, it would be nec-essary to check that the signed integer −12 can berepresented using 13 bits, and that 32 is a multipleof both 8 and 32. We are currently investigatingways of formalizing and checking such constraints.

RTL languages and translation

Although we have prescribed the form of RTLs, wecan’t write any RTLs until we choose a set of lo-

8 CHAPTER 2. REGISTER TRANSFER LISTS

STORE : ∀#n.#n loc× #n bits→ effect Store an n-bit value in a given location. The type indicates thatfor any n, STORE #n takes an n-bit location and an n-bit valueand produces an effect.

FETCH : ∀#n.#n loc→ #n bits For any n, FETCH #n takes an n-bit location and returns the n-bitvalue stored in that location.

AGG B : ∀#n.∀#w.#n cell→ #w loc For any n and w, AGG B #n #w aggregates an integral number ofn-bit cells into a w-bit location, making the first cell the mostsignificant part of the new location, i.e., using big-endian byteorder. w must be a multiple of n. (w and n are mnemonic forwide and narrow.)

CELL ’m’ : #32 bits→ #8 cell Given a 32-bit address, CELL ’m’ returns the 8-bit cell in memoryreferred to by that address.

CELL ’r’ : #5 bits→ #32 cell Given a 5-bit register number, CELL ’r’ returns the correspond-ing 32-bit register (a mutable cell).

ADD : ∀#n.#n bits× #n bits→ #n bits For any n, ADD #n takes two n-bit values and returns their n-bitsum. Carry and overflow are ignored.

SX : ∀#n.∀#w.#n bits→ #w bits For any n and w, SX #n #w takes an n-bit value, interprets it as atwo’s-complement signed integer, and sign-extends it to producea w-bit representation of the same value. w must be greaterthan n.

CONST : ∀#n.〈constant〉 → #n bits For any n, CONST #n k represents the n-bit constant k. k mustbe representable in n bits. The same k could be used with dif-ferent ns.

Figure 2.2: Some RTL operators and their types

cations and a set of RTL operators. These choicesdefine an RTL language. Any specification writtenin λ-RTL can introduce new storage spaces (andtherefore new locations) and new RTL operators,determining an RTL language for use in that spec-ification.

For use during compilation, we restrict RTLs toa subset of a full RTL language. Each RTL used ina back end based on the vpo optimizer (Benitez andDavidson 1988) must satisfy the vpo invariant forthe target machine. An RTL satisfies that invari-ant if and only if it can be represented in a singleinstruction on the target machine. All RTLs manip-ulated by vpo satisfy this invariant, so vpo can stopand emit machine code at any time. We use thename X-RTLs refer to the set of RTLs satisfyingthe vpo invariant for machine X .

Interpretations of RTLs

One reason we restrict the form of RTLs is to limittheir possible meanings or interpretations. For ex-ample, access to the mutable state of a machineis available only through the fetch and store oper-ations built into the RTL form, so we can easilytell what state is changed by an RTL and how thatchange depends on the previous state. Researchersand tools working with our form of RTLs have free-dom to define interpretations of only two parts ofthe RTL form: aggregations and operators.

RTL aggregations specify byte order. More gen-erally, aggregations make it possible to write anRTL that stores a w-bit value in (or fetches a w-bit value from) k consecutive n-bit locations, pro-vided that w = kn. Such an aggregation has type#n cell → #w loc, and its interpretation must bea bijection between a single w-bit value and k n-

INTERPRETATIONS OF RTLS 9

bit values. Moreover, when w = n, the bijectionmust be the identity function. Storing uses the bi-jection, and fetching uses its inverse, making it pos-sible to combine RTLs using forward substitution.Little-endian and big-endian aggregations will bebuilt into λ-RTL, as will an “identity aggregation”that is defined only when w = n. We imagine thatusers could define other aggregations by giving sys-tems of equations, as in Ramsey (1996).

RTL operators must be interpreted as pure func-tions on bit vectors. Consequently, the result of ap-plying an RTL operator must not depend on proces-sor state; the operator must give the same answerevery time. Chapter 4 presents a collection of RTLoperators that we expect can be used to describemany different machines. We don’t expect this setto be complete; on the contrary, we expect thatany machine description written in λ-RTL will in-troduce a handful of new RTL operators which willbe unique to that machine.

Chapter 3

Using λ-RTL to specify registertransfer lists

Bare RTLs are both spartan and verbose. Ex-pressions do not include if-then-else, so condition-als must be represented by using guards on effects.There is no expression meaning “undefined;” as-signments of undefined values must be specified us-ing a kill effect. These restrictions, and the require-ment that operations be fully disambiguated, makeRTLs a form that is good for manipulation by toolsbut not so good for writing specifications.

λ-RTL is a metalanguage that enables specifica-tion writers to attach RTL trees to SLED-like con-structors without having to write everything explic-itly. λ-RTL is a higher-order, strongly typed, poly-morphic, pure functional language based largely onStandard ML (Milner, Tofte, and Harper 1990). Avariation on the Hindley-Milner type system makesit possible to write flexible, type-safe functionswithout having to write types explicitly.

Design considerations

We have drawn on our experience with SLED(Ramsey and Fernandez 1997) to identify mech-anisms and properties that are desirable in anyCSDL language, including λ-RTL. These include:

• Use of constructors to provide an abstract viewof the machine’s instruction set, and use of at-tributes to specify properties of instructions.These mechanisms are close kin to attribute

grammars and to the denotational approach tosemantics, both of which have long histories ofutility.

• Use of default, unnamed attributes (for exam-ple, binary representation or default construc-tor type), sometimes supported by special syn-tax. By providing a default case that does nothave to be named, we can unclutter specifi-cations. For example, when we refer to anoperand of type Address, the location desig-nated by that effective address should be thedefault meaning.

• Use of a specialized sublanguage for definingattributes. SLED uses patterns to specifybinary representations; λ-RTL uses the pre-scribed forms of RTLs, to be augmented by asophisticated type system.

• Language constructs that help eliminate repe-tition by permitting simultaneous descriptionof many instructions at once. Such constructsreduce the number of opportunities for er-rors, and they help keep specifications conciseand readable. λ-RTL includes two such con-structs: first class functions, and a generaliza-tion of SLED’s grouping construct, which itselfis based on Icon generators (Griswold 1982).

• Formal notation that matches informal descrip-tions found in architecture manuals. The ar-

10

RESTRICTIONS EASED IN λ-RTL 11

chitecture manual is the standard model of themachine that is shared by all tool builders, sothis close match not only helps people readspecifications, it helps them write correct spec-ifications. λ-RTL enables users to define theirown infix operators, and Chapters 4, ??, and 6show how we have used this capability to createan ISP-like notation for RTLs.

We expect to be able to analyze λ-RTL specifica-tions for internal consistency and “plausibility.” Forexample, it should be possible to identify cases inwhich a register number is mistakenly used as animmediate value instead of as an offset into the stor-age space modeling the register file.

Restrictions eased in λ-RTL

λ-RTL descriptions are easier to write than bareRTLs in three ways: grouping and higher-orderfunctions can help eliminate repetition, the typesystem largely eliminates the need to write sizesexplicitly, and λ-RTL relaxes several of the restric-tions on the form of RTLs. In particular,

• In λ-RTL, it is rarely necessary to write fetchesexplicitly.

• λ-RTL gives the illusion that bit slices (sub-fields) are locations that can be assigned to.

• λ-RTL gives the illusion that aggregates of cellsare locations that can be assigned to, and itis not usually necessary to write aggregationsexplicitly.

Implicit fetches

Most programmers are used to writing x := x + 1and having the x on the left denote a location whilethe x on the right denotes the value stored in thatlocation. Typical compilers identify “lvalue con-texts” and “rvalue” contexts and automatically in-sert fetches in rvalue contexts. We do the same inλ-RTL, but instead of using syntax to identify thecontexts, we use types.

We use types and not syntax because λ-RTL hasno special syntax for writing RTL assignments. In-stead of an assignment syntax, λ-RTL provides abuilt-in store function that accepts a location anda value and produces an effect. Any user-definedfunction might result in a call to the built-in store,so we have to recognize the right and left contextsby their types. Thus, if a location is used where avalue is expected, we insert a fetch.

λ-RTL does almost everything with functions,not syntax, so the writer of a specification can nor-mally redefine the meaning of a notation by defin-ing a new function with the same name. Thisstrategy doesn’t work with implicit fetches becausethere is no explicit notation associated with a fetch.We want users to be able to control the meaningsof these fetches, however, because many machineshave resources that are almost, but not quite, se-quences of mutable cells. For example, SPARC reg-isters can be viewed as a collection of 32 mutablecells, except that register 0 is not mutable and al-ways contains 0. We would like users to be able todefine special meanings for “fetch from register 0”and “store into register 0” so the rest of the specifi-cation can pretend that the registers are an ordinarystorage space. We do so by permitting users to at-tach fetch and store methods to each storage space.Methods not given explicitly default to the stan-dard fetch and store operators. Sample fetch andstore methods for the SPARC are shown on page ??.If we wanted, we could use fetch and store meth-ods to describe the true implementation of SPARCregisters, in which “registers” 8 through 31 denotelocations accessed indirectly through the register-window pointer (CWP).

Slices

Many machine instructions manipulate fragmentsof a word stored in a mutable cell. For example,some machines represent condition codes as indi-vidual bits within a program status word. Manymachines have instructions that, for example, as-sign to the least-significant 8 bits of a 32-bit register.To make it easy to specify such instructions, λ-RTLcreates the illusion that a sub-range or “slice” of a

12 CHAPTER 3. USING λ-RTL TO SPECIFY REGISTER TRANSFER LISTS

cell can be a location in its own right, one that isa suitable argument for a fetch or store operation.This illusion helps keep machine descriptions read-able; for example, an effect that sets the SPARCoverflow bit simply assigns to it, hiding the factthat it is buried in a program status word that hasto be fetched, modified, and stored.

λ-RTL uses a special syntax for slices, becausewe hope eventually to overload the notation to refereither to locations or to values. Until we get thisoverloading figured out, however, uses have to bedistinguished by a ty suffix that must be either locor bits. Examples of the syntax include

x@ty[k] Bit k of x. By default,bit 0 is the least signif-icant bit. A future ver-sion of λ-RTL may makeit possible to change thenumbering.

x@ty[k1..k2] Bits k1 through k2 of x,inclusive.

x@ty[k bits at e] A k-bit slice of x, with theleast significant bit at e.

k’s denote integer constants, e’s denote expressions,and x’s may denote values or locations. In all casesthe size of the slice is known statically, so its typecan be computed automatically. We use the Greekletter σ to stand for any of these slice specifications.

Given a slice specification σ, SLICE LOCσ mapslocations to locations and SLICE BITSσ maps valuesto values. The illusion that slices can be applied tolocations is implemented by rewriting, according tothe following rules:

FETCH(SLICE LOCσ l) 7→ SLICE BITSσ(FETCH l)

SLICE LOCσ l ← n 7→ l← bitInsertσ(FETCH l, n)

where l is a location and n is a value. After therewriting, all slices operate on bit vectors, and allfetches and stores operate on true locations. Invoca-tion of user-defined fetch and store methods takesplace after the rewriting of slices. This orderingmakes it possible to use fetch and store methods todefine cell-like abstractions, while ensuring that the

meaning of slicing is always consistent with respectto such abstractions.

Implicit aggregation

λ-RTL provides the special syntax $space[offset]for references to mutable cells. The offset can bean arbitrary expression, but the space must be a lit-eral name, so that λ-RTL can identify the storagespace and use appropriate fetch and store methods.To make this cell a location, λ-RTL applies an ag-gregation, which is also associated with the storagespace as a method. The default method is the iden-tity aggregation, which permits only “aggregates”of a single cell.

When little-endian, big-endian, or other aggre-gations are used, λ-RTL will infer the size of theaggregate. We have not yet determined the rulesto be used for the inference, but we hope at leastto support the automatic inference that four 8-bitbytes must be aggregated to form a value that goesinto a 32-bit register.

Overview of λ-RTL

From the point of view of external tools, the outputof a λ-RTL specification is a set of bindings of valuesto attributes of constructors. The part of λ-RTLused to specify values is a pure, typed functionallanguage without recursion. Many features of λ-RTL are inspired by Standard ML.

To describe syntax, we use an EBNF gram-mar with standard metasymbols for

{sequences

},

[optional constructs

], and

(alternative

∣∣ choices

).

We use large metasymbols to help distinguish themfrom literals. Terminal symbols given literally ap-pear in typewriter font. Other terminal symbolsand all nonterminals appear in italic font. Excerptsfrom the grammar begin with the name of a non-terminal followed by the ⇒ (“produces”) symbol.When referring to terminal and nonterminals, wesometimes precede the name of the symbol witha qualifier; for example argument-pattern refers tothe nonterminal symbol pattern used as an argu-

OVERVIEW OF λ-RTL 13

ment, and module-name refers to the terminal sym-bol name, which is the name of a module.

A λ-RTL specification is a sequence of modules.λ-RTL uses modules to organize the name space ofvalues. A value x declared in a module S is visiblewithin S as just x, but outside S it can only bereferred to as S.x. Modules are written

module ⇒

module module-name is{import

} {declaration

}end

Modules may include declarations of nested mod-ules as well as values.

At top level, modules provide a kind of separatecompilation. λ-RTL requires explicit import direc-tives for references between top-level modules.

import ⇒ import(module-name

∣∣ [

{module-name

}])

| from module-id import(name

∣∣ [

{name

}])

module-id ⇒ module-name

| module-id . module-name

For example, before module A can use values de-fined in module B, module A must include the di-rective import B. This directive makes B visi-ble within A, as well as qualified names beginningwith B. The from. . . import directive imports iden-tifiers directly into module A. Not only can theseidentifiers be used without qualification; they carrythe same fixity (infix nature, operator precedence,and associativity) that they had in module B. Forexample, if the module StdOperators defines infixidentifiers + and - to denote addition and subtrac-tion, then if module A includes the directive

from StdOperators import [+ -]

then these identifiers are also infix in module A,where it is legal to write, e.g., the function defi-nition fun inc x is x + 1.

Except as changed by an import directive, thescopes of names are limited to the modules in whichthey are defined. Values and modules share a sin-gle name space within λ-RTL. Constructors andstorage spaces occupy their own individual namespaces, which are flat. RTL operators occupy aseparate, flat name space, but within λ-RTL theyare not distinguished from other kinds of values.Eventually, the λ-RTL translator will check that

the same name is not used to declare distinct RTLoperators in two different modules.

Within a module, declarations are either top-levelor inner declarations. Inner declarations may ap-pear anywhere, but top-level declarations may ap-pear only outside of function and value definitions.Top-level declarations appear only in contexts thatguarantee they will be evaluated exactly once.

Top-level declarations

λ-RTL’s top-level declarations introduce nestedmodules, locations, storage spaces, RTL operators,or bindings to attributes of constructors.

Storage spaces

Storage spaces are introduced by storage; theyname the storage space, give its granularity (widthof an individual cell) and size (number of cells), andpossibly fetch, store, or aggregation methods.

declaration ⇒

storage character-literal{storage-alias

}

is{storage-property

}

storage-alias ⇒ alias name

storage-property ⇒

storage-method using expression

|[cell-count

]cells of cell-width bits

| called documentation-string

storage-method ⇒ fetch∣∣ store

∣∣ aggregate

RTL operators

RTL operators are introduced by:

declaration ⇒

rtlop(operator-name

∣∣ [

{operator-name

}]): type

Figure 3.1 shows λ-RTL types. For example,

rtlop bit : bool -> #1 bits

rtlop [= <>] : #n bits * #n bits -> bool

introduces three new RTL operators. The #n bits

in the definition of = and <> introduces parametricpolymorphism; the equality and inequality opera-tors can be applied to values of any size. This sizeis normally implicit in λ-RTL, but it is made ex-plicit in the translated RTL form.


In the current implementation of λ-RTL, rtlopintroduces an arbitrary, abstract value. Because itis the only abstraction mechanism available, it issometimes put to odd uses.

Attributes

Attribute bindings are introduced as follows:

declaration ⇒

attribute of{constructor is expression

}

attribute ⇒

default attribute

| attribute(attribute-name

∣∣ default

)

constructor ⇒

opcode ([operand-name

{, operand-name

}])

[: type-name

]

Each expression must be terminated by a newline;if an expression doesn’t fit on one line, it must con-tain an open parenthesis, brace, or let, so that theλ-RTL compiler knows to continue to the closingdelimiter.

For example,

default attribute of nop() is RTL.SKIP

defines the effect of a no-op. The longer definition

attribute trap of

taddcctv (rs1, reg_or_imm, rd) is

(tag_overflows($r[rs1], reg_or_imm) orelse

add_overflows($r[rs1], reg_or_imm, 0)

--> trap(tag_overflow))

requires that the attribute be parenthesized, be-cause it occupies more than one line.

In order to compute the type of an attribute, theλ-RTL translator needs to know the types of eachconstructor’s operands. The operand directive isused to tell λ-RTL about the types of constructors’operands.

declaration ⇒

operand(operand-name

∣∣ [

{operand-name

}]): type

Note that constructors may be typed. Because theoperand directive asserts the type a constructor ap-plication has when it is used as an operand, that

same directive can be used to check the type of thatconstructor’s attributes.

The types of simple integer operands are deter-mined by the machine architecture, and in fact, theλ-RTL translator can extract these types automat-ically from a SLED specification.1 The types ofother operands are up to the author of the λ-RTLspecification. For example the Pentium specifica-tion in Chapter 6 represents an effective address asa triple, which contains locations of three differentsizes.

operand addr :

{ b : #8 loc, w : #16 loc, d : #32 loc }

Defining attributes of constructors differs fromdefining functions in significant ways.

• Constructors occupy their own name space.Their names don’t conflict with names of λ-RTL functions or values, and the constructorscannot be used as functions in later λ-RTL def-initions.

• The arguments to attributes are constructoroperands, not patterns. They must be referredto by name, and their names determine theirtypes, according to the operand declarationscurrently in effect.

The attribute values will be represented in C, sothey are required to have monomorphic types, i.e.,their types may not have free type or width vari-ables, and the types may not include functions.

The RTLs used in a λ-RTL description are notpure RTLs, but rather RTL templates. An RTLtemplate becomes a fragment of RTL when suitablefragments of RTL are substituted for the construc-tor’s operands. An RTL template is an RTL, exceptthat

• Names of constructor operands may be usedto stand for RTL expressions. (Eventually wehope to use names of operands to stand forlocations as well as values, but the current im-plementation does not support this usage.)

1Try, e.g., lrtl -operands sparc.sled.


type ⇒

type{* type

}Tuple

| { member-name : type{, member-name : type

}} Record

| type -> type Function

| type-constant(bits

∣∣ loc

∣∣ cell

)Unary constructors

| bool∣∣ rtl Nullary constructors

| type vector Vector

type-constant ⇒ #variable-name∣∣ #integer-literal

Figure 3.1: λ-RTL types

• Tuple or record selections from constructoroperands may be used to stand for RTL ex-pressions.

• Vectors of RTL expressions subscripted by con-structor operands may be used in place of RTLexpressions.

A tuple or record containing RTL templates is alsoconsidered an RTL template. The most notableconsequence of these rules is that no expressionwhose normal form contains a λ-abstraction is anRTL template; λ-abstractions must be applied toarguments.

Eventually there will be a precise, formal defini-tion of RTL template.

Inner declarations

Inner declarations may appear anywhere a top-leveldeclaration may appear, and they may also appearin let-expressions, where they are the only kindof declaration permitted. Their typical usage is tobind local values in function bodies. Inner declara-tions include value bindings, function bindings, andfixity declarations.

Value, location, and function bindings

Value, location, and function bindings are writtenas

declaration ⇒

val value-spec is expression

| locations value-spec is expression

| fun value-spec{argument-pattern

}+ is expression

value-spec ⇒ result-pattern∣∣ [

{name

}]

A location binding is just like a value binding, ex-cept the value is forced to be a location. The squarebrackets in value-spec represent λ-RTL’s groupingmechanism, in which the expression actually de-notes a sequence of values, and each value in thesequence is bound of the corresponding name onthe left.

As examples, consider

val do_nothing is RTL.SKIP

fun bool n is n <> 0

val [eq ne] is [(=) (<>)]

fun triple x y z is (x, y, z)

val triple is \x.\y.\z.(x, y, z)

As in ML, a function definition in λ-RTL may in-clude more than one argument-pattern, so the twodefinitions of triple are equivalent.

Infix operators

λ-RTL supports user-defined infix operators witharbitrarily many levels of precedence. Precedencecan be any real number, with arbitrary precision.Real numbers are written in decimal notation, withoptional minus sign. If the precedence value is aninteger, no decimal point is required. Note that ~0is different from 0.


declaration ⇒(infixl

∣∣ infixr

∣∣ infixn

)precedence value-spec

| nonfix value-spec

Infix operators must be declared to be left-associa-tive, right-associative, or non-associative with otheroperators of the same precedence. For example, wecan make equality testing infix and non-associativeusing

infixn 5 [= <>]

The standard library (Chapter 4) contains this anda number of other infix declarations for the standardRTL operators.

A single instance of an infix operator can be made“nonfix” (treated as an ordinary value) by enclosingit in parentheses. The nonfix declaration makesthis behavior permanent.

Expressions

λ-RTL values include tuples, records, vectors, andfunctions (λ-abstractions), in addition to the cells,locations, and values of the underlying RTLs. λ-RTL provides expressions to create call of thesekinds of values, as well as conditional expressions,let bindings, selection, slicing, function applica-tion, and type casting. As in ML, function appli-cation is written simply by writing one expressionnext to another, e.g., f x or g (x, y). Functionapplication has higher precedence than any infix op-erator, but lower precedence than dot selection andslicing. Figure 3.2 shows the syntax, precedence,and associativity of λ-RTL expressions.

Most of λ-RTL’s expressions are standard andrequire no explanation, but type casts are unusual.Type casts are particularly important when dealingwith references to cells in storage spaces (bytes inmemory or registers in a register file). Like sometype casts in C, but unlike type casts in ML, a λ-RTL type cast may involve computation. For exam-ple, casting a location to a value forces the insertionof a fetch operator (the fetch method of the under-lying storage space). Casting a cell to a locationforces the insertion of an aggregation. Function,

record, and tuple values may also be cast, but usersof λ-RTL will rarely need to do so.2

Patterns

Patterns create bindings of identifiers to values.Syntactically, they are subsets of expressions, butevery occurence of an identifier in a pattern is abinding instance. They are used to bind names tothe arguments of functions or the values in val dec-larations.

pattern ⇒

( pattern{

, pattern}

)

| { member-name[is pattern

]

{, member-name

[is pattern

]}}

| identifier

| identifier as pattern

The as syntax may be used to assign names bothto a whole pattern and to its constituents, e.g, asin

val result as {sum, carry} is add(r1, r2, c)

Grouping

A group may appear in place of an expression any-where in λ-RTL. Whereas an expression denotes asingle value, a group denotes a sequence of values.All operations except binding distribute over thegroup, so if the expression in a declaration containsa group, then that expression denotes a sequence ofvalues, no matter how deeply nested the group is.The most common operations to distribute over agroup are function application and tuple formation.

The usual syntax for a group is a list of expres-sions in square brackets. The expressions need notbe of the same type. The space separating elementsof a group has precedence higher than function ap-plication (but lower than dot selection). There-fore, the group [f x] is a group containing twovalues, as is [f (x)]. To use function applicationor infix operators within a group, one must includethe whole application in parentheses, e.g., [(f x)]

2For the adventurous, the usual rules for covariant andcontravariant subtyping apply.


expression ⇒

( expression{, expression

}) Tuple

| { member-name is expression{, member-name is expression

}} Record

| [. expression{, expression

}.] Vector

| let{declaration

}in expression end Local declarations

| if expression then expression else expression fi Conditional

| \ argument-pattern . expression Function

| expression.member-name Selection

| $ space-identifier [ expression ] Location

| expression @bits[slice-specification] Slice of value

| expression @loc[slice-specification] Slice of location

| expression expression Function application

| expression : type Type conversion

| identifier Named value

| literal-constant Constant

Expression Precedence Fixity or Associativity

Dot selection Highest Postfix

Slice Highest Postfix

Grouping Next highest

Function application Above all infix Left associative

infixx operators As declared As declared

\-abstraction Lowest Right associative

Figure 3.2: λ-RTL expressions

or [(f(x))]. In addition to the usual syntax, λ-RTL provides two kinds of special syntax to creategroups of integers. All use square brackets:

expression ⇒ group

group ⇒

[{expression

}]

| [ expression .. expression ]

| [ expression to expression[by expression

∣∣ columns expression

]]

In the latter two cases, the constituent expressionsmust be compile-time constants.

Groups are evaluated left to right, last-in, first-out, so for example the expression

([a b], [1 2])

denotes the same sequence of four expressions as

[(a, 1) (a, 2) (b, 1) (b, 2)]

The evaluation rule and the design of groups in gen-eral are based on the generator mechanism of theIcon programming language (Griswold 1982).

The primary role of groups is to make it easyto specify multiple machine instructions in a sin-gle attribute binding. This is done by using value-spec (either a single name or a sequence of names insquare brackets) to form the opcodes of construc-tors. Multiple value-specs may be used to combine,for example, a group of operations with a group ofsuffixes. Chapters ?? and 6 have examples.

opcode ⇒ value-spec{^value-spec

}


In some cases, the default left-to-right, first-in,last-out rule for enumerating groups may be in-convenient or confusing. λ-RTL therefore providessome syntactic sugar.

expression ⇒{foreach ident in group

}+ group expression end

Before enumerating groups, the λ-RTL transla-tor rewrites expressions of the form foreach x1

in e1 · · · foreach xn in en group e′ end intofunction applications of the form (\x1.· · ·\xn.e

′)

(e1)· · ·(en). Because of this rewriting, usinggroups within the body of the foreach is likely toproduce unexpected results.

Lexical conventions

Like identifiers in ML, λ-RTL identifiers may bealphanumeric or symbolic. An alphanumeric iden-tifier begins with a letter and continues with letters,digits, underscores, and primes. A symbolic identi-fier is a sequence of the symbols

!&+/\:<=>?@~|*‘%;-^#.

Such an identifier may begin with any symbol ex-cept #.

Identifiers consisting solely of two or more dashesintroduce comments; the comment includes thedashes and continues to the end of the line. Forexample, -- and --- introduce comments, but -->and --*-- do not. A comment is syntacticallyequivalent to a newline.

An integer literal is a sequence of decimal digits;λ-RTL has no floating-point literals. λ-RTL alsosupports hexadecimal literals beginning with 0x,octal literals beginning with 0, and binary literalsbeginning with 0b.

Literals (and identifiers) preceded by # representwidths in λ-RTL’s type system. RTL operators andfunctions that are polymorphic in width may bespecialized by applying them to a width literal ina width application. For example, four 8-bit bytescan be aggregrated into a 32-bit word by

RTL.AGGB #8 #32

and the sx sign-extension operator can be special-ized to extend a 13-bit value to 32 bits by

sx #13 #32

It is rarely necessary to supply these constants ex-plicitly; λ-RTL almost always infers them from con-text.

The initial basis

Most of the RTL-specific content of λ-RTL is in theinitial basis, i.e., the collection of predefined func-tions and values. Figure 3.3 shows λ-RTL’s initialbasis. Of course, access to values within the RTL

or Vector modules requires that the modules beimported.

Style

λ-RTL provides expressive power with few restric-tions. Writers of specifications use the functionsin λ-RTL’s initial basis to create RTLs that corre-spond to the form prescribed in Chapter 2. Writerscan also introduce new RTL operators. This free-dom gives us ample scope for experimenting withdifferent styles of description—as well as ample ropewith which to hang ourselves. Our initial exper-iments have led to a “standard library” of usefulRTL operators, as well as some useful λ-RTL func-tions. Chapter 4 presents this library.

λ-RTL provides no looping or recursion con-structs. Loops whose sizes are known in ad-vance can be simulated by using Vector.foldr andVector.spanning. Because this style will be famil-iar only to those well versed in functional program-ming, we expect eventually to provide some syntac-tic sugar for it.

Differences between λ-RTL and Stan-dard ML

Although λ-RTL is inspired in large part by Stan-dard ML, there are significant differences, whichmay interest ML programmers. Syntactic differ-ences include:


Boolean constants:true Truth

false Falsehood

RTL operators and other functions used to create RTLs:

RTL.STORE Takes location and value, produces effect.

RTL.FETCH Fetches value from location.

RTL.AGGB Big-endian aggregation.

RTL.AGGL Little-endian aggregation.

RTL.SKIP The empty RTL; an effect that does nothing.

RTL.GUARD Takes a Boolean expression and an effect and produces an effect.

RTL.PAR Takes two effects (RTLs) and composes them so they take place simultaneously(list append on list of effects).

RTL.SEQ Takes two effects (RTLs) and composes them by forward substitution (sequentialcomposition of effects).

Functions on vectors:

sub Vector subscript, a left-associative infix operator with precedence 5.

Vector.spanning A curried function of two arguments. Vector.spanning x y produces the vector[.x, x + 1, . . . , y.].

Vector.foldr A higher-order function used to visit every element of a vector. Like Vector.foldr

in Standard ML ’97.RTL.FETCH and RTL.STORE use the user’s fetch and store methods. To bypass fetch and store methods, andwhen defining fetch and store methods, use RTL.TRUE FETCH and RTL.TRUE STORE.

Figure 3.3: λ-RTL’s initial basis

• The defining connective is is, not =.

• if expressions must be terminated by fi.

• Identifiers declared infix must be explicitlyidentified as left-, right-, or non-associative.Arbitrarily many levels of precedence are avail-able.

• Dot notation is used to select elements fromrecords as well as from modules.

• Modules must be imported explicitly, and thefrom. . . import directive replaces ML’s open.

• Comments begin with an identifier consistingof n dashes, where n is at least 2. Commentsend at the end of a line.

There are also some semantic restrictions:

• There are no side effects (mutation, excep-tions), so order of evaluation does not matter.

• Functions may not be recursive, so the nor-mal form of all expressions can be computedat compile time.

• There are no datatype constructors. As acorollary, the only patterns used in functiondefinitions are those that match tuples orrecords.

Chapter 4

A Library of standard RTL operators

This chapter describes RTL operators and functions that we expect to be useful for describing many differentmachines. As we do throughout this document, we use the noweb literate-programming tool (Ramsey 1994)to present examples of λ-RTL. Noweb extracts the code from the same files used to produce this document,so we can run the examples through the λ-RTL compiler and make sure everything compiles.

A noweb file contains explanatory text interleaved with named “code chunks,” which contain source codeand references to other code chunks. The names of chunks appear italicized and in angle brackets, as in thefollowing C code:

20a 〈summarize the problem 20a〉≡ 20c ⊲

printf("Inputs are: ");

〈for every i that is an input, print i and a blank 20b〉printf("\n");

20b 〈for every i that is an input, print i and a blank 20b〉≡ (20a)

for (i = 1; i < argc; i++)

printf("%s ", argv[i]);

The ≡ sign indicates the definition of a chunk. Definitions of a chunk can be continued in a later chunk;noweb concatenates their contents. Such a concatenation is indicated by a +≡ sign in the definition:

20c 〈summarize the problem 20a〉+≡ ⊳ 20a

printf("Must process %d files\n", argc-1);

noweb adds navigational aids to the document. Each chunk name ends with the number of the page on whichthe chunk’s definition begins. When more than one definition appears on a page, they are distinguished byappending lower-case letters to the page number. When a chunk’s definition is continued, noweb includespointers to the previous and next definitions, written “⊳??” and “?? ⊲.” The notation “(?0—1)” shows wherea chunk is used.

The layout of the module is as follows:

20d 〈basic.rtl 20d〉≡module StdOperators is

import RTL

〈definitions of standard RTL operators and functions 21a〉end

20

4.1. BUILDING RTLS 21

Because λ-RTL does not yet have modules, we use noweb to include the chunk 〈definitions of standardRTL operators and functions 21a〉 in our descriptions of the SPARC and the Pentium.

4.1 Building RTLs

λ-RTL is intended for building RTLs, not analying them, so it doesn’t expose all the structure of RTLs;single effects, guarded effects, and full RTLs (lists of guarded effects) all have the same type effect. Theinitial basis provides ways to create and combine effects. We define a more compact, infix notation for partsof the basis.

21a 〈definitions of standard RTL operators and functions 21a〉≡ (20d) 21b ⊲

val do_nothing is RTL.SKIP : effect

val --> is RTL.GUARD : bool * effect -> effect

val := is RTL.STORE -- type is #n loc * #n bits -> effect

infixr 1 -->

infixn 2 :=

In the underlying representation, the semantics of these operations is extended to full RTLs in the obviousway. For example, RTL.STORE produces an RTL containing a single effect whose guard is true.

We don’t need an explicit FETCH, because fetches are inserted automatically, and we don’t need an explicitCONST, because λ-RTL compiles literal constants into RTL CONST nodes.

We use a vertical bar to combine simultaneous effects, and a semicolon for sequential composition. Oneday, the λ-RTL translator will translate sequential composition to simultaneous composition, using forwardsubstitution.

21b 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 21a 21c ⊲

val | is RTL.PAR : effect * effect -> effect

val ; is RTL.SEQ : effect * effect -> effect

infixl 0 |

infixl ~1 ;

4.2 Booleans

true and false are values in λ-RTL’s initial basis. We use not for Boolean negation, and conjoin anddisjoin for conjunction and disjunction. (The bitwise operations on bit vectors are com, and, and or.)

21c 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 21b 22c ⊲

rtlop not : bool -> bool

rtlop [conjoin disjoin] : bool * bool -> bool

For writing other Boolean functions λ-RTL has only if-then-else-fi, which λ-RTL rewrites into suitableguards. We would like to have connectives andalso and orelse, but introducing them presents a choiceabout level of detail: should they be introduced as new RTL operators, whose semantics must be definedoutside of λ-RTL, or should they be defined in terms of the existing λ-RTL primitives? We would like totry both alternatives, because they provide different levels of detail in the generated RTLs.

22 CHAPTER 4. A LIBRARY OF STANDARD RTL OPERATORS

Eventually, λ-RTL will have mechanisms to support multiple levels of abstraction, probably by using somesort of parameterized modules. For now, we resort to some literate-programming tricks that are equivalentto conditional compilation. noweb can extract this code in two versions, designated simple and full. Codechunks tagged [[simple]] and [[full]] appear only in the corresponding versions, and untagged code chunksappear in both versions.

In the simple version, the boolean connectives are left abstract. In the full version, we give them theirstandard definitions. The simple version has the advantage that the RTLs are smaller and easier to read,but the disadvantage that the semantics have to be specified elsewhere. In the full version, these connectivesare expanded to primitives, so they automatically get their proper semantics.

22a 〈definitions of standard RTL operators and functions [[simple]] 22a〉≡ 22e ⊲

val [andalso orelse] is [conjoin disjoin]

Conditional definition.

22b 〈definitions of standard RTL operators and functions [[full]] 22b〉≡ 22f ⊲

fun andalso (p, q) is if p then q else false fi

fun orelse (p, q) is if p then true else q fi


No matter what their definitions, we want these connectives to be left-associative infix operators.

22c 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 21c 22d ⊲

infixl 3 orelse

infixl 4 andalso

In the long run, equality and inequality will probably have to have some built-in semantics, but for now,we leave them abstract.

22d 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 22c 23a ⊲

rtlop [eq ne] : #n bits * #n bits -> bool

val [= <>] is [eq ne]

infixn 5 [= <>]

These operators are shown in TEX as = and 6=. We have chosen to make inequality a primitive RTL operator,although we could have defined it in terms of equality by writing val <> is \(x,y). not (x = y).

We can convert between booleans and bits. Again we have the choice of making things abstract or concrete.

22e 〈definitions of standard RTL operators and functions [[simple]] 22a〉+≡ ⊳ 22a

rtlop bit : bool -> #1 bits -- turn boolean into bit

rtlop bool : #1 bits -> bool -- turn bit into boolean

22f 〈definitions of standard RTL operators and functions [[full]] 22b〉+≡ ⊳ 22b

fun bit p is if p then 1 else 0 fi

fun bool n is n <> 0

4.3. INTEGER ARITHMETIC AND COMPARISONS 23

4.3 Integer arithmetic and comparisons

These operations represent integer arithmetic performed on the bit-vector representation. Here are addi-tion and subtraction. We have to define addition and carry separately, because λ-RTL won’t permit RTLoperators that return records or tuples.1

23a 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 22d 23b ⊲

local

rtlop [add sub] : #n bits * #n bits -> #n bits

in

val [+ -] is [add sub]

end

rtlop [carry borrow] : #n bits * #n bits * #1 bits -> #1 bits

rtlop neg : #n bits -> #n bits -- two’s-complement negation

Note that this definition hides the names add and sub, using + and - instead.To define combined versions add and subtract, we need sign extension and zero extension. These operators

are polymorphic in the sizes of the arguments and results.


rtlop [sx zx] : #n bits -> #m bits -- must have m > n

Now we can define add and subtract to combine result and carry/borrow.

23c 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 23b 23d ⊲

infixl 6 [+ -]

fun add (n, m, c) is { result is n + m + zx c, carry is carry (n, m, c) }

fun subtract (n, m, b) is { result is n - m - zx b, borrow is borrow (n, m, b) }

The intended semantics of add and subtract is “add with carry” and “subtract with borrow.”Integer multiplication and division may be signed or unsigned. There are two sets of signed division

operators because division may truncate towards −∞ (div and mod) or towards 0 (quot and rem). Thereis, of course, no such distinction for unsigned division.

23d 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 23c 24a ⊲

rtlop [mul mulu] : #n bits * #n bits -> #m bits -- m = 2 * n

rtlop [div divu quot] : #m bits * #n bits -> #m bits -- m = n or m = 2n

rtlop [mod modu rem] : #m bits * #n bits -> #n bits -- m = n or m = 2n

The λ-RTL type system is unable to express the constraints on the types of multiplication and division. Wecould play some games with the definitions (e.g., return two words and merge them with bitInsert), butit’s not worth it at this time.

This code highlights the unpleasant question of naming RTL operators. As of July 1999, the names used inrtlop declarations are consistent with the names used in the VPOI code-generation interfaces. These namesare, however, re-bound to other names, which are used in the actual machine descriptions. Eventually, it ishoped that VPO, λ-RTL, C--, and the Princeton proof-carrying code project will agree on a set of namesfor operators.

1This restriction is not strictly necessary, but we believe it should simplify construction of automatic RTL-based tools.


Two’s-complement signed integer comparison.

24a 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 23d 24b ⊲

local

rtlop [lt le gt ge] : #n bits * #n bits -> bool

in

val [< <= > >=] is [lt le gt ge]

end

Comparisons are left abstract instead of being defined in terms of < and =.Unsigned comparison.


rtlop [ltu leu gtu geu] : #n bits * #n bits -> bool

The arithmetic operators obey the usual rules of precedence.

24c 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 24b 24d ⊲

infixl 7 [div mod quot rem divu modu]

infixl 6 [+ -]

infixn 5 [< <= > >=]

4.4 Logical operators

Most processors offer a variety of bitwise operations on bit vectors, including those shown here.

24d 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 24c 24e ⊲

rtlop [and or xor] : #n bits * #n bits -> #n bits

rtlop com : #n bits -> #n bits -- ‘complement’ because ‘not’ is boolean

rtlop bitInsert : {lsb : #k bits, wide : #w bits} * #n bits -> #w bits

rtlop bitExtract : {lsb : #k bits, wide : #w bits} -> #n bits

val bitInsert is \x.\y.bitInsert (x, y) -- curry

bitInsert inserts into a w-bit (“wide”) value. The thing inserted is n bits (“narrow”), and the widths ofboth wide and narrow values are normally inferred by the type system. The bitExtract is a generalizationof λ-RTL’s built-in slicing operator. The difference is the width of the narrow value returned by bitExtract

is implicit in its (specialized) type, and that width can be a width variable, not just a constant.There are instructions that manipulate bit fields whose widths aren’t known statically. These computations

can’t be expressed with bitInsert and slicing, because we can’t assign a type (width) to the results. Butwe can define bitTransfer as a combination extraction and insertion, and in fact, this is the way suchinstructions must behave. Even an instruction that extracts k bits from a 32-bit word, where k is not knownuntil run time, must still insert the result into some value of known size—typically a 32-bit word of all zeros.

24e 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 24d 25a ⊲

rtlop bitTransfer :

{ src : {value : #w bits, lsb : #k bits}

, dst : {value : #w bits, lsb : #k bits}

, width : #k bits

} -> #w bits

-- extract width bits from src at lsb, insert into dst at lsb

4.4. LOGICAL OPERATORS 25

src.lsb

↓

src.value s11s10s9s8 s7s6s5s4 s3s2s1s0

dst.lsb

↓

dst.value d11d10d9d8d7d6d5d4d3d2d1d0

result d11d10s5s4 s3d6d5d4d3d2d1d0

result = bitTransfer { src is { value is s, lsb is 3}

, dst is { value is d, lsb is 7}

, width is 3

}

Figure 4.1: Example of bitTransfer

bitTransfer {src, dst, width} extracts width bits from src.value, with the least significant bit beingsrc.lsb, and it inserts those bits into dst.value, with the least significant bit inserted at position dst.lsb.The width and lsb specifiers are assumed to have the same type. Figure 4.1 shows an example.

Most processors have shift instructions. We make the shift operations primitive RTL operators, eventhough they can be defined in terms of bit transfer.

25a 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 24e 25b ⊲

rtlop [shl shrl shra] : #n bits * #k bits -> #n bits

At least one processor, the IA64, has a “population count” operation that examines a value and returnsthe number of bits that are set. It’s not clear how polymorphic it should be; for the moment, I make thetype of the return value arbitrary, but it might be more convenient to make it the same as the type of theargument.


rtlop popcnt : #n bits -> #k bits

Bit transfer can also be used to implement rotation, but it’s possible rotation should also be primitive.

25c 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 25b 26a ⊲

fun rotl (w, n, k) is

let val w’ is bitTransfer { src is {value is n, lsb is w-k}

, dst is {value is 0, lsb is 0}

, width is k

}

in bitTransfer { src is {value is n, lsb is 0}

, dst is {value is w’, lsb is w-k}

, width is w - k

}

end


26a 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 25c 26b ⊲

fun rotr (w, n, k) is

let val w’ is bitTransfer { src is {value is n, lsb is 0}

, dst is {value is 0, lsb is w-k}

, width is k

}

in bitTransfer { src is {value is n, lsb is k}

, dst is {value is w’, lsb is 0}

, width is w - k

}

end

4.5 Undefined effects

As of July 1999, the λ-RTL translator does not support the kill effects. In a later implementation, we planto use the question mark to stand for an “undefined” value. All λ-RTL functions and RTL operators willbe strict in this value, and RTL.STORE(loc, ?) will become RTL.KILL loc. Because we would like to usethe question mark in our example descriptions, even though λ-RTL doesn’t yet have the machinery to giveit its eventual meaning, we introduce it as a constant. (It used to be a nullary RTL operator, but AndrewAppel argued it shouldn’t be an operator, so I make it a value.)


local

-- rtlop undefined : #k bits

val undefined is 0

in

val ? is undefined

end

4.6 Floating-point arithmetic

Floating-point operations need to come in different families, so for example we could distinguish IEEE 754floating point from VAX floating point. It therefore seems reasonable to put these operators in their ownnested module.

26c 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 26b

module IEEE754 is

〈IEEE 754 floating point 26d〉end

Many floating-point operations that we normally think of as binary can’t be treated as binary in λ-RTL,because their results depend on the rounding mode, which must be passed as an additional argument.

26d 〈IEEE 754 floating point 26d〉≡ (26c) 27a ⊲

rtlop [fneg fabs] : #k bits -> #k bits -- exact (no rounding)

rtlop [fsqrt] : #k bits * #2 bits -> #k bits -- rounded

rtlop [fadd fsub fmul fdiv] : #k bits * #k bits * #2 bits -> #k bits

rtlop [fmulx] : #k bits * #k bits -> #n bits -- where #n = 2 #k

4.6. FLOATING-POINT ARITHMETIC 27

All these operators are polymorphic in the size of their operands, so they can be used for single-, double-,and extended-precision floats. fmulx (for “exact”) produces an exact, double-size result, without rounding.

There are four rounding modes required by the standard. It’s unfortunate that we seem to have to pickinteger representations for these rounding modes, instead of leaving them more abstract, but λ-RTL doesnot currently permit users to introduce abstract types.

27a 〈IEEE 754 floating point 26d〉+≡ (26c) ⊳ 26d 27b ⊲

module Round is

local

rtlop [round_nearest round_zero round_down round_up] : #2 bits

-- what to round toward

in

val [nearest zero down up] is [round_nearest round_zero round_down round_up]

end

end

Conversions are polymorphic in two widths: the argument and result widths. The argument of type#2 bits represents the rounding mode.

27b 〈IEEE 754 floating point 26d〉+≡ (26c) ⊳ 27a 27c ⊲

rtlop [i2f f2i f2f] : #k bits * #2 bits -> #n bits

There are the special values ±0 and ±∞.

27c 〈IEEE 754 floating point 26d〉+≡ (26c) ⊳ 27b 27d ⊲

rtlop [pzero mzero] : #n bits

rtlop [pinf minf] : #n bits

There are NaNs.

27d 〈IEEE 754 floating point 26d〉+≡ (26c) ⊳ 27c 27e ⊲

rtlop NaN : #k bits -> #n bits -- function from significand to value

A floating-point comparison produces one of four results: less, equal, greater, or unordered. Because wedon’t want to fix particular 2-bit representations of these results, we create RTL operators to describe them.

27e 〈IEEE 754 floating point 26d〉+≡ (26c) ⊳ 27d

module Comparison is

rtlop [float_lt float_eq float_gt unordered] : #2 bits

val [< = >] is [float_lt float_eq float_gt]

end

rtlop fcmp : #k bits * #k bits -> #2 bits

Chapter 5

The meanings of the standard RTLoperators

This chapter is just a big table that lists all the RTL operators with their suggested standard names, types,and meanings.

Although this specification accounts for the rounding modes used in IEEE floating-point arithmetic, itdoesn’t account for other state, such as the state controlling whether denormalized numbers should be used.

Operator (with type) Meaning

IEEE 754 Floating-point operationsfcmp : ∀#n.#n bits × #n bits → #2 bits Standard comparison of IEEE 754

floating-point values. Produces one ofthe four 2-bit results specified below. Forconvenience, we also provide (below)eight of the sixteen possible Booleanfunctions of the results of afloating-point comparison.

float lt : #2 bits Result of floating-point comparisoncmpf (x, y) was “less than”

float eq : #2 bits Result of floating-point comparisoncmpf (x, y) was “equal”

float gt : #2 bits Result of floating-point comparisoncmpf (x, y) was “greater than”

unordered : #2 bits Result of floating-point comparisoncmpf (x, y) was “unordered”

feq : ∀#n.#n bits × #n bits → bool Floating equals.

fge : ∀#n.#n bits × #n bits → bool Floating greater than or equal.

fgt : ∀#n.#n bits × #n bits → bool Floating greater than.

fle : ∀#n.#n bits × #n bits → bool Floating less than or equal.

28

29


flt : ∀#n.#n bits × #n bits → bool Floating less than.

fne : ∀#n.#n bits × #n bits → bool Floating not equals.

fordered : ∀#n.#n bits × #n bits → bool Floating ordered. Comparesfloating-point numbers to see if they areordered by any of {<, =, >}.

funordered : ∀#n.#n bits × #n bits → bool Floating unordered. Comparesfloating-point numbers to see if they arenot ordered by any of {<, =, >}.

NaN : ∀#n.∀#m.#n bits → #m bits NaN n is an IEEE 754 Nan withsignificand n.

round down : #2 bits IEEE 754 rounding mode for roundingdown.

round up : #2 bits IEEE 754 rounding mode for roundingup.

round nearest : #2 bits IEEE 754 rounding mode for rounding tonearest.

round zero : #2 bits IEEE 754 rounding mode for roundingtoward zero.

f2f : ∀#n.∀#m.#n bits × #2 bits → #m bits Converts an IEEE 754 floating-pointvalue of n bits to a floating-point valueof m bits, using the rounding mode givenas the second argument.

f2i : ∀#n.∀#m.#n bits × #2 bits → #m bits Converts an IEEE 754 floating-pointvalue of n bits to a two’s-complementinteger value of m bits, using therounding mode given as the secondargument.

i2f : ∀#n.∀#m.#n bits × #2 bits → #m bits Converts a two’s-complement integer ofn bits into an IEEE 754 floating-pointvalue of m bits, using the rounding modegiven as the second argument.

fadd : ∀#n.#n bits × #n bits × #2 bits → #n bits Floating-point add with explicitrounding mode.

fsub : ∀#n.#n bits × #n bits × #2 bits → #n bits Floating-point subtract with explicitrounding mode.

fdiv : ∀#n.#n bits × #n bits × #2 bits → #n bits Floating-point divide with explicitrounding mode.

fmul : ∀#n.#n bits × #n bits × #2 bits → #n bits Floating-point multiply with explicitrounding mode.

fmulx : ∀#n.∀#m.#n bits × #n bits → #m bits Exact floating-point multiply, with norounding, where m = 2n.

30 CHAPTER 5. THE MEANINGS OF THE STANDARD RTL OPERATORS


fabs : ∀#n.#n bits → #n bits Absolute value of a floating-point value.

fneg : ∀#n.#n bits → #n bits Negation of a floating-point value.

fsqrt : ∀#n.#n bits × #2 bits → #n bits Square root of a floating-point value,with explicit rounding mode.

minf : ∀#n.#n bits IEEE 754 representation of minusinfinity.

mzero : ∀#n.#n bits IEEE 754 representation of minus zero.

pinf : ∀#n.#n bits IEEE 754 representation of plus infinity.

pzero : ∀#n.#n bits IEEE 754 representation of plus zero.

Other operations, alphabetical by nameadd : ∀#n.#n bits × #n bits → #n bits Two’s-complement integer addition.

add overflows : ∀#n.#n bits × #n bits → bool Tells whether addition overflows. Ifx and y are considered as signed,two’s-complement integers,add overflows(x, y) iff x and y have thesame sign and add(x, y) has the oppositesign. Overflow of unsigned addition maybe checked using carry.

and : ∀#n.#n bits × #n bits → #n bits Bitwise and. (Boolean conjunction isconjoin.)

bit : bool → #1 bits Convert a Boolean to a 1-bit value. Maynot be primitive in all systems, e.g.,bit x = if x then 1 else 0.

bitExtract : ∀#w.∀#n.{lsb : #w bits, wide : #w bits} → #n bits

bitExtract #w #n {lsb, wide}

returns the n-bit (“narrow”) valueextracted from the w-bit value wide,with least significant bit at bit lsb ofwide. (lsb is taken to be a w-bit value,even though only ⌈log2 w⌉ bits areneeded.) It is unusual in that the widthof the value abstracted, #n, is part ofthe type to which bitExtract isspecialized, not a run-time argument.

bitInsert : ∀#w.∀#n.{lsb : #w bits, wide : #w bits} × #n bits → #w bits

bitInsert inserts into a w-bit (“wide”)value. The thing inserted is n bits(“narrow”), and the result is a new w-bitvalue.

bitTransfer : ∀#w.{dst:{lsb:#w bits,value:#w bits},src:{lsb:#w bits,value:#w bits},width:#w bits}→#w bits

31


There are instructions that manipulatebit fields whose widths aren’t knownstatically. These computations can’t beexpressed with bitInsert and slicing,because we can’t assign a type (width)to the results. But we can definebitTransfer as a combinationextraction and insertion, and in fact, thisis the way such instructions must behave.Even an instruction that extracts k bitsfrom a 32-bit word, where k is notknown until run time, must still insertthe result into some value of knownsize—typically a 32-bit word of all zeros.

bool : #1 bits → bool Convert a 1-bit value to Boolean. Maynot be primitive in all systems, e.g.,bool x ≡ (x 6= 0).

borrow : ∀#n.#n bits × #n bits × #1 bits → #1 bits The bit borrow(x, y, bin) is the “borrowout” bit needed when computing thesubtraction x− y − bin, where x and y

are two’s-complement n-bit integers, andbin is the “borrow in” bit.

carry : ∀#n.#n bits × #n bits × #1 bits → #1 bits The bit carry(x, y, cin) is the “carryout” bit produced by adding x + y + cin,where x and y are two’s-complementn-bit integers, and cin is the “carry in”bit.

com : ∀#n.#n bits → #n bits Bitwise complement of a vector of n bits.(The operator not is used forcomplementing Booleans.)

conjoin : bool × bool → bool Boolean conjunction. May not beprimitive in all systems, e.g.,x conjoin y ≡ if x then y else false.

disjoin : bool × bool → bool Boolean disjunction. May not beprimitive in all systems, e.g.,x disjoin y ≡ if x then true else y.

div : ∀#n.#n bits × #n bits → #n bits Divides an n-bit, two’s-complement,signed integer by an n-bit,two’s-complement, signed integer,rounding toward minus infinity, andproducing an n-bit result.



divu : ∀#n.#n bits × #n bits → #n bits Divides an n-bit, unsigned integer by ann-bit, unsigned integer, rounding down(toward both zero and minus infinity),and producing an n-bit result.

eq : ∀#n.#n bits × #n bits → bool Equals. Compares bit vectors forequality.

ge : ∀#n.#n bits × #n bits → bool Greater than or equal. Comparestwo’s-complement signed integers.

geu : ∀#n.#n bits × #n bits → bool Greater than or equal, unsigned.Compares unsigned integers.

gt : ∀#n.#n bits × #n bits → bool Greater than. Comparestwo’s-complement signed integers.

gtu : ∀#n.#n bits × #n bits → bool Greater than, unsigned. Comparesunsigned integers.

le : ∀#n.#n bits × #n bits → bool Less than or equal. Comparestwo’s-complement signed integers.

leu : ∀#n.#n bits × #n bits → bool Less than or equal, unsigned. Comparesunsigned integers.

lt : ∀#n.#n bits × #n bits → bool Less than. Compares two’s-complementsigned integers.

ltu : ∀#n.#n bits × #n bits → bool Less than, unsigned. Compares unsignedintegers.

mod : ∀#n.#n bits × #n bits → #n bits Signed modulus, satisfyingx mod y = x− y mul trunc (x div y) forall y 6= 0.

modu : ∀#n.#n bits × #n bits → #n bits Unsigned modulus, satisfyingx mod y = x− y mul trunc (x divu y) forall y 6= 0.

mul : ∀#n.∀#m.#n bits × #n bits → #m bits Multiply two n-bit, two’s-complement,signed integers to produce an m-bit,two’s complement, signed result, wherem = 2n.

mul overflows : ∀#n.#n bits × #n bits → bool Tells whether signed multiplicationoverflows n bits. If x and y areconsidered as signed, two’s-complementintegers, mul overflows(x, y) iff any ofthe most significant n bits of mul(x, y)differs from bit n− 1 of mul(x, y).

mulu : ∀#n.∀#m.#n bits × #n bits → #m bits Multiply two n-bit, unsigned integers toproduce an m-bit, unsigned result, wherem = 2n.

33


mulu overflows : ∀#n.#n bits × #n bits → bool Tells whether unsigned multiplicationoverflows n bits. If x and y areconsidered as unsigned integers,mul overflows(x, y) iff any of the mostsignificant n bits of mul(x, y) is nonzero.

mul trunc : ∀#n.∀#m.#n bits × #n bits → #n bits Multiply two n-bit integers to produce a2n-bit result and return only the leastsignificant n bits. Could be written usingmul and bitExtract. We don’t needseparate signed and unsigned versionsbecause they are identical in the leastsignificant n bits. For a proof, writex = x0..n−1 + s · xn−1 · 2

n−1 and similarlyfor y, where s is the sign bit and is ±1.Then show that (x · y) mod 2n isindependent of s. The only tricky bit isto realize that for any k,−s · k · 2n−1 = s · k · 2n−1 (mod 2n).

ne : ∀#n.#n bits × #n bits → bool Comparison of bit vectors (inequality).

neg : ∀#n.#n bits → #n bits Two’s-complement negation.

not : bool → bool Logical complement

or : ∀#n.#n bits × #n bits → #n bits Bitwise or. (Boolean disjunction isdisjoin.)

popcnt : ∀#n.#n bits → #n bits Computes the number of 1 bits in itsargument.

quot : ∀#n.#n bits × #n bits → #n bits Divides an n-bit, two’s-complement,signed integer by an n-bit,two’s-complement, signed integer,rounding toward zero, and producing ann-bit result.

rem : ∀#n.#n bits × #n bits → #n bits Signed remainder, satisfyingx rem y = x− y mul trunc (x quot y) forall y 6= 0.



rotl : ∀#n.#n bits × #n bits → #n bits Rotate left. rotl(x, d) shifts value x

d bits to the left, shifting in the mostsignificant d bits of x at the bottom.Or, more precisely, the least significantd bits of the result are the mostsignificant d bits of x, and the remainingbits are the least significant n− d bitsof x. (When d >= n, read d mod n for d

and (n− d) mod n for n− d.) Can beexpressed using bitTransfer, but theequation is not for the faint of heart—then in #n must be part of the equation.

rotr : ∀#n.#n bits × #n bits → #n bits Rotate right. rotr(x, d) shifts value x

d bits to the right, shifting in the leastsignificant d bits of x at the top. Or,more precisely, the most significantd bits of the result are the leastsignificant d bits of x, and the remainingbits are the most significant n− d bitsof x. (When d >= n, read d mod n for d

and (n− d) mod n for n− d.) Can alsobe expressed using bitTransfer.

shl : ∀#n.#n bits × #n bits → #n bits Shift left. shrl(x, d) shifts value x d bitsto the left, shifting in zeroes at thebottom Or, more precisely, the leastsignificant d bits of the result are zero,and the remaining bits are taken fromthe least significant n− d bits of x. (Ifd ≥ n, the result is zero.) Or,alternatively,shl(x, d) = (x mulu 2d) modu 2n. Can alsobe expressed using bitTransfer.

shra : ∀#n.#n bits × #n bits → #n bits Shift right, arithmetic. shrl(x, d) shiftsvalue x d bits to the right, shifting incopies of x’s sign bit at the top. Or,more precisely, the most significantd bits of the result are copies of x’s mostsignificant bit, and the remaining bitsare taken from the most significantn− d bits of x. (If d ≥ n, the result is allzeroes or all ones.) Or, alternatively,shra(x, d) = x divs 2d. Can also beexpressed using bitTransfer.

5.1. KNOWN ALGEBRAIC LAWS 35


shrl : ∀#n.#n bits × #n bits → #n bits Shift right, logical. shrl(x, d) shiftsvalue x d bits to the right, shifting inzeros at the top. Or, more precisely, themost significant d bits of the result arezeroes, and the remaining bits are takenfrom the most significant n− d bits of x.(If d ≥ n, the result is zero.) Or,alternatively, shrl(x, d) = x divu 2d.Can also be expressed usingbitTransfer.

sub : ∀#n.#n bits × #n bits → #n bits Two’s-complement integer subtraction.

sub overflows : ∀#n.#n bits × #n bits → bool Tells whether substraction overflows. Ifx and y are considered as signed,two’s-complement integers,sub overflows(x, y) iff x and y haveopposite signs and sub(x, y) has thesame sign as y. Overflow of unsignedsubtraction may be checked usingborrow.

sx : ∀#n.∀#m.#n bits → #m bits Sign-extend an n-bit, two’s-complement,signed integer to produce an m-bit,two’s-complement, signed integer, bothargument and result denoting the sameinteger value when interpreted astwo’s-complement integers. Requiresm > n.

xor : ∀#n.#n bits × #n bits → #n bits Bitwise exclusive or.

zx : ∀#n.∀#m.#n bits → #m bits Zero-extend an n-bit value to produce anm-bit value, both argument and resultdenoting the same integer value wheninterpreted as unsigned integers.Requires m > n.

5.1 Known algebraic laws

Here’s a smattering of algebraic laws. The derivation of a more complete list would be worthwhile.

35 〈algebraic laws 35〉≡AC [add and or xor]

x add 0 = x

x and 0 = 0

x and (com 0) = x


com (com x) = x

x eq y = not (x ne y)

x ge y = not (x lt y)

x geu y = not (x ltu y)

... and so on ...

x ge y = y le x

x gt y = y lt x

x geu y = y leu y

x gtu y = y ltu x

... and so on ...

0 le x conjoin x lt y = x ltu y

not (not p) = p

neg x = add (com x, 1)

x rotl k = x rotr (neg k)

x rotr k = x rotl (neg k)

rotl #n #k (x, y) = rotl #n #k (x, y modu n)


5.2. MORE SPARC DESCRIPTION 37

5.2 More SPARC description

The preceding sections illuminate the issues involved in describing the SPARC. Here, we describe the restof the instruction set, with minimal commentary.

More locations

Naming the registers

37a 〈SPARC basics 37a〉≡ 37b ⊲

locations

[ g0 g1 g2 g3 g4 g5 g6 g7

o0 o1 o2 o3 o4 o5 o6 o7

l0 l1 l2 l3 l4 l5 l6 l7

i0 i1 i2 i3 i4 i5 i6 i7 ] is $r[[0..31]]

[sp fp] is [o6 i6]

Floating-point registers

37b 〈SPARC basics 37a〉+≡ ⊳ 37a 38a ⊲

storage

’f’ is 32 cells of 32 bits called "floating-point registers"

aggregate using RTL.AGGB

’F’ is 1 cell of 32 bits called "floating-point state register (FSR)"

locations

FSR is $F[0]

RD is FSR@loc[30..31] --- rounding modes

fcc is FSR@loc[10..11]

We have chosen to use the floating-point rounding operator as rndRD, that is, we make it a function ofthe value of the rounding-mode bits. Strictly speaking, the proper machine-independent thing to do is touse two functions, by making rnd depend on the IEEE rounding modes, then use roundingMode to computethe rounding mode from the RD bits as follows:

37c 〈SPARC basics [[full]] 37c〉≡ 37d ⊲

locations

RD is FSR@loc[30..31]

val roundingMode is [. IEEE754.Round.nearest, IEEE754.Round.zero,

IEEE754.Round.up, IEEE754.Round.down .] sub RD


Similar remarks apply to floating-point compare.

37d 〈SPARC basics [[full]] 37c〉+≡ ⊳ 37c

val floatingCompare is [. IEEE754.Comparison.=, IEEE754.Comparison.<,

IEEE754.Comparison.>, IEEE754.Comparison.unordered .] sub fcc

For compilation, we can get away with the scheme we’re using by arranging for the RTL operators denotingthe rounding modes and comparison results to have the right bit patterns for the machine we’re interestedin.


Coprocessor registers The number of coprocessor registers (if any) is implementation-dependent.

38a 〈SPARC basics 37a〉+≡ ⊳ 37b 43d ⊲

storage

’c’ is cells of 32 bits called "coprocessor registers (implementation-dependent)"


’C’ is 1 cell of 32 bits called "coprocessor status register"

locations

CSR is $C[0]

More Loads and Stores

Load Floating-point Instructions

38b 〈instruction defaults 38b〉≡ 38c ⊲

ld^[f df] (address, fd) is $f[fd] := [word dword] $m[address]


The full semantics of the ldfsr instruction is quite dramatic. Luckily it’s irrelevant for many purposes.

38c 〈instruction defaults 38b〉+≡ ⊳ 38b 38e ⊲

ldfsr (address) is FSR := $m[address]

Here’s an imaginary picture of the full semantics. RTL operators of these types are flagrantly illegal.

38d 〈wishful thinking about ldfsr 38d〉≡rtlop drain_fpu_pipeline : effect

-- ‘wait for all FPop instructions that have not finished execution

-- to complete’ (p92)

rtlop delay : #n bits * effect -> effect -- delayed completion of an effect

default attribute of

ldfsr (address) is

drain_fpu_pipeline; FSR := $m[address] | delay(3, branch_fcc := fcc)

Load coprocessor Note that the trap semantics is the same as for other loads, and they could be merged.

38e 〈instruction defaults 38b〉+≡ ⊳ 38c 38g ⊲

ld^[c dc] (address, cd) is $c[cd] := [word dword] $m[address]

ldcsr (address) is CSR := $m[address]

38f 〈trap semantics 38f〉≡ 39h ⊲

ld^[c dc] (address, cd) is alignTrap(address, [4 8])

ldcsr (address) is alignTrap(address, 4)

Store floating-point

38g 〈instruction defaults 38b〉+≡ ⊳ 38e 38h ⊲

st^[f df] (fd, address) is $m[address] := [word dword] $f[fd]

stfsr (address) is $m[address] := FSR

Store coprocessor

38h 〈instruction defaults 38b〉+≡ ⊳ 38g 39a ⊲

st^[c dc] (cd, address) is $m[address] := [word dword] $c[cd]

stcsr (address) is $m[address] := CSR


Load and store floating-point pair In calling sequences, it can be necessary to do loads and stores ofunaligned doubles. This can be done with a pair of load instructions that refer to the two registers of thepair needed to hold the double. Unfortunately there’s a problem: if the load instructions refer to temporaryregisters, there’s no way to guarantee that the adjacent pair of temporaries map to an adjacent pair of realregisters. To solve this problem, I define the synthetic instructions ldnext and stnext, which take a registernumber but load and store the adjacent register.

To refer to the floating-point register pair (fd, fd + 1), I simply cast $f[fd] into a 64-bit location.

39a 〈instruction defaults 38b〉+≡ ⊳ 38h 39b ⊲

ldnext(fd, address) is ($f[fd] : #64 loc)@loc[0..31] := $m[address]

stnext(fd, address) is $m[address] := ($f[fd] : #64 loc)@bits[0..31]

I must temporarily hack ldnext because I can’t generate C code that assigns to a slice. The hack must bevoided because we can’t deal with temporaries here.

39b 〈instruction defaults 38b〉+≡ ⊳ 39a 39c ⊲

-- ldnext(fd, address) is $f[fd+1] := $m[address]

More integer ALU instructions

SETHI SETHI sets the most significant 22 bits, a computation we represent by taking the val argumentand forcing the least significant 10 bits to 0.

39c 〈instruction defaults 38b〉+≡ ⊳ 39b 39d ⊲

sethi(n, rd) is $r[rd] := bitInsert {wide is n, lsb is 0} (0 : #10 bits)

Shift instructions Note there is no change to condition codes.

39d 〈instruction defaults 38b〉+≡ ⊳ 39c 39g ⊲

[sll srl sra] (rs1, reg_or_imm, rd) is

$r[rd] := [shl shrl shra]($r[rs1], reg_or_imm@bits[5 bits at 0])

Tagged add “A tag overflow occurs if bit 1 or bit 0 of either operand is nonzero, or if the additiongenerates an arithmetic overflow.” Again, this information could be left abstract.

39e 〈SPARC utilities [[simple]] 39e〉≡ 40a ⊲

rtlop tag_overflows : #32 bits * #32 bits -> bool


39f 〈SPARC utilities [[full]] 39f〉≡ 40b ⊲

fun tag_overflows(left, right) is

left@bits[0..1] <> 0 orelse right@bits[0..1] <> 0

orelse add_overflows(left, right, 0)


39g 〈instruction defaults 38b〉+≡ ⊳ 39d 40d ⊲

[taddcc taddcctv] (rs1, reg_or_imm, rd) is

add_instruction(rs1, reg_or_imm, 0, rd, {set_codes is true})

39h 〈trap semantics 38f〉+≡ ⊳ 38f 40f ⊲

taddcctv (rs1, reg_or_imm, rd) is

(tag_overflows($r[rs1], reg_or_imm) orelse add_overflows($r[rs1], reg_or_imm, 0)



Subtract Similar stuff. The near-duplication with the machinery for addition is unfortunate.

40a 〈SPARC utilities [[simple]] 39e〉+≡ ⊳ 39e

rtlop sub_overflows : #32 bits * #32 bits * #1 bits -> bool

40b 〈SPARC utilities [[full]] 39f〉+≡ ⊳ 39f

fun sub_overflows (x, y, b) is

let val {result, borrow} is subtract(x, y, b)

in x@bits[31] <> y@bits[31] andalso x@bits[31] <> result@bits[31]

end

40c 〈SPARC utilities 40c〉≡ 41b ⊲

fun subtract_instruction (rs1, operand2, borrow, rd, {set_codes}) is

let val {result, borrow is borrow’} is subtract($r[rs1], operand2, borrow)

in $r[rd] := result

| set_codes -->

set_cc(result, bit(sub_overflows($r[rs1], operand2, borrow)), borrow’)

end


The explicit fetch is here because of a flaw in the type-inference algorithm currently used by λ-RTL. Thisalgorithm is slated for replacement.

40d 〈instruction defaults 38b〉+≡ ⊳ 39g 40e ⊲

[sub subcc subx subxcc] (rs1, reg_or_imm, rd) is

subtract_instruction(rs1, reg_or_imm, [0 carryBit], rd, {set_codes is [false true]})

Tagged subtract

40e 〈instruction defaults 38b〉+≡ ⊳ 40d 41c ⊲

[tsubcc tsubcctv] (rs1, reg_or_imm, rd) is

subtract_instruction(rs1, reg_or_imm, 0, rd, {set_codes is true})

40f 〈trap semantics 38f〉+≡ ⊳ 39h 43a ⊲

tsubcctv (rs1, reg_or_imm, rd) is

(tag_overflows($r[rs1], reg_or_imm) orelse sub_overflows($r[rs1], reg_or_imm, 0)



Multiply step Here’s a synopsis of the semantics in the manual.

1. Establish the multiplier.

2. Compute a value by shifting N xor V into $r[rs1].

3. Add either the value or 0 to the multiplier.

4. Put the sum in $r[rd].

5. Update the condition codes according to the addition.

6. Update the Y register by shifting in the least significant bit of the original $r[rs1].

We use an auxiliary shiftIn function to shift a bit into a 32-bit register.

41a 〈instruction defaults [[full]] 41a〉≡ 42a ⊲

mulscc(rs1, reg_or_imm, rd) is

let fun shiftIn (n, bit) is bitInsert {wide is shrl(n, 1), lsb is 31} bit

val multiplier is reg_or_imm -- step 1

val v2 is shiftIn($r[rs1], xor(icc.N, icc.V)) -- step 2

val add_args is

(multiplier, if Y@bits[0] = 0 then 0 else v2 fi, 0) -- step 3

val {result, carry} is add add_args

in $r[rd] := result | -- step 4

set_cc(result, bit(add_overflows add_args), carry) | -- step 5

Y := shiftIn(Y, $r[rs1]@bits[0]) -- step 6

end


Divide instructions We haven’t attempted to specify the overflow semantics exactly—see page 116 ofthe V8 manual. Instead, we introduce machine-dependent RTL operators to stand for this semantics.

41b 〈SPARC utilities 40c〉+≡ ⊳ 40c

rtlop [sparc_udiv_overflow sparc_sdiv_overflow] : #64 bits * #32 bits -> #1 bits

fun divide({signed}, rs1, r_o_i, rd, {set_codes}) is

let val (operator, overflow) is if signed then ((quot), sparc_sdiv_overflow)

else ((divu), sparc_udiv_overflow)

fi

val result is (operator(Reg64.get rs1, zx r_o_i))@bits[32 bits at 0] -- ??

val V is overflow(Reg64.get rs1, r_o_i)

in $r[rd] := result | set_codes --> set_cc (result, V, 0)

end

41c 〈instruction defaults 38b〉+≡ ⊳ 40e 42c ⊲

[u s]^div^["" cc] (rs1, reg_or_imm, rd) is

divide({signed is [false true]}, rs1, reg_or_imm, rd, {set_codes is [false true]})

Extra parentheses are needed because the division operators are normally infix.


More control flow

Branch on floating condition codes To define the floating-point branches, we emulate the specificationin the manual. One day we’ll define extra attributes to cope with the fact that a delay is required betweensetting condition codes and testing them with this sort of instruction.

42a 〈instruction defaults [[full]] 41a〉+≡ ⊳ 41a

fb^[a n u g ug l ul lg ne e ue ge uge le ule o] (target, annul) is

let val [L E G U] is

fcc = [(IEEE754.Comparison.<) (IEEE754.Comparison.=)

(IEEE754.Comparison.>) (IEEE754.Comparison.unordered)]

val or is (orelse)

infixl 2 or

in [true false U G

(G or U) L (L or U) (L or G)

(L or G or U) E (E or U) (E or G)

(E or G or U) (E or L) (E or L or U) (E or L or G)

] --> nPC := target

end

If we don’t want to see the details of the tests, we can make them abstract RTL operators.

42b 〈instruction defaults [[simple]] 42b〉≡fb^[a n u g ug l ul lg ne e ue ge uge le ule o] (target, annul) is

let rtlop [fba fbn fbu fbg fbug fbl fbul fblg

fbne fbe fbue fbge fbuge fble fbule fbo] : #2 bits -> bool

in [fba fbn fbu fbg fbug fbl fbul fblg fbne fbe fbue fbge fbuge fble fbule fbo] fcc

--> nPC := target

end


Branch on coprocessor condition codes Omitted. (It’s more of the same.)

Return from trap rett is slightly different from restore in that there’s no destination register, so noneof the asssignments need be guarded.

42c 〈instruction defaults 38b〉+≡ ⊳ 41c 42d ⊲

rett (address) is

let fun restoreOut n is Reg.out n := Reg.in’ n

fun restoreLocal n is Reg.local’ n := $w[winptr-8 +zx n]

fun restoreIn n is Reg.in’ n := $w[winptr-16+zx n]

in do8 restoreOut | do8 restoreLocal | do8 restoreIn | winptr := winptr - 16 |

S := PS | ET := 1 | nPC := address

end

Trap on integer condition codes The traps use the same conditional tests as the integer branch in-structions.

42d 〈instruction defaults 38b〉+≡ ⊳ 42c 43b ⊲

t^[a n ne e g le ge l gu leu cc cs pos neg vc vs] (address) is RTL.SKIP


43a 〈trap semantics 38f〉+≡ ⊳ 40f 43g ⊲

t^[a n ne e g le ge l gu leu cc cs pos neg vc vs] (address) is

let val t is IccTest.tests

in [t.A t.N t.NE t.E t.G t.LE t.GE t.L

t.GU t.LEU t.CC t.CS t.POS t.NEG t.VC t.VS

] --> trap(trap_instruction address@bits[7 bits at 0])

end

State Registers

Read State Register

43b 〈instruction defaults 38b〉+≡ ⊳ 42d 43c ⊲

rd^[y psr wim tbr] (rd) is $r[rd] := [Y PSR WIM TBR]

Write State Register

43c 〈instruction defaults 38b〉+≡ ⊳ 43b 43e ⊲

wr^[y psr wim tbr] (rs1, reg_or_imm, rd) is [Y PSR WIM TBR] := xor($r[rs1], reg_or_imm)

Miscellaneous instructions

Store Barrier STBAR can’t really be modeled by an ordinary effect. Instead, we do just what the manualdoes, which is model it as an assignment to a special bit in the processor state.

43d 〈SPARC basics 37a〉+≡ ⊳ 38a

storage ’b’ is 1 cell of 1 bit called "store-barrier-pending flag"

locations store_barrier_pending is $b[0]

43e 〈instruction defaults 38b〉+≡ ⊳ 43c 43f ⊲

stbar () is store_barrier_pending := 1

Unimplemented instruction

43f 〈instruction defaults 38b〉+≡ ⊳ 43e 44a ⊲

unimp (imm22) is RTL.SKIP

43g 〈trap semantics 38f〉+≡ ⊳ 43a

unimp(imm22) is trap(illegal_instruction)

Floating point

The SPARC uses IEEE floating point, as imported from StdOperators.IEEE754


Conversions Because the conversion operators can change the sizes as well as the formats of their argu-ments, it’s easiest to use the casting functions defined earlier.

44a 〈instruction defaults 38b〉+≡ ⊳ 43f 44b ⊲

f ^[s d q]^toi (fs2, fd) is $f[fd] := f2i([word dword qword] $f[fs2],

IEEE754.Round.zero)

fito^[s d q] (fs2, fd) is $f[fd] := [word dword qword] (i2f($f[fs2], RD))

fsto^[d q] (fs2, fd) is $f[fd] := [dword qword] (f2f(word $f[fs2], RD))

fdto^[s q] (fs2, fd) is $f[fd] := [word qword] (f2f(dword $f[fs2], RD))

fqto^[s d] (fs2, fd) is $f[fd] := [word dword] (f2f(qword $f[fs2], RD))

Moves

44b 〈instruction defaults 38b〉+≡ ⊳ 44a 44c ⊲

fmovs (fs2, fd) is $f[fd] := $f[fs2]

fnegs (fs2, fd) is $f[fd] := fneg $f[fs2]

fabss (fs2, fd) is $f[fd] := fabs $f[fs2]

Square root Here, it’s easiest just to instantiate the square-root operator with the proper size.

44c 〈instruction defaults 38b〉+≡ ⊳ 44b 44d ⊲

fsqrt^[s d q] (fs2, fd) is $f[fd] := fsqrt [#32 #64 #128] ($f[fs2], RD)

Arithmetic The SPARC manual is silent about whether the results of addition and subtraction arerounded. We assume they are.

44d 〈instruction defaults 38b〉+≡ ⊳ 44c 44e ⊲

f^[add sub mul div]^[s d q] (fs1, fs2, fd) is

$f[fd] := [fadd fsub fmul fdiv] [#32 #64 #128] ($f[fs1], $f[fs2], RD)

The “exact multiply” instructions do not round, and it is easiest to give the size of arguments and resultsexplicitly.

44e 〈instruction defaults 38b〉+≡ ⊳ 44d 44f ⊲

fsmuld (fs1, fs2, fd) is $f[fd] := fmulx #32 #64 ($f[fs1], $f[fs2])

fdmulq (fs1, fs2, fd) is $f[fd] := fmulx #64 #128 ($f[fs1], $f[fs2])

Comparison The fcmp* and fcmpe* family have the same standard semantics, but different trap seman-tics. We haven’t specified the trap semantics yet.

44f 〈instruction defaults 38b〉+≡ ⊳ 44e

fcmp ^[s d q] (fs1, fs2) is fcc := fcmp [#32 #64 #128] ($f[fs1], $f[fs2])

fcmpe^[s d q] (fs1, fs2) is fcc := fcmp [#32 #64 #128] ($f[fs1], $f[fs2])

Chapter 6

Describing the Intel Pentium

This chapter shows how we use λ-RTL to describe some aspects of the Pentium instruction set that don’thave obvious analogs on the SPARC. Most Pentium instructions come in byte, word (16-bit), and doublewordvariants, and these variants treat parts of the Pentium registers as first-class locations. We show how towork with these locations and how to give them their usual names. A single effective address can refer todifferent parts of a register depending on the opcode with which it is used, so we specify the meanings ofeffective addresses as triples from which one element is selected depending on context.

By a mixture of opcodes and instruction prefixes, the Pentium offers many variants of each basic operation,so factoring the specification is a concern. We use the logical instructions to illustrate aggressive factoring,different notations for instructions that implement binary operations, and a slightly different treatment ofcondition codes.

45

46 CHAPTER 6. DESCRIBING THE INTEL PENTIUM

The large-scale structure of the Pentium description resembles that of the SPARC description, except thatwe add some examples that use infix and.

46a 〈intel.rtl 46a〉≡module Pentium is

import [RTL StdOperators]

from StdOperators import

[ --> + - := bit < > <= >= | = <> ?

add and bitInsert conjoin disjoin div divu mod modu mul mulu or popcnt

xor com shrl shl subtract sx zx

]

infixl 1.2 conjoin

infixl 1.1 disjoin

val * is mul

infixl 7 *

〈basics 46b〉〈new RTL operators 58f〉〈effective-address utilities 49b〉〈effective addresses 50a〉val [OF CF AF SF ZF PF] is [Reg.OF Reg.CF Reg.AF Reg.SF Reg.ZF Reg.PF]

〈utilities 49a〉local infixl 3 and

in default attribute of 〈instruction defaults using infix and 52c〉end

default attribute of 〈instruction defaults 55c〉end

6.1 Storage spaces

We begin with the basics—registers and memory. The Intel registers require no special fetch or store methods.The machine uses little-endian byte order.

46b 〈basics 46b〉≡ (46a) 46c ⊲

storage

’r’ is 8 cells of 32 bits called "registers"

’m’ is cells of 8 bits called "memory" aggregate using RTL.AGGL

Registers

In the Intel architecture, parts of registers have their own names. Figure 6.1 shows the various parts of thegeneral-purpose registers and the names used to refer to them. We put the definitions for the registers intotheir own λ-RTL sub-structure so we can more easily manage the name space.

46c 〈basics 46b〉+≡ (46a) ⊳ 46b 51c ⊲

module Reg is

〈registers 48a〉end

6.1. STORAGE SPACES 47

Register

AX︷︸︸︷

0 AH AL32 16 8 0

︸︷︷︸

EAXCX

︷︸︸︷

1 CH CL32 16 8 0

︸︷︷︸

ECXDX

︷︸︸︷

2 DH DL32 16 8 0

︸︷︷︸

EDXBX

︷︸︸︷

3 BH BL32 16 8 0

︸︷︷︸

EBX

4 SP32 16 0

︸︷︷︸

ESP

5 BP32 16 0

︸︷︷︸

EBP

6 SI32 16 0

︸︷︷︸

ESI

7 DI32 16 0

︸︷︷︸

EDI

Figure 6.1: Intel Pentium registers


It is straightforward to define these locations in λ-RTL.

48a 〈registers 48a〉≡ (46c) 48b ⊲

locations

[ EAX ECX EDX EBX ESP EBP ESI EDI ] is $r[[0..7]]

-- EAX is $r[0], etc...

[ AX CX DX BX SP BP SI DI ] is $r[[0..7]] @ loc [0..15]

[ AL CL DL BL ] is [EAX ECX EDX EBX] @ loc [8 bits at 0]

[ AH CH DH BH ] is [EAX ECX EDX EBX] @ loc [8 bits at 8]

Dealing with the instruction encoding is more complex. Registers are referred to by number, but as canbe seen from Figure 6.1, the number doesn’t uniquely determine the location. In fact, a register numbermaps to one of three locations, depending on context. These three arrays specify which register, or partthereof, is denoted by which number in which context:

48b 〈registers 48a〉+≡ (46c) ⊳ 48a 48c ⊲

val byte is [. AL, CL, DL, BL, AH, CH, DH, BH .]

val word is [. AX, CX, DX, BX, SP, BP, SI, DI .]

val dword is [. EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI .]

To find out which location is denoted by a register number, we use the number to index the array chosen torepresent the proper context.

Status and Control Registers

The Pentium has two 32-bit status and control registers, EFLAGS and EIP. EIP is the instruction pointer.EFLAGS is a collection of 1-bit status flags and 1- or 2-bit control and system flags.

48c 〈registers 48a〉+≡ (46c) ⊳ 48b 48d ⊲

storage

’c’ is 2 cells of 32 bits called "control registers"

locations

[ EFLAGS EIP ] is $c[[0..1]]

What are commonly called condition codes are referred to as status flags in Intel documentation. ThePentium has six.

48d 〈registers 48a〉+≡ (46c) ⊳ 48c

locations

[ CF PF AF ZF SF OF ] is EFLAGS @ loc [1 bit at [0 2 4 6 7 11]]

-- carry parity auxiliary-carry zero sign overflow


Most instructions set flags in stylized ways, as shown in Table 3-2 on page 3-14 of the Pentium manual(Intel 1993). In particular, the values of SF, ZF, and PF are determined by simple tests of the result of anoperation. We capture this common semantics in the λ-RTL function set flags. OF, AF, and CF are setbased on conditions that are different for different kinds of instructions, so we pass their values to set flags.We pass a record, not a tuple, to set flags, so the order in which values are given does not matter.

49a 〈utilities 49a〉≡ (46a) 52a ⊲

fun parity n is (popcnt n)@bits[0] -- (number of bits set) mod 2

fun set_flags {result, o, a, c} is

SF := bit (result < 0) |

ZF := bit (result = 0) |

PF := bit (parity (result@bits[0..7]) = 0) |

OF := o | AF := a | CF := c

Defining the other flags within EFLAGS is straightforward, and they are omitted from this example.

Effective addresses

Within an effective address, a register number may denote one of three locations, depending on context. Wedefine functions regb, regw, and regd to be used in these contexts.

49b 〈effective-address utilities 49b〉≡ (46a) 51b ⊲

---- now hack me!

fun hacksub8 (a, n) is

if n = 0 then a sub 0

else if n = 1 then a sub 1






else a sub 7

fi fi fi fi fi fi fi

val (sub) is (hacksub8)

infixn 0 [sub hacksub8]

fun regb n is Reg.byte sub n -- returns an 8-bit location

fun regw n is Reg.word sub n -- returns a 16-bit location

fun regd n is Reg.dword sub n -- returns a 32-bit location


The context is usually determined by the opcode, not within the effective address itself. We would never-theless like an effective address to have a denotation (a default attribute) independent of any context. Oursolution to this problem is to let the denotation be a record with three elements—one for each context. TheR function creates such a record from a register number. We name the elements b, w, and d to stand for thebyte, word, and doubleword locations.

50a 〈effective addresses 50a〉≡ (46a) 50b ⊲

fun R n is {b is regb n, w is regw n, d is regd n}

An address in memory also stands for one of three locations, depending on context. (Because a locationincludes size as well as address, we get one byte, two bytes, or four bytes, all at the same address.) TheM function does for memory what R does for registers: turn a number into a collection of locations, one ofwhich will be used based on context. We use type casts to give the sizes of the locations.

50b 〈effective addresses 50a〉+≡ (46a) ⊳ 50a 50c ⊲

fun M address is { b is $m[address] : #8 loc

, w is $m[address] : #16 loc

, d is $m[address] : #32 loc

}

Finally, in order to make immediate operands look like register or memory operands, we define an I

function, which creates a record that provides the same immediate value, no matter what the context.There’s a little problem with polymorphism, in that the argument immed cannot simultaneously have types#8 bits, #16 bits, and #32 bits. We work around the problem by inserting a specious zero extension(zx). Eventually, we’ll probably need different functions for each type of immediate value.

50c 〈effective addresses 50a〉+≡ (46a) ⊳ 50b 51a ⊲

fun I immed is { b is zx immed : #8 bits

, w is zx immed : #16 bits

, d is zx immed : #32 bits

}

fun Is immed is { b is sx immed : #8 bits

, w is sx immed : #16 bits

, d is sx immed : #32 bits

}


Effective addresses denote registers or memory locations. In order to support LEA (load effective address),effective addresses for memory operands are represented by the address alone. These have constructor typeMem. The true effective addresses have constructor type addr.

51a 〈effective addresses 50a〉+≡ (46a) ⊳ 50c

operand Mem : #32 bits

operand addr : { b : #8 loc, w : #16 loc, d : #32 loc }


Indir (r) : Mem is regd r

Disp8 (d8, r) : Mem is sx d8 + regd r

Disp32 (d32, r) : Mem is d32 + regd r

Index ( base, index, ss) : Mem is regd base + regd index << ss

Index8 (d8, base, index, ss) : Mem is regd base + sx d8 + regd index << ss

Index32 (d32, base, index, ss) : Mem is regd base + d32 + regd index << ss

ShortIndex (d8, index, ss) : Mem is sx d8 + regd index << ss

Abs32 (a) : Mem is a

E (Mem) : addr is M Mem

Reg (r) : addr is R r

We define C notation for 32-bit shifts.

51b 〈effective-address utilities 49b〉+≡ (46a) ⊳ 49b

val [>> <<] is [shrl shl]

infixl 8 [>> <<]

We can’t have constructors without naming the types of the operands. Here are the types recoveredautomatically from the SLED specification.

51c 〈basics 46b〉+≡ (46a) ⊳ 46c 51d ⊲

operand ss : #2 bits

operand [ base index r16 r32 r8 sr16] : #3 bits

operand i8 : #8 bits



And here are additional types written in by hand.

51d 〈basics 46b〉+≡ (46a) ⊳ 51c 51e ⊲

operand d8 : #8 bits

operand [a d32] : #32 bits

operand [l r reg] : #3 bits

Traps

A simple, cheezy model.

51e 〈basics 46b〉+≡ (46a) ⊳ 51d

module Trap is

val [DE DB NMI DP OF BR UD NM DF _ TS NP SS GP PF _ MF AC MC] is [0..18]

storage ’t’ is 1 cell of 8 bits called "trap code"

locations trap_code is $t[0]

fun trap k is $t[0] := k

val exn is trap

end


Instructions

The Pentium architecture provides more opportunities for factoring than does the SPARC. Most instructionscome in three size variants, and each variant uses one of the three contexts. These variants would be tedious tospecify repetitiously, so we use λ-RTL’s higher-order functions and generators to reduce repetition. The restof this chapter offers several alternative ways to handle this kind of variation, using the logical instructionsas a running example.

A very common pattern of execution is the “two-operand instruction,” which we can write l ← l ⊕ r

where ⊕ is a generic binary operator, l is an effective address dependent on context, and r could be acontext-dependent effective address or an immediate value that has been “put in context” by the I function.

The λ-RTL function llr embodies this pattern of computation. It takes two (curried) arguments: thetriple (l,⊕, r), and a size function that determines the context by choosing the b, w, or d element from eachoperand record. Actually, we wind up making size a pair of functions (sl and sr), because Milner’s typeinference can’t infer a polymorphic type for a parameter, and therefore using a single size function wouldforce both left and right to have the same type. We don’t want that, because we want freedom for rightto be either a location like left or for it to be a value.

52a 〈utilities 49a〉+≡ (46a) ⊳ 49a 54b ⊲

fun llr (left, op, right) (size as (sl, sr)) is sl left := op((sl left), sr right)

fun [b w d] {b, w, d} is [b w d] -- b, w, d select correct size from record

val [b w d] is [(b,b) (w,w) (d,d)] -- b, w, d renamed to pairs

The function pairs b, w, and d are used below as size arguments.

Logical instructions

The Pentium has 42 instructions that implement the three logical operations AND, OR, and XOR. (λ-RTLcounts different size variants as different instructions.) All these instructions have side effects on the statusflags (condition codes) as well as the l ← l⊕ r effect. This section shows progressively more aggressive waysto use λ-RTL to describe such large groups of similar instructions.

In λ-RTL, one can declare any identifier to be an infix operator. Infix operators can be convenient whenspecifying single instructions, as shown in the specification of these three variants of AND, which operate onthe A register.

52b 〈junk: instruction defaults using infix and 52b〉≡ 53a ⊲

ANDiAL (i8) is Reg.AL := Reg.AL and i8

ANDiAX (i16) is Reg.AX := Reg.AX and i16

ANDiEAX (i32) is Reg.EAX := Reg.EAX and i32

Infix notation is less convenient when describing groups of instructions. Here we use the llr functiondefined above to define some variants that work on two or three sizes of values. The parentheses around(and) convert it from an infix operator to an ordinary value that can be passed as a parameter to llr.

52c 〈instruction defaults using infix and 52c〉≡ (46a)

ANDi ^[b w d] (addr, i) is llr (addr, (and), I i) [b w d]

ANDio^[w d]^b (addr, i) is llr (addr, (and), I (sx i)) [ w d]

ANDmr^[b ow od] (addr, reg) is llr (addr, (and), R reg) [b w d]

ANDrm^[b ow od] (addr, reg) is llr (R reg, (and), addr) [b w d]


The I and R functions create three-element records that are selected from by the size functions also passedto llr.1 Groups in square brackets are used to define multiple variants at once. Each variant uses either b,w, or d as a size function.

Note that the parameter addr also represents a three-element record. This type is inferred.The llr function helps factor the specification, but the ordinary function-application syntax is not very

evocative of what is going on. Here, we use the infix operators <==, <, and > with non-standard meanings,creating a sort of “mixfix” notation that is more suggestive of the semantics:

dst <== l <⊕> r.

We use the notation first, then define it.

53a 〈junk: instruction defaults using infix and 52b〉+≡ ⊳ 52b

ANDiAL (i8) is Reg.AL := Reg.AL and i8

ANDiAX (i16) is Reg.AX := Reg.AX and i16

ANDiEAX (i32) is Reg.EAX := Reg.EAX and i32

ANDi^[b w d](addr, i) is bin (addr <== addr <(and)> I i) [b w d]

ANDio^[ w d]^b (addr, i) is bin (addr <== addr <(and)> I (sx i)) [ w d]

ANDmr^[b ow od] (addr, reg) is bin (addr <== addr <(and)> R reg) [b w d]

ANDrm^[b ow od] (addr, reg) is bin (R reg <== R reg <(and)> addr) [b w d]

We implement the notation by defining <==, <, and > to build up records, then defining bin to createthe effect describing the assignment. Since < and > have existing definitions from Chapter 4, we save thosedefinitions in lt and gt.

53b 〈junk: utilities 53b〉≡nonfix [< >]

val [lt gt] is [< >] -- save these for later

fun < (l, operator) is { l is l, operator is operator }

fun > ({l, operator}, r) is { l is l, operator is operator, r is r }

fun <== (dst, {l, operator, r}) is { dst is dst, l is l, operator is operator, r is r}

infixn ~10 <

infixn ~11 >

infixn ~12 <==

fun bin {dst, l, operator, r} (sdl, sr) is sdl dst := operator (sdl l, sr r)

Note that dst and l must have the same type. This restriction is just fine, since on the Pentium they alwayshave the same value—the reason both dst and l are parameters is purely notational.

1N.B. we probably are going to have to split up the immediate instructions because the types of their second operands aredifferent.


It turns out that OR and XOR have the same structure as AND, so we factor the description to includethose instructions as well. Once we do that, there’s no point in making the operators infix.

54a 〈junk: instruction defaults 54a〉≡ 55a ⊲

[AND OR XOR]îAL (i8) is Reg.AL := [and or xor] (Reg.AL, i8)

[AND OR XOR]îAX (i16) is Reg.AX := [and or xor] (Reg.AX, i16)

[AND OR XOR]îEAX (i32) is Reg.EAX := [and or xor] (Reg.EAX, i32)

[AND OR XOR]î^[b w d](addr, i) is

bin (addr <== addr <[and or xor]> I i) [b w d]

[AND OR XOR]î^[ow od]^b (addr, i) is

bin (addr <== addr <[and or xor]> I (sx i)) [ w d]

[AND OR XOR]^mr^[b ow od] (addr, reg) is

bin (addr <== addr <[and or xor]> R reg) [b w d]

[AND OR XOR]^rm^[b ow od] (addr, reg) is

bin (R reg <== R reg <[and or xor]> addr) [b w d]

Now that we have machinery in place, a minor extension will enable us also to specify the effects of theseinstructions on the status flags (condition codes). Section 4.4.1 of the Pentium manual says “The AND, OR,and XOR instructions clear the OF and CF flags, leave the AF flag undefined, and update the SF, ZF, andPF flags.” The function logical flags implements this behavior.

54b 〈utilities 49a〉+≡ (46a) ⊳ 52a 54c ⊲

fun logical_flags n is set_flags {result is n, o is 0, c is 0, a is ?}

In our description of the SPARC, we defined a “main effect” function like bin and passed it a functionthat set the condition code. Here, we use a slightly different approach—we define bin’ to return both themain effect and the result of the instruction, and we pass bin’ and its arguments to a function that sets thecondition code.

54c 〈utilities 49a〉+≡ (46a) ⊳ 54b 54d ⊲

fun bin’ {dst, l, operator, r} (sdl, sr) is

let val result is operator (sdl l, sr r)

in {result is result, effect is sdl dst := result}

end

logical combines the “main effect” and the effects on the flags.

54d 〈utilities 49a〉+≡ (46a) ⊳ 54c 55b ⊲

fun logical main arg size is

let val {result, effect} is main arg size

in effect | logical_flags result

end


Here’s the final definition of 42 Pentium instructions, including their effects on the flags. We have re-castthe first three groups so we can apply logical. Because Reg.AL represents a single location, not a triple, weuse (\x.x,\x.x) (the identity function) as a “size selector.” The same trick applies to Reg.AX and Reg.EAX.

55a 〈junk: instruction defaults 54a〉+≡ ⊳ 54a

[AND OR XOR]îAL (i8) is

logical bin’ (Reg.AL <== Reg.AL <[and or xor]> i8) (\x.x,\x.x)

[AND OR XOR]îAX (i16) is

logical bin’ (Reg.AX <== Reg.AX <[and or xor]> i16) (\x.x,\x.x)

[AND OR XOR]îEAX (i32) is

logical bin’ (Reg.EAX <== Reg.EAX <[and or xor]> i32) (\x.x,\x.x)

[AND OR XOR]î^[b w d](addr, i) is

logical bin’ (addr <== addr <[and or xor]> I i) [b w d]

[AND OR XOR]î^[ow od]^b (addr, i) is

logical bin’ (addr <== addr <[and or xor]> I (sx i)) [ w d]

[AND OR XOR]^mr^[b ow od] (addr, reg) is

logical bin’ (addr <== addr <[and or xor]> R reg) [b w d]

[AND OR XOR]^rm^[b ow od] (addr, reg) is

logical bin’ (R reg <== R reg <[and or xor]> addr) [b w d]

Finally, we can try for aggressive factoring without the infix notation. The function llr’ is curried, sothat applications won’t be cluttered with parentheses and commas, and it takes an extra argument for anadditional effect.

55b 〈utilities 49a〉+≡ (46a) ⊳ 54d 55e ⊲

fun llr’ l op r (sl, sr) extra is sl l := op(sl l, sr r) | extra (op(sl l, sr r))

fun logical’ l op r size is llr’ l op r size logical_flags

val idsize is (\x.x, \x.x)

55c 〈instruction defaults 55c〉≡ (46a) 55d ⊲

[AND OR XOR]îAL (i8) is logical’ Reg.AL [and or xor] i8 idsize

[AND OR XOR]îAX (i16) is logical’ Reg.AX [and or xor] i16 idsize

[AND OR XOR]îEAX (i32) is logical’ Reg.EAX [and or xor] i32 idsize

[AND OR XOR]î^[b w d](addr, i) is logical’ addr [and or xor] (I i) [b w d]

[AND OR XOR]î^[ow od]^b (addr, i) is logical’ addr [and or xor] (Is i) [w d]

[AND OR XOR]^mr^[b ow od] (addr, reg) is logical’ addr [and or xor] (R reg) [b w d]

[AND OR XOR]^rm^[b ow od] (addr, reg) is logical’ (R reg) [and or xor] addr [b w d]

The remaining logical instruction, NOT, comes in only three variants, and it does not affect the conditioncodes. Even though NOT is unary, we still pass a pair of size functions, so that we can reuse b, w, and d asdefined above.

55d 〈instruction defaults 55c〉+≡ (46a) ⊳ 55c 55f ⊲

NOT^[b ow od] (addr) is unary (com, addr) [b w d]

55e 〈utilities 49a〉+≡ (46a) ⊳ 55b 56a ⊲

fun unary (op, v) (size, _) is size v := op (size v)

Load effective address

55f 〈instruction defaults 55c〉+≡ (46a) ⊳ 55d 56b ⊲

LEAod (reg, Mem) is regd reg := Mem

LEAow (reg, Mem) is regw reg := Mem @bits[16 bits at 0]


6.2 Real instructions, for serious

Now following the SLED spec, more or less.The arithmetic group are the first challenge. We’ve already done the logical.Section 4.4.1 of the Pentium manual says “The AND, OR, and XOR instructions clear the OF and CF

flags, leave the AF flag undefined, and update the SF, ZF, and PF flags.” The function logical flags

implements this behavior.

56a 〈utilities 49a〉+≡ (46a) ⊳ 55e 57a ⊲

fun add_overflows (x, y, c) is

let val {result, carry} is add(x, y, c)

in bit (x < 0) = bit(y < 0) conjoin (bit(x < 0) <> bit(result < 0))

end

fun add_adjust carry_in x y is

StdOperators.carry(x@bits[0..3], y@bits[0..3], carry_in)

fun allr l op r (sl, sr) extra is sl l := op(sl l, sr r) | extra (sl l, sr r)

fun add’ carry_in adj (x, y) is

let val {result, carry} is add(x, y, carry_in)

in set_flags { result is result, a is adj x y , c is carry,

o is bit(add_overflows (x, y, carry_in)) }

end

fun adc (x, y) is (add(x, y, CF)).result

val adcx is add’ CF (add_adjust CF)

val addx is add’ 0 (add_adjust 0)

56b 〈instruction defaults 55c〉+≡ (46a) ⊳ 55f 56c ⊲

ADCîAL (i8) is allr Reg.AL adc i8 idsize adcx

ADCîAX (i16) is allr Reg.AX adc i16 idsize adcx

ADCîEAX (i32) is allr Reg.EAX adc i32 idsize adcx

ADCî^[b w d](addr, i) is allr addr adc (I i) [b w d] adcx

ADCî^[ow od]^b (addr, i) is allr addr adc (Is i) [w d] adcx

ADC^mr^[b ow od] (addr, reg) is allr addr adc (R reg) [b w d] adcx

ADC^rm^[b ow od] (addr, reg) is allr (R reg) adc addr [b w d] adcx

Add is nearly identical, but no carry in—add still carries out.

56c 〈instruction defaults 55c〉+≡ (46a) ⊳ 56b 57b ⊲

ADDîAL (i8) is allr Reg.AL (+) i8 idsize addx

ADDîAX (i16) is allr Reg.AX (+) i16 idsize addx

ADDîEAX (i32) is allr Reg.EAX (+) i32 idsize addx

ADDî^[b w d](addr, i) is allr addr (+) (I i) [b w d] addx

ADDî^[ow od]^b (addr, i) is allr addr (+) (Is i) [w d] addx

ADD^mr^[b ow od] (addr, reg) is allr addr (+) (R reg) [b w d] addx

ADD^rm^[b ow od] (addr, reg) is allr (R reg) (+) addr [b w d] addx

6.2. REAL INSTRUCTIONS, FOR SERIOUS 57

The add, subtract, with and without carry/borrow, and also cmp, all fit mostly the same template:

l ← l ± (r + c)|set-flags .

The bit c is either the CF bit or zero. Compare is a subtract that doesn’t set the result.

57a 〈utilities 49a〉+≡ (46a) ⊳ 56a 58c ⊲

fun sub_overflows (x, y, c) is

let val {result, borrow} is subtract(x, y, c)

in bit (x < 0) = bit(y > 0) conjoin (bit(x < 0) <> bit(result < 0))

end

fun sub_adjust carry_in x y is

StdOperators.borrow(x@bits[0..3], y@bits[0..3], carry_in)

fun sub’ asgn l r c (sl, sr) is

let val {result, borrow} is subtract(sl l, sr r, c)

in asgn --> sl l := result |

set_flags {result is result, a is sub_adjust c (sl l) (sr r), c is borrow,

o is bit(sub_overflows(sl l, sr r, c))}

end

57b 〈instruction defaults 55c〉+≡ (46a) ⊳ 56c 57c ⊲

-- arith is ADD ADC AND XOR OR SBB SUB CMP

[SUB SBB]îAL (i8) is sub’ true Reg.AL i8 [0 CF] idsize

[SUB SBB]îAX (i16) is sub’ true Reg.AX i16 [0 CF] idsize

[SUB SBB]îEAX (i32) is sub’ true Reg.EAX i32 [0 CF] idsize

[SUB SBB]î^[b w d](addr, i) is sub’ true addr (I i) [0 CF] [b w d]

[SUB SBB]î^[ow od]^b (addr, i) is sub’ true addr (Is i) [0 CF] [w d]

[SUB SBB]^mr^[b ow od] (addr, reg) is sub’ true addr (R reg) [0 CF] [b w d]

[SUB SBB]^rm^[b ow od] (addr, reg) is sub’ true (R reg) addr [0 CF] [b w d]

CMPîAL (i8) is sub’ false Reg.AL i8 0 idsize

CMPîAX (i16) is sub’ false Reg.AX i16 0 idsize

CMPîEAX (i32) is sub’ false Reg.EAX i32 0 idsize

CMPî^[b w d](addr, i) is sub’ false addr (I i) 0 [b w d]

CMPî^[ow od]^b (addr, i) is sub’ false addr (Is i) 0 [w d]

CMP^mr^[b ow od] (addr, reg) is sub’ false addr (R reg) 0 [b w d]

CMP^rm^[b ow od] (addr, reg) is sub’ false (R reg) addr 0 [b w d]

And now, Alphabetman.

57c 〈instruction defaults 55c〉+≡ (46a) ⊳ 57b 57d ⊲

AAA is

if and(Reg.AL, 0xf) > 9 disjoin AF = 1 then

Reg.AL := and(Reg.AL + 6, 0xf) | Reg.AH := Reg.AH + 1 | AF := 1 | CF := 1

else

Reg.AL := and(Reg.AL, 0xf) | AF := 0 | CF := 0

fi

N.B. At least on the Pentium Pro, these instructions can be used with 8-bit immediate values other than10. The specification should be changed.

57d 〈instruction defaults 55c〉+≡ (46a) ⊳ 57c 58a ⊲

AAD is Reg.AL := (mulu (Reg.AH, 10))@bits[0..7] + Reg.AL | Reg.AH := 0

AAM is Reg.AH := Reg.AL divu 10 | Reg.AL := Reg.AL modu 10


58a 〈instruction defaults 55c〉+≡ (46a) ⊳ 57d 58b ⊲

AAS is

if and(Reg.AL, 0xf) > 9 disjoin AF = 1 then

Reg.AL := and(Reg.AL - 6, 0xf) | Reg.AH := Reg.AH - 1 | AF := 1 | CF := 1

else

Reg.AL := and(Reg.AL, 0xf) | AF := 0 | CF := 0

fi

58b 〈instruction defaults 55c〉+≡ (46a) ⊳ 58a 58d ⊲

ARPL (addr, r) is

if addr.w@bits[0..1] < (regw r)@bits[0..1] then

ZF := 1 | addr.w@loc[0..1] := (regw r)@bits[0..1]

else

ZF := 0

fi

In the bounds check, size and offset must be different because size is a 32-bit value and offset maybe 16 or 32 bits.

58c 〈utilities 49a〉+≡ (46a) ⊳ 57a 58e ⊲

fun bound index addr size offset is

let val (lower, upper) is ($m[addr], $m[addr+size])

in index < lower disjoin index > (upper + offset) --> Trap.exn Trap.BR

end

58d 〈instruction defaults 55c〉+≡ (46a) ⊳ 58b 58g ⊲

BOUNDow (r, Mem) is bound (regw r) Mem 2 2

BOUNDod (r, Mem) is bound (regd r) Mem 4 4

set zf sets ZF only and leaves other flags undefined. set cf is similar.

58e 〈utilities 49a〉+≡ (46a) ⊳ 58c 59a ⊲

fun set_zf z is ZF := z | SF := ? | PF := ? | OF := ? | AF := ? | CF := ?

fun set_cf z is CF := z | SF := ? | PF := ? | OF := ? | AF := ? | ZF := ?

The RTL operator lsbset n finds the smallest k such that bit k is set in n (i.e., n and 1 << k <> 0).The operator msbset works similarly but for the most significant bit.

58f 〈new RTL operators 58f〉≡ (46a)

rtlop [lsbset msbset] : #n bits -> #k bits

58g 〈instruction defaults 55c〉+≡ (46a) ⊳ 58d 58h ⊲

-- this desperately needs foreach

[BSF BSR]^[od ow] (reg, addr) is

(\scan. \(sl, sr) .

if sr addr = 0 then set_zf 1 | sl (R reg) := ?

else set_zf 0 | sl (R reg) := scan (sr addr)

fi

) [lsbset msbset] [d w]

58h 〈instruction defaults 55c〉+≡ (46a) ⊳ 58g 59c ⊲

BSWAP (reg) is

let fun b k is (regd reg)@bits[8 bits at k]

fun ins lsb byte word is bitInsert {lsb is lsb, wide is word} byte

in regd reg := ins 0 (b 24) (ins 8 (b 16) (ins 16 (b 8) (ins 24 (b 0) 0)))

end

6.2. REAL INSTRUCTIONS, FOR SERIOUS 59

The location of a bit is according to the computations in the manual (Table 11-3 in the Pentium Promanual). bitAt touches only a byte; the others touch words or doublewords.

59a 〈utilities 49a〉+≡ (46a) ⊳ 58e 59b ⊲

fun bitAt {base, offset} is

$m[base + offset div 8]@loc[1 bit at offset mod 8]

fun bit16At {base, offset} is

($m[base + 2 * (offset div 16)] : #16 loc)@loc[1 bit at offset mod 16]

fun bit32At {base, offset} is

($m[base + 4 * (offset div 32)] : #32 loc)@loc[1 bit at offset mod 32]

There are a bunch of things we could do to a bit.

59b 〈utilities 49a〉+≡ (46a) ⊳ 59a

fun bt bit is set_cf bit

fun btc bit is set_cf bit | bit := com bit

fun btr bit is set_cf bit | bit := 0

fun bts bit is set_cf bit | bit := 1

Perhaps surprisingly, when the right operand is immediate, only the low-order 3 or 5 bits are used (ac-cording to the processor manual). The number 3 (@bits[0..2]) is particularly alarming; we should test tosee if the fourth bit is used in word mode.

59c 〈instruction defaults 55c〉+≡ (46a) ⊳ 58h

-- [BT BTC BTR BTS]ôw (Reg(l), r) is [bt btc btr bts] ((regw l)@loc[1 bit at (regw r)@bits[0..3]])

-- [BT BTC BTR BTS]ôd (Reg(l), r) is [bt btc btr bts] ((regd l)@loc[1 bit at (regd r)@bits[0..4]])

-- [BT BTC BTR BTS]ôw (E(a), r) is [bt btc btr bts] (bit16At {base is a, offset is regw r})

-- [BT BTC BTR BTS]ôd (E(a), r) is [bt btc btr bts] (bit32At {base is a, offset is regd r})

-- [BT BTC BTR BTS]îow (Reg(l), i8) is [bt btc btr bts] ((regw l)@loc[1 bit at i8@bits[0..3]])

-- [BT BTC BTR BTS]îod (Reg(l), i8) is [bt btc btr bts] ((regd l)@loc[1 bit at i8@bits[0..4]])

-- [BT BTC BTR BTS]îow (E(a), i8) is [bt btc btr bts] (bit16At {base is a, offset is i8@bits[0..2]})

-- [BT BTC BTR BTS]îod (E(a), i8) is [bt btc btr bts] (bit32At {base is a, offset is i8@bits[0..4]})

Chapter 7

Describing the MIPS R?000

This note uses λ-RTL to specify the MIPS instruction set. Page numbers listed here come from the MIPSR4000 reference manual by Joe Heinrich.

7.1 Overview

The MIPS is a RISC architecture. Like most such architectures, it uses load and stores and register-to-register instructions. It uses delayed branches and delayed loads. Unlike the SPARC, it has (almost) nocondition codes. Users may either test and branch (see Section 7.3) or use register set instructions to capturethe effect of a test (see Section 7.3.) The one condition code the MIPS does have is Condition which isassociated with the floating point coprocessor, as described in Section ??.

7.2 Storage spaces

Storage locations manipulated by MIPS instructions include memory and several kinds of registers. Memoryin the MIPS is byte addressed. It uses either big-endian or little-endian byte order. This is selectable by theuser. We will illustrate big-endian.

60a 〈MIPS basics 60a〉≡ (70b) 60b ⊲

storage

’m’ is cells of 8 bits called "memory" aggregate using RTL.AGGB

Integer Registers

The machine’s registers are modeled as a simple collection of mutable cells. The aggregation attribute meansthat registers can be fetched and stored to in chunks larger than one register:

60b 〈MIPS basics 60a〉+≡ (70b) ⊳ 60a 61b ⊲

storage

’r’ is 32 cells of 32 bits called "registers"

〈MIPS register fetch and store methods 61a〉aggregate using RTL.AGGB

60


On the MIPS register 0 is hardwired to value zero. Dealing with register 0 is fairly simple; there are onlytwo reasonable models. One is the full semantics; the other is a model which specifies only that register 0is immutable. We make register 0 immutable by ensuring that attempts to store into it have no effect. Thefull semantics also shows that fetches return 0.

61a 〈MIPS register fetch and store methods 61a〉≡ (60b)

fetch using \n. if n = 0 then 0 else RTL.TRUE_FETCH ($r[n]) fi

store using \(n, v). n <> 0 --> RTL.TRUE_STORE ($r[n], v)

In this straightforward model of registers, we use these special fetch and store methods, and we also permitregisters to be aggregated into pairs, so we can describe instructions like .

Integer Unit control/status registers

This specification gives the names of the special registers attached to the integer unit. CR is the ConfigRegister. The BE field of CR indicates if the kernel and memory are big-endian or little-endian. The StatusRegister (SR) is also of interest because of the RE field. RE (Reverse-endianness) inverts the user modeendianness. (see page 110).

61b 〈MIPS basics 60a〉+≡ (70b) ⊳ 60b 61c ⊲

storage ’i’ is 6 cells of 32 bits called "Integer-unit control/status register"

locations

[CR SR HI LO PC nPC] is $i[[0..5]]

[RE] is SR@loc[1 bit at [25]]

[BE] is CR@loc[1 bit at [15]]

Aggregating cells in storage spaces

A reference to a cell like $m[a] normally means the cell by itself—in this case, a single byte in memorylocated at address a. But sometimes we want to talk about a 32-bit word at address a, or even a 16-bithalfword, a doubleword, or a quad-word. Often, λ-RTL can figure out what is meant just from the contextin which $m[a] is used—but when it can’t, an explicit cast can be used to make the size explicit, thus: $m[a]: #32 bits. Because the syntax of the casts can be awkward, we define functions that do the casting.

61c 〈MIPS basics 60a〉+≡ (70b) ⊳ 61b 61d ⊲

fun byte x is x : #8 bits

fun hword x is x : #16 bits

fun word x is x : #32 bits

fun dword x is x : #64 bits

we can also use some casts for floating point

61d 〈MIPS basics 60a〉+≡ (70b) ⊳ 61c 67a ⊲

fun single x is x : #32 bits

fun double x is x : #64 bits

Given these functions, the result of, e.g., dword is guaranteed to be a 64-bit value, even if dword is appliedto an 8-bit byte in memory.

62 CHAPTER 7. DESCRIBING THE MIPS R?000

7.3 MIPS instructions

Types of MIPS operands

These types have been gleaned automatically straight from the SLED specification.

62a 〈address modes and operands 62a〉≡ (70b) 62b ⊲

operand [ base fd fs ft rd rs rt shamt]: #5 bits

operand offset : #16 bits

operand reloc : #32 bits

operand breakcode: #20 bits

MIPS addresses and operands

The MIPS has a simple structure for effective addresses and operands. As far as the hardware is concerned,there is only one addressing mode, the address is the value of register base, plus the value of sign-extendinga 16-bit offset.

62b 〈address modes and operands 62a〉+≡ (70b) ⊳ 62a

fun dispA (base, offset) is $r[base] + sx offset

Load instructions

The specifications of the load-integer instructions give us our first opportunity to factor λ-RTL descriptions.On the left-hand side, the phrases [s u] and [b h] are like for loops, and the carets (^) join parts ofconstructor names, so the opcode on the left-hand side expands to the list of constructors lb, lh and lw.Corresponding to [b h w] is the expression group [byte hword word]. These functions, defined above,produce aggregations of 8, 16 and 32 bits.

62c 〈instruction defaults 62c〉≡ (70b) 63a ⊲


lw (rt, offset, base) is $r[rt] := $m[dispA(base,offset)]

l^[b h] (rt, offset, base) is $r[rt] := sx ([byte hword] $m[dispA(base, offset)])

Loading unaligned words

“Load word left instruction adds the sign-extended offset to the contents of base to form the virtual address.It reads bytes only from the doubleword in memory which contains the specified byte.” We implement thisby calculating a virtual address vaddr, aligning it by dropping the two least significant bits of the address(alignAddr) and making an aligned fetch into tmp:

62d 〈address twiddling and fetch 62d〉≡ (63)

let val vaddr is $r[base] + sx offset

val memword is $m[and(vaddr, com 3)] -- word containing address

Then, we use the bitTransfer operator to insert the addressed bytes into the register word. The source(src) is, of course, the fetched word. The number of bytes to transfer is Byte, which is derived from the twolow order bits (tByte) of the address. The offset into the fetched word is calculated in lsbit:

62e 〈number of bytes to shift 62e〉≡ (63a)

val tByte is vaddr@bits[0..1] -- pick out the two lsb’s of the address

val Byte is xor(tByte,3)

val lsbit is shl(Byte,3) -- Byte*8

7.3. MIPS INSTRUCTIONS 63

The destination is target register $r[rt].

63a 〈instruction defaults 62c〉+≡ (70b) ⊳ 62c 63c ⊲

lwl(rt,offset,base) is

〈address twiddling and fetch 62d〉〈number of bytes to shift 62e〉

in

$r[rt] := bitTransfer{src is {value is memword,lsb is 0},

dst is {value is $r[rt] ,lsb is lsbit},

width is 7 + lsbit}

end

Load-word right, in combination with load-word left, allows for the load of an unaligned word. It differsfrom the lwl instruction in that the information is “right aligned” in the register

63b 〈number of bytes to shift into right 63b〉≡ (63c)



val lsbit is shl(Byte,3)

63c 〈instruction defaults 62c〉+≡ (70b) ⊳ 63a 63d ⊲

lwr(rt,offset,base) is

〈address twiddling and fetch 62d〉〈number of bytes to shift into right 63b〉

in

$r[rt] := bitTransfer{src is {value is memword,lsb is 32-lsbit},

dst is {value is $r[rt] ,lsb is 0},

width is 32- lsbit}

end

Store Instructions

Stores work on bytes, halfwords and words. They can also be used to store unaligned data, but it must bedone in two parts.

It is tempting to try to factor the number of bits [8 16 32] in the store instructions, but this does notparse. Notice the use of bit extraction to show what part of the register is being used.

63d 〈instruction defaults 62c〉+≡ (70b) ⊳ 63c 64a ⊲

sb(rt,offset,base) is $m[dispA(base,offset)] := $r[rt]@bits[8 bits at 0]

sh(rt,offset,base) is $m[dispA(base,offset)] := $r[rt]@bits[16 bits at 0]

sw(rt,offset,base) is $m[dispA(base,offset)] := $r[rt]


Storing unaligned words

These are the store analogues to the unaligned loads

64a 〈instruction defaults 62c〉+≡ (70b) ⊳ 63d 65c ⊲

swr (rt, offset, base) is

let val vaddr is ($r[base] + sx offset)

val memword is $m[and(vaddr, com 3)]




val lssrcbit is shl(tByte,3) -- tByte*8

in

memword := bitTransfer{src is {value is $r[rt], lsb is 0},

dst is {value is memword,lsb is lsbit},

width is 8 + lssrcbit}

end

swl (rt, offset, base) is

let val vaddr is $r[base] + sx offset

val memword is $m[and(vaddr, com 3)]




val lssrcbit is shl(tByte,3) -- tByte*8

in

memword := bitTransfer{src is {value is $r[rt], lsb is lssrcbit},

dst is {value is memword, lsb is 0},

width is 8 + lsbit}

end

Arithmetic

Adds and subtracts can be on signed and unsigned numbers and can be on immediates, too.When an overflow occurs during adds and subtracts, the result is undefined. if no overflow checks to

see if there is an overflow and if not, then it returns true.

64b 〈mips utilities [[full]] 64b〉≡fun op_overflows (op,x,y) is

let val result is op(x, y)

in x@bits[31] = y@bits[31] andalso x@bits[31] <> result@bits[31]

end

fun no_overflow(op,x,y) is

let val result is op(x,y)

in (x@bits[31] <> y@bits[31] orelse x@bits[31] = result@bits[31])

end


7.3. MIPS INSTRUCTIONS 65

65a 〈mips utilities [[simple]] 65a〉≡local rtlop mips_no_overflow : #32 bits * #32 bits -> bool

in fun no_overflow (_, x, y) is mips_no_overflow (x, y)

end


65b 〈mips utilities 65b〉≡ (70b) 66a ⊲

fun op_function(op, x,y,z,unsigned) is

unsigned orelse no_overflow(op,y,z) --> x := op(y, z)

65c 〈instruction defaults 62c〉+≡ (70b) ⊳ 64a 65d ⊲

[sub add]^["" u] (rd,rs,rt) is

op_function([(-) (+)],$r[rd],$r[rs],$r[rt],[false true])

addi^["" u] (rt,rs,offset) is

op_function((+),$r[rt],$r[rs],sx offset,[false true])

Multiply and Divide

HI and LO registers

The MIPS has a special register pair called HI and LO. These registers are used for multiplication and division.Special move instructions (mfhi,mflo,mthi,mtlo) are used to read and write the registers. Parenthetically,these instructions have special scheduling attributes. mflo and mfhi cannot be immediately followed byinstructions that modify the LO/HI resp. registers. In general, “correct operation requires separating readsof HI and LO by two or more instructions.”(A-57).

65d 〈instruction defaults 62c〉+≡ (70b) ⊳ 65c 65e ⊲

mf^[hi lo] (rd) is $r[rd] := [HI LO]

mt^[hi lo] (rs) is [HI LO] := $r[rs]

multiplies and divides

We have both signed and unsigned versions:

65e 〈instruction defaults 62c〉+≡ (70b) ⊳ 65d 66b ⊲

mult(rs,rt) is

let val temp is mul($r[rs],$r[rt])

in

HI := (temp: #64 bits)@bits[32 bits at 32]| LO := temp@bits[32 bits at 0]

end

multu(rs,rt) is

let val temp is mulu($r[rs],$r[rt])

in

HI := (temp: #64 bits)@bits[32 bits at 32] | LO := temp@bits[32 bits at 0]

end

div(rs,rt) is

HI := $r[rs] div $r[rt] | LO := $r[rs] mod $r[rt]

divu(rs,rt) is

HI := $r[rs] divu $r[rt] | LO := $r[rs] modu $r[rt]


Logical Instructions

66a 〈mips utilities 65b〉+≡ (70b) ⊳ 65b

fun nor(x,y) is com (or(x,y))

66b 〈instruction defaults 62c〉+≡ (70b) ⊳ 65e 66c ⊲

[and or xor nor](rd,rs,rt) is $r[rd] := [and or xor nor]($r[rs],$r[rt])

[andi ori xori ](rt,rs,offset) is $r[rt] := [and or xor ]($r[rs],sx offset)

Shift Instructions

66c 〈instruction defaults 62c〉+≡ (70b) ⊳ 66b 66d ⊲

[sll srl sra] (rd,rt,shamt) is $r[rd] := [shl shrl shra] ($r[rt],shamt)

[sll srl sra]^v (rd,rt,rs) is $r[rd] := [shl shrl shra] ($r[rt],$r[rs]@bits[0..4])

Set instructions

The set instructions set register rd to 1 if register rs is less than register rt. There are signed and unsignedversions and corresponding immediate versions, too:


[slt sltu] (rd,rs,rt) is $r[rd] := sx (bit ([(<) ltu] ($r[rs], $r[rt])))

[slt sltu]^i (rt,rs,offset) is $r[rt] := sx (bit ([(<) ltu] ($r[rs], sx offset)))

Load upper immediate

Like the SETHI instruction on the SPARC. We shift the instruction left 16 bits and concatenate to 16 bitsof zeros.

66e 〈instruction defaults 62c〉+≡ (70b) ⊳ 66d 66f ⊲

lui(rt,offset) is $r[rt] := shl(zx offset, 16)

Jump and branch instructions

66f 〈instruction defaults 62c〉+≡ (70b) ⊳ 66e 68a ⊲

jr(rs) is nPC := $r[rs]

j(reloc) is nPC := reloc

jalr(rd,rs) is nPC := $r[rs] | $r[rd] := PC

jal(reloc) is nPC := reloc | $r[31] := PC

b^[le gt lt ge eq]^z (rs,reloc) is

[(<=) (>) (<) (>=) (=)] ($r[rs],0 ) --> nPC := reloc

b^[eq ne] (rs,rt,reloc) is

[(=) (<>)] ($r[rs],$r[rt]) --> nPC := reloc

bltzal (rs, reloc) is $r[rs] < 0 --> (nPC := reloc | $r[31] := PC)

bgezal (rs, reloc) is $r[rs] >= 0 --> (nPC := reloc | $r[31] := PC)

7.4. FLOATING POINT 67

7.4 Floating Point

Architecture

The MIPS FPU has 32 registers.

67a 〈MIPS basics 60a〉+≡ (70b) ⊳ 61d 67b ⊲

storage ’f’ is 32 cells of 32 bits called "Floating-Point Register"


The MIPS FPU has 32 control registers of which only FCR0 and FCR31 are usable.

67b 〈MIPS basics 60a〉+≡ (70b) ⊳ 67a 67c ⊲

storage ’c’ is 32 cells of 32 bits called "Floating-Point Control Registers"

locations

[FCR0 FCR31] is $c[[0 31]]

following the sparc supplement, we will put the floating-point condition codes in their own module

67c 〈MIPS basics 60a〉+≡ (70b) ⊳ 67b 67d ⊲

module pcc is

locations [FS C Cause Enables Flags ] is FCR31@loc[1 bits at [24 23 12 7 2 ]]

locations RM is FCR31@loc[2 bits at 0]

end

val Condition is pcc.C

val RM is pcc.RM

There are signaling and non-signaling versions

compare

non-signaling

When a floating point compare is performed, four bits of information are generated: Greater Than, LessThan, Equal, and Unordered. Since comparisons come in two sizes, we parameterize over the comparisonfunction.

67d 〈MIPS basics 60a〉+≡ (70b) ⊳ 67c 69f ⊲

module Fcmp is

fun [f t un or eq neq ueq ogl olt uge ult oge ole ugt ule ogt] cmp (fs,ft) is

let val or is (orelse)

infixl 2 or

val results is cmp ($f[fs], $f[ft])

val [L E G U] is

results = [(IEEE754.Comparison.float_lt) (IEEE754.Comparison.float_eq)

(IEEE754.Comparison.float_gt)(IEEE754.Comparison.unordered)]

in

Condition := bit [false (G or L or E or U) U (G or L or E)

E (G or L or U) (E or U) (G or L) L (G or E or L)

(L or U) (G or E) (L or E) (G or U) (L or E or U) G]

end

end


Note the manual says only the s and d formats are valid for these instructions.

68a 〈instruction defaults 62c〉+≡ (70b) ⊳ 66f 68b ⊲

c^"."^[f t un or eq neq ueq ogl olt uge ult oge ole ugt ule ogt]^"."^[s d] (fs,ft)

is [ Fcmp.f Fcmp.t Fcmp.un Fcmp.or Fcmp.eq Fcmp.neq Fcmp.ueq Fcmp.ogl

Fcmp.olt Fcmp.uge Fcmp.ult Fcmp.oge Fcmp.ole Fcmp.ugt Fcmp.ule Fcmp.ogt

] (fcmp [#32 #64]) (fs,ft)

signaling

There are signalling versions, as well. All will generate an exception if the operands are unordered.

Branching

The MIPS can perform branches on the truth or falsehood of the above set Condition field.

68b 〈instruction defaults 62c〉+≡ (70b) ⊳ 68a 68c ⊲

bc1t (reloc) is Condition = 1 --> nPC := reloc

bc1f (reloc) is Condition = 0 --> nPC := reloc

Conversions

We can convert from 32 and 64 bit fixed point integers to single and double floats:


cvt^"."^[s d]^"."^[w l] (fd,fs) is

$f[fd] := [single double] (i2f( [word dword] $f[fs], RM))

the reverse conversions:


cvt^"."^[w l]^"."^[s d] (fd,fs) is

$f[fd] := [word dword] (f2i([single double] $f[fs], IEEE754.Round.zero))

We can also convert from doubles to singles, and vice versa:

68e 〈instruction defaults 62c〉+≡ (70b) ⊳ 68d 69a ⊲

cvt^"."^s^"."^d (fd,fs) is $f[fd] := single (f2f(double $f[fs],RM))

cvt^"."^d^"."^s (fd,fs) is $f[fd] := double (f2f(single $f[fs],RM))

Chris Milner reports that these are the correct floating point conversion constructors for the MIPS:

constructor cvt_s_l (fd, fs)

constructor cvt_s_w (fd, fs)

constructor cvt_s_d (fd, fs)

constructor cvt_d_l (fd, fs)

constructor cvt_d_w (fd, fs)

constructor cvt_d_s (fd, fs)

constructor cvt_w_s (fd, fs)

constructor cvt_w_d (fd, fs)

constructor cvt_l_d (fd, fs)

constructor cvt_l_s (fd, fs)

7.5. COPROCESSOR INSTRUCTIONS 69

Rounding

Moving to and from the FPU

We can move registers back and forth...

69a 〈instruction defaults 62c〉+≡ (70b) ⊳ 68e 69b ⊲

mtc1 (rt,rd) is $f[rd] := $r[rt]

mfc1 (rt,rd) is $r[rt] := $f[rd]

how about some loads and stores

69b 〈instruction defaults 62c〉+≡ (70b) ⊳ 69a 69c ⊲

lwc1 (ft,offset,base) is $f[ft] := single $m[$r[base] + sx offset]

"l.d"(ft,offset,base) is $f[ft] := double $m[$r[base] + sx offset]

swc1 (ft,offset,base) is $m[$r[base] + sx offset] := single $f[ft]

"s.d"(ft,offset,base) is $m[$r[base] + sx offset] := double $f[ft]

and the control word, too


cfc1(rt,rd) is $r[rt] := $c[rd]

ctc1(rt,rd) is $c[rd] := $r[rt]

computation


[add sub div mul]^"."^[s d] (fd,fs,ft) is

$f[fd] := [fadd fsub fdiv fmul] [#32 #64] ($f[fs],$f[ft],RM)

Let’s do those with two arguments

69e 〈instruction defaults 62c〉+≡ (70b) ⊳ 69d 69g ⊲

[abs neg]^"."^[s d](fd,fs) is $f[fd] := [fabs fneg] [#32 #64] $f[fs]

mov^"."^[s d](fd,fs) is $f[fd] := [single double] $f[fs]

7.5 Coprocessor instructions

Several instructions exist to help move/load/store words to and from the various coprocessors. Coprocessor0 is the system control processor.

69f 〈MIPS basics 60a〉+≡ (70b) ⊳ 67d

storage ’p’ alias CP0 is 32 cells of 32 bits

called "System Control Coprocessor Registers"

storage ’q’ alias CP2 is 32 cells of 32 bits called "Coprocessor 2"

storage ’v’ alias CP3 is 32 cells of 32 bits called "Coprocessor 3"

Its move, load and store instructions are:

69g 〈instruction defaults 62c〉+≡ (70b) ⊳ 69e

mfc0(rt,rd) is $r[rt] := $CP0[rd]

mtc0(rt,rd) is $CP0[rt] := $r[rd]

lwc0(ft,offset,base) is $CP0[ft] := $m[$r[base] + sx offset]

-- note: the virtual address should be word aligned for lw and sw

swc0(ft,offset,base) is $m[$r[base] + sx offset] := $CP0[ft]


There are other coprocessors. Coprocessor 1 is the FPU and will be covered later. Coprocessor 2 and 3could be defined in the same way, along with their corresponding loads/stores and moves. We can’t factorthis easily,like this:

70a 〈imaginary code 70a〉≡swc^["0" "2" "3"] is

$m[$r[base] + sc offset] := $CP^["0" "2" "3"][ft]

so we won’t bother.

7.6 Miscellaneous

Here we would describe syscall, the TLB instructions and Trap instruction.

7.7 Putting it all together

This code chunk is expanded into our λ-RTL description of the MIPS.

70b 〈mips.rtl 70b〉≡module Mips is

import [RTL Vector]


[add and andalso bit bitInsert bool borrow carry com divu ltu mod

modu mul mulu ne not or orelse quot shl shrl shra subtract sx xor zx

--> := | ; = <> + - < <= > >= ? IEEE754 bitTransfer div

]

from StdOperators.IEEE754 import

[fadd fcmp fdiv f2f f2i fabs fmulx fsqrt i2f fmul fneg fsub NaN ]

from StdOperators.IEEE754.Comparison import

[unordered float_lt float_gt float_eq]

〈mips utilities 65b〉〈MIPS basics 60a〉〈address modes and operands 62a〉〈instruction defaults 62c〉

end

Code written to file mips.rtl.

7.8. WHAT THIS SPECIFICATION DOES NOT DO... 71

7.8 What this specification does not do...

So far, we are ignoring big/little endianness issues such as loads and unaligned loads.We also ignore the FR bit of the status register - this indicates if the float registers are even-odd pairs

(for doubles) or all addressible (singles).

7.9 SLED specification

Here is the SLED specification for the MIPS architecture:

71 〈mips.sled 71〉≡operand [ base fd fs ft rd rs rt shamt ] : #5 bits

operand offset : #16 bits

operand breakcode : #20 bits

constructor lb (rt, offset, base)

constructor lbu (rt, offset, base)

constructor lh (rt, offset, base)

constructor lhu (rt, offset, base)

constructor lw (rt, offset, base)

constructor lwl (rt, offset, base)

constructor lwr (rt, offset, base)

constructor sb (rt, offset, base)

constructor sh (rt, offset, base)

constructor sw (rt, offset, base)

constructor swl (rt, offset, base)

constructor swr (rt, offset, base)

constructor addi (rt, rs, offset)

constructor addiu (rt, rs, offset)

constructor slti (rt, rs, offset)

constructor sltiu (rt, rs, offset)

constructor andi (rt, rs, offset)

constructor ori (rt, rs, offset)

constructor xori (rt, rs, offset)

constructor lui (rt, offset)

constructor add (rd, rs, rt)

constructor addu (rd, rs, rt)

constructor sub (rd, rs, rt)

constructor suburd, rs, rt)

constructor slt (rd, rs, rt)

constructor sltu (rd, rs, rt)

constructor and (rd, rs, rt)

constructor or (rd, rs, rt)

constructor xor (rd, rs, rt)

constructor nor (rd, rs, rt)

constructor sll (rd, rt, shamt)

constructor srl (rd, rt, shamt)

constructor sra (rd, rt, shamt)

constructor sllv (rd, rt, rs)

constructor srlv (rd, rt, rs)


constructor srav (rd, rt, rs)

constructor mult (rs, rt)

constructor multu (rs, rt)

constructor div (rs, rt)

constructor divu (rs, rt)

constructor mfhi (rd)

constructor mflo (rd)

constructor mthi (rs)

constructor mtlo (rs)

constructor syscall ()

constructor break (breakcode)

constructor lwc0 (ft, offset, base)




constructor swc0 (ft, offset, base)




constructor jr (rs)

constructor jalr (rd, rs)

constructor blez (rs, reloc)

constructor bgtz (rs, reloc)

constructor bltz (rs, reloc)

constructor bgez (rs, reloc)

constructor bltzal (rs, reloc)

constructor bgezal (rs, reloc)

constructor beq (rs, rt, reloc)

constructor bne (rs, rt, reloc)

constructor j (reloc)

constructor jal (reloc)

constructor add.s (fd, fs, ft)

constructor add.w (fd, fs, ft)

constructor add.d (fd, fs, ft)

constructor div.s (fd, fs, ft)

constructor div.w (fd, fs, ft)

constructor div.d (fd, fs, ft)

constructor mul.s (fd, fs, ft)

constructor mul.w (fd, fs, ft)

constructor mul.d (fd, fs, ft)

constructor sub.s (fd, fs, ft)

constructor sub.w (fd, fs, ft)

constructor sub.d (fd, fs, ft)

constructor abs.s (fd, fs)

constructor abs.w (fd, fs)

constructor abs.d (fd, fs)

constructor mov.s (fd, fs)

constructor mov.w (fd, fs)

constructor mov.d (fd, fs)

constructor neg.s (fd, fs)

7.9. SLED SPECIFICATION 73

constructor neg.w (fd, fs)

constructor neg.d (fd, fs)

constructor mfc1 (ft, rd)

constructor mtc1 (rt, fd)

constructor cfc1 (rt, fs)

constructor ctc1 (rt, fs)

constructor c.f.s (fs, ft)

constructor c.f.w (fs, ft)

constructor c.f.d (fs, ft)

constructor c.un.s (fs, ft)

constructor c.un.w (fs, ft)

constructor c.un.d (fs, ft)

constructor c.eq.s (fs, ft)

constructor c.eq.w (fs, ft)

constructor c.eq.d (fs, ft)

constructor c.ueq.s (fs, ft)

constructor c.ueq.w (fs, ft)

constructor c.ueq.d (fs, ft)

constructor c.olt.s (fs, ft)

constructor c.olt.w (fs, ft)

constructor c.olt.d (fs, ft)

constructor c.ult.s (fs, ft)

constructor c.ult.w (fs, ft)

constructor c.ult.d (fs, ft)

constructor c.ole.s (fs, ft)

constructor c.ole.w (fs, ft)

constructor c.ole.d (fs, ft)

constructor c.ule.s (fs, ft)

constructor c.ule.w (fs, ft)

constructor c.ule.d (fs, ft)

constructor c.sf.s (fs, ft)

constructor c.sf.w (fs, ft)

constructor c.sf.d (fs, ft)

constructor c.ngle.s (fs, ft)

constructor c.ngle.w (fs, ft)

constructor c.ngle.d (fs, ft)

constructor c.seq.s (fs, ft)

constructor c.seq.w (fs, ft)

constructor c.seq.d (fs, ft)

constructor c.ngl.s (fs, ft)

constructor c.ngl.w (fs, ft)

constructor c.ngl.d (fs, ft)

constructor c.lt.s (fs, ft)

constructor c.lt.w (fs, ft)

constructor c.lt.d (fs, ft)

constructor c.nge.s (fs, ft)

constructor c.nge.w (fs, ft)

constructor c.nge.d (fs, ft)

constructor c.le.s (fs, ft)

constructor c.le.w (fs, ft)


constructor c.le.d (fs, ft)

constructor c.ngt.s (fs, ft)

constructor c.ngt.w (fs, ft)

constructor c.ngt.d (fs, ft)

constructor cvt.s.s (fd, fs)

constructor cvt.s.w (fd, fs)

constructor cvt.s.d (fd, fs)

constructor cvt.d.s (fd, fs)

constructor cvt.d.w (fd, fs)

constructor cvt.d.d (fd, fs)

constructor cvt.w.s (fd, fs)

constructor cvt.w.w (fd, fs)

constructor cvt.w.d (fd, fs)

constructor bc1f (reloc)

constructor bc1t (reloc)

Code written to file mips.sled.

Chapter 8

Describing the Alpha

In this document any reference to an “UNPREDICTABLE” effect refers to the ALPHA Architecture Hand-book definition of the word, which is that the effect is implementation defined. Implementation definedrefers to the implementation of the DEC ALPHA chip. UNPREDICTABLE further implies that the effectsmay change ’moment to moment.’ It guarantees that no-one may create a security hole, hang the system,or halt the system. That’s all it guarantees. As for how we will define this in a lambda-RTL is yet to bedefined.

8.1 Storage

Memory

Memory storage on the alpha consists of little endian byte addressed cells in the default case. There is alsothe possibility that someone will build a big endian system using the Alpha’s capability of being set intobig endian mode at boot time. If so, we wish to permit a machine description file for a big-endian alpha tobe created merely by changing a single flag, littleEndian, to FALSE. The ALPHA takes a purely hardwareapproach to switching endianness. Compilers always generate little endian code. If the machine is booted asbig endian, the load and store instructions provide the starting address for any non-64-bit load or store as ifthe machine were little endian, but the hardware changes the last three bits of the virtual address to reflectthe true big endian nature of the system. Thus, the 32-bit long words within a 64-bit quad word can beswapped merely by inverting the third bit from the right in the virtual address (i.e. adding or decrementingthe virtual address by 4.) Similar reflections can be done for words within a quad word, or for bytes.

75 〈storage 75〉≡ (80c) 76a ⊲

storage

’m’ is cells of 8 bits called "memory"

aggregate using if littleEndian then RTL.AGGL else RTL.AGGB fi

75

76 CHAPTER 8. DESCRIBING THE ALPHA

Explicit Size Depiction

A reference to a cell like $m[a] normally means the cell by itself—in this case, a single byte in memorylocated at address a. But sometimes we want to talk about a 32-bit long word at address a, or even a 16-bitword, a quad-word. Often, λ-RTL can figure out what is meant just from the context in which $m[a] isused—but when it can’t, an explicit cast can be used to make the size explicit, thus: $m[a] : #32 loc.Because the syntax of the casts can be awkward, we define functions that do the casting.

76a 〈storage 75〉+≡ (80c) ⊳ 75 76b ⊲

fun byte x is x : #8 loc

fun word x is x : #16 loc

fun long x is x : #32 loc

fun quad x is x : #64 loc

fun octa x is x : #128 loc

Given these functions, the result of, e.g., quad is guaranteed to be a 64-bit location, even if quad is appliedto an 8-bit byte in memory.

This bit is unexplained.

76b 〈storage 75〉+≡ (80c) ⊳ 76a 76c ⊲

module Mem is

infixl 3 xor

fun byte a is byte $m[if littleEndian then a else a xor 0b111 fi]

fun word a is word $m[if littleEndian then a else a xor 0b110 fi]

fun long a is long $m[if littleEndian then a else a xor 0b100 fi]

fun quad a is quad $m[a]

end

Integer Registers

There are 32 integer registers, each is 64 bits

76c 〈storage 75〉+≡ (80c) ⊳ 76b 77a ⊲

storage

’r’ is 32 cells of 64 bits called "integer registers"

〈ALPHA register fetch and store methods 76d〉

Register r31 is always zero when read. A write to r31 is ignored. Whether the entire instruction generatingthe write to r31 is ignored, or just the write to r31 effect, is “unpredictable.” [3.1.2 ALPHA ArchitectureHandbook]. For now our description doesn’t deal with the possibility that an exception might be generatedby a store to r31. Instead we simply ignore stores to r31.

76d 〈ALPHA register fetch and store methods 76d〉≡ (76c)

fetch using \n. if n = 31 then 0 else RTL.TRUE_FETCH ($r[n]) fi

store using \(n, v). n <> 31 --> RTL.TRUE_STORE ($r[n], v)

8.2. ALPHA INSTRUCTIONS 77

Floating Point Registers

There are 32 floating point registers; each is 64 bits.

77a 〈storage 75〉+≡ (80c) ⊳ 76c 77b ⊲

storage

’f’ is 32 cells of 64 bits called "floating point registers"

aggregate using RTL.AGGL

fetch using \n. if n = 31 then 0 else RTL.TRUE_FETCH ($f[n]) fi

store using \(n, v). n <> 31 --> RTL.TRUE_STORE ($f[n], v)

The semantics for storing and loading from $f[31] are the same for floating point operations as for integeroperations on r[31].

Special-Purpose Registers

Lock Registers

The lock registers.

77b 〈storage 75〉+≡ (80c) ⊳ 77a 77c ⊲

storage ’l’ alias lockFlag is 1 cell of 1 bit called "Lock Flag"

storage ’a’ alias lockPAR is 1 cell of 64 bits called "Lock Physical Address Register"

Processor Cycle Counter

77c 〈storage 75〉+≡ (80c) ⊳ 77b 77d ⊲

storage ’c’ alias PCC is 1 cell of 64 bits called "Processor Cycle Counter"

locations

[PCC_OFF PCC_CNT] is $c[0]@loc[32 bits at [0 32]]

Program Counter

77d 〈storage 75〉+≡ (80c) ⊳ 77c

storage ’p’ alias PC is 1 cell of 64 bits called "Program Counter"

--- fetch using \n. and (RTL.TRUE_FETCH($p[0]), (neg 0x3))

8.2 ALPHA instructions

Types of ALPHA operands

Alpha instructions consist of 3-tuple form instructions (2 source, one destination).

77e 〈address modes and operands 77e〉≡ (80c) 78a ⊲

operand [ fa fb fc ra rb rc ] : #5 bits

operand imm : #8 bits

operand mhint : #14 bits

operand mdisp : #16 bits

operand palfunc : #26 bits

operand target : #64 bits


ALPHA addresses and operands

The ALPHA displaced addressing mode adds a sign extended 16 bit immediate field value to the contentsof register designated by the rb field.

78a 〈address modes and operands 77e〉+≡ (80c) ⊳ 77e 78b ⊲

operand address : #64 bits

fun dispA (rb, mdisp) is $r[rb] + sx mdisp

78b 〈address modes and operands 77e〉+≡ (80c) ⊳ 78a

operand reg_or_imm : #64 bits


rmode(rb) : reg_or_imm is $r[rb]

imode(imm) : reg_or_imm is sx imm

A Useful Composition Function

A composition function on two functions with a single argument will be of use in mixing type informationwith sign extension functions.

78c 〈composition function 78c〉≡ (80c)

fun o(f,g) is \x.f(g(x))

infixl 3 o

Load Instructions

Load Memory Data into Integer Register

There are four size load instructions from memory to registers. The word and byte load instructions zeroextend to the 64 bit register, while the long word variant sign extends.

78d 〈instructions 78d〉≡ (80c) 78e ⊲


ld^[q l wu bu] (ra, mdisp, rb) is

$r[ra] := [(Mem.quad) (sx o Mem.long) (zx o Mem.word) (zx o Mem.byte)

] (dispA(rb, mdisp))

Load the aligned included Quad Word

78e 〈instructions 78d〉+≡ (80c) ⊳ 78d 78f ⊲


ldq_u(ra, mdisp, rb) is

$r[ra] := Mem.quad (and (dispA(rb, mdisp), (com 7)))

Load address (computes effective address)

78f 〈instructions 78d〉+≡ (80c) ⊳ 78e 79c ⊲


lda (ra, mdisp, rb) is $r[ra] := dispA(rb, mdisp)

lda_h (ra, mdisp, rb) is $r[ra] := $r[rb] + shl (sx mdisp, 16)

8.2. ALPHA INSTRUCTIONS 79

The chunks named 〈instructions 78d〉 define the default attributes of instructions. We have chosen to usethe default attributes to specify the “main effect” of these instructions, but there are circumstances in whichthe processor traps instead of executing this main effect. Traps are not relevant to all specifications, so wehave chosen to give the trap semantics separately, by binding them to an attribute named trap. Again,with a proper modules system, trap semantics could be omitted from some specifications.

All loads except byte loads could trap if the address is improperly aligned.

79a 〈trap semantics 79a〉≡ (80c) 79d ⊲

ld^[q l wu] (ra, mdisp, rb) is alignTrap(dispA(rb, mdisp), [8 4 2])

79b 〈ALPHA utilities 79b〉≡ (80c)

fun alignTrap (address, k) is

address modu k <> 0 --> trap(mem_address_not_aligned)

The interaction of the trap semantics with the main semantics is not specified in λ-RTL.

Store Instructions

Store to a Memory Location from a Register

79c 〈instructions 78d〉+≡ (80c) ⊳ 78f 79e ⊲

stq(ra, mdisp, rb) is Mem.quad (dispA(rb, mdisp)) := $r[ra]

stl(ra, mdisp, rb) is Mem.long (dispA(rb, mdisp)) := $r[ra]@bits[32 bits at 0]

stw(ra, mdisp, rb) is Mem.word (dispA(rb, mdisp)) := $r[ra]@bits[16 bits at 0]

stb(ra, mdisp, rb) is Mem.byte (dispA(rb, mdisp)) := $r[ra]@bits[8 bits at 0]

79d 〈trap semantics 79a〉+≡ (80c) ⊳ 79a

st^[q l wu] (ra, mdisp, rb) is alignTrap(dispA(rb, mdisp), [8 4 2])

Store a Register to the Aligned Quad Word below the Unaligned Address

79e 〈instructions 78d〉+≡ (80c) ⊳ 79c 79f ⊲

stq_u(ra, mdisp, rb) is Mem.quad (and (dispA(rb, mdisp), com 7)) := $r[ra]

Logical Operations

Standard Logical Functions

79f 〈instructions 78d〉+≡ (80c) ⊳ 79e 80b ⊲

[and bis xor](ra, reg_or_imm, rc) is $r[rc] := [and or xor] ($r[ra], reg_or_imm)

[bic ornot eqv](ra, reg_or_imm, rc) is $r[rc] := [and or xor] ($r[ra], com reg_or_imm)

Add instructions

Our treatment of the add instructions is similar to our treatment of the logical instructions, except theseinstructions can cause overflow. Overflow semantics is a bit tricky, so in the [[simple]] version of thespecification we leave it abstract.

79g 〈ALPHA utilities [[simple]] 79g〉≡rtlop add_overflows : #n bits * #n bits -> bool



80a 〈ALPHA utilities [[full]] 80a〉≡fun iff (p, q) is if p then q else not q fi

fun add_overflows (x, y) is

let val result is x + y

infixn 0 iff

in (x < 0 iff y < 0) andalso (x < 0 iff result >= 0)

end


The sum goes in rc, plus there may be an overflow trap. When the /V suffix is provided with an addinstruction, it signals overflow, else overflow is ignored.

80b 〈instructions 78d〉+≡ (80c) ⊳ 79f

[addq addqv] (ra, reg_or_imm, rc) is $r[rc] := $r[ra] + reg_or_imm

[addl addlv] (ra, reg_or_imm, rc) is

$r[rc] := sx (($r[ra] + reg_or_imm)@bits[32 bits at 0])

8.3 Lambda RTL File

Our lambda rtl file needs to import a set of lambda rtl operators

80c 〈alpha.rtl 80c〉≡module Alpha is

import [RTL Vector]


[add and andalso bit bitInsert bool borrow carry com divu

modu mul mulu ne not or orelse quot shl shrl shra subtract sx xor zx neg

--> := | ; = <> + - <= >= < > ? ]

val littleEndian is true

〈composition function 78c〉〈storage 75〉〈address modes and operands 77e〉〈window dressing for trap semantics 80d〉〈ALPHA utilities 79b〉〈instructions 78d〉attribute trap of

〈trap semantics 79a〉

end

All that is left to do is to define what we mean by trap. Ideally, we would like a λ-RTL abstractionmechanism that would let us define trap as “an unspecified function from a 7-bit code to an effect.” Becauseλ-RTL lacks suitable abstraction mechanisms, we have to specify a concrete effect. Rather than try to specifythe complete semantics of traps, we model a trap as an assignment to a nonexistent location.

80d 〈window dressing for trap semantics 80d〉≡ (80c)

storage ’t’ is 1 cell of 7 bits called "trap code"

locations trap_code is $t[0]

fun trap k is $t[0] := k

8.3. LAMBDA RTL FILE 81

These bindings include only the “trap type” of a handful of traps. We have omitted trap priorities. Thetrap instruction uses a family of codes starting at 0x80.

81a 〈window dressing for trap semantics [[full]] 81a〉≡val [data_store_error

illegal_instruction

mem_address_not_aligned

tag_overflow

trap_instruction

] is [0x2b 0x02 0x07 0x0a 0x80] -- 7-bit constants to be passed to trap

val trap_instruction is \code.trap_instruction+code


In the simple version, we create an RTL operator for each trap code, so we can see them by name, not bynumber, in the processor supplement.

81b 〈window dressing for trap semantics [[simple]] 81b〉≡rtlop [data_store_error

illegal_instruction

mem_address_not_aligned

tag_overflow

] : #7 bits

rtlop trap_instruction : #7 bits -> #7 bits


Chapter 9

Describing the PDP-8

9.1 Introduction

9.2 Storage Spaces

Main memory on the PDP-8 consists of 4096 12-bit words of magnetic core. The machine’s ISA is organizedaround a 12-bit accumulator that is involved in all arithmetic and logic operations. A 1-bit link registerallows the programmer to extend the arithmetic precision beyond that available through single primitiveoperations. A 12-bit switch register is provided to capture values from the machine’s assortment of frontpanel switches. Last but not least, the PDP-8 has a 12-bit program counter.

Here is the λ-RTL required to model these storage spaces and attach to them the names provided by thearchitecture manual.

82 〈storage spaces 82〉≡ (85c)

storage

’m’ is 4096 cells of 12 bits called "memory"

’a’ is 1 cells of 12 bits called "accumulator"

’s’ is 1 cells of 12 bits called "switch register"

’p’ is 1 cells of 12 bits called "program counter"

’l’ is 1 cells of 1 bits called "link register"

’r’ is 1 cells of 1 bits called "run flip-flop"

locations

PC is $p[0]

AC is $a[0]

L is $l[0]

SR is $s[0]

RUN is $r[0]

82

9.3. OPERATIONS 83

Addressing modes on the PDP-8 are relatively simple. Memory references are either absolute or indirect.Certain indirect references automatically increment the pointer in memory used for indirection. This iscalled “auto-indexing” in the PDP-8 manual.

In λ − RTL we treat memory references as pairs. The first member of the pair is an address, and thesecond is an effect associated with the reference. This is used to handle “auto-indexing”. The effect for theother two addressing modes is RTL.SKIP or the null effect.

Here’s the λ−RTL describing memory references.

83a 〈memory references 83a〉≡ (85c)

operand addr : #12 bits

operand memref : #12 bits * effect


absolute(addr) is (addr, RTL.SKIP)

indirect(addr) is ($m[addr], RTL.SKIP)

autoinc(addr) is ($m[addr] + 1, $m[addr] := $m[addr] + 1)

We assume that addresses are within appropriate ranges for their context.

9.3 Operations

As with memory, the PDP-8 has a simple ISA. Operations can be divided into two categories. The first areunary operators involving memory references. The second are nullary operators that involve the accumulator,link register, or switch register.

We will describe the unary operators first. Each unary operator takes a memory reference. We can modelthese operations as a memory reference, which you will recall is a location and an effect, and a right handside which describes the operation’s primary effect.

Here is the λ-RTL for modeling a unary opertion. Notice that the effect from the memory reference musttake place before the primary effect of the operation. Also notice that the parameter rhs is not an effect,but rather a function taking a location and producing an effect.

83b 〈operations 83b〉≡ (85c) 83c ⊲

fun unary_op((loc,memaction), rhs) is

( memaction ; rhs loc )

The primary effect of all of the unary operations is to fetch a value from memory, perform an operation onthis value and the value of the accumulator, and store the result in the accumulator. The following λ-RTLdescribes the effect that an operation on memory and the accumulator, has on the machine state. Noticethat we have written this function in curried style. The reason for this will be clearer in a moment.

83c 〈operations 83b〉+≡ (85c) ⊳ 83b 84a ⊲

fun accum_effect op loc is AC := op(AC,$m[loc])

84 CHAPTER 9. DESCRIBING THE PDP-8

Now, with our two functions in hand, we can make short work of the unary operations. There are a coupleof things worth mentioning here. First is that unary op takes a tuple whose second element is a functiontaking a location and producing an effect. It should now be obvious why we wrote accum effect as a curriedfunction.

We use the operation mnemonics from the PDP-8 manual wherever possible.

84a 〈operations 83b〉+≡ (85c) ⊳ 83c 84b ⊲


and(memref) is unary_op(memref,accum_effect and)

tad(memref) is unary_op(memref,accum_effect (+))

dca(memref) is unary_op(memref,\loc.$m[loc] := AC)

The next operation merits discussion. It increments the value given by the memory reference. If theresulting value is zero, the following instruction is skipped. We have used guarded assignment to theprogram counter to model this effect.

84b 〈operations 83b〉+≡ (85c) ⊳ 84a 84c ⊲

isz(memref) is

unary_op(memref,

\loc.let val v is $m[loc] + 1

in ($m[loc] := v | v = 0 --> PC := PC + 2)

end

)

The next routines describe the behaviour of subroutine calls and unconditional jumps. The operation jmstransfers control to the subroutine whose address is given by the memory reference. The return address isstored in the following memory location. This means that PDP-8 programs are not intended to recurse, andseems weird in general.

The effect of the unconditional jump should be obvious.

84c 〈operations 83b〉+≡ (85c) ⊳ 84b 84d ⊲

jms(memref) is

unary_op(memref,\loc.$m[loc]:=PC | PC:=loc+1)

jmp(memref) is

unary_op(memref,\loc.PC:=loc)

The remaining PDP-8 operations do not involve memory references. They are nullary operations thateffect the accumulator, the link register or the switch register.

Most of these operations have simple effects that should be obvious from the λ-RTL. The exceptions tothis are the logical rotate instructions.

The PDP-8 specifies that bit 0 is the most significant bit in a 12-bit word. God only knows why thisis the case, but it is. In λ-RTL, bit 0 is always the least significant bit. So the rotates below might seembackwards, but they’re not. Also notice that values are rotated through the link register. This means thatwe can write the rotations as the simultaneouse composition of three assignment effects.

84d 〈operations 83b〉+≡ (85c) ⊳ 84c 85a ⊲

val rar is (L := AC@loc[0] | AC@loc[0..10] := AC@loc[1..11] | AC@loc[11] := L)

val ral is (L := AC@loc[11] | AC@loc[1..11] := AC@loc[0..10] | AC@loc[0] := L)

9.4. PUTTING IT ALL TOGETHER 85

There are also a set of nullary skip operations that skip the next instruction when some boolean conditionis true. We model these operations with a function skip op that takes a condition and then does a guardedupdate to the program counter. Perhaps this is a good time to point out that we assume that each operationin this description involves the implicit increment of the program counter by 1, unless we specify otherwise.We could write this fact down in the description, but we won’t bother.

85a 〈operations 83b〉+≡ (85c) ⊳ 84d 85b ⊲

fun skip_op(cond) is cond --> PC := PC + 2

The rest of the nullary operators are simple and we won’t try to describe them here.

85b 〈operations 83b〉+≡ (85c) ⊳ 85a

val iac is AC := AC + 1

val cma is AC := com AC


iac is iac

nop is RTL.SKIP

rar is rar

rtr is rar; rar

ral is ral

rtl is ral; ral

cml is L := com L

cma is cma

cia is cma; iac

cll is L := 0

stl is L := 1

cla is AC := 0

sta is AC := 4095

osr is AC := or(AC,SR)

halt is RUN := 0

skp is skip_op(true)

snl is skip_op(L = 1)

szl is skip_op(L = 0)

sza is skip_op(AC = 0)

sna is skip_op(AC <> 0)

spa is skip_op(AC@loc[0] = 0)


85c 〈pdp8.lrtl 85c〉≡ (86)

module PDP8 is

import [RTL Vector]

from StdOperators import [and or com | + = := --> ; <>]

〈storage spaces 82〉〈memory references 83a〉〈operations 83b〉

end

86 CHAPTER 9. DESCRIBING THE PDP-8

86 〈pdp8.rtl 86〉≡〈pdp8.lrtl 85c〉

Chapter 10

Describing MMIX

This document attempts to describe the MMIX computer, as explained in Fasciscle 1 of TheArt of Computer Programming. This Fascicle, and other information, may be found athttp://www-cs-faculty.stanford.edu/~knuth/mmix-news.html.

Describing this machine is a real challenge, not so much because of the features of the machine itself(though some of the register behavior is baroque), but because Knuth tends to specify all computation overthe rationals, mapping from bit vectors to integers when taking values out of storage spaces, and mappingfrom integers to bit vectors when putting values into storage spaces. This style is inimical to λ-RTL, becauseλ-RTL specifies all computations over bit vectors. To consider just one example, Knuth specifies all additionsusing “God’s addition” over the integers. In λ-RTL, we must say how many bits are used in each addition,and our treatment of overflow must differ, as we must fool with the carry bit.1 One can imagine it mightnot be completely trivial to show that our specification matches his.

10.1 Storage spaces: memory and registers

MMIX is a big-endian machine. The λ-RTL translator has a bug that prevents it from swallowing largeinteger literals, so I’ve left out the size of the memory (264 cells).

87 〈MMIX basics 87〉≡ (91d) 88a ⊲

storage ’m’ is cells of 8 bits called "memory"


1This mismatch is not the result of caprice or accident on either party’s part. The goal of Knuth’s specification is concisenessand clarity, and the notations and techniques of mathematics serve that goal. For example, several specifications can be clearerif intermediate values are allowed to lie in the rationals as well as the integers (never mind bit vectors). To take anotherexample, all specifications are clearer if the + symbol has just one meaning: mathematical addition. But the purpose of aλ-RTL specification is to enable automatic generation of machine-dependent tools, and the tool generators want to know whatoperations to use without having to reason about rational-valued intermediate results. It must be easy to figure out whichinstructions do 16-bit adds and which do 32-bit adds, without having to study all of the contexts in which those adds appear.

87

88 CHAPTER 10. DESCRIBING MMIX

Knuth provides special notations for aggregates. These notations ignore low address bits.

88a 〈MMIX basics 87〉+≡ (91d) ⊳ 87 88b ⊲

infixl 7 and

infixl 6 or

fun M a is $m[a] : #8 loc

fun M2 a is $m[a and (com 1)] : #16 loc



fun M1 a is M a

MMIX has 256 general-purpose registers.

88b 〈MMIX basics 87〉+≡ (91d) ⊳ 88a 88c ⊲

storage ’r’ is 256 cells of 64 bits called "registers"

MMIX has 32 special-purpose registers. The codes come from page 21 of the fascicle.

88c 〈MMIX basics 87〉+≡ (91d) ⊳ 88b 88d ⊲

storage ’s’ is 32 cells of 64 bits called "special registers"

locations

[ rA rB rC rD rE rF rG rH rI rJ rK rL rM rN rO rP rQ rR rS rT rU rV rW rX rY rZ ] is

$s[[21 0 8 1 2 22 19 3 12 4 15 20 5 9 10 23 16 6 11 13 17 18 24 25 26 27]]

[ rBB rTT rWW rXX rYY rZZ ] is $s[[ 7 14 28 29 30 31]]

The program counter appears to be distinct from all these other registers

88d 〈MMIX basics 87〉+≡ (91d) ⊳ 88c 88e ⊲

storage ’p’ is 1 cell of 64 bits called "program counter"

locations PC is $p[0]

10.2 MMIX instructions

I can’t define a ready equivalent for the s($X) ← N notation, because in Knuth’s world, N is an integer,whereas in mine, N is a 64-bit value.

Trips and traps

It’s clear that λ-RTL is not capable of describing MMIX’s interrupts. Even the specification in section 34of the detailed MMIX document is not entirely clear; I suspect it will be necessary to go look at the sourcecode.

88e 〈MMIX basics 87〉+≡ (91d) ⊳ 88d 89a ⊲

module Interrupt is

val trip is

rX := ? -- current instruction, sometimes (e.g., M4 PC)

| rY := ? -- an operand, sometimes

| rZ := ? -- another operand, sometimes

| rW := ? -- the successor instruction, sometimes

end

10.2. MMIX INSTRUCTIONS 89

Integer overflow

It’s not clear exactly what effect overflow has on the state of the MMIX machine. I note the following fromtrips and traps.

89a 〈MMIX basics 87〉+≡ (91d) ⊳ 88e 89b ⊲

module Event is

locations

event is rA @loc [8 bits at 0]

enable is rA @loc [8 bits at 8]

fun occur ? is

if enable and ? = 0 then

event := event or ?

else

Interrupt.trip

fi

module Mask is

val [D V W I O U Z X] is [0x80 0x40 0x20 0x10 0x08 0x04 0x02 0x01]

end

val [D V W I O U Z X] is occur [0x80 0x40 0x20 0x10 0x08 0x04 0x02 0x01]

end

We can now define some overflows.

89b 〈MMIX basics 87〉+≡ (91d) ⊳ 89a 89c ⊲

module Ovfl is

-- overflow because a value won’t fit

fun noFit n pow2 is n >= pow2 orelse n < neg pow2

fun o1 n is noFit n 0x100 --> Event.O




-- overflows for addition and subtraction

fun sign x is bit (x < 0)

fun add x y is

let val s is x + y

in sign x = sign y andalso sign x <> sign s --> Event.O

end

fun sub x y is

let val d is x - y

in sign x <> sign y andalso sign x <> sign d --> Event.O

end

end

Operands

The X , Y , and Z operands are all 8 bits.

89c 〈MMIX basics 87〉+≡ (91d) ⊳ 89b 90a ⊲

operand [X Y Z] : #8 bits

90 CHAPTER 10. DESCRIBING MMIX

Addresses

We prefer to treat the address as a single operand, where it occurs.

90a 〈MMIX basics 87〉+≡ (91d) ⊳ 89c

operand A : #64 bits


Address (Y, Z) : A is $r[Y]+$r[Z]

-- IAddress (Y, Z) : A is $r[Y]+sx Z -- does this exist?

Loading and storing

90b 〈instruction defaults 90b〉≡ (91d) 90c ⊲

LD^[B W T O] (X, A) is $r[X] := sx ([M1 M2 M4 M8] A)

LD^[B W T O]^U (X, A) is $r[X] := zx ([M1 M2 M4 M8] A)

LDHT (X, A) is $r[X] := bitInsert {wide is 0, lsb is 32} (M4 A)

LDA (X, A) is $r[X] := A

Storing signed requires some overflow checks.

90c 〈instruction defaults 90b〉+≡ (91d) ⊳ 90b 90d ⊲

STB (X, A) is M1 A := $r[X] @bits [ 8 bits at 0] | Ovfl.o1 $r[X]

STW (X, A) is M2 A := $r[X] @bits [16 bits at 0] | Ovfl.o2 $r[X]

STT (X, A) is M4 A := $r[X] @bits [32 bits at 0] | Ovfl.o4 $r[X]

STO (X, A) is M8 A := $r[X]

Storing unsigned never overflows.

90d 〈instruction defaults 90b〉+≡ (91d) ⊳ 90c 90e ⊲

STBU (X, A) is M1 A := $r[X] @bits [ 8 bits at 0]

STWU (X, A) is M2 A := $r[X] @bits [16 bits at 0]

STTU (X, A) is M4 A := $r[X] @bits [32 bits at 0]

STOU (X, A) is M8 A := $r[X]

90e 〈instruction defaults 90b〉+≡ (91d) ⊳ 90d 90f ⊲

STHT (X, A) is M4 A := $r[X] @bits [32 bits at 32]

STCO (X, A) is M8 A := zx X

Arithmetic

90f 〈instruction defaults 90b〉+≡ (91d) ⊳ 90e 91a ⊲

ADD (X, Y, Z) is $r[X] := $r[Y] + $r[Z] | Ovfl.add $r[Y] $r[Z]

SUB (X, Y, Z) is $r[X] := $r[Y] - $r[Z] | Ovfl.sub $r[Y] $r[Z]

MUL (X, Y, Z) is

let val p is muls($r[Y], $r[Z])

in $r[X] := p@bits[64 bits at 0] | Ovfl.o8 p

end

DIV (X, Y, Z) is

if $r[Z] <> 0 then

$r[X] := $r[Y] divs $r[Z] | rR := $r[Y] mods $r[Z]

else

$r[X] := 0 | rR := $r[Y]

fi

10.3. PUTTING IT ALL TOGETHER 91

91a 〈instruction defaults 90b〉+≡ (91d) ⊳ 90f 91b ⊲

ADDU (X, Y, Z) is $r[X] := $r[Y] + $r[Z]

SUBU (X, Y, Z) is $r[X] := $r[Y] - $r[Z]

MULU (X, Y, Z) is

let val p is muls($r[Y], $r[Z])

in $r[X] := p@bits[64 bits at 0] | rH := p@bits[64 bits at 64]

end

DIV (X, Y, Z) is

if ugt($r[Z], rD) then

let val big is bitInsert {wide is zx $r[Y], lsb is 64} rD : #128 bits

in $r[X] := (big divu $r[Z])@bits[64 bits at 0] |

rR := (big modu $r[Z])@bits[64 bits at 0]

end

else

$r[X] := rD | rR := $r[Y]

fi

For address arithmetic, the definition using shifting is much easier than Knuth’s integers.

91b 〈instruction defaults 90b〉+≡ (91d) ⊳ 91a 91c ⊲

["2" "4" "8" "16"]^ADDU (X, Y, Z) is

$r[X] := shl(64, $r[Y], [1 2 3 4] : #64 bits) + $r[Z]

91c 〈instruction defaults 90b〉+≡ (91d) ⊳ 91b

NEG (X, Y, Z) is $r[X] := zx Y - $r[Z] | Ovfl.sub (zx Y) $r[Z]

NEGU (X, Y, Z) is $r[X] := zx Y - $r[Z]


This code chunk is expanded into our λ-RTL description of the MMIX.

91d 〈mmix.rtl 91d〉≡module MMix is

import [RTL Vector]


[add and andalso bit bitInsert bool borrow carry com divu divs

modu mods muls mulu ne neg not or orelse quot shl shrl shra subtract sx ugt xor zx

--> := | ; = <> + - < <= > >= ? IEEE754

]


[fadd fcmp fdiv f2f f2i fabs fmulx fsqrt i2f fmul fneg fsub]

〈MMIX basics 87〉default attribute of

〈instruction defaults 90b〉end

Code written to file mmix.rtl.

Chapter 11

Describing the PowerPC (byMatthew Fluet)

This document uses λ-RTL to specify the PPC instruction set.

11.1 Overview

The PPC module encapsulates the λ-RTL definition of the PPC instruction set.

92 〈ppc.rtl 92〉≡module PPC is

〈imports 93a〉〈extensions 93b〉

〈storage spaces 94a〉〈operands 98c〉〈instruction semantics 99b〉

end

Code written to file ppc.rtl.

92

11.1. OVERVIEW 93

The PPC module imports a standard set of RTL operators.

93a 〈imports 93a〉≡ (92)

import [RTL Vector]


[ := --> | ; ?

not conjoin disjoin

and or com xor

neg + carry - borrow mul mulu div divu mod quot rem

zx sx shl shrl shra rotl

bitInsert bitExtract bitTransfer

eq ne = >= > <= < <> geu gtu leu ltu

bit bool

]


[ Round

fadd fdiv fmul fsub fmulx fneg

f2f f2i i2f

]

We extend the set of imported operators with some standard functions.It seems more natural to have all of the arithmetic comparisons use the same naming convention.

93b 〈extensions 93b〉≡ (92) 93c ⊲

val ge is (>=)

val gt is (>)

val le is (<=)

val lt is (<)

The PPC has primitive bitwise operations for “AND with complement,” “OR with complement,”“NAND,” “NOR,” and equivalent.

93c 〈extensions 93b〉+≡ (92) ⊳ 93b 93d ⊲

fun [andc orc](a, b) is [and or](a, com(b))

fun [nand nor](a, b) is com([and or](a, b))

fun eqv (a, b) is com(xor(a, b))

One can also define boolean operations corresponding to XOR and equivalent.

93d 〈extensions 93b〉+≡ (92) ⊳ 93c

fun xjoin (a, b) is disjoin(conjoin(a, not b), conjoin(not a, b))

fun eqjoin (a, b) is not(xjoin(a, b))

94 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)

11.2 Storage spaces

Storage locations manipulated by PPC instructions include memory and several kinds of registers. λ-RTL

limitation: Storage space identifiers must be one character and have strange scoping rules: two storagespaces in different modules cannot share the same identifier, but storage space identifiers must be accessedby module selection in expressions.

94a 〈storage spaces 94a〉≡ (92)

〈Memory 94b〉〈General-Purpose Registers 94c〉〈Floating-Point Registers 95a〉〈Condition Register 95b〉〈Floating-Point Status and Control Register 96a〉〈XER Register 96b〉〈Link Register 97a〉〈Count Register 97b〉〈Instruction Addresses 97c〉〈Time Base Registers 98a〉〈Bogus 98b〉

Memory

Memory in the PPC is byte addressed, using big-endian byte order. λ-RTL limitation: Storage methods(e.g., aggregate using) are not implemented. Hence, any aggregation of the memory storage space is givena bogus aggregation.

94b 〈Memory 94b〉≡ (94a)

module Memory is

storage

’m’ is

cells of 8 bits

called "Memory"


fun mem n is let val n is n : #32 bits in $m[n] end

end

val mem is Memory.mem

General-purpose registers

The PPC has thirty-two 32-bit general purpose registers.

94c 〈General-Purpose Registers 94c〉≡ (94a)

module GPR is

storage

’g’ is

32 cells of 32 bits

called "General-Purpose Registers"

fun gpr n is $g[n]

end

val gpr is GPR.gpr


Floating-point registers

The PPC has thirty-two 64-bit floating point registers.

95a 〈Floating-Point Registers 95a〉≡ (94a)

module FPR is

storage

’f’ is

32 cells of 64 bits

called "Floating-Point Registers"

fun fpr n is $f[n]

end

val fpr is FPR.fpr

Condition register

The PPC has one 32-bit condition register. This register is further divided into eight 4-bit fields. The firsttwo fields can be the implicit results of integer and floating-point instructions respectively.

95b 〈Condition Register 95b〉≡ (94a)

module CR is

storage

’c’ is

1 cell of 32 bits

called "Condition Register"

val cr is $c[0]

fun crf n is $c[0]@loc[4 bits at mul(4 : #3 bits, n : #3 bits)]

fun crb n is $c[0]@loc[1 bit at n : #5 bits]

locations

[LT GT EQ SO] is crb [0..3]

[FX FEX VX OX] is crb [4..7]

end

val cr is CR.cr

val crf is CR.crf

val crb is CR.crb


Floating-point status and control register

The PPC has one 32-bit floating-point status and control register.

96a 〈Floating-Point Status and Control Register 96a〉≡ (94a)

module FPSCR is

storage

’F’ is

1 cell of 32 bits

called "Floating-Point Status and Control Register"

fun fpsrcb n is $F[0]@loc[1 bit at n : #5 bits]

locations

[FX FEX VX OX UX ZX XX] is fpsrcb([0..6])

[VXSNAN VXISI VXIDI VXZDZ VXIMZ VXVC] is fpsrcb([7..12])

[FR FI] is fpsrcb([13..14])

[FPRF] is $F[0]@loc[5 bits at [15] : #5 bits]

[C] is fpsrcb([15])

[FPCC] is $F[0]@loc[4 bits at [16] : #5 bits]

[_] is fpsrcb([20])

[VXSOFT VXSQRT VXCVI] is fpsrcb([21..23])

[VE OE UE ZE XE] is fpsrcb([24..28])

[NI] is fpsrcb([29])

[RN] is $F[0]@loc[2 bits at [30] : #5 bits]

end

XER register

The PPC has one 32-bit XER register.

96b 〈XER Register 96b〉≡ (94a)

module XER is

storage

’x’ is

1 cell of 32 bits

called "XER Register"

val xer is $x[0]

fun xerf n is $x[0]@loc[4 bits at mul(4 : #3 bits, n : #3 bits)]

fun xerb n is $x[0]@loc[1 bit at n : #5 bits]

locations

XER is $x[0]

[SO OV CA] is xerb [0 1 2]

[byte_count] is $x[0]@loc[7 bits at [25] : #5 bits]

end

val xer is XER.xer

val xerf is XER.xerf


Link register

The PPC has one 32-bit link register.

97a 〈Link Register 97a〉≡ (94a)

module LR is

storage

’L’ is

1 cell of 32 bits

called "Link Register"

locations

LR is $L[0]

end

val lr is LR.LR

Count register

The PPC has one 32-bit count register.

97b 〈Count Register 97b〉≡ (94a)

module CTR is

storage

’C’ is 1 cell of 32 bits

called "Count Register"

locations

CTR is $C[0]

end

val ctr is CTR.CTR

Instruction Addresses

The PPC does not specifically define a PC register. The architecture manual specifies branching and linkinginstructions in terms of cia (current instruction address) and nia (new instruction address). This distinctionis reflected by a storage space with two 32-bit cells.

97c 〈Instruction Addresses 97c〉≡ (94a)

module IA is

storage

’I’ is 2 cells of 32 bits

called "Instruction Addresses"

locations

CIA is $I[0]

NIA is $I[1]

end

val cia is IA.CIA

val nia is IA.NIA


Time base registers

The PPC virtual environment architecture defines an addition set of 32-bit time base registers.

98a 〈Time Base Registers 98a〉≡ (94a)

module TBR is

storage

’t’ is 2 cells of 32 bits

called "Time Base Registers"

locations

TBU is $t[0]

TBL is $t[1]

end

val tbu is TBR.TBU

val tbl is TBR.TBL

Bogus

The mtspr, mfspr, and mftb select a special purpose register based on one of the instruction operands. Incases where the operand does not encode a valid register, a 32-bit location may still be required. Hence, weintroduce a bogus storage space.

98b 〈Bogus 98b〉≡ (94a)

module Bogus is

storage

’b’ is

1 cell of 32 bits

called "Bogus"

val bogus is $b[0]

end

val bogus is Bogus.bogus

11.3 Operands

For the most part, the operands can be automatically derived automatically from the SLED file by lrtl

-operands ppc.sled.

98c 〈operands 98c〉≡ (92) 99a ⊲

operand L : #1 bits

operand [ crfD crfS ] : #3 bits

operand [ IMM SR ] : #4 bits

operand [ A B BI BO D MB ME NB S SH TO crbA crbB crbD fA fB fC fD fS ] : #5 bits

operand [ CRM FM ] : #8 bits

operand [ SIMM UIMM d ] : #16 bits

11.4. INSTRUCTION SEMANTICS 99

λ-RTL limitation: The following are not automatically generated. None of the operands appear in thefields of instruction (32) declaration in the SLED file. The reloc operand is a relocatable operand.The spr operand is a synthesized operand derived from the sprL and sprH fields (which, not explicitly usedas operands, are also not generated by lrtl -operands ppc.sled). The tbr is also a synthesized operand.

99a 〈operands 98c〉+≡ (92) ⊳ 98c

operand [ reloc ] : #32 bits

operand [ spr ] : #32 bits

operand [ tbr ] : #32 bits

11.4 Instruction semantics

We group the PPC instructions by functional categories.

99b 〈instruction semantics 99b〉≡ (92)

〈utilities 100a〉〈integer arithmetic instructions 102c〉〈integer compare instructions 107〉〈integer logical instructions 108〉〈integer rotate instructions 113〉〈integer shift instructions 114〉〈floating-point arithmetic instructions 115〉〈floating-point multiply-add instructions 116a〉〈floating-point rounding and conversion instructions 116b〉〈floating-point compare instructions 116c〉〈floating-point status and control register instructions 116d〉〈integer load instructions 117a〉〈integer store instructions 119b〉〈integer load and store multiple instructions 122b〉〈integer load and store string instructions 123c〉〈memory synchronization instructions 123d〉〈floating-point load instructions 123e〉〈floating-point store instructions 123f〉〈floating-point move instructions 124a〉〈branch instructions 124b〉〈condition register logical instructions 126a〉〈system linkage instructions 126c〉〈trap instructions 126d〉〈processor control instructions 126e〉〈cache management instructions 128c〉〈segment register manipulation instructions 128d〉〈lookaside buffer management instructions 128e〉〈external control instructions 129〉


Utilities

We define a number of auxilary functions for handling common cases.One common case is setting CR0 (field 0 of the condition register) corresponding to a result. λ-RTL

limitation: In this case CR.* are slices of one bit into the condition register. In order to give the illusionthat slices are locations that can be assigned to, Norman Ramsey suggests rewriting fetches and stores ofslice locations. That seems intuitive, but in a case like the one above, it seems that one would also need tochange the parallel assignment into a sequential assignement (and do some forward substitution); otherwise,one is left with a non-deterministic assigment of 4 values into the same location. Norman Ramsey’s reply“Yes, this is a serious long-term question. At the moment, we’ve extended the translator so that slices arelegitimate locations that can be assigned to. Exactly what we might do with the information, however, isanyone’s guess. For the time being, you can rely on seeing intermediate code that is more or less what youasked for.”

100a 〈utilities 100a〉≡ (99b) 100b ⊲

fun set_cr0 (res)

is (CR.LT := bit(res < 0) |

CR.GT := bit(res > 0) |

CR.EQ := bit(res = 0))

Another common case is setting the overflow, summary overflow, and carry bits of the CR and XERregisters.

100b 〈utilities 100a〉+≡ (99b) ⊳ 100a 101b ⊲

fun set_ov ov

is (ov --> (XER.SO := bit(true) |

CR.SO := bit(true)) |

XER.OV := bit(ov))

fun set_ca ca

is (XER.CA := bit(ca))


Some rotate and shift instructions compute a mask, having ones in positions x through y (wrapping ifx > y) and zeros elsewhere. The positions x and y can be expressed using 5 bits, but up to 32 bits might betransferred, so we zero extend the positions to 6 bits.

101a 〈utilities – broken 101a〉≡fun mask(x, y)

is

let

val x is zx #5 #6 x

val y is zx #5 #6 y

val m

is

if x < y

then bitTransfer #6 #32 {dst is {lsb is x,

value is 0x0},

src is {lsb is 0,

value is com 0x0},

width is y - x + 1}

else bitTransfer #6 #32 {dst is {lsb is x + 1,

value is com 0x0},

src is {lsb is 0,

value is 0x0},

width is x - y - 1}

fi

in

m

end

Code written to file utilities -- broken.

λ-RTL limitation: For some reason, the “Computing normal forms” pass of lrtl does not terminatewhen the auxilary function slw is defined using mask. The issue is with the expression mask(0, 31 - n).So, we define mask as an rtlop while investigating.

101b 〈utilities 100a〉+≡ (99b) ⊳ 100b 102a ⊲

rtlop mask : #5 bits * #5 bits -> #32 bits


Setting the floating-point status word is horribly complicated.

102a 〈utilities 100a〉+≡ (99b) ⊳ 101b 102b ⊲

fun rounding_mode (x)

is

let

val rn is FPSCR.RN

in

if rn = 0

then Round.nearest

else if rn = 1

then Round.zero

else if rn = 2

then Round.up

else Round.down

fi fi fi

end

fun set_cr1(res) is RTL.SKIP

Some branch instructions automatically word align a value before assigning it to nia.

102b 〈utilities 100a〉+≡ (99b) ⊳ 102a

fun align v

is bitInsert #5 #32 #2 {lsb is 30, wide is v} 0

Integer arithmetic instructions

64-bit instructions: divdx, divdux, mulhdx, mulhdux, mulldAll of the addition, subtraction, and negation instructions are specified using variations of the auxilary

function add’. This function takes a destination location, two source values, a carry-in value and a carryflag, an overflow detection flag, and a result condition flag. The flags determine which condition registerbits are set.

102c 〈integer arithmetic instructions 102c〉≡ (99b) 103 ⊲

fun add’ (d, a, b, (cai, caf), oe, rc)

is

let

val res is (+)((+)(a, b), zx cai)

val ca is carry(a, b, cai)

val ov is (conjoin(a@bits[31] = b@bits[31],

a@bits[31] <> res@bits[31]))

in

rc --> set_cr0 res |

oe --> set_ov ov |

caf --> set_ca (bool ca) |

d := res

end

fun subf’ (d, a, b, (cai, caf), oe, rc)

is

add’ (d, com a, b, (cai, caf), oe, rc)


We instantiate additional auxilary versions to choose values from general purpose registers or sign-extendedimmediates. These auxilary functions additionally choose carry-in values and carry flags.

103 〈integer arithmetic instructions 102c〉+≡ (99b) ⊳ 102c 104a ⊲

fun [add subf](D, A, B, (cai, caf), oe, rc)

is

[add’ subf’](gpr(D), gpr(A), gpr(B), (cai, caf), oe, rc)

fun [addme subfme](D, A, oe, rc)

is

[add’ subf’](gpr(D), gpr(A), neg 1, (XER.CA, true), oe, rc)

fun [addze subfze](D, A, oe, rc)

is

[add’ subf’](gpr(D), gpr(A), 0, (XER.CA, true), oe, rc)

fun addi (D, A, SIMM)

is

add’(gpr(D),

if A = 0 then 0 else gpr(A) fi,

sx SIMM,

(0, false), false, false)

fun addic (D, A, SIMM, rc)

is

add’(gpr(D), gpr(A), sx SIMM, (XER.CA, true), false, rc)

fun addis (D, A, SIMM)

is

add’(gpr(D),

if A = 0 then 0 else gpr(A) fi,

shl(sx SIMM, 16 : #32 bits),

(0, false), false, false)

fun subfic (D, A, SIMM)

is

subf’(gpr(D), gpr(A), sx SIMM, (1, true), false, false)

fun neg’ (D, A, oe, rc)

is

add’ (gpr(D), com(gpr(A)), 1, (0, false), oe, rc)


The individual instructions choose settings for the overflow detection flag and the result condition flag.

104a 〈integer arithmetic instructions 102c〉+≡ (99b) ⊳ 103 104b ⊲


add^["" "c" "e"]^["" "o"]^["" "."] (D, A, B)

is add(D, A, B,

[(0, false) (0, true) (XER.CA, true)],

[false true],

[false true])

addi (D, A, SIMM)

is addi (D, A, SIMM)

addic^["" "."] (D, A, SIMM)

is addic (D, A, SIMM, [false true])

addis (D, A, SIMM)

is addis (D, A, SIMM)

addme^["" "o"]^["" "."] (D, A, B)

is addme(D, A, [false true], [false true])

addze^["" "o"]^["" "."] (D, A, B)

is addze(D, A, [false true], [false true])

neg^["" "o"]^["" "."] (D, A, B)

is neg’(D, A, [false true], [false true])

subf^["" "c" "e"]^["" "o"]^["" "."] (D, A, B)

is subf(D, A, B,

[(1, false) (1, true) (XER.CA, true)],

[false true],

[false true])

subfic (D, A, SIMM)

is subfic (D, A, SIMM)

subfme^["" "o"]^["" "."] (D, A, B)

is subfme(D, A, [false true], [false true])

subfze^["" "o"]^["" "."] (D, A, B)

is subfze(D, A, [false true], [false true])

In order to specify the correct overflow semantics for the multiply instructions, we introduce an auxilaryfunction sov that examines the low and high order 32-bits of a 64-bit value to determine if signed overflowwould occur when truncating the value to 32-bits.

104b 〈integer arithmetic instructions 102c〉+≡ (99b) ⊳ 104a 105 ⊲

fun sov (resLo, resHi)

is

let

val signLo is bitExtract #5 #32 #1 {lsb is 0 : #5 bits, wide is resLo}

in

if resHi = 0

then signLo = 0

else if resHi = com 0

then signLo = 1

else true

fi

fi

end


Using the sov function, we define an auxilary function mul’ that factors out many of the common patternsin the multiplication instructions.

105 〈integer arithmetic instructions 102c〉+≡ (99b) ⊳ 104b 106 ⊲

fun mul’ (mul, d, a, b, lh, oe, rc)

is

let

val res is mul(a, b)

val resLo is bitExtract #6 #64 #32 {lsb is 32 : #6 bits, wide is res}

val resHi is bitExtract #6 #64 #32 {lsb is 0 : #6 bits, wide is res}

val res’ is if lh then resHi else resLo fi

in


oe --> set_ov (sov (resLo, resHi)) |

d := res’

end

fun mullw (D, A, B, oe, rc)

is mul’ (mul, gpr(D), gpr(A), gpr(B), false, oe, rc)

fun mulhw (mul, D, A, B, rc)

is mul’ (mul, gpr(D), gpr(A), gpr(B), true, false, rc)

fun mulli (D, A, SIMM)

is mul’ (mul, gpr(D), gpr(A), sx SIMM, false, false, false)


[mulhw mulhwu]^["" "."] (D, A, B)

is mulhw ([mul mulu], D, A, B, [false true])

mulli (D, A, SIMM)

is mulli (D, A, SIMM)

mullw^["" "o"]^["" "."] (D, A, B)

is mullw (D, A, B, [false true], [false true])


The division instructions are standard. The boolean ok is true if the result of the division can be expressedin 32 bits. When the result cannot be expressed in 32 bits, the contents of general purpose register D areundefined, as are the contents of CR.LT, CR.GT, and CR.EQ. λ-RTL limitation: Quoted from SpecifyingInstructions’ Semantics Using λ-RTL (Interim Report): “As of July 1999, the λ-RTL translator does notsupport the kill effects. In a later implementation, we plan to use the question mark to stand for an “unde-fined” value. All λ-RTL functions and RTL operators will be strict in this value, and RTL.STORE(loc, ?)

will become RTL.KILL loc. Because we would like to use the question mark in our example descriptions,even though λ-RTL doesn’t yet have the machinery to give it its eventual meaning, we introduce it as aconstant.”

106 〈integer arithmetic instructions 102c〉+≡ (99b) ⊳ 105

fun divw (D, A, B, oe, rc)

is

let

val a is gpr(A)

val b is gpr(B)

val dividend is (gpr(A))

val divisor is (gpr(B))

val res is (div)(dividend, divisor)

val ok is not (disjoin(conjoin(a = (0x80000000 : #32 bits),

b = neg (1 : #32 bits)),

b = (0 : #32 bits)))

val res’ is if ok then res else ? fi

in

rc --> set_cr0 res’ |

oe --> set_ov (not ok) |

gpr(D) := res’

end

fun divwu (D, A, B, oe, rc)

is

let

val a is gpr(A)

val b is gpr(B)

val dividend is (gpr(A))

val divisor is (gpr(B))

val res is (divu)(dividend, divisor)

val ok is not (b = 0)

val res’ is if ok then res else ? fi

in

rc --> set_cr0 res’ |

oe --> set_ov (not ok) |

gpr(D) := res’

end


divw^["" "o"]^["" "."] (D, A, B)

is divw(D, A, B, [false true], [false true])


divwu^["" "o"]^["" "."] (D, A, B)

is divwu(D, A, B, [false true], [false true])

Integer compare instructions

The arithmetic and logical comparision instructions can be concisely specified by an auxilary function cmp’

that computes the appropriate condition field value from application of a less-than operator to the sourcegeneral purpose register and a 32-bit source value, and stores the result in the destination condition registerfield. Further auxilary functions compute the 32-bit source value either from a general purpose register oran signed or unsigned immediate value.

The L operand must be 0 for 32-bit implementations. This is enforced by the SLED file.

107 〈integer compare instructions 107〉≡ (99b)

fun cmp’ (lt, crfD, A, b)

is

let

val a is gpr(A) : #32 bits

val c is if lt(a, b)

then 8 -- 0b1000

else if lt(b, a)

then 4 -- 0b0100

else 2 -- 0b0010

fi

fi

val res is bitInsert #2 #4 #1 {lsb is 3, wide is c} XER.SO

in

crf(crfD) := res

end

fun cmp (lt, crfD, A, B)

is cmp’ (lt, crfD, A, gpr(B))

fun cmpi (crfD, A, SIMM)

is cmp’ (lt, crfD, A, sx SIMM)

fun cmpli (crfD, A, UIMM)

is cmp’ (ltu, crfD, A, zx UIMM)


cmp^["" "l"] (crfD, L, A, B)

is cmp([lt ltu], crfD, A, B)

cmpi (crfD, L, A, SIMM)

is cmpi (crfD, A, SIMM)

cmpli (crfD, L, A, UIMM)

is cmpli (crfD, A, UIMM)


Integer logical instructions

64-bit instructions: cntlzdx, extswxThe standard logical instructions can be concisely specified by an auxilary function log’ that computes

the result of applying a logical operation to the source general purpose register and a 32-bit source value,and stores the result in the destination general purpose register. Further auxilary functions compute the32-bit source value either from a general purpose register or an unsigned immediate value.

108 〈integer logical instructions 108〉≡ (99b) 109 ⊲

fun log’ (A, S, b, OP, rc)

is

let

val res is OP(gpr(S), b)

in

rc --> set_cr0(res) |

gpr(A) := res

end

fun log (A, S, B, OP, rc)

is log’ (A, S, gpr(B), OP, rc)

fun logi (A, S, UIMM, OP, rc)

is log’ (A, S, zx UIMM, OP, rc)

fun logis (A, S, UIMM, OP, rc)

is log’ (A, S, shl(zx UIMM, 16), OP, rc)


[and andc eqv nand nor or orc xor]^["" "."] (A, S, B)

is log(A, S, B, [and andc eqv nand nor or orc xor], [false true])

[and]^"i"^["" "s"]^"." (A, S, UIMM)

is

foreach OP in [and]

foreach f in [logi logis]

group

f (A, S, UIMM, OP, true)

end

[or xor]^"i"^["" "s"] (A, S, UIMM)

is

foreach OP in [or xor]

foreach f in [logi logis]

group

f (A, S, UIMM, OP, false)

end


The extend sign byte and extend sign half word instructions are specified in a similar manner. Thedifferent sizes of the extracted value in the two instructions prevents further code-sharing.

109 〈integer logical instructions 108〉+≡ (99b) ⊳ 108 112 ⊲

fun ext’ (A, res, rc)

is

let

in


gpr(A) := res

end

fun extsb (A, S, rc)

is

let

val res is sx #8 #32 (bitExtract #5 #32 #8

{lsb is 24, wide is gpr(S)})

in

ext’(A, res, rc)

end

fun extsh (A, S, rc)

is

let

val res is sx #16 #32 (bitExtract #5 #32 #16

{lsb is 16, wide is gpr(S)})

in

ext’(A, res, rc)

end


exts^["b" "h"]^["" "."] (A, S)

is [extsb extsh](A, S, [false true])


The count leading zeros word instruction is not easily specified in λ-RTL. The architecture manual de-scribes the instruction in pseudo-code as follows:

n ← 0do while n < 32

if rS[n] = 1 then leave

n ← n + 1rA ← n

However, λ-RTL does permit the declaration of recursive functions; hence, a general while construct cannotbe written. One way of specifying the cntlzwx instruction is with a Vector.foldr.

110 〈integer logical instructions – broken 110〉≡ 111 ⊲

fun cntlz (A, S, rc)

is

let

fun zero n is bool((gpr(S))@loc[1 bit at n : #5 bits] : #1 bits)

val (res, _)

is

Vector.foldr (\ (n, (c, b)). if conjoin(not b, zero n)

then (n, true)

else (c, b)

fi)

(0, false)

(Vector.spanning 0 31)

in


gpr(A) := res

end


cntlzw^["" "."] (A, S)

is cntlz (A, S, [false true])

Code written to file integer logical instructions -- broken.


λ-RTL limitation: However, λ-RTL fails to expand this in a concise manner. Furthermore, lrtl dieswith Error: Toolkit bug: impossible value from attribute not by selection: of an if expres-sion. It is unclear where and how the if expression was supposed to be eliminated.

An alternate specification of cntlzx can be given by nested if statements:

111 〈integer logical instructions – broken 110〉+≡ ⊳ 110


is

let

fun zero n is bool((gpr(S))@loc[1 bit at n : #5 bits] : #1 bits)

val res

is

if zero 0

then if zero 1

then if zero 2

then if zero 3

then if zero 4

then if zero 5

then if zero 6

then if zero 7

then if zero 8

then if zero 9

then if zero 10

then if zero 11

then if zero 12

then if zero 13

then if zero 14

then if zero 15

then if zero 16

then if zero 17

then if zero 18

then if zero 19

then if zero 20

then if zero 21

then if zero 22

then if zero 23

then if zero 24

then if zero 25

then if zero 26

then if zero 27

then if zero 28

then if zero 29

then if zero 30

then if zero 31

then 32

else 31 fi

else 30 fi

else 29 fi

else 28 fi

else 27 fi


else 26 fi

else 25 fi

else 24 fi

else 23 fi

else 22 fi

else 21 fi

else 20 fi

else 19 fi

else 18 fi

else 17 fi

else 16 fi

else 15 fi

else 14 fi

else 13 fi

else 12 fi

else 11 fi

else 10 fi

else 9 fi

else 8 fi

else 7 fi

else 6 fi

else 5 fi

else 4 fi

else 3 fi

else 2 fi

else 1 fi

else 0 fi : #32 bits

in


gpr(A) := res

end


cntlzw^["" "."] (A, S)


This specification generates a very large RTL. Instead, we choose to define an RTL operator for theinstruction.

112 〈integer logical instructions 108〉+≡ (99b) ⊳ 109

rtlop cntlz: #n bits -> #k bits


is

let

val res is cntlz (gpr(S))

in


gpr(A) := res

end


cntlzw^["" "."] (A, S)



Integer rotate instructions

64-bit instructions: rldclx, rldcrx, rldicx, rldiclx, rldicrx, rldimixThe integer rotate instructions are standard.

113 〈integer rotate instructions 113〉≡ (99b)

fun rlwnm (A, S, n, MB, ME, rc)

is

let

val r is rotl (32, gpr(S), n)

val m is mask(MB, ME)

val res is and(r, m)

in


gpr(A) := res

end

fun rlwnmi (A, S, n, MB, ME, rc)

is

let


val m is mask(MB, ME)

val res is or(and(r, m), and(gpr(A), com m))

in


gpr(A) := res

end


rlwimi^["" "."] (A, S, SH, MB, ME)

is rlwnm (A, S,

SH,

MB, ME, [false true])

rlwnm^["" "."] (A, S, B, MB, ME)

is rlwnm (A, S,

(bitExtract #5 #32 #5 {lsb is 27, wide is gpr(B)}),


rlwinm^["" "."] (A, S, SH, MB, ME)

is rlwnmi (A, S,

SH,



Integer shift instructions

64-bit instructions: sldx, sradx, sradix, srdxThe integer shift instructions are standard. There may be more patterns that can be factored out, but it

is not immediately obvious.

114 〈integer shift instructions 114〉≡ (99b)

fun slw (A, S, B, rc)

is

let

val n is (bitExtract #5 #32 #5 {lsb is 27, wide is gpr(B)})


val m is if bitExtract #5 #32 #1 {lsb is 26, wide is gpr(B)} = 0

then mask(0, 31 - n)

else 0x0

fi


in


gpr(A) := res

end

fun srw (A, S, B, rc)

is

let

val n is (bitExtract #5 #32 #5 {lsb is 27, wide is gpr(B)})

val r is rotl (32, gpr(S), 32 - n)

val m is if bitExtract #5 #32 #1 {lsb is 26, wide is gpr(B)} = 0

then mask(n, 31)

else 0x0

fi


in


gpr(A) := res

end

fun sraw (A, S, n, rc)

is

let

val r is rotl (32, gpr(S), 32 - n)

val m is if bitExtract #5 #32 #1 {lsb is 26, wide is r} = 0

then mask(0, 31)

else 0x0

fi

val S is bitExtract #5 #32 #1 {lsb is 31, wide is gpr(S)}

val S’ is if bool(S) then com 0x0 else 0x0 fi

val res is or(and(r, m), and(S’, com m))

val ca is and(S, bit(and(r, com m) = 0))

in


XER.CA := ca |

gpr(A) := res


end


slw^["" "."] (A, S, B)

is slw (A, S, B, [false true])

srw^["" "."] (A, S, B)

is srw (A, S, B, [false true])

sraw^["" "."] (A, S, B)

is sraw (A, S,

bitExtract #5 #32 #5 {lsb is 27, wide is gpr(B)},

[false true])

srawi^["" "."] (A, S, SH)

is sraw (A, S,

SH,

[false true])

Floating-point arithmetic instructions

Unimplemented: all – setting of condition codesOptional instructions: fresx, frsqrtex, fselx, fsqrtx, fsqrtsx

115 〈floating-point arithmetic instructions 115〉≡ (99b)

fun fbinop’ (fD, fA, fB, OP, cp, rc)

is

let

val rm is rounding_mode (0)

val res is OP (fpr(fA), fpr(fD), rm)

val res is cp (res, rm)

in


fpr(fD) := res

end

fun funop’ (fD, fA, fB, OP, cp, rc)

is

let


val res is OP (fpr(fA), fpr(fD), rm)


in


fpr(fD) := res

end

fun dp (x, rm) is x

fun sp (x, rm) is f2f #32 #64 (f2f #64 #32 (x, rm), rm)


f^[add div sub]^["" "s"]^["" "."] (fD, fA, fB)

is fbinop’ (fD, fA, fB, [fadd fdiv fsub], [dp sp], [false true])

f^[mul]^["" "s"]^["" "."] (fD, fA, fC)

is fbinop’ (fD, fA, fC, [fmul], [dp sp], [false true])


Floating-point multiply-add instructions

Unimplemented: all – setting of condition codesUse the exact floating-point multiply to compute a precise intermediate result, then use the convert-

precision (cp) function to map the result down to the appropriate precision.

116a 〈floating-point multiply-add instructions 116a〉≡ (99b)

fun fmulop’ (fD, fA, fB, fC, neg, OP, cp, rc)

is

let


val res is fmulx #64 #128 (fpr(fA), fpr(fC))

val b is f2f #64 #128 (fpr(fB), Round.nearest)

val res is OP (res, b, rm)


val res is if neg then fneg res else res fi

in


fpr(fD) := res

end

fun dp (x, rm) is f2f #128 #64 (x, rm)

fun sp (x, rm) is f2f #32 #64 (f2f #128 #32 (x, rm), rm)


f^["" "n"]^m^[add sub]^["" "s"]^["" "."] (fD, fA, fB, fC)

is fmulop’ (fD, fA, fB, fC,

[false true], [fadd fsub], [dp sp], [false true])

Floating-point rounding and conversion instructions

Unimplemented: all64-bit instructions: fcfidx, fctidx, fctidzx

116b 〈floating-point rounding and conversion instructions 116b〉≡ (99b)

Floating-point compare instructions

Unimplemented: all

116c 〈floating-point compare instructions 116c〉≡ (99b)

Floating-point status and control register instructions

Unimplemented: all

116d 〈floating-point status and control register instructions 116d〉≡ (99b)


Integer load instructions

64-bit instructions: ld, ldu, ldux, ldx, lwa, lwaux, lwaxAll load instructions can be factored to use the following auxilary function l’. d is a destination location,

x is an extraction function (to extract and extend the correct number of bits from memory), and ea is theeffective address from which the value is loaded. A and u are used to update general purpose register A withthe effective address. λ-RTL limitation: It would be nice to have an alpha option datatype in λ-RTL,rather than passing both the register number and a flag indicating whether or not to update.

117a 〈integer load instructions 117a〉≡ (99b) 117b ⊲

fun l’ (d, x, ea, A, u)

is

let

in

d := x ea |

u --> gpr(A) := ea

end

Two more auxilary functions distinguish between updating and non-updating loads. A non-updating loadrequires a special case when A = 0.

117b 〈integer load instructions 117a〉+≡ (99b) ⊳ 117a 118 ⊲

fun l (D, x, A, o)

is let val ea is if A = 0 then 0 else gpr(A) fi + o

in l’ (gpr(D), x, ea, A, false) end

fun lu (D, x, A, o)

is let val ea is gpr(A) + o

in l’ (gpr(D), x, ea, A, true) end


We specify various extraction functions, including half-word byte reverse and word byte-reverse.

118 〈integer load instructions 117a〉+≡ (99b) ⊳ 117b 119a ⊲

fun byte ea is zx #8 #32 (mem(ea))

fun hword ea is zx #16 #32 (mem(ea))

fun hwordr ea

is

let

val v1 is mem(ea + 1)


val v’ is 0 : #32 bits

val v’ is bitInsert #5 #32 #8 {lsb is 0, wide is v’} v1


in

v’

end

fun hworda ea is sx #16 #32 (mem(ea))

fun word ea is mem(ea)

fun wordr ea

is

let










in

v’

end


These load and extraction functions can be used to specify the majority of the load instructions.

119a 〈integer load instructions 117a〉+≡ (99b) ⊳ 118


l^[bz ha hz wz]^["" "u"] (D, d, A)

is

foreach x in [byte hworda hword word]

foreach f in [l lu]

group

f(D, x, A, sx #16 #32 d)

end

l^[bz ha hz wz]^["" "u"]^x (D, A, B)

is

foreach x in [byte hworda hword word]

foreach f in [l lu]

group

f(D, x, A, gpr(B))

end

l^[h w]^brx (D, A, B)

is

foreach x in [hwordr wordr]

group

lu(D, x, A, gpr(B))

end

Integer store instructions

64-bit instructions: std, stdu, stdux, stdxA similar factoring technique expresses all store instructions using the following auxilary function st’. v

is the value to be stored, ea is the effective address to which the value is stored. A and u are used to updategeneral purpose register A with the effective address.

119b 〈integer store instructions 119b〉≡ (99b) 119c ⊲

fun st’ (v, ea, A, u)

is

let

in

mem(ea) := v |

u --> gpr(A) := ea

end

Two more auxilary functions distinguish between updating and non-updating stores. A non-updatingstore requires a special case when A = 0.

119c 〈integer store instructions 119b〉+≡ (99b) ⊳ 119b 120 ⊲

fun st (v, A, o)

is st’ (v, if A = 0 then 0 else gpr(A) fi + o, A, false)

fun stu (v, A, o)

is st’ (v, gpr(A) + o, A, true)


We specify various extraction functions, including half-word byte reverse and word byte-reverse.

120 〈integer store instructions 119b〉+≡ (99b) ⊳ 119c 121 ⊲

fun byte S is bitExtract #5 #32 #8 {lsb is 24, wide is gpr(S)}

fun hword S is bitExtract #5 #32 #16 {lsb is 16, wide is gpr(S)}

fun hwordr S

is

let

val v1 is bitExtract #5 #32 #8 {lsb is 24, wide is gpr(S)}





in

v’

end

fun word S is gpr(S)

fun wordr S

is

let










in

v’

end


These store and extraction functions can be used to specify the majority of the store instructions.

121 〈integer store instructions 119b〉+≡ (99b) ⊳ 120


st^[b h w]^["" "u"] (S, d, A)

is

foreach x in [byte hword word]

foreach f in [st stu]

group

f(x S, A, sx #16 #32 d)

end

st^[b h w]^["" "u"]^x (S, B, A)

is

foreach x in [byte hword word]

foreach f in [st stu]

group

f(x S, A, gpr(B))

end

st^[h w]^brx (S, A, B)

is

foreach x in [hwordr wordr]

group

stu(x S, A, gpr(B))

end


Integer load and store multiple instructions

The PPC lmw instruction loads n consecutive words starting at the computed effective address into thegeneral purpose registers D through 31. λ-RTL limitation: One would really like to specify val rs is

Vector.spanning D 31 and eliminate the r >= D tests. However, D is an instantiation-time constant,not a compile-time constant. λ-RTL limitation: lrtl is prohibitively slow when compiling this instruc-tion. Unrolling the Vector.foldr expression produces a very large expression. Furthermore, lrtl recog-nizes that at instantiation-time, the r >= D can be evaluated. However, it does not recognize the fact that1 >= D⇒ 2 >= D. Hence, cases for all 233 possible evaluations of r >= D and A = 0 when r ∈ {0, . . . , 31}are generated. The architecture manual states that the instruction form is invalid if A is in the range ofregisters loaded (i.e., if A ≥ D). This is not checked by the SLED file.

122a 〈integer load and store multiple instructions – broken 122a〉≡ 123a ⊲

fun lmw (D, ea)

is

let

val rs is Vector.spanning 0 31

val (eff,ea) is Vector.foldr

(\(r,(eff,ea)). (eff | r >= D --> gpr(r) := mem(ea),

if r >= D then ea + 4 else ea fi))

(RTL.SKIP, ea)

rs

in

eff

end


lmw (D, d, A)

is lmw(D, (if A = 0 then 0 else gpr(A) fi) + (sx #16 #32 d))

Code written to file integer load and store multiple instructions -- broken.

122b 〈integer load and store multiple instructions 122b〉≡ (99b) 123b ⊲


lmw (D, d, A) is RTL.SKIP


The PPC stmw instruction stores n consecutive words starting at the computed effective address from thegeneral purpose registers D through 31. λ-RTL limitation: See above. The architecture manual states thatthe instruction form is invalid if A is in the range of registers loaded (i.e., if A ≥ D). This is not checked bythe SLED file.

123a 〈integer load and store multiple instructions – broken 122a〉+≡ ⊳ 122a

fun stmw (S, ea)

is

let

val rs is Vector.spanning 0 31

val (eff,ea) is Vector.foldr

(\(r,(eff,ea)). (eff | r >= S --> mem(ea) := gpr(r),

if r >= S then ea + 4 else ea fi))

(RTL.SKIP, ea)

rs

in

eff

end


stmw (S, d, A)

is stmw(S, (if A = 0 then 0 else gpr(A) fi) + (sx #16 #32 d))

123b 〈integer load and store multiple instructions 122b〉+≡ (99b) ⊳ 122b


stmw (S, d, A) is RTL.SKIP

Integer load and store string instructions

Unimplemented: all

123c 〈integer load and store string instructions 123c〉≡ (99b)

Memory synchronization instructions

Unimplemented: eieio, isync, lwarx, stwcx., sync64-bit instructions: ldarx, stdcx.

123d 〈memory synchronization instructions 123d〉≡ (99b)

Floating-point load instructions

Unimplemented: all

123e 〈floating-point load instructions 123e〉≡ (99b)

Floating-point store instructions

Unimplemented: allOptional instructions: stfiwx

123f 〈floating-point store instructions 123f〉≡ (99b)


Floating-point move instructions

Unimplemented: all

124a 〈floating-point move instructions 124a〉≡ (99b)

Branch instructions

There are four types of branch instructions.The first type simply assigns a relocatable operand to nia. It optionally saves cia in the link register.

The SLED file computes reloc from the LI and AA operands. However, b and ba are considered distinctinstructions, although they have identical semantics. Similar comments apply to bl and bla.

124b 〈branch instructions 124b〉≡ (99b) 124c ⊲

fun b (reloc, lk)

is

let

in

lk --> lr := cia + 4 |

nia := reloc

end


b^["" "l"] (reloc)

is b (reloc, [false true])

b^["" "l"]^"a" (reloc)

is b (reloc, [false true])

The remaining types of branch instructions branch conditionally on the values of the condition register,count register, and the operands BO and BI. The auxilary function bc’ computes a thunk to assign a newvalue to the count register and two boolean values.

124c 〈branch instructions 124b〉+≡ (99b) ⊳ 124b 125a ⊲

fun bc’ (BO, BI)

is

let

fun bob n is BO@bits[1 bit at n : #3 bits]

val nctr is if bool(bob 2) then ctr else ctr - 1 fi

val ctr_ok

is bool(or(bob 2, xor(bit(nctr <> 0), bob 3)))

val cond_ok

is bool(or(bob 0, eqv(crb BI, bob 1)))

val set_ctr

is \(x). not (bool (bob 2)) --> ctr := nctr

in

(set_ctr, ctr_ok, cond_ok)

end


The second type of branch instruction updates the count register and branches to reloc if both ctr ok

and cond ok are true. It optionally saves cia in the link register. The SLED file computes reloc from theBD and AA operands. However, bc and bca are considered distinct instructions, although they have identicalsemantics. Similary comments apply to bcl and bcla.

125a 〈branch instructions 124b〉+≡ (99b) ⊳ 124c 125b ⊲

fun bc (BO, BI, reloc, lk)

is

let

val (set_ctr, ctr_ok, cond_ok) is bc’ (BO, BI)

in

set_ctr (0) |

conjoin(ctr_ok, cond_ok) --> b (reloc, lk)

end


bc^["" "l"] (BO, BI, reloc)

is bc (BO, BI, reloc, [false true])

bc^["" "l"]^"a" (BO, BI, reloc)

is bc (BO, BI, reloc, [false true])

The third type of branch instruction does not update the count register and branches to the aligned countregister if cond ok is true. It optionally saves cia in the link register. The architecture manual states thatthe instruction form is invalid if BO[2] = 0. This is not checked by the SLED file.

125b 〈branch instructions 124b〉+≡ (99b) ⊳ 125a 125c ⊲

fun bcctr (BO, BI, lk)

is

let


in

cond_ok --> b(align(ctr), lk)

end


bcctr^["" "l"] (BO, BI)

is bcctr (BO, BI, [false true])

The fourth type of branch instruction updates the count register and branches to the aligned link registerif both ctr ok and cond ok are true. It optionally saves cia in the link register.

125c 〈branch instructions 124b〉+≡ (99b) ⊳ 125b

fun bclr (BO, BI, lk)

is

let


in

set_ctr (0) |

conjoin(ctr_ok, cond_ok) --> b(align(lr), lk)

end


bclr^["" "l"] (BO, BI)

is bclr (BO, BI, [false true])


Condition register logical instructions

The standard condition register logical instructions applies a bitwise operation to two source conditionregister fields, and stores the result in the destination condition register field.

126a 〈condition register logical instructions 126a〉≡ (99b) 126b ⊲

fun crlog (D, A, B, OP) is crb(D) := OP(crb(A), crb(B))


cr^[and andc eqv nand nor or orc xor] (crbD, crbA, crbB)

is crlog (crbD, crbA, crbB, [and andc eqv nand nor or orc xor])

There is also an instruction to copy the contents of one source condition register field to a destinationcondition register field.

126b 〈condition register logical instructions 126a〉+≡ (99b) ⊳ 126a


mcrf (crfD, crfS)

is crf(crfD) := crf(crfS)

System linkage instructions

Unimplemented: scSupervisor-level instructions: rfi, rfid64-bit instructions: rfidOptional 64-bit bridge instruction: rfi

126c 〈system linkage instructions 126c〉≡ (99b)

Trap instructions

Unimplemented: tw, twi64-bit instructions: td, tdi

126d 〈trap instructions 126d〉≡ (99b)

Processor control instructions

Supervisor-level instructions: mfmsr, mtmsr, mtmsrdSupervisor- and user-level instructions: mfspr, mtspr64-bit instructions: mtmsrdOptional 64-bit bridge instructions: mtmsr

Two instructions that move values to and from the condition and XER registers are easily specified.

126e 〈processor control instructions 126e〉≡ (99b) 127a ⊲


mcrxr (crfD)

is (crf(crfD) := xerf(0) | xerf(0) := 0)

mfcr (D)

is gpr(D) := cr


The mfspr, mtspr, mftb instructions use an integer encoding to select a special purpose register. Theauxilary function getSPR returns the location corresponding to the encoding. The architecture manual statesthat when the encoded value is any value other than those defined below, then one of the following mayoccur:

• The system illegal instruction error handler is invoked.

• The system supervisor-level instruction error handler is invoked.

• The results are boundedly undefined.

These semantics are not captured by the λ-RTL specifications below. Instead, we return a bogus locationfor the operation.

127a 〈processor control instructions 126e〉+≡ (99b) ⊳ 126e 127b ⊲

fun getSPR spr

is

let

val dst is if spr = 1 then xer

else if spr = 8 then lr

else if spr = 9 then ctr

else if spr = 268 then tbl

else if spr = 269 then tbu

else bogus

fi fi fi fi fi : #32 loc

in

dst

end

Using the auxilary function, we define the instructions.

127b 〈processor control instructions 126e〉+≡ (99b) ⊳ 127a 128b ⊲


mfspr (D, spr)

is gpr(D) := (getSPR spr)

mtspr (spr, S)

is (getSPR spr) := gpr(S)

mftb (D, tbr)

is gpr(D) := (getSPR tbr)


The move to condition register fields copies the contents of a general purpose register to the conditionregister under the control of the field mask. The field mask is specified by an 8-bit operand which indicateswhich fields of the condition register should be updated. λ-RTL limitation: Many of the issues withVector.foldr described above with the lmw instruction also apply.

128a 〈processor control instructions – broken 128a〉≡default attribute of

mtcrf (CRM, S)

is

let

fun crmb n is CRM@bits[1 bit at n]

val mask is Vector.foldr

(\(n,mask). bitInsert #5 #32 #4

{lsb is mul(4 : #3 bits, n),

wide is mask}

(if bool(crmb(n))

then 0xF

else 0x0

fi))

(0 : #32 bits)

(Vector.spanning 0 7)

in

cr := or(and(gpr(S), mask), and(cr, com mask))

end

Code written to file processor control instructions -- broken.

128b 〈processor control instructions 126e〉+≡ (99b) ⊳ 127b


mtcrf (CRM, S) is RTL.SKIP

Cache management instructions

Unimplemented: dcbf, dcbst, dcbt, dcbtst, dcbz, icbiSupervisor-level instructions: dcbiOptional instructions: dcba

128c 〈cache management instructions 128c〉≡ (99b)

Segment register manipulation instructions

Supervisor-level instructions: mfsr, mfsrin, mtsr, mtsrd, mtsrdin, mtsrinOptional 64-bit bridge instructions: mfsr, mfsrin, mtsr, mtsrd, mtsrdin, mtsrin

128d 〈segment register manipulation instructions 128d〉≡ (99b)

Lookaside buffer management instructions

Supervisor-level instructions: slbia, slbie, tlbia, tlbie, tlbsyncOptional instructions: slbia, slbie, tlbia, tlbie, tlbsync64-bit instructions: slbia, slbie

128e 〈lookaside buffer management instructions 128e〉≡ (99b)


External control instructions

Unimplemented: eciwx, ecowx

129 〈external control instructions 129〉≡ (99b)

Chapter 12

Index

The indices are split into two pieces, one for the SPARC and one for the Intel. We’ll combine them one day.

12.1 List of chunks

〈basic.rtl 20d〉〈definitions of standard RTL operators and functions 21a〉〈definitions of standard RTL operators and functions [[full]] 22b〉〈definitions of standard RTL operators and functions [[simple]] 22a〉〈for every i that is an input, print i and a blank 20b〉〈IEEE 754 floating point 26d〉〈summarize the problem 20a〉〈algebraic laws 35〉〈trap semantics 38f〉〈instruction defaults 38b〉〈instruction defaults [[full]] 41a〉〈instruction defaults [[simple]] 42b〉〈SPARC basics 37a〉〈SPARC basics [[full]] 37c〉〈SPARC utilities 40c〉〈SPARC utilities [[full]] 39f〉〈SPARC utilities [[simple]] 39e〉〈wishful thinking about ldfsr 38d〉〈basics 46b〉〈effective addresses 50a〉〈effective-address utilities 49b〉〈instruction defaults 55c〉〈instruction defaults using infix and 52c〉〈intel.rtl 46a〉〈junk: instruction defaults 54a〉〈junk: instruction defaults using infix and 52b〉

130

12.1. LIST OF CHUNKS 131

〈junk: utilities 53b〉〈new RTL operators 58f〉〈registers 48a〉〈utilities 49a〉〈address modes and operands 62a〉〈address twiddling and fetch 62d〉〈imaginary code 70a〉〈instruction defaults 62c〉〈MIPS basics 60a〉〈MIPS register fetch and store methods 61a〉〈mips utilities 65b〉〈mips utilities [[full]] 64b〉〈mips utilities [[simple]] 65a〉〈mips.rtl 70b〉〈mips.sled 71〉〈number of bytes to shift 62e〉〈number of bytes to shift into right 63b〉〈trap semantics 79a〉〈address modes and operands 77e〉〈ALPHA register fetch and store methods 76d〉〈ALPHA utilities 79b〉〈ALPHA utilities [[full]] 80a〉〈ALPHA utilities [[simple]] 79g〉〈alpha.rtl 80c〉〈composition function 78c〉〈instructions 78d〉〈storage 75〉〈window dressing for trap semantics 80d〉〈window dressing for trap semantics [[full]] 81a〉〈window dressing for trap semantics [[simple]] 81b〉〈memory references 83a〉〈operations 83b〉〈pdp8.lrtl 85c〉〈pdp8.rtl 86〉〈storage spaces 82〉〈instruction defaults 90b〉〈MMIX basics 87〉〈mmix.rtl 91d〉〈Bogus 98b〉〈branch instructions 124b〉〈cache management instructions 128c〉〈Condition Register 95b〉〈condition register logical instructions 126a〉〈Count Register 97b〉〈extensions 93b〉〈external control instructions 129〉

132 CHAPTER 12. INDEX

〈floating-point arithmetic instructions 115〉〈floating-point compare instructions 116c〉〈floating-point load instructions 123e〉〈floating-point move instructions 124a〉〈floating-point multiply-add instructions 116a〉〈Floating-Point Registers 95a〉〈floating-point rounding and conversion instructions 116b〉〈Floating-Point Status and Control Register 96a〉〈floating-point status and control register instructions 116d〉〈floating-point store instructions 123f〉〈General-Purpose Registers 94c〉〈imports 93a〉〈Instruction Addresses 97c〉〈instruction semantics 99b〉〈integer arithmetic instructions 102c〉〈integer compare instructions 107〉〈integer load and store multiple instructions 122b〉〈integer load and store multiple instructions – broken 122a〉〈integer load and store string instructions 123c〉〈integer load instructions 117a〉〈integer logical instructions 108〉〈integer logical instructions – broken 110〉〈integer rotate instructions 113〉〈integer shift instructions 114〉〈integer store instructions 119b〉〈Link Register 97a〉〈lookaside buffer management instructions 128e〉〈Memory 94b〉〈memory synchronization instructions 123d〉〈operands 98c〉〈ppc.rtl 92〉〈processor control instructions 126e〉〈processor control instructions – broken 128a〉〈segment register manipulation instructions 128d〉〈storage spaces 94a〉〈system linkage instructions 126c〉〈Time Base Registers 98a〉〈trap instructions 126d〉〈utilities 100a〉〈utilities – broken 101a〉〈XER Register 96b〉

12.2 Identifier index

+: 23a, 23c, 24c -: 20c, 23a, 23c, 24c, 25c, 26a

12.2. IDENTIFIER INDEX 133

-->: 21a:=: 21a<: 20b, 24a, 24c, 27e, 46a, 49a, 53a, 53b, 54a, 55a,

56a, 57a, 58b, 58c<=: 24a, 24c<>: 22d, 22f>: 23b, 24a, 24c, 27e, 46a, 53a, 53b, 54a, 55a, 57a,

57c, 58a, 58c>=: 24a, 24c?: 26badd: 23a, 23c, 89b, 90f, 91d, 103, 104a, 115, 116aand: 24dandalso: 22a, 22b, 22cbit: 22e, 22fbitExtract: 24dbitInsert: 24d, 24dbitTransfer: 24e, 25c, 26abool: 21a, 21c, 22d, 22e, 22f, 24a, 24bborrow: 23a, 23ccarry: 23a, 23ccom: 24dconjoin: 21c, 22adisjoin: 21c, 22adiv: 23d, 24cdivu: 23d, 24cdo nothing: 21aeq: 22d, 66f, 67d, 68a, 71f2f: 27bf2i: 27bfabs: 26dfadd: 26dfcmp: 27efdiv: 26dfloat eq: 27efloat gt: 27efloat lt: 27efmul: 26dfmulx: 26dfneg: 26dfsqrt: 26dfsub: 26dge: 24a, 93bgeu: 24bgt: 24a, 53b, 93bgtu: 24bi2f: 27b

le: 24a, 93bleu: 24blt: 24a, 53b, 93b, 107ltu: 24bminf: 27cmod: 23d, 24cmodu: 23d, 24cmul: 23dmulu: 23dmzero: 27cNaN: 27dne: 22dneg: 23anot: 21cor: 23d, 24d, 66a, 66b, 67d, 68a, 70b, 71orelse: 22a, 22b, 22cpinf: 27cpopcnt: 25bpzero: 27cquot: 23d, 24crem: 23d, 24crotl: 25crotr: 26ashl: 25ashra: 25ashrl: 25asub: 23a, 89b, 90f, 91csubtract: 23csx: 23bundefined: 26bunordered: 27exor: 24dzx: 23b, 23cdelay: 38ddivide: 41b, 41cdrain fpu pipeline: 38dfloatingCompare: 37droundingMode: 37csparc sdiv overflow: 41bsparc udiv overflow: 41bsub overflows: 40a, 40b, 40c, 40f, 57asubtract instruction: 40c, 40d, 40etag overflows: 39e, 39f, 39h, 40f<: 20b, 24a, 24c, 27e, 46a, 49a, 53a, 53b, 54a, 55a,

56a, 57a, 58b, 58c<==: 53a, 53b, 54a, 55a


>: 23b, 24a, 24c, 27e, 46a, 53a, 53b, 54a, 55a, 57a,57c, 58a, 58c

>>: 51bAC: 51eadc: 56a, 56badcx: 56a, 56badd’: 56a, 102c, 103add adjust: 56aadd overflows: 56a, 79g, 80aaddx: 56a, 56cAF: 46a, 48d, 49a, 57c, 58a, 58eallr: 56a, 56b, 56cb: 50a, 50b, 50c, 51a, 52a, 52a, 52c, 53a, 54a, 55a,

55c, 55d, 56b, 56c, 57b, 58h, 93c, 93d, 98b, 102c,105, 106, 107, 108, 109, 110, 116a, 121, 124b,125a, 125b, 125c

bin: 53a, 53b, 54abin’: 54c, 55abit16At: 59a, 59cbit32At: 59a, 59cbitAt: 59abound: 58c, 58dBR: 51e, 58cbt: 59b, 59cbtc: 59b, 59cbtr: 59b, 59cbts: 59b, 59cbyte: 48b, 49b, 58h, 61c, 62c, 76a, 76b, 78d, 79c,

118, 119a, 120, 121CF: 46a, 48d, 49a, 56a, 57b, 57c, 58a, 58ed: 50a, 50b, 50c, 51a, 52a, 52a, 52c, 53a, 54a, 55a,

55c, 55d, 56b, 56c, 57b, 58gDB: 51eDE: 51eDF: 51eDP: 51edword: 48b, 49b, 61c, 68c, 68dexn: 51e, 58cGP: 51egt: 24a, 53b, 93bhacksub8: 49bI: 50c, 52c, 53a, 54a, 55a, 55c, 56b, 56c, 57b, 89aidsize: 55b, 55c, 56b, 56c, 57bIs: 50c, 55c, 56b, 56c, 57bllr: 52a, 52cllr’: 55b

logical: 54d, 55alogical’: 55b, 55clogical flags: 54b, 54d, 55blsbset: 58f, 58glt: 24a, 53b, 93b, 107M: 50b, 51a, 88aMC: 51eMF: 51emsbset: 58f, 58gNM: 51eNMI: 51eNP: 51eOF: 46a, 48d, 49a, 51e, 58eparity: 48d, 49aPF: 46a, 48d, 49a, 51e, 58eR: 50a, 51a, 52c, 53a, 54a, 55a, 55c, 56b, 56c, 57b,

58gregb: 49b, 50aregd: 49b, 50a, 51a, 55f, 58d, 58h, 59cregw: 49b, 50a, 55f, 58b, 58d, 59cset cf: 58e, 59bset flags: 49a, 54b, 56a, 57aset zf: 58e, 58gSF: 46a, 48d, 49a, 58eSS: 51esub’: 57a, 57bsub adjust: 57asub overflows: 40a, 40b, 40c, 40f, 57atrap: 51e, 79b, 80c, 80d, 81aTS: 51eUD: 51eunary: 55d, 55ew: 50a, 50b, 50c, 51a, 52a, 52a, 52c, 53a, 54a, 55a,

55c, 55d, 56b, 56c, 57b, 58gword: 48b, 49b, 58h, 61c, 62d, 68c, 68d, 69g, 76a,

76b, 78d, 79c, 118, 119a, 120, 121ZF: 46a, 48d, 49a, 58b, 58eByte: 62e, 63b, 64abyte: 48b, 49b, 58h, 61c, 62c, 76a, 76b, 78d, 79c,

118, 119a, 120, 121Condition: 67c, 67d, 68bdispA: 62b, 62c, 63d, 78a, 78d, 78e, 78f, 79a, 79c,

79d, 79edouble: 61d, 68c, 68d, 68e, 69b, 69edword: 48b, 49b, 61c, 68c, 68deq: 22d, 66f, 67d, 68a, 71

12.2. IDENTIFIER INDEX 135

f: 67d, 68a, 68c, 68d, 68e, 69a, 69b, 69d, 69e, 71hword: 61c, 62c, 118, 119a, 120, 121lsbit: 62e, 63a, 63b, 63c, 64aneq: 67d, 68ano overflow: 64b, 65a, 65bnor: 66a, 66b, 71, 93c, 108, 126aoge: 67d, 68aogl: 67d, 68aogt: 67d, 68aole: 67d, 68a, 71olt: 67d, 68a, 71op function: 65b, 65cop overflows: 64bor: 23d, 24d, 66a, 66b, 67d, 68a, 70b, 71RM: 67c, 68c, 68e, 69dsingle: 61d, 68c, 68d, 68e, 69b, 69et: 67d, 68atByte: 62e, 63b, 64aueq: 67d, 68a, 71uge: 67d, 68augt: 67d, 68aule: 67d, 68a, 71ult: 67d, 68a, 71un: 67d, 68a, 71word: 48b, 49b, 58h, 61c, 62d, 68c, 68d, 69g, 76a,

76b, 78d, 79c, 118, 119a, 120, 121add overflows: 56a, 79g, 80aalignTrap: 79a, 79b, 79dbyte: 48b, 49b, 58h, 61c, 62c, 76a, 76b, 78d, 79c,

118, 119a, 120, 121data store error: 81a, 81bdispA: 62b, 62c, 63d, 78a, 78d, 78e, 78f, 79a, 79c,

79d, 79eiff: 80alittleEndian: 75, 76b, 80clong: 76a, 76b, 78d, 79co: 78c, 78docta: 76aquad: 76a, 76b, 78d, 78e, 79c, 79etrap: 51e, 79b, 80c, 80d, 81atrap instruction: 81a, 81bword: 48b, 49b, 58h, 61c, 62d, 68c, 68d, 69g, 76a,

76b, 78d, 79c, 118, 119a, 120, 121accum effect: 83c, 84acma: 85biac: 85b

ral: 84d, 85brar: 84d, 85bskip op: 85a, 85bunary op: 83b, 84a, 84b, 84cadd: 23a, 23c, 89b, 90f, 91d, 103, 104a, 115, 116aD: 89aI: 50c, 52c, 53a, 54a, 55a, 55c, 56b, 56c, 57b, 89aM: 50b, 51a, 88aM1: 88a, 90b, 90c, 90dM2: 88a, 90b, 90c, 90dM4: 88a, 88e, 90b, 90c, 90d, 90eM8: 88a, 90b, 90c, 90d, 90enoFit: 89bO: 89a, 89b, 90bo1: 89b, 90co2: 89b, 90co4: 89b, 90co8: 89b, 90foccur: 89asign: 89bsub: 23a, 89b, 90f, 91ctrip: 88e, 89aU: 89a, 90bV: 89aW: 89a, 90bX: 89a, 89c, 90b, 90c, 90d, 90e, 90f, 91a, 91b, 91cZ: 89a, 89c, 90a, 90f, 91a, 91b, 91cadd: 23a, 23c, 89b, 90f, 91d, 103, 104a, 115, 116aadd’: 56a, 102c, 103addi: 103, 104aaddic: 103, 104aaddis: 103, 104aaddme: 103, 104aaddze: 103, 104aalign: 102b, 125b, 125candc: 93c, 108, 126ab: 50a, 50b, 50c, 51a, 52a, 52a, 52c, 53a, 54a, 55a,

55c, 55d, 56b, 56c, 57b, 58h, 93c, 93d, 98b, 102c,105, 106, 107, 108, 109, 110, 116a, 121, 124b,125a, 125b, 125c

bc: 125abc’: 124c, 125a, 125b, 125cbcctr: 125bbclr: 125cbogus: 98b, 98b, 127a


byte: 48b, 49b, 58h, 61c, 62c, 76a, 76b, 78d, 79c,118, 119a, 120, 121

cia: 97c, 124bcmp: 107cmp’: 107cmpi: 107cmpli: 107cntlz: 110, 111, 112, 112cr: 95b, 95b, 126a, 126e, 128acrb: 95b, 95b, 124c, 126acrf: 95b, 95b, 107, 126b, 126ecrlog: 126actr: 97b, 124c, 125b, 127adivw: 106divwu: 106dp: 115, 116aeqjoin: 93deqv: 93c, 108, 124c, 126aext’: 109extsb: 109extsh: 109fbinop’: 115fmulop’: 116afpr: 95a, 95a, 115, 116afpsrcb: 96afunop’: 115ge: 24a, 93bgetSPR: 127a, 127bgpr: 94c, 94c, 103, 105, 106, 107, 108, 109, 110,

111, 112, 113, 114, 117a, 117b, 119a, 119b, 119c,120, 121, 122a, 123a, 126e, 127b, 128a

gt: 24a, 53b, 93bhword: 61c, 62c, 118, 119a, 120, 121hworda: 118, 119ahwordr: 118, 119a, 120, 121l: 107, 117b, 119a, 124b, 125a, 125b, 125cl’: 117a, 117ble: 24a, 93blmw: 122a, 122blog: 108log’: 108logi: 108logis: 108lr: 97a, 124b, 125c, 127a

lt: 24a, 53b, 93b, 107lu: 117b, 119amask: 101a, 101b, 113, 114, 128amem: 94b, 94b, 118, 119b, 122a, 123amul’: 105mulhw: 105mulli: 105mullw: 105nand: 93c, 108, 126aneg’: 103, 104ania: 97c, 124bnor: 66a, 66b, 71, 93c, 108, 126aorc: 93c, 108, 126arlwnm: 113rlwnmi: 113rounding mode: 102a, 115, 116aset ca: 100b, 102cset cr0: 100a, 102c, 105, 106, 108, 109, 110, 111,

112, 113, 114set cr1: 102a, 115, 116aset ov: 100b, 102c, 105, 106slw: 114sov: 104b, 105sp: 115, 116asraw: 114srw: 114st: 119c, 121st’: 119b, 119cstmw: 123a, 123bstu: 119c, 121subf: 103, 104asubf’: 102c, 103subfic: 103, 104asubfme: 103, 104asubfze: 103, 104atbl: 98a, 127atbu: 98a, 127aword: 48b, 49b, 58h, 61c, 62d, 68c, 68d, 69g, 76a,

76b, 78d, 79c, 118, 119a, 120, 121wordr: 118, 119a, 120, 121xer: 96b, 96b, 127axerb: 96bxerf: 96b, 96b, 126exjoin: 93d

References

Bailey, Mark W. and Jack W. Davidson. 1995 (January).A formal model and specification language for procedure calling conventions.In Conference Record of the 22nd Annual ACM Symposium on Principles of Programming Languages,pages 298–310, San Francisco, CA.

. 1996 (May).Target-sensitive construction of diagnostic programs for procedure calling sequence generators.Proceedings of the ACM SIGPLAN ’96 Conference on Programming Language Design and Implemen-tation, in SIGPLAN Notices, 31(5):249–257.

Bala, Vasanth and Norman Rubin. 1995 (November 29–December 1,).Efficient instruction scheduling using finite state automata.In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 46–56, AnnArbor, Michigan.

Benitez, Manuel E. and Jack W. Davidson. 1988 (July).A portable global optimizer and linker.Proceedings of the ACM SIGPLAN ’88 Conference on Programming Language Design and Implemen-tation, in SIGPLAN Notices, 23(7):329–338.

Davidson, Jack W. and Christopher W. Fraser. 1980 (April).The design and application of a retargetable peephole optimizer.ACM Transactions on Programming Languages and Systems, 2(2):191–202.

Emmelmann, Helmut, Friedrich-Wilhelm Schroer, and Rudolf Landwehr. 1989 (July).BEG — a generator for efficient back ends.Proceedings of the ACM SIGPLAN ’89 Conference on Programming Language Design and Implemen-tation, in SIGPLAN Notices, 24(7):227–237.

Fauth, Andreas, Johan Van Praet, and Markus Freericks. 1995 (March).Describing instruction set processors using nML.In The European Design and Test Conference, pages 503–507.

Fernandez, Mary F. 1995 (November).A Retargetable Optimizing Linker.PhD thesis, Dept of Computer Science, Princeton University.

Fraser, Christopher W., Robert R. Henry, and Todd A. Proebsting. 1992 (April).BURG—fast optimal instruction selection and tree parsing.SIGPLAN Notices, 27(4):68–76.

137

138 REFERENCES

Griswold, Ralph E. 1982 (October).The evaluation of expressions in Icon.ACM Transactions on Programming Languages and Systems, 4(4):563–584.

Intel Corporation. 1993.Architecture and Programming Manual. Vol. 3 of Pentium Processor User’s Manual.Mount Prospect, IL.

Larus, James R. and Eric Schnarr. 1995 (June).EEL: machine-independent executable editing.Proceedings of the ACM SIGPLAN ’95 Conference on Programming Language Design and Implemen-tation, in SIGPLAN Notices, 30(6):291–300.

Larus, James R. 1990 (September).SPIM S20: A MIPS R2000 simulator.Technical Report 966, Computer Sciences Department, University of Wisconsin, Madison, WI.

Lipsett, R., C. Schaefer, and C. Ussery. 1993.VHDL: Hardware Description and Design. 12 edition.Kluwer Academic Publishers.

Milner, Robin, Mads Tofte, and Robert W. Harper. 1990.The Definition of Standard ML.Cambridge, Massachusetts: MIT Press.

Milner, Robin. 1978 (December).A theory of type polymorphism in programming.Journal of Computer and System Sciences, 17:348–375.

Morrisett, Greg. 1995 (December).Compiling with Types.PhD thesis, Carnegie Mellon.Published as technical report CMU–CS–95–226.

Proebsting, Todd A. and Christopher W. Fraser. 1994 (January).Detecting pipeline structural hazards quickly.In Conference Record of the 21st Annual ACM Symposium on Principles of Programming Languages,pages 280–286, Portland, OR.

Ramsey, Norman and Mary F. Fernandez. 1997 (May).Specifying representations of machine instructions.ACM Transactions on Programming Languages and Systems, 19(3):492–524.

Ramsey, Norman and David R. Hanson. 1992 (July).A retargetable debugger.ACM SIGPLAN ’92 Conference on Programming Language Design and Implementation, in SIGPLANNotices, 27(7):22–31.

Ramsey, Norman. 1994 (September).Literate programming simplified.IEEE Software, 11(5):97–105.

REFERENCES 139

. 1996 (April).A simple solver for linear equations containing nonlinear operators.Software—Practice & Experience, 26(4):467–487.

SPARC International. 1992.The SPARC Architecture Manual, Version 8.Englewood Cliffs, NJ: Prentice Hall.

Srivastava, Amitabh and Alan Eustace. 1994 (June).ATOM: A system for building customized program analysis tools.Proceedings of the ACM SIGPLAN ’94 Conference on Programming Language Design and Implemen-tation, in SIGPLAN Notices, 29(6):196–205.

Stallman, Richard M. 1992 (February).Using and Porting GNU CC (Version 2.0).Free Software Foundation.

Thomas, Donald and Philip Moorby. 1995.The Verilog Hardware Description Language. 2nd edition.Norwell, USA: Kluwer Academic Publishers.

Thompson, T. 1996 (February).An Alpha in PC clothing.Byte, pages 195–196.

Wang, Daniel C., Andrew W. Appel, Jeff L. Korn, and Christopher S. Serra. 1997 (October).The Zephyr abstract syntax description language.In Proceedings of the 2nd USENIX Conference on Domain-Specific Languages, pages 213–227, SantaBarbara, CA.

Tufts CSnr/toolkit/working/sml/rtl/specifying.pdfAbstract The Zephyr project is part of an eﬀort...

Documents

Transcript of Tufts CSnr/toolkit/working/sml/rtl/specifying.pdfAbstract The Zephyr project is part of an eﬀort...