Tufts CSnr/toolkit/working/sml/rtl/specifying.pdfAbstract The Zephyr project is part of an effort...
Transcript of Tufts CSnr/toolkit/working/sml/rtl/specifying.pdfAbstract The Zephyr project is part of an effort...
Specifying Instructions’ Semantics Using λ-RTL
(Interim Report)
Norman Ramsey and Jack W. Davidson
Department of Computer Science
University of Virginia
Charlottesville, VA 22903
August 5, 2003
ii
Abstract
The Zephyr project is part of an effort to build a National Compiler Infrastructure,which will support research in compiling techniques and high-performance computing.Compilers work with source code, abstract syntax, intermediate forms, and machineinstructions. By using high-level descriptions of the representations and semantics ofthese forms, we expect to be able to create compiler components that will be usable withdifferent source languages, front ends, and target machines.
To help deal with multiple machines, we are developing a family of Computer SystemsDescription Languages (CSDL) to describe properties that are relevant to the construc-tion of compilers and other systems software. The languages describe properties of amachine’s instructions or its mutable state, or both. Of particular interest is the de-scription of the semantics of instructions, i.e., their effects on the state of the machine.This report describes our design of λ-RTL, a CSDL language for specifying instructions’semantics.
We describe the effects of instructions using register transfer lists (RTLs). A registertransfer list is a collection of assignments to locations, which represent registers, memory,and all other mutable state. We prescribe a form of RTLs that makes it explicit how tocompute the values assigned and on what state the computation depends. The form alsomakes byte order explicit and provides for instructions whose effects may be undefined.
Because our form of RTLs contains so much information, it is convenient for use bytools, but it would be tedious to write RTLs by hand. λ-RTL, which is based on theλ-calculus and on register transfer lists, is a metalanguage designed to make it easier forpeople to write RTLs and to associate them with machine instructions. It enables usto omit substantial information from hand-written specifications; the λ-RTL translatorinfers the missing information and puts the resulting RTLs into canonical form. λ-RTLalso provides a “grouping” construct designed to help specify large groups of similarinstructions.
This report presents a short overview of λ-RTL, followed by examples. The examplesinclude a description of the SPARC and excerpts from a description of the Pentium. Thereport also includes λ-RTL definitions of a library of basic operators, which we believewill be useful for describing a wide variety of machines.
ii
Contents
1 Overview of CSDL 1Machine descriptions for machine-level tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Core aspects for CSDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Combining instructions and storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Languages in the CSDL family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Register Transfer Lists 6Form of RTLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Types in RTLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7RTL languages and translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Interpretations of RTLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Using λ-RTL to specify register transfer lists 10Design considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Restrictions eased in λ-RTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Implicit fetches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Implicit aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Overview of λ-RTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Top-level declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Inner declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Lexical conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18The initial basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Differences between λ-RTL and Standard ML . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 A Library of standard RTL operators 204.1 Building RTLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Booleans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
iii
iv CONTENTS
4.3 Integer arithmetic and comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.4 Logical operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.5 Undefined effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.6 Floating-point arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 The meanings of the standard RTL operators 285.1 Known algebraic laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2 More SPARC description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
More locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37More Loads and Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38More integer ALU instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39More control flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42State Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Miscellaneous instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Floating point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6 Describing the Intel Pentium 456.1 Storage spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Effective addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Load effective address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Real instructions, for serious . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7 Describing the MIPS R?000 607.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607.2 Storage spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Integer Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Integer Unit control/status registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Aggregating cells in storage spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.3 MIPS instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Types of MIPS operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62MIPS addresses and operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Load instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Multiply and Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Logical Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Shift Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Set instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Load upper immediate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Jump and branch instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.4 Floating Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
CONTENTS v
Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69Moving to and from the FPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.5 Coprocessor instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.6 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.7 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.8 What this specification does not do... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717.9 SLED specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8 Describing the Alpha 758.1 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75Explicit Size Depiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Integer Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Floating Point Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Special-Purpose Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Program Counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.2 ALPHA instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Types of ALPHA operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77ALPHA addresses and operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78A Useful Composition Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78Load Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Logical Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Add instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.3 Lambda RTL File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9 Describing the PDP-8 829.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829.2 Storage Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839.4 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10 Describing MMIX 8710.1 Storage spaces: memory and registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8710.2 MMIX instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Trips and traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Integer overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90Loading and storing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.3 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
vi CONTENTS
11 Describing the PowerPC (by Matthew Fluet) 9211.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9211.2 Storage spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94General-purpose registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Floating-point registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Condition register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Floating-point status and control register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96XER register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96Link register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Count register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Instruction Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Time base registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98Bogus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
11.3 Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9811.4 Instruction semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100Integer arithmetic instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102Integer compare instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107Integer logical instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108Integer rotate instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113Integer shift instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114Floating-point arithmetic instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115Floating-point multiply-add instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116Floating-point rounding and conversion instructions . . . . . . . . . . . . . . . . . . . . . . . 116Floating-point compare instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116Floating-point status and control register instructions . . . . . . . . . . . . . . . . . . . . . . 116Integer load instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117Integer store instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119Integer load and store multiple instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122Integer load and store string instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Memory synchronization instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Floating-point load instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Floating-point store instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Floating-point move instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124Branch instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124Condition register logical instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126System linkage instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126Trap instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126Processor control instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126Cache management instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128Segment register manipulation instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128Lookaside buffer management instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128External control instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
CONTENTS vii
12 Index 13012.1 List of chunks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13012.2 Identifier index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
References 137
Chapter 1
Overview of CSDL
Machine descriptionsfor machine-level tools
Special-purpose and general-purpose computers aredesigned at a rapid pace, and software tools andtechnology don’t always keep up. The tools we needinclude not only the compilers, assemblers, link-ers, and debuggers familiar to all programmers, butalso less familiar tools, like profilers, tracers, test-coverage analyzers, and general code-modificationtools. Most of these tools must work with machineinstructions.
Machine-dependent detail makes it hard to buildmachine-level tools. For many years, compilershave used machine descriptions to capture such de-tail. Machine descriptions isolate target-specific in-formation so that it can easily be examined andchanged. Despite their successful use in compilers,machine descriptions are seldom used to build othersystems software. Descriptions used in compilersare hard to reuse because they typically combineinformation about the target machine with infor-mation about the compiler. For example, “machinedescriptions” written using tools like BEG (Emmel-mann, Schroer, and Landwehr 1989) and BURG(Fraser, Henry, and Proebsting 1992) are actuallydescriptions of code generators, and they dependnot only on the target machine but also on a partic-ular intermediate language. In extreme cases (e.g.,gcc’s md files), the description formalism itself de-pends on the compiler.
We believe we can simplify construction of com-pilers and other machine-level tools by developingdescription techniques that separate machine prop-erties from compiler concerns. We expect this capa-bility to be useful in a National Compiler Infrastruc-ture because it will lead to a back-end infrastructurethat should be usable not only with many differenttarget machines, but also with different source lan-guages, front ends, and intermediate languages.
Some existing languages for machine description,like VHDL (Lipsett, Schaefer, and Ussery 1993)and Verilog (Thomas and Moorby 1995) do describeonly properties of the machine, but they are at toolow a level, describing implementations as much asarchitectures. These description languages requiretoo much detail that is not needed to build systemssoftware.
We are designing a family of Computer SystemsDescription Languages (CSDL) to support a varietyof machine-level tools while remaining independentof any one in particular. We have several goals forCSDL:
• Descriptions should be composed from simplecomponents. Each component should describe,as much as possible, a single aspect of thetarget machine. Such aspects might includecalling conventions, representations of instruc-tions, pipeline implementations, memory hier-archy, or other properties. It should not alwaysbe necessary to describe aspects completely.For example, most compilers use only a sub-set of the instructions available on a particular
1
2 CHAPTER 1. OVERVIEW OF CSDL
machine, and a compiler writer should be re-quired to describe only that subset.
• An application writer should be able to deriveuseful tools not only when the components de-scribing individual aspects are incomplete, butalso when not all aspects are described. Anapplication writer should not have to describeaspects of the machine unless the informationis needed to build his application. For exam-ple, an application writer working entirely atthe assembly-language level or above shouldnot have to describe binary representations ofinstructions. Someone writing a garbage col-lector might need to describe the stack-framelayouts determined by a calling convention, buthe should not have to describe the instruc-tions used in the calling sequences that estab-lish those layouts.
• Components, especially those describing “coreaspects,” should be reusable. For example,writers of all specifications should benefit fromhaving a common formalization of what itmeans to be a “SPARC instruction supportedby the bare hardware.”
CSDL forms a family of languages because all fam-ily members have the same view of two core aspectsof machines: instructions and state. This report ex-plains the CSDL view of these core aspects, and itpresents a rationale for and examples of the mostimportant member of the CSDL family, which de-scribes the semantics of machine instructions.
Core aspects for CSDL
To identify core aspects to be used throughoutthe CSDL family, we examined descriptions usedto help retarget a variety of systems-level tools.These tools included an optimizer (Benitez andDavidson 1988), a debugger (Ramsey and Han-son 1992), an instruction scheduler (Proebsting andFraser 1994), a call-sequence generator (Bailey andDavidson 1995), a linker (Fernandez 1995), and anexecutable editor (Larus and Schnarr 1995). Cur-sory inspection showed no single common aspect of
a machine used in descriptions for all of these tools,but a closer look revealed that all the descriptionsrefer either to the machine’s instruction set or toits storage locations, and some to both. For ex-ample, the specifications used by the scheduler andlinker refer only to the machine’s instructions andthe properties thereof. The specifications used inthe call-sequence generator and in the debugger’sstack walker refer only to storage, explaining in de-tail how values move between registers and mem-ory. The specifications used in the optimizer andthe executable editor refer both to instructions andto storage, and in particular, to how instructionschange the contents of storage. From these obser-vations, we have chosen to require that languages inthe CSDL family refer to instructions, storage, orboth, and that they use the models of instructionsand storage presented below.
Instructions
The CSDL model of an instruction set is a list ofinstructions together with some identifying infor-mation about their operands. Although instruc-tion names in assembly languages are typically over-loaded, CSDL requires instructions to have uniquenames, because tools often need uniquely namedcode for each instruction in an instruction set. Forexample, an assembler might use a unique C pro-cedure to encode each instruction, or an executableeditor might use a unique element of a C union torepresent an instance of each instruction.
An individual instruction is viewed as a functionor constructor that, when applied to operands, pro-duces something interesting: a binary representa-tion, an assembly-language string, a semantics, etc.Instruction descriptions that use the CSDL core willinclude the names and types of the operands of eachinstruction. Types will include integers of varioussizes, but it will also be possible to introduce newtypes to define such machine-dependent conceptsas effective addresses. Values of these new typesmay be created by applying suitable constructors;for example, Chapter 6 shows how a Pentium ef-fective address may be formed by applying any of
COMBINING INSTRUCTIONS AND STORAGE 3
9 constructors, each one representing a different ad-dressing mode.
The structure defined by CSDL constructors andtheir operands can be viewed as a discriminated-union type. It is also equivalent to an ASDL gram-mar (Wang et al. 1997), without recursion, in whichthe start symbol is “instruction,” other nontermi-nals refer to machine-level concepts like “effectiveaddress” or “integer-instruction operand,” and ter-minal symbols are defined by integers or addresses.This structure is a simplification of the “construc-tor” from the Specification Language for Encodingand Decoding (SLED) of the New Jersey Machine-Code Toolkit (Ramsey and Fernandez 1997). Itis determined by the machine, independent of anytool, so it should be useful in any specification lan-guage that deals with individual machine instruc-tions. For example, the nML machine-descriptionlanguage (Fauth, Praet, and Freericks 1995) usesthis structure, although nML is otherwise quite dif-ferent from SLED.
Properties of instructions
Experience with the New Jersey Machine-CodeToolkit shows that many applications can be builtfrom specifications that discuss only instructionsand their properties, with no reference to storage.Such applications include assemblers and disassem-blers, as well as code generators that work by pat-tern matching on the names of instructions.
We specify properties of instructions in a compo-sitional style, so instructions’ properties are func-tions of the properties of their operands. We formal-ize that style using attribute grammars. For exam-ple, assembly-language representations of instruc-tions can be computed as synthesized attributes,where the attributes are strings. Binary representa-tions can also be computed as attributes, where theattributes are SLED “patterns.” Attributes readilysupport machines like the 68000 family, in whichthe representations of effective addresses depend oncontext.
Storage
The CSDL model of a storage space is a sequenceof mutable cells. A storage space is like an array;cells are all the same size, and they are indexedby integers. For example, a typical microprocessorhas a memory made up of 8-bit cells (bytes) and aregister file made up of 32-bit cells. The number ofcells in a storage space may be left unspecified.
The state of a machine can be described as thecontents of a collection of storage spaces. We usestorage spaces to model main memory, general-purpose registers, special-purpose registers, condi-tion codes, and so on.
Experience with CCL, a Calling Convention Lan-guage (Bailey and Davidson 1995), shows that ap-plications can be built from specifications that dis-cuss only storage, with no reference to instructions.For example, calling conventions can be describedby discussing the placement of parameters in stor-age cells and the the effects of calls and returnson storage. From a CCL description one can gen-erate procedure prologs and epilogs in a compiler,and one can also generate code to test compilers’implementations of calling conventions (Bailey andDavidson 1996). One might be able to derive stackwalkers or exception-handling code from similar de-scriptions. A garbage collector may need to knowwhich locations in storage can contain pointers; thisis a property of an application, not of a machine,but it can be described using the CSDL storagecore. (We want to describe application propertiesas well as machine properties, but we want to keepthem separate.)
Languages in the CSDL family may refer to in-dividual locations. Ways of writing locations mayvary, but each one must resolve to a name of a stor-age space and an integer offset identifying a cellwithin that storage space.
Combining instructions
and storage
An architecture manual tells programmers whatconstitutes the state of the processor, and it lists
4 CHAPTER 1. OVERVIEW OF CSDL
instructions and their operands, but the most im-portant thing it does is explain the semantics ofeach instruction in terms of that instruction’s ef-fects on the state of the processor. Many applica-tions can be built based on this information, butsome require more detail than others. For example,
• Information about control flow is enough tobuild control-flow graphs.
• Information about calling conventions may beenough to recognize procedure calls and re-turns.
• Information about locations read and written isenough to build data-flow graphs. Such graphs,together with the ability to match individualinstructions, may suffice to build code-editingtools like EEL (Larus and Schnarr 1995) orATOM (Srivastava and Eustace 1994).
• Information about register-transfer semanticsis enough to build code improvers in the style ofPO (Davidson and Fraser 1980), vpo (Benitezand Davidson 1988), and gcc (Stallman 1992).These code improvers work by pattern match-ing, so they need not know what all of theregister-transfer operators do. They do, how-ever, need at least a partial semantics, to per-form strength reduction, constant folding, andsimilar transformations.
• To build a code generator, one needs to knowmore about the operators used in the regis-ter transfers. In particular, one needs to knowenough so that one can find a sequence of in-structions to implement each operation in thecompiler’s intermediate code. It is sufficientto know that the intermediate code and theinstructions compute the same operation; oneneed not know exactly what the operations doto the bits.
• To build an emulator like SPIM (Larus 1990)or a binary translator like FX!32 (Thomp-son 1996), one needs enough information aboutthe operations in the register transfers to inter-pret the the effect of each register transfer oneach bit of the processor’s state.
We believe that these applications and more can beserved by providing register-transfer semantics forinstructions. The effect of a particular instructioncan be specified as a register-transfer list (RTL),which modifies storage cells. As with other proper-ties of instructions, the RTL is computed using anattribute grammar.
In principle, a single register-transfer descriptioncould support all of the different uses above. Asingle RTL could be given different interpretations,depending on its intended use. We believe we cansupport this mode of specification by prescribinga rigid form for RTLs, while supplying suitable ab-straction mechanisms for use within that form. Ab-straction mechanisms support our goals of enablingapplication writers to leave irrelevant facts unspeci-fied. For example, we want to make it easy to writean RTL that means “register rd is assigned an un-specified function of registers rs and rt.”
Languages in the CSDL family
We consider SLED, for specifying representationsof instructions (Ramsey and Fernandez 1997), andCCL, for specifying calling conventions (Bailey andDavidson 1995), to be the first languages in theCSDL family. We are developing a new language,λ-RTL, for specifying instruction semantics. We ex-pect that CSDL will expand to include languagesfor specifying properties of memory hierarchies andof pipelines. Indeed, the simple functional-resourcelanguages of Proebsting and Fraser (1994) and Balaand Rubin (1995) fit nicely into the CSDL frame-work.
This report focuses on λ-RTL. The report doesnot give a specification or definition of λ-RTL, be-cause λ-RTL is not sufficiently developed for a spec-ification or definition to be worthwhile. Instead,this report presents some essential properties of λ-RTL, and it gives examples of machine specifica-tions using the current, imperfect version. Chap-ter 2 describes the form of register transfer lists usedin λ-RTL. Chapter 3 presents λ-RTL and discussesits translation into RTLs. Chapter 4 describe somebasic content which fits into that form, and which
LANGUAGES IN THE CSDL FAMILY 5
we believe will be useful for describing many ma-chines. Chapter ?? presents a λ-RTL descriptionsof the SPARC processor. Chapter 6 presents de-scriptions of a few aspects of the Pentium processorsthat have no analogs on the SPARC, like addressingmodes whose meanings depend on context.
Chapter 2
Register Transfer Lists
Computer scientists have used register transfersin many different forms. For λ-RTL, we have chosena form designed for use by tools, not by people. Wehave therefore insisted that as much information aspossible be explicit in the RTL itself. Under ourcurrent plan,
• RTLs are represented as trees.
• All operators are fully disambiguated, e.g., asto type and size.
• There is no aliasing of locations.
• Fetches are explicit, as are changes in the sizeor type of data.
• Stores are annotated with the size of the datastored.
• Explicit tree nodes specify byte order. Moregenerally, they specify how to transfer data be-tween storage spaces of different granularity.
• RTL is a typed representation. We plan to gen-erate RTL type checkers from CSDL specifi-cations. Optimizations and other transforma-tions that are intended to preserve semanticsshould also preserve the property of being welltyped. Typing is a good way to catch bugs inoptimizers (Morrisett 1995).
The form of RTLs proposed here may be suitablenot just for specification, but also for use in the im-plementations of compilers, binary translators, andother tools.
〈ASDL specification of the form of RTLs〉≡ty = (int) -- size of a value, in bits
exp = CONST (const)
| FETCH (location, ty)
| APP (operator, exp*)
location = AGG(aggregation, cell)
cell = CELL(space, exp)
effect = STORE(location dst, exp src, ty)
| KILL(location)
| KILLALL(space)
guarded = GUARD(exp, effect)
rtl = RTL (guarded*)
Figure 2.1: ASDL specification of the form of RTLs
Form of RTLs
Figure 2.1 uses the Zephyr Abstract Syntax De-scription Language (Wang et al. 1997) to show theform of RTLs. A register transfer list is a list ofguarded effects. Each effect represents the transferof a value into a storage location, i.e., a store opera-tion. The transfer takes place only if the guard (anexpression) evaluates to true. Locations may besingle cells or aggregates of consecutive cells withina storage space. Values are computed by expres-sions without side effects, which simplifies analy-sis and transformation. These expressions includeinteger constants, fetches from locations, and ap-plications of RTL operators to lists of expressions.Effects in a list take place simultaneously, as in Di-jkstra’s multiple-assignment statement, so one RTLrepresents one change of state. This decision makes
6
TYPES IN RTLS 7
it possible to specify swap instructions without hav-ing to introduce temporary locations.
Not every effect is a true assignment. A kill effectchanges the value in a storage location in an unde-fined way. Kill effects are needed to model sucharchitectural specifications as “the effect of a logi-cal instruction on the AF flag is undefined.”
The “kill all” effect in Figure 2.1 kills all of thecells in the given storage space. It is needed to givea semantics to certain phrases of λ-RTL. We haven’tdemonstrated a need for it yet, but it may be usefulto model stores into arbitrary memory locations.
As an example of a typical RTL, consider aSPARC load instruction using the displacement ad-dressing mode, written in the SPARC assembly lan-guage as
〈sample SPARC instruction〉≡ld [%sp-12], %i0
Although we would not want to specify just a sin-gle instance of a single instruction, the effect of thisload instruction might be notated in λ-RTL as fol-lows:1
〈λ-RTL for sample instruction〉≡$r[24] <-- $m[$r[14]+sx(~12)]
because the stack pointer is register 14 and registeri0 is register 24. The corresponding RTL is muchmore verbose, with the sizes of all quantities iden-tified explicitly, as a fully disambiguated tree:
STORE #32
AGG B #32 #32
CELL ’r’
CONST #5
24
FETCH #32
AGG B #8 #32
CELL ’m’
ADD #32
FETCH #32
AGG B #32 #32
CELL ’r’
CONST #5
14
SX #13 #32
CONST #13
-12
The various constants labeled with hash marks,like #32, indicate the number of bits in arguments,
1The ~ in ~12 is a unary minus.
results, or data being transferred. Such constantswill fit into a generalization of the Hindley-Milnertype system (Milner 1978).
Figure 2.2 shows the operators used in this tree.The left child of the STORE is a subtree represent-ing the location consisting of the single register i0,which is register 24. The right-hand child representsa 32-bit word (a big-endian aggregration of fourbytes) fetched from memory at the address givenby the subtree rooted at ADD. This node adds thecontents of the stack pointer (register 14) to theconstant −12. The constant is a 13-bit constant,and the SX operator sign-extends it to 32 bits, so itcan be added to the stack pointer.
Types in RTLs
RTL is a typed format. The type system includesthe following types:
#n bits A value that is n bits wide.
#n loc A location containing an n-bit value.
#n cell One of a sequence of n-bit storage cells,which can be aggregated together tomake a larger location, as by the AGG
B nodes in the example tree.
bool A Boolean condition.
effect A side effect on storage.Figure 2.2 uses these types to show the types ofthe operators used above. We have extended Mil-ner’s type inference to this system, so that thosewriting specifications in λ-RTL can omit types andwidths. Unlike in ML, type inference alone does notguarantee that terms make sense; in general, it willbe necessary to check additional constraints. Forexample, in the RTL shown above, it would be nec-essary to check that the signed integer −12 can berepresented using 13 bits, and that 32 is a multipleof both 8 and 32. We are currently investigatingways of formalizing and checking such constraints.
RTL languages and translation
Although we have prescribed the form of RTLs, wecan’t write any RTLs until we choose a set of lo-
8 CHAPTER 2. REGISTER TRANSFER LISTS
STORE : ∀#n.#n loc× #n bits→ effect Store an n-bit value in a given location. The type indicates thatfor any n, STORE #n takes an n-bit location and an n-bit valueand produces an effect.
FETCH : ∀#n.#n loc→ #n bits For any n, FETCH #n takes an n-bit location and returns the n-bitvalue stored in that location.
AGG B : ∀#n.∀#w.#n cell→ #w loc For any n and w, AGG B #n #w aggregates an integral number ofn-bit cells into a w-bit location, making the first cell the mostsignificant part of the new location, i.e., using big-endian byteorder. w must be a multiple of n. (w and n are mnemonic forwide and narrow.)
CELL ’m’ : #32 bits→ #8 cell Given a 32-bit address, CELL ’m’ returns the 8-bit cell in memoryreferred to by that address.
CELL ’r’ : #5 bits→ #32 cell Given a 5-bit register number, CELL ’r’ returns the correspond-ing 32-bit register (a mutable cell).
ADD : ∀#n.#n bits× #n bits→ #n bits For any n, ADD #n takes two n-bit values and returns their n-bitsum. Carry and overflow are ignored.
SX : ∀#n.∀#w.#n bits→ #w bits For any n and w, SX #n #w takes an n-bit value, interprets it as atwo’s-complement signed integer, and sign-extends it to producea w-bit representation of the same value. w must be greaterthan n.
CONST : ∀#n.〈constant〉 → #n bits For any n, CONST #n k represents the n-bit constant k. k mustbe representable in n bits. The same k could be used with dif-ferent ns.
Figure 2.2: Some RTL operators and their types
cations and a set of RTL operators. These choicesdefine an RTL language. Any specification writtenin λ-RTL can introduce new storage spaces (andtherefore new locations) and new RTL operators,determining an RTL language for use in that spec-ification.
For use during compilation, we restrict RTLs toa subset of a full RTL language. Each RTL used ina back end based on the vpo optimizer (Benitez andDavidson 1988) must satisfy the vpo invariant forthe target machine. An RTL satisfies that invari-ant if and only if it can be represented in a singleinstruction on the target machine. All RTLs manip-ulated by vpo satisfy this invariant, so vpo can stopand emit machine code at any time. We use thename X-RTLs refer to the set of RTLs satisfyingthe vpo invariant for machine X .
Interpretations of RTLs
One reason we restrict the form of RTLs is to limittheir possible meanings or interpretations. For ex-ample, access to the mutable state of a machineis available only through the fetch and store oper-ations built into the RTL form, so we can easilytell what state is changed by an RTL and how thatchange depends on the previous state. Researchersand tools working with our form of RTLs have free-dom to define interpretations of only two parts ofthe RTL form: aggregations and operators.
RTL aggregations specify byte order. More gen-erally, aggregations make it possible to write anRTL that stores a w-bit value in (or fetches a w-bit value from) k consecutive n-bit locations, pro-vided that w = kn. Such an aggregation has type#n cell → #w loc, and its interpretation must bea bijection between a single w-bit value and k n-
INTERPRETATIONS OF RTLS 9
bit values. Moreover, when w = n, the bijectionmust be the identity function. Storing uses the bi-jection, and fetching uses its inverse, making it pos-sible to combine RTLs using forward substitution.Little-endian and big-endian aggregations will bebuilt into λ-RTL, as will an “identity aggregation”that is defined only when w = n. We imagine thatusers could define other aggregations by giving sys-tems of equations, as in Ramsey (1996).
RTL operators must be interpreted as pure func-tions on bit vectors. Consequently, the result of ap-plying an RTL operator must not depend on proces-sor state; the operator must give the same answerevery time. Chapter 4 presents a collection of RTLoperators that we expect can be used to describemany different machines. We don’t expect this setto be complete; on the contrary, we expect thatany machine description written in λ-RTL will in-troduce a handful of new RTL operators which willbe unique to that machine.
Chapter 3
Using λ-RTL to specify registertransfer lists
Bare RTLs are both spartan and verbose. Ex-pressions do not include if-then-else, so condition-als must be represented by using guards on effects.There is no expression meaning “undefined;” as-signments of undefined values must be specified us-ing a kill effect. These restrictions, and the require-ment that operations be fully disambiguated, makeRTLs a form that is good for manipulation by toolsbut not so good for writing specifications.
λ-RTL is a metalanguage that enables specifica-tion writers to attach RTL trees to SLED-like con-structors without having to write everything explic-itly. λ-RTL is a higher-order, strongly typed, poly-morphic, pure functional language based largely onStandard ML (Milner, Tofte, and Harper 1990). Avariation on the Hindley-Milner type system makesit possible to write flexible, type-safe functionswithout having to write types explicitly.
Design considerations
We have drawn on our experience with SLED(Ramsey and Fernandez 1997) to identify mech-anisms and properties that are desirable in anyCSDL language, including λ-RTL. These include:
• Use of constructors to provide an abstract viewof the machine’s instruction set, and use of at-tributes to specify properties of instructions.These mechanisms are close kin to attribute
grammars and to the denotational approach tosemantics, both of which have long histories ofutility.
• Use of default, unnamed attributes (for exam-ple, binary representation or default construc-tor type), sometimes supported by special syn-tax. By providing a default case that does nothave to be named, we can unclutter specifi-cations. For example, when we refer to anoperand of type Address, the location desig-nated by that effective address should be thedefault meaning.
• Use of a specialized sublanguage for definingattributes. SLED uses patterns to specifybinary representations; λ-RTL uses the pre-scribed forms of RTLs, to be augmented by asophisticated type system.
• Language constructs that help eliminate repe-tition by permitting simultaneous descriptionof many instructions at once. Such constructsreduce the number of opportunities for er-rors, and they help keep specifications conciseand readable. λ-RTL includes two such con-structs: first class functions, and a generaliza-tion of SLED’s grouping construct, which itselfis based on Icon generators (Griswold 1982).
• Formal notation that matches informal descrip-tions found in architecture manuals. The ar-
10
RESTRICTIONS EASED IN λ-RTL 11
chitecture manual is the standard model of themachine that is shared by all tool builders, sothis close match not only helps people readspecifications, it helps them write correct spec-ifications. λ-RTL enables users to define theirown infix operators, and Chapters 4, ??, and 6show how we have used this capability to createan ISP-like notation for RTLs.
We expect to be able to analyze λ-RTL specifica-tions for internal consistency and “plausibility.” Forexample, it should be possible to identify cases inwhich a register number is mistakenly used as animmediate value instead of as an offset into the stor-age space modeling the register file.
Restrictions eased in λ-RTL
λ-RTL descriptions are easier to write than bareRTLs in three ways: grouping and higher-orderfunctions can help eliminate repetition, the typesystem largely eliminates the need to write sizesexplicitly, and λ-RTL relaxes several of the restric-tions on the form of RTLs. In particular,
• In λ-RTL, it is rarely necessary to write fetchesexplicitly.
• λ-RTL gives the illusion that bit slices (sub-fields) are locations that can be assigned to.
• λ-RTL gives the illusion that aggregates of cellsare locations that can be assigned to, and itis not usually necessary to write aggregationsexplicitly.
Implicit fetches
Most programmers are used to writing x := x + 1and having the x on the left denote a location whilethe x on the right denotes the value stored in thatlocation. Typical compilers identify “lvalue con-texts” and “rvalue” contexts and automatically in-sert fetches in rvalue contexts. We do the same inλ-RTL, but instead of using syntax to identify thecontexts, we use types.
We use types and not syntax because λ-RTL hasno special syntax for writing RTL assignments. In-stead of an assignment syntax, λ-RTL provides abuilt-in store function that accepts a location anda value and produces an effect. Any user-definedfunction might result in a call to the built-in store,so we have to recognize the right and left contextsby their types. Thus, if a location is used where avalue is expected, we insert a fetch.
λ-RTL does almost everything with functions,not syntax, so the writer of a specification can nor-mally redefine the meaning of a notation by defin-ing a new function with the same name. Thisstrategy doesn’t work with implicit fetches becausethere is no explicit notation associated with a fetch.We want users to be able to control the meaningsof these fetches, however, because many machineshave resources that are almost, but not quite, se-quences of mutable cells. For example, SPARC reg-isters can be viewed as a collection of 32 mutablecells, except that register 0 is not mutable and al-ways contains 0. We would like users to be able todefine special meanings for “fetch from register 0”and “store into register 0” so the rest of the specifi-cation can pretend that the registers are an ordinarystorage space. We do so by permitting users to at-tach fetch and store methods to each storage space.Methods not given explicitly default to the stan-dard fetch and store operators. Sample fetch andstore methods for the SPARC are shown on page ??.If we wanted, we could use fetch and store meth-ods to describe the true implementation of SPARCregisters, in which “registers” 8 through 31 denotelocations accessed indirectly through the register-window pointer (CWP).
Slices
Many machine instructions manipulate fragmentsof a word stored in a mutable cell. For example,some machines represent condition codes as indi-vidual bits within a program status word. Manymachines have instructions that, for example, as-sign to the least-significant 8 bits of a 32-bit register.To make it easy to specify such instructions, λ-RTLcreates the illusion that a sub-range or “slice” of a
12 CHAPTER 3. USING λ-RTL TO SPECIFY REGISTER TRANSFER LISTS
cell can be a location in its own right, one that isa suitable argument for a fetch or store operation.This illusion helps keep machine descriptions read-able; for example, an effect that sets the SPARCoverflow bit simply assigns to it, hiding the factthat it is buried in a program status word that hasto be fetched, modified, and stored.
λ-RTL uses a special syntax for slices, becausewe hope eventually to overload the notation to refereither to locations or to values. Until we get thisoverloading figured out, however, uses have to bedistinguished by a ty suffix that must be either locor bits. Examples of the syntax include
x@ty[k] Bit k of x. By default,bit 0 is the least signif-icant bit. A future ver-sion of λ-RTL may makeit possible to change thenumbering.
x@ty[k1..k2] Bits k1 through k2 of x,inclusive.
x@ty[k bits at e] A k-bit slice of x, with theleast significant bit at e.
k’s denote integer constants, e’s denote expressions,and x’s may denote values or locations. In all casesthe size of the slice is known statically, so its typecan be computed automatically. We use the Greekletter σ to stand for any of these slice specifications.
Given a slice specification σ, SLICE LOCσ mapslocations to locations and SLICE BITSσ maps valuesto values. The illusion that slices can be applied tolocations is implemented by rewriting, according tothe following rules:
FETCH(SLICE LOCσ l) 7→ SLICE BITSσ(FETCH l)
SLICE LOCσ l ← n 7→ l← bitInsertσ(FETCH l, n)
where l is a location and n is a value. After therewriting, all slices operate on bit vectors, and allfetches and stores operate on true locations. Invoca-tion of user-defined fetch and store methods takesplace after the rewriting of slices. This orderingmakes it possible to use fetch and store methods todefine cell-like abstractions, while ensuring that the
meaning of slicing is always consistent with respectto such abstractions.
Implicit aggregation
λ-RTL provides the special syntax $space[offset]for references to mutable cells. The offset can bean arbitrary expression, but the space must be a lit-eral name, so that λ-RTL can identify the storagespace and use appropriate fetch and store methods.To make this cell a location, λ-RTL applies an ag-gregation, which is also associated with the storagespace as a method. The default method is the iden-tity aggregation, which permits only “aggregates”of a single cell.
When little-endian, big-endian, or other aggre-gations are used, λ-RTL will infer the size of theaggregate. We have not yet determined the rulesto be used for the inference, but we hope at leastto support the automatic inference that four 8-bitbytes must be aggregated to form a value that goesinto a 32-bit register.
Overview of λ-RTL
From the point of view of external tools, the outputof a λ-RTL specification is a set of bindings of valuesto attributes of constructors. The part of λ-RTLused to specify values is a pure, typed functionallanguage without recursion. Many features of λ-RTL are inspired by Standard ML.
To describe syntax, we use an EBNF gram-mar with standard metasymbols for
{sequences
},
[optional constructs
], and
(alternative
∣∣ choices
).
We use large metasymbols to help distinguish themfrom literals. Terminal symbols given literally ap-pear in typewriter font. Other terminal symbolsand all nonterminals appear in italic font. Excerptsfrom the grammar begin with the name of a non-terminal followed by the ⇒ (“produces”) symbol.When referring to terminal and nonterminals, wesometimes precede the name of the symbol witha qualifier; for example argument-pattern refers tothe nonterminal symbol pattern used as an argu-
OVERVIEW OF λ-RTL 13
ment, and module-name refers to the terminal sym-bol name, which is the name of a module.
A λ-RTL specification is a sequence of modules.λ-RTL uses modules to organize the name space ofvalues. A value x declared in a module S is visiblewithin S as just x, but outside S it can only bereferred to as S.x. Modules are written
module ⇒
module module-name is{import
} {declaration
}end
Modules may include declarations of nested mod-ules as well as values.
At top level, modules provide a kind of separatecompilation. λ-RTL requires explicit import direc-tives for references between top-level modules.
import ⇒ import(module-name
∣∣ [
{module-name
}])
| from module-id import(name
∣∣ [
{name
}])
module-id ⇒ module-name
| module-id . module-name
For example, before module A can use values de-fined in module B, module A must include the di-rective import B. This directive makes B visi-ble within A, as well as qualified names beginningwith B. The from. . . import directive imports iden-tifiers directly into module A. Not only can theseidentifiers be used without qualification; they carrythe same fixity (infix nature, operator precedence,and associativity) that they had in module B. Forexample, if the module StdOperators defines infixidentifiers + and - to denote addition and subtrac-tion, then if module A includes the directive
from StdOperators import [+ -]
then these identifiers are also infix in module A,where it is legal to write, e.g., the function defi-nition fun inc x is x + 1.
Except as changed by an import directive, thescopes of names are limited to the modules in whichthey are defined. Values and modules share a sin-gle name space within λ-RTL. Constructors andstorage spaces occupy their own individual namespaces, which are flat. RTL operators occupy aseparate, flat name space, but within λ-RTL theyare not distinguished from other kinds of values.Eventually, the λ-RTL translator will check that
the same name is not used to declare distinct RTLoperators in two different modules.
Within a module, declarations are either top-levelor inner declarations. Inner declarations may ap-pear anywhere, but top-level declarations may ap-pear only outside of function and value definitions.Top-level declarations appear only in contexts thatguarantee they will be evaluated exactly once.
Top-level declarations
λ-RTL’s top-level declarations introduce nestedmodules, locations, storage spaces, RTL operators,or bindings to attributes of constructors.
Storage spaces
Storage spaces are introduced by storage; theyname the storage space, give its granularity (widthof an individual cell) and size (number of cells), andpossibly fetch, store, or aggregation methods.
declaration ⇒
storage character-literal{storage-alias
}
is{storage-property
}
storage-alias ⇒ alias name
storage-property ⇒
storage-method using expression
|[cell-count
]cells of cell-width bits
| called documentation-string
storage-method ⇒ fetch∣∣ store
∣∣ aggregate
RTL operators
RTL operators are introduced by:
declaration ⇒
rtlop(operator-name
∣∣ [
{operator-name
}]): type
Figure 3.1 shows λ-RTL types. For example,
rtlop bit : bool -> #1 bits
rtlop [= <>] : #n bits * #n bits -> bool
introduces three new RTL operators. The #n bits
in the definition of = and <> introduces parametricpolymorphism; the equality and inequality opera-tors can be applied to values of any size. This sizeis normally implicit in λ-RTL, but it is made ex-plicit in the translated RTL form.
14 CHAPTER 3. USING λ-RTL TO SPECIFY REGISTER TRANSFER LISTS
In the current implementation of λ-RTL, rtlopintroduces an arbitrary, abstract value. Because itis the only abstraction mechanism available, it issometimes put to odd uses.
Attributes
Attribute bindings are introduced as follows:
declaration ⇒
attribute of{constructor is expression
}
attribute ⇒
default attribute
| attribute(attribute-name
∣∣ default
)
constructor ⇒
opcode ([operand-name
{, operand-name
}])
[: type-name
]
Each expression must be terminated by a newline;if an expression doesn’t fit on one line, it must con-tain an open parenthesis, brace, or let, so that theλ-RTL compiler knows to continue to the closingdelimiter.
For example,
default attribute of nop() is RTL.SKIP
defines the effect of a no-op. The longer definition
attribute trap of
taddcctv (rs1, reg_or_imm, rd) is
(tag_overflows($r[rs1], reg_or_imm) orelse
add_overflows($r[rs1], reg_or_imm, 0)
--> trap(tag_overflow))
requires that the attribute be parenthesized, be-cause it occupies more than one line.
In order to compute the type of an attribute, theλ-RTL translator needs to know the types of eachconstructor’s operands. The operand directive isused to tell λ-RTL about the types of constructors’operands.
declaration ⇒
operand(operand-name
∣∣ [
{operand-name
}]): type
Note that constructors may be typed. Because theoperand directive asserts the type a constructor ap-plication has when it is used as an operand, that
same directive can be used to check the type of thatconstructor’s attributes.
The types of simple integer operands are deter-mined by the machine architecture, and in fact, theλ-RTL translator can extract these types automat-ically from a SLED specification.1 The types ofother operands are up to the author of the λ-RTLspecification. For example the Pentium specifica-tion in Chapter 6 represents an effective address asa triple, which contains locations of three differentsizes.
operand addr :
{ b : #8 loc, w : #16 loc, d : #32 loc }
Defining attributes of constructors differs fromdefining functions in significant ways.
• Constructors occupy their own name space.Their names don’t conflict with names of λ-RTL functions or values, and the constructorscannot be used as functions in later λ-RTL def-initions.
• The arguments to attributes are constructoroperands, not patterns. They must be referredto by name, and their names determine theirtypes, according to the operand declarationscurrently in effect.
The attribute values will be represented in C, sothey are required to have monomorphic types, i.e.,their types may not have free type or width vari-ables, and the types may not include functions.
The RTLs used in a λ-RTL description are notpure RTLs, but rather RTL templates. An RTLtemplate becomes a fragment of RTL when suitablefragments of RTL are substituted for the construc-tor’s operands. An RTL template is an RTL, exceptthat
• Names of constructor operands may be usedto stand for RTL expressions. (Eventually wehope to use names of operands to stand forlocations as well as values, but the current im-plementation does not support this usage.)
1Try, e.g., lrtl -operands sparc.sled.
OVERVIEW OF λ-RTL 15
type ⇒
type{* type
}Tuple
| { member-name : type{, member-name : type
}} Record
| type -> type Function
| type-constant(bits
∣∣ loc
∣∣ cell
)Unary constructors
| bool∣∣ rtl Nullary constructors
| type vector Vector
type-constant ⇒ #variable-name∣∣ #integer-literal
Figure 3.1: λ-RTL types
• Tuple or record selections from constructoroperands may be used to stand for RTL ex-pressions.
• Vectors of RTL expressions subscripted by con-structor operands may be used in place of RTLexpressions.
A tuple or record containing RTL templates is alsoconsidered an RTL template. The most notableconsequence of these rules is that no expressionwhose normal form contains a λ-abstraction is anRTL template; λ-abstractions must be applied toarguments.
Eventually there will be a precise, formal defini-tion of RTL template.
Inner declarations
Inner declarations may appear anywhere a top-leveldeclaration may appear, and they may also appearin let-expressions, where they are the only kindof declaration permitted. Their typical usage is tobind local values in function bodies. Inner declara-tions include value bindings, function bindings, andfixity declarations.
Value, location, and function bindings
Value, location, and function bindings are writtenas
declaration ⇒
val value-spec is expression
| locations value-spec is expression
| fun value-spec{argument-pattern
}+ is expression
value-spec ⇒ result-pattern∣∣ [
{name
}]
A location binding is just like a value binding, ex-cept the value is forced to be a location. The squarebrackets in value-spec represent λ-RTL’s groupingmechanism, in which the expression actually de-notes a sequence of values, and each value in thesequence is bound of the corresponding name onthe left.
As examples, consider
val do_nothing is RTL.SKIP
fun bool n is n <> 0
val [eq ne] is [(=) (<>)]
fun triple x y z is (x, y, z)
val triple is \x.\y.\z.(x, y, z)
As in ML, a function definition in λ-RTL may in-clude more than one argument-pattern, so the twodefinitions of triple are equivalent.
Infix operators
λ-RTL supports user-defined infix operators witharbitrarily many levels of precedence. Precedencecan be any real number, with arbitrary precision.Real numbers are written in decimal notation, withoptional minus sign. If the precedence value is aninteger, no decimal point is required. Note that ~0is different from 0.
16 CHAPTER 3. USING λ-RTL TO SPECIFY REGISTER TRANSFER LISTS
declaration ⇒(infixl
∣∣ infixr
∣∣ infixn
)precedence value-spec
| nonfix value-spec
Infix operators must be declared to be left-associa-tive, right-associative, or non-associative with otheroperators of the same precedence. For example, wecan make equality testing infix and non-associativeusing
infixn 5 [= <>]
The standard library (Chapter 4) contains this anda number of other infix declarations for the standardRTL operators.
A single instance of an infix operator can be made“nonfix” (treated as an ordinary value) by enclosingit in parentheses. The nonfix declaration makesthis behavior permanent.
Expressions
λ-RTL values include tuples, records, vectors, andfunctions (λ-abstractions), in addition to the cells,locations, and values of the underlying RTLs. λ-RTL provides expressions to create call of thesekinds of values, as well as conditional expressions,let bindings, selection, slicing, function applica-tion, and type casting. As in ML, function appli-cation is written simply by writing one expressionnext to another, e.g., f x or g (x, y). Functionapplication has higher precedence than any infix op-erator, but lower precedence than dot selection andslicing. Figure 3.2 shows the syntax, precedence,and associativity of λ-RTL expressions.
Most of λ-RTL’s expressions are standard andrequire no explanation, but type casts are unusual.Type casts are particularly important when dealingwith references to cells in storage spaces (bytes inmemory or registers in a register file). Like sometype casts in C, but unlike type casts in ML, a λ-RTL type cast may involve computation. For exam-ple, casting a location to a value forces the insertionof a fetch operator (the fetch method of the under-lying storage space). Casting a cell to a locationforces the insertion of an aggregation. Function,
record, and tuple values may also be cast, but usersof λ-RTL will rarely need to do so.2
Patterns
Patterns create bindings of identifiers to values.Syntactically, they are subsets of expressions, butevery occurence of an identifier in a pattern is abinding instance. They are used to bind names tothe arguments of functions or the values in val dec-larations.
pattern ⇒
( pattern{
, pattern}
)
| { member-name[is pattern
]
{, member-name
[is pattern
]}}
| identifier
| identifier as pattern
The as syntax may be used to assign names bothto a whole pattern and to its constituents, e.g, asin
val result as {sum, carry} is add(r1, r2, c)
Grouping
A group may appear in place of an expression any-where in λ-RTL. Whereas an expression denotes asingle value, a group denotes a sequence of values.All operations except binding distribute over thegroup, so if the expression in a declaration containsa group, then that expression denotes a sequence ofvalues, no matter how deeply nested the group is.The most common operations to distribute over agroup are function application and tuple formation.
The usual syntax for a group is a list of expres-sions in square brackets. The expressions need notbe of the same type. The space separating elementsof a group has precedence higher than function ap-plication (but lower than dot selection). There-fore, the group [f x] is a group containing twovalues, as is [f (x)]. To use function applicationor infix operators within a group, one must includethe whole application in parentheses, e.g., [(f x)]
2For the adventurous, the usual rules for covariant andcontravariant subtyping apply.
OVERVIEW OF λ-RTL 17
expression ⇒
( expression{, expression
}) Tuple
| { member-name is expression{, member-name is expression
}} Record
| [. expression{, expression
}.] Vector
| let{declaration
}in expression end Local declarations
| if expression then expression else expression fi Conditional
| \ argument-pattern . expression Function
| expression.member-name Selection
| $ space-identifier [ expression ] Location
| expression @bits[slice-specification] Slice of value
| expression @loc[slice-specification] Slice of location
| expression expression Function application
| expression : type Type conversion
| identifier Named value
| literal-constant Constant
Expression Precedence Fixity or Associativity
Dot selection Highest Postfix
Slice Highest Postfix
Grouping Next highest
Function application Above all infix Left associative
infixx operators As declared As declared
\-abstraction Lowest Right associative
Figure 3.2: λ-RTL expressions
or [(f(x))]. In addition to the usual syntax, λ-RTL provides two kinds of special syntax to creategroups of integers. All use square brackets:
expression ⇒ group
group ⇒
[{expression
}]
| [ expression .. expression ]
| [ expression to expression[by expression
∣∣ columns expression
]]
In the latter two cases, the constituent expressionsmust be compile-time constants.
Groups are evaluated left to right, last-in, first-out, so for example the expression
([a b], [1 2])
denotes the same sequence of four expressions as
[(a, 1) (a, 2) (b, 1) (b, 2)]
The evaluation rule and the design of groups in gen-eral are based on the generator mechanism of theIcon programming language (Griswold 1982).
The primary role of groups is to make it easyto specify multiple machine instructions in a sin-gle attribute binding. This is done by using value-spec (either a single name or a sequence of names insquare brackets) to form the opcodes of construc-tors. Multiple value-specs may be used to combine,for example, a group of operations with a group ofsuffixes. Chapters ?? and 6 have examples.
opcode ⇒ value-spec{^value-spec
}
18 CHAPTER 3. USING λ-RTL TO SPECIFY REGISTER TRANSFER LISTS
In some cases, the default left-to-right, first-in,last-out rule for enumerating groups may be in-convenient or confusing. λ-RTL therefore providessome syntactic sugar.
expression ⇒{foreach ident in group
}+ group expression end
Before enumerating groups, the λ-RTL transla-tor rewrites expressions of the form foreach x1
in e1 · · · foreach xn in en group e′ end intofunction applications of the form (\x1.· · ·\xn.e
′)
(e1)· · ·(en). Because of this rewriting, usinggroups within the body of the foreach is likely toproduce unexpected results.
Lexical conventions
Like identifiers in ML, λ-RTL identifiers may bealphanumeric or symbolic. An alphanumeric iden-tifier begins with a letter and continues with letters,digits, underscores, and primes. A symbolic identi-fier is a sequence of the symbols
!&+/\:<=>?@~|*‘%;-^#.
Such an identifier may begin with any symbol ex-cept #.
Identifiers consisting solely of two or more dashesintroduce comments; the comment includes thedashes and continues to the end of the line. Forexample, -- and --- introduce comments, but -->and --*-- do not. A comment is syntacticallyequivalent to a newline.
An integer literal is a sequence of decimal digits;λ-RTL has no floating-point literals. λ-RTL alsosupports hexadecimal literals beginning with 0x,octal literals beginning with 0, and binary literalsbeginning with 0b.
Literals (and identifiers) preceded by # representwidths in λ-RTL’s type system. RTL operators andfunctions that are polymorphic in width may bespecialized by applying them to a width literal ina width application. For example, four 8-bit bytescan be aggregrated into a 32-bit word by
RTL.AGGB #8 #32
and the sx sign-extension operator can be special-ized to extend a 13-bit value to 32 bits by
sx #13 #32
It is rarely necessary to supply these constants ex-plicitly; λ-RTL almost always infers them from con-text.
The initial basis
Most of the RTL-specific content of λ-RTL is in theinitial basis, i.e., the collection of predefined func-tions and values. Figure 3.3 shows λ-RTL’s initialbasis. Of course, access to values within the RTL
or Vector modules requires that the modules beimported.
Style
λ-RTL provides expressive power with few restric-tions. Writers of specifications use the functionsin λ-RTL’s initial basis to create RTLs that corre-spond to the form prescribed in Chapter 2. Writerscan also introduce new RTL operators. This free-dom gives us ample scope for experimenting withdifferent styles of description—as well as ample ropewith which to hang ourselves. Our initial exper-iments have led to a “standard library” of usefulRTL operators, as well as some useful λ-RTL func-tions. Chapter 4 presents this library.
λ-RTL provides no looping or recursion con-structs. Loops whose sizes are known in ad-vance can be simulated by using Vector.foldr andVector.spanning. Because this style will be famil-iar only to those well versed in functional program-ming, we expect eventually to provide some syntac-tic sugar for it.
Differences between λ-RTL and Stan-dard ML
Although λ-RTL is inspired in large part by Stan-dard ML, there are significant differences, whichmay interest ML programmers. Syntactic differ-ences include:
OVERVIEW OF λ-RTL 19
Boolean constants:true Truth
false Falsehood
RTL operators and other functions used to create RTLs:
RTL.STORE Takes location and value, produces effect.
RTL.FETCH Fetches value from location.
RTL.AGGB Big-endian aggregation.
RTL.AGGL Little-endian aggregation.
RTL.SKIP The empty RTL; an effect that does nothing.
RTL.GUARD Takes a Boolean expression and an effect and produces an effect.
RTL.PAR Takes two effects (RTLs) and composes them so they take place simultaneously(list append on list of effects).
RTL.SEQ Takes two effects (RTLs) and composes them by forward substitution (sequentialcomposition of effects).
Functions on vectors:
sub Vector subscript, a left-associative infix operator with precedence 5.
Vector.spanning A curried function of two arguments. Vector.spanning x y produces the vector[.x, x + 1, . . . , y.].
Vector.foldr A higher-order function used to visit every element of a vector. Like Vector.foldr
in Standard ML ’97.RTL.FETCH and RTL.STORE use the user’s fetch and store methods. To bypass fetch and store methods, andwhen defining fetch and store methods, use RTL.TRUE FETCH and RTL.TRUE STORE.
Figure 3.3: λ-RTL’s initial basis
• The defining connective is is, not =.
• if expressions must be terminated by fi.
• Identifiers declared infix must be explicitlyidentified as left-, right-, or non-associative.Arbitrarily many levels of precedence are avail-able.
• Dot notation is used to select elements fromrecords as well as from modules.
• Modules must be imported explicitly, and thefrom. . . import directive replaces ML’s open.
• Comments begin with an identifier consistingof n dashes, where n is at least 2. Commentsend at the end of a line.
There are also some semantic restrictions:
• There are no side effects (mutation, excep-tions), so order of evaluation does not matter.
• Functions may not be recursive, so the nor-mal form of all expressions can be computedat compile time.
• There are no datatype constructors. As acorollary, the only patterns used in functiondefinitions are those that match tuples orrecords.
Chapter 4
A Library of standard RTL operators
This chapter describes RTL operators and functions that we expect to be useful for describing many differentmachines. As we do throughout this document, we use the noweb literate-programming tool (Ramsey 1994)to present examples of λ-RTL. Noweb extracts the code from the same files used to produce this document,so we can run the examples through the λ-RTL compiler and make sure everything compiles.
A noweb file contains explanatory text interleaved with named “code chunks,” which contain source codeand references to other code chunks. The names of chunks appear italicized and in angle brackets, as in thefollowing C code:
20a 〈summarize the problem 20a〉≡ 20c ⊲
printf("Inputs are: ");
〈for every i that is an input, print i and a blank 20b〉printf("\n");
20b 〈for every i that is an input, print i and a blank 20b〉≡ (20a)
for (i = 1; i < argc; i++)
printf("%s ", argv[i]);
The ≡ sign indicates the definition of a chunk. Definitions of a chunk can be continued in a later chunk;noweb concatenates their contents. Such a concatenation is indicated by a +≡ sign in the definition:
20c 〈summarize the problem 20a〉+≡ ⊳ 20a
printf("Must process %d files\n", argc-1);
noweb adds navigational aids to the document. Each chunk name ends with the number of the page on whichthe chunk’s definition begins. When more than one definition appears on a page, they are distinguished byappending lower-case letters to the page number. When a chunk’s definition is continued, noweb includespointers to the previous and next definitions, written “⊳??” and “?? ⊲.” The notation “(?0—1)” shows wherea chunk is used.
The layout of the module is as follows:
20d 〈basic.rtl 20d〉≡module StdOperators is
import RTL
〈definitions of standard RTL operators and functions 21a〉end
20
4.1. BUILDING RTLS 21
Because λ-RTL does not yet have modules, we use noweb to include the chunk 〈definitions of standardRTL operators and functions 21a〉 in our descriptions of the SPARC and the Pentium.
4.1 Building RTLs
λ-RTL is intended for building RTLs, not analying them, so it doesn’t expose all the structure of RTLs;single effects, guarded effects, and full RTLs (lists of guarded effects) all have the same type effect. Theinitial basis provides ways to create and combine effects. We define a more compact, infix notation for partsof the basis.
21a 〈definitions of standard RTL operators and functions 21a〉≡ (20d) 21b ⊲
val do_nothing is RTL.SKIP : effect
val --> is RTL.GUARD : bool * effect -> effect
val := is RTL.STORE -- type is #n loc * #n bits -> effect
infixr 1 -->
infixn 2 :=
In the underlying representation, the semantics of these operations is extended to full RTLs in the obviousway. For example, RTL.STORE produces an RTL containing a single effect whose guard is true.
We don’t need an explicit FETCH, because fetches are inserted automatically, and we don’t need an explicitCONST, because λ-RTL compiles literal constants into RTL CONST nodes.
We use a vertical bar to combine simultaneous effects, and a semicolon for sequential composition. Oneday, the λ-RTL translator will translate sequential composition to simultaneous composition, using forwardsubstitution.
21b 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 21a 21c ⊲
val | is RTL.PAR : effect * effect -> effect
val ; is RTL.SEQ : effect * effect -> effect
infixl 0 |
infixl ~1 ;
4.2 Booleans
true and false are values in λ-RTL’s initial basis. We use not for Boolean negation, and conjoin anddisjoin for conjunction and disjunction. (The bitwise operations on bit vectors are com, and, and or.)
21c 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 21b 22c ⊲
rtlop not : bool -> bool
rtlop [conjoin disjoin] : bool * bool -> bool
For writing other Boolean functions λ-RTL has only if-then-else-fi, which λ-RTL rewrites into suitableguards. We would like to have connectives andalso and orelse, but introducing them presents a choiceabout level of detail: should they be introduced as new RTL operators, whose semantics must be definedoutside of λ-RTL, or should they be defined in terms of the existing λ-RTL primitives? We would like totry both alternatives, because they provide different levels of detail in the generated RTLs.
22 CHAPTER 4. A LIBRARY OF STANDARD RTL OPERATORS
Eventually, λ-RTL will have mechanisms to support multiple levels of abstraction, probably by using somesort of parameterized modules. For now, we resort to some literate-programming tricks that are equivalentto conditional compilation. noweb can extract this code in two versions, designated simple and full. Codechunks tagged [[simple]] and [[full]] appear only in the corresponding versions, and untagged code chunksappear in both versions.
In the simple version, the boolean connectives are left abstract. In the full version, we give them theirstandard definitions. The simple version has the advantage that the RTLs are smaller and easier to read,but the disadvantage that the semantics have to be specified elsewhere. In the full version, these connectivesare expanded to primitives, so they automatically get their proper semantics.
22a 〈definitions of standard RTL operators and functions [[simple]] 22a〉≡ 22e ⊲
val [andalso orelse] is [conjoin disjoin]
Conditional definition.
22b 〈definitions of standard RTL operators and functions [[full]] 22b〉≡ 22f ⊲
fun andalso (p, q) is if p then q else false fi
fun orelse (p, q) is if p then true else q fi
Conditional definition.
No matter what their definitions, we want these connectives to be left-associative infix operators.
22c 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 21c 22d ⊲
infixl 3 orelse
infixl 4 andalso
In the long run, equality and inequality will probably have to have some built-in semantics, but for now,we leave them abstract.
22d 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 22c 23a ⊲
rtlop [eq ne] : #n bits * #n bits -> bool
val [= <>] is [eq ne]
infixn 5 [= <>]
These operators are shown in TEX as = and 6=. We have chosen to make inequality a primitive RTL operator,although we could have defined it in terms of equality by writing val <> is \(x,y). not (x = y).
We can convert between booleans and bits. Again we have the choice of making things abstract or concrete.
22e 〈definitions of standard RTL operators and functions [[simple]] 22a〉+≡ ⊳ 22a
rtlop bit : bool -> #1 bits -- turn boolean into bit
rtlop bool : #1 bits -> bool -- turn bit into boolean
22f 〈definitions of standard RTL operators and functions [[full]] 22b〉+≡ ⊳ 22b
fun bit p is if p then 1 else 0 fi
fun bool n is n <> 0
4.3. INTEGER ARITHMETIC AND COMPARISONS 23
4.3 Integer arithmetic and comparisons
These operations represent integer arithmetic performed on the bit-vector representation. Here are addi-tion and subtraction. We have to define addition and carry separately, because λ-RTL won’t permit RTLoperators that return records or tuples.1
23a 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 22d 23b ⊲
local
rtlop [add sub] : #n bits * #n bits -> #n bits
in
val [+ -] is [add sub]
end
rtlop [carry borrow] : #n bits * #n bits * #1 bits -> #1 bits
rtlop neg : #n bits -> #n bits -- two’s-complement negation
Note that this definition hides the names add and sub, using + and - instead.To define combined versions add and subtract, we need sign extension and zero extension. These operators
are polymorphic in the sizes of the arguments and results.
23b 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 23a 23c ⊲
rtlop [sx zx] : #n bits -> #m bits -- must have m > n
Now we can define add and subtract to combine result and carry/borrow.
23c 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 23b 23d ⊲
infixl 6 [+ -]
fun add (n, m, c) is { result is n + m + zx c, carry is carry (n, m, c) }
fun subtract (n, m, b) is { result is n - m - zx b, borrow is borrow (n, m, b) }
The intended semantics of add and subtract is “add with carry” and “subtract with borrow.”Integer multiplication and division may be signed or unsigned. There are two sets of signed division
operators because division may truncate towards −∞ (div and mod) or towards 0 (quot and rem). Thereis, of course, no such distinction for unsigned division.
23d 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 23c 24a ⊲
rtlop [mul mulu] : #n bits * #n bits -> #m bits -- m = 2 * n
rtlop [div divu quot] : #m bits * #n bits -> #m bits -- m = n or m = 2n
rtlop [mod modu rem] : #m bits * #n bits -> #n bits -- m = n or m = 2n
The λ-RTL type system is unable to express the constraints on the types of multiplication and division. Wecould play some games with the definitions (e.g., return two words and merge them with bitInsert), butit’s not worth it at this time.
This code highlights the unpleasant question of naming RTL operators. As of July 1999, the names used inrtlop declarations are consistent with the names used in the VPOI code-generation interfaces. These namesare, however, re-bound to other names, which are used in the actual machine descriptions. Eventually, it ishoped that VPO, λ-RTL, C--, and the Princeton proof-carrying code project will agree on a set of namesfor operators.
1This restriction is not strictly necessary, but we believe it should simplify construction of automatic RTL-based tools.
24 CHAPTER 4. A LIBRARY OF STANDARD RTL OPERATORS
Two’s-complement signed integer comparison.
24a 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 23d 24b ⊲
local
rtlop [lt le gt ge] : #n bits * #n bits -> bool
in
val [< <= > >=] is [lt le gt ge]
end
Comparisons are left abstract instead of being defined in terms of < and =.Unsigned comparison.
24b 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 24a 24c ⊲
rtlop [ltu leu gtu geu] : #n bits * #n bits -> bool
The arithmetic operators obey the usual rules of precedence.
24c 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 24b 24d ⊲
infixl 7 [div mod quot rem divu modu]
infixl 6 [+ -]
infixn 5 [< <= > >=]
4.4 Logical operators
Most processors offer a variety of bitwise operations on bit vectors, including those shown here.
24d 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 24c 24e ⊲
rtlop [and or xor] : #n bits * #n bits -> #n bits
rtlop com : #n bits -> #n bits -- ‘complement’ because ‘not’ is boolean
rtlop bitInsert : {lsb : #k bits, wide : #w bits} * #n bits -> #w bits
rtlop bitExtract : {lsb : #k bits, wide : #w bits} -> #n bits
val bitInsert is \x.\y.bitInsert (x, y) -- curry
bitInsert inserts into a w-bit (“wide”) value. The thing inserted is n bits (“narrow”), and the widths ofboth wide and narrow values are normally inferred by the type system. The bitExtract is a generalizationof λ-RTL’s built-in slicing operator. The difference is the width of the narrow value returned by bitExtract
is implicit in its (specialized) type, and that width can be a width variable, not just a constant.There are instructions that manipulate bit fields whose widths aren’t known statically. These computations
can’t be expressed with bitInsert and slicing, because we can’t assign a type (width) to the results. Butwe can define bitTransfer as a combination extraction and insertion, and in fact, this is the way suchinstructions must behave. Even an instruction that extracts k bits from a 32-bit word, where k is not knownuntil run time, must still insert the result into some value of known size—typically a 32-bit word of all zeros.
24e 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 24d 25a ⊲
rtlop bitTransfer :
{ src : {value : #w bits, lsb : #k bits}
, dst : {value : #w bits, lsb : #k bits}
, width : #k bits
} -> #w bits
-- extract width bits from src at lsb, insert into dst at lsb
4.4. LOGICAL OPERATORS 25
src.lsb
↓
src.value s11s10s9s8 s7s6s5s4 s3s2s1s0
dst.lsb
↓
dst.value d11d10d9d8d7d6d5d4d3d2d1d0
result d11d10s5s4 s3d6d5d4d3d2d1d0
result = bitTransfer { src is { value is s, lsb is 3}
, dst is { value is d, lsb is 7}
, width is 3
}
Figure 4.1: Example of bitTransfer
bitTransfer {src, dst, width} extracts width bits from src.value, with the least significant bit beingsrc.lsb, and it inserts those bits into dst.value, with the least significant bit inserted at position dst.lsb.The width and lsb specifiers are assumed to have the same type. Figure 4.1 shows an example.
Most processors have shift instructions. We make the shift operations primitive RTL operators, eventhough they can be defined in terms of bit transfer.
25a 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 24e 25b ⊲
rtlop [shl shrl shra] : #n bits * #k bits -> #n bits
At least one processor, the IA64, has a “population count” operation that examines a value and returnsthe number of bits that are set. It’s not clear how polymorphic it should be; for the moment, I make thetype of the return value arbitrary, but it might be more convenient to make it the same as the type of theargument.
25b 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 25a 25c ⊲
rtlop popcnt : #n bits -> #k bits
Bit transfer can also be used to implement rotation, but it’s possible rotation should also be primitive.
25c 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 25b 26a ⊲
fun rotl (w, n, k) is
let val w’ is bitTransfer { src is {value is n, lsb is w-k}
, dst is {value is 0, lsb is 0}
, width is k
}
in bitTransfer { src is {value is n, lsb is 0}
, dst is {value is w’, lsb is w-k}
, width is w - k
}
end
26 CHAPTER 4. A LIBRARY OF STANDARD RTL OPERATORS
26a 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 25c 26b ⊲
fun rotr (w, n, k) is
let val w’ is bitTransfer { src is {value is n, lsb is 0}
, dst is {value is 0, lsb is w-k}
, width is k
}
in bitTransfer { src is {value is n, lsb is k}
, dst is {value is w’, lsb is 0}
, width is w - k
}
end
4.5 Undefined effects
As of July 1999, the λ-RTL translator does not support the kill effects. In a later implementation, we planto use the question mark to stand for an “undefined” value. All λ-RTL functions and RTL operators willbe strict in this value, and RTL.STORE(loc, ?) will become RTL.KILL loc. Because we would like to usethe question mark in our example descriptions, even though λ-RTL doesn’t yet have the machinery to giveit its eventual meaning, we introduce it as a constant. (It used to be a nullary RTL operator, but AndrewAppel argued it shouldn’t be an operator, so I make it a value.)
26b 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 26a 26c ⊲
local
-- rtlop undefined : #k bits
val undefined is 0
in
val ? is undefined
end
4.6 Floating-point arithmetic
Floating-point operations need to come in different families, so for example we could distinguish IEEE 754floating point from VAX floating point. It therefore seems reasonable to put these operators in their ownnested module.
26c 〈definitions of standard RTL operators and functions 21a〉+≡ (20d) ⊳ 26b
module IEEE754 is
〈IEEE 754 floating point 26d〉end
Many floating-point operations that we normally think of as binary can’t be treated as binary in λ-RTL,because their results depend on the rounding mode, which must be passed as an additional argument.
26d 〈IEEE 754 floating point 26d〉≡ (26c) 27a ⊲
rtlop [fneg fabs] : #k bits -> #k bits -- exact (no rounding)
rtlop [fsqrt] : #k bits * #2 bits -> #k bits -- rounded
rtlop [fadd fsub fmul fdiv] : #k bits * #k bits * #2 bits -> #k bits
rtlop [fmulx] : #k bits * #k bits -> #n bits -- where #n = 2 #k
4.6. FLOATING-POINT ARITHMETIC 27
All these operators are polymorphic in the size of their operands, so they can be used for single-, double-,and extended-precision floats. fmulx (for “exact”) produces an exact, double-size result, without rounding.
There are four rounding modes required by the standard. It’s unfortunate that we seem to have to pickinteger representations for these rounding modes, instead of leaving them more abstract, but λ-RTL doesnot currently permit users to introduce abstract types.
27a 〈IEEE 754 floating point 26d〉+≡ (26c) ⊳ 26d 27b ⊲
module Round is
local
rtlop [round_nearest round_zero round_down round_up] : #2 bits
-- what to round toward
in
val [nearest zero down up] is [round_nearest round_zero round_down round_up]
end
end
Conversions are polymorphic in two widths: the argument and result widths. The argument of type#2 bits represents the rounding mode.
27b 〈IEEE 754 floating point 26d〉+≡ (26c) ⊳ 27a 27c ⊲
rtlop [i2f f2i f2f] : #k bits * #2 bits -> #n bits
There are the special values ±0 and ±∞.
27c 〈IEEE 754 floating point 26d〉+≡ (26c) ⊳ 27b 27d ⊲
rtlop [pzero mzero] : #n bits
rtlop [pinf minf] : #n bits
There are NaNs.
27d 〈IEEE 754 floating point 26d〉+≡ (26c) ⊳ 27c 27e ⊲
rtlop NaN : #k bits -> #n bits -- function from significand to value
A floating-point comparison produces one of four results: less, equal, greater, or unordered. Because wedon’t want to fix particular 2-bit representations of these results, we create RTL operators to describe them.
27e 〈IEEE 754 floating point 26d〉+≡ (26c) ⊳ 27d
module Comparison is
rtlop [float_lt float_eq float_gt unordered] : #2 bits
val [< = >] is [float_lt float_eq float_gt]
end
rtlop fcmp : #k bits * #k bits -> #2 bits
Chapter 5
The meanings of the standard RTLoperators
This chapter is just a big table that lists all the RTL operators with their suggested standard names, types,and meanings.
Although this specification accounts for the rounding modes used in IEEE floating-point arithmetic, itdoesn’t account for other state, such as the state controlling whether denormalized numbers should be used.
Operator (with type) Meaning
IEEE 754 Floating-point operationsfcmp : ∀#n.#n bits × #n bits → #2 bits Standard comparison of IEEE 754
floating-point values. Produces one ofthe four 2-bit results specified below. Forconvenience, we also provide (below)eight of the sixteen possible Booleanfunctions of the results of afloating-point comparison.
float lt : #2 bits Result of floating-point comparisoncmpf (x, y) was “less than”
float eq : #2 bits Result of floating-point comparisoncmpf (x, y) was “equal”
float gt : #2 bits Result of floating-point comparisoncmpf (x, y) was “greater than”
unordered : #2 bits Result of floating-point comparisoncmpf (x, y) was “unordered”
feq : ∀#n.#n bits × #n bits → bool Floating equals.
fge : ∀#n.#n bits × #n bits → bool Floating greater than or equal.
fgt : ∀#n.#n bits × #n bits → bool Floating greater than.
fle : ∀#n.#n bits × #n bits → bool Floating less than or equal.
28
29
Operator (with type) Meaning
flt : ∀#n.#n bits × #n bits → bool Floating less than.
fne : ∀#n.#n bits × #n bits → bool Floating not equals.
fordered : ∀#n.#n bits × #n bits → bool Floating ordered. Comparesfloating-point numbers to see if they areordered by any of {<, =, >}.
funordered : ∀#n.#n bits × #n bits → bool Floating unordered. Comparesfloating-point numbers to see if they arenot ordered by any of {<, =, >}.
NaN : ∀#n.∀#m.#n bits → #m bits NaN n is an IEEE 754 Nan withsignificand n.
round down : #2 bits IEEE 754 rounding mode for roundingdown.
round up : #2 bits IEEE 754 rounding mode for roundingup.
round nearest : #2 bits IEEE 754 rounding mode for rounding tonearest.
round zero : #2 bits IEEE 754 rounding mode for roundingtoward zero.
f2f : ∀#n.∀#m.#n bits × #2 bits → #m bits Converts an IEEE 754 floating-pointvalue of n bits to a floating-point valueof m bits, using the rounding mode givenas the second argument.
f2i : ∀#n.∀#m.#n bits × #2 bits → #m bits Converts an IEEE 754 floating-pointvalue of n bits to a two’s-complementinteger value of m bits, using therounding mode given as the secondargument.
i2f : ∀#n.∀#m.#n bits × #2 bits → #m bits Converts a two’s-complement integer ofn bits into an IEEE 754 floating-pointvalue of m bits, using the rounding modegiven as the second argument.
fadd : ∀#n.#n bits × #n bits × #2 bits → #n bits Floating-point add with explicitrounding mode.
fsub : ∀#n.#n bits × #n bits × #2 bits → #n bits Floating-point subtract with explicitrounding mode.
fdiv : ∀#n.#n bits × #n bits × #2 bits → #n bits Floating-point divide with explicitrounding mode.
fmul : ∀#n.#n bits × #n bits × #2 bits → #n bits Floating-point multiply with explicitrounding mode.
fmulx : ∀#n.∀#m.#n bits × #n bits → #m bits Exact floating-point multiply, with norounding, where m = 2n.
30 CHAPTER 5. THE MEANINGS OF THE STANDARD RTL OPERATORS
Operator (with type) Meaning
fabs : ∀#n.#n bits → #n bits Absolute value of a floating-point value.
fneg : ∀#n.#n bits → #n bits Negation of a floating-point value.
fsqrt : ∀#n.#n bits × #2 bits → #n bits Square root of a floating-point value,with explicit rounding mode.
minf : ∀#n.#n bits IEEE 754 representation of minusinfinity.
mzero : ∀#n.#n bits IEEE 754 representation of minus zero.
pinf : ∀#n.#n bits IEEE 754 representation of plus infinity.
pzero : ∀#n.#n bits IEEE 754 representation of plus zero.
Other operations, alphabetical by nameadd : ∀#n.#n bits × #n bits → #n bits Two’s-complement integer addition.
add overflows : ∀#n.#n bits × #n bits → bool Tells whether addition overflows. Ifx and y are considered as signed,two’s-complement integers,add overflows(x, y) iff x and y have thesame sign and add(x, y) has the oppositesign. Overflow of unsigned addition maybe checked using carry.
and : ∀#n.#n bits × #n bits → #n bits Bitwise and. (Boolean conjunction isconjoin.)
bit : bool → #1 bits Convert a Boolean to a 1-bit value. Maynot be primitive in all systems, e.g.,bit x = if x then 1 else 0.
bitExtract : ∀#w.∀#n.{lsb : #w bits, wide : #w bits} → #n bits
bitExtract #w #n {lsb, wide}
returns the n-bit (“narrow”) valueextracted from the w-bit value wide,with least significant bit at bit lsb ofwide. (lsb is taken to be a w-bit value,even though only ⌈log2 w⌉ bits areneeded.) It is unusual in that the widthof the value abstracted, #n, is part ofthe type to which bitExtract isspecialized, not a run-time argument.
bitInsert : ∀#w.∀#n.{lsb : #w bits, wide : #w bits} × #n bits → #w bits
bitInsert inserts into a w-bit (“wide”)value. The thing inserted is n bits(“narrow”), and the result is a new w-bitvalue.
bitTransfer : ∀#w.{dst:{lsb:#w bits,value:#w bits},src:{lsb:#w bits,value:#w bits},width:#w bits}→#w bits
31
Operator (with type) Meaning
There are instructions that manipulatebit fields whose widths aren’t knownstatically. These computations can’t beexpressed with bitInsert and slicing,because we can’t assign a type (width)to the results. But we can definebitTransfer as a combinationextraction and insertion, and in fact, thisis the way such instructions must behave.Even an instruction that extracts k bitsfrom a 32-bit word, where k is notknown until run time, must still insertthe result into some value of knownsize—typically a 32-bit word of all zeros.
bool : #1 bits → bool Convert a 1-bit value to Boolean. Maynot be primitive in all systems, e.g.,bool x ≡ (x 6= 0).
borrow : ∀#n.#n bits × #n bits × #1 bits → #1 bits The bit borrow(x, y, bin) is the “borrowout” bit needed when computing thesubtraction x− y − bin, where x and y
are two’s-complement n-bit integers, andbin is the “borrow in” bit.
carry : ∀#n.#n bits × #n bits × #1 bits → #1 bits The bit carry(x, y, cin) is the “carryout” bit produced by adding x + y + cin,where x and y are two’s-complementn-bit integers, and cin is the “carry in”bit.
com : ∀#n.#n bits → #n bits Bitwise complement of a vector of n bits.(The operator not is used forcomplementing Booleans.)
conjoin : bool × bool → bool Boolean conjunction. May not beprimitive in all systems, e.g.,x conjoin y ≡ if x then y else false.
disjoin : bool × bool → bool Boolean disjunction. May not beprimitive in all systems, e.g.,x disjoin y ≡ if x then true else y.
div : ∀#n.#n bits × #n bits → #n bits Divides an n-bit, two’s-complement,signed integer by an n-bit,two’s-complement, signed integer,rounding toward minus infinity, andproducing an n-bit result.
32 CHAPTER 5. THE MEANINGS OF THE STANDARD RTL OPERATORS
Operator (with type) Meaning
divu : ∀#n.#n bits × #n bits → #n bits Divides an n-bit, unsigned integer by ann-bit, unsigned integer, rounding down(toward both zero and minus infinity),and producing an n-bit result.
eq : ∀#n.#n bits × #n bits → bool Equals. Compares bit vectors forequality.
ge : ∀#n.#n bits × #n bits → bool Greater than or equal. Comparestwo’s-complement signed integers.
geu : ∀#n.#n bits × #n bits → bool Greater than or equal, unsigned.Compares unsigned integers.
gt : ∀#n.#n bits × #n bits → bool Greater than. Comparestwo’s-complement signed integers.
gtu : ∀#n.#n bits × #n bits → bool Greater than, unsigned. Comparesunsigned integers.
le : ∀#n.#n bits × #n bits → bool Less than or equal. Comparestwo’s-complement signed integers.
leu : ∀#n.#n bits × #n bits → bool Less than or equal, unsigned. Comparesunsigned integers.
lt : ∀#n.#n bits × #n bits → bool Less than. Compares two’s-complementsigned integers.
ltu : ∀#n.#n bits × #n bits → bool Less than, unsigned. Compares unsignedintegers.
mod : ∀#n.#n bits × #n bits → #n bits Signed modulus, satisfyingx mod y = x− y mul trunc (x div y) forall y 6= 0.
modu : ∀#n.#n bits × #n bits → #n bits Unsigned modulus, satisfyingx mod y = x− y mul trunc (x divu y) forall y 6= 0.
mul : ∀#n.∀#m.#n bits × #n bits → #m bits Multiply two n-bit, two’s-complement,signed integers to produce an m-bit,two’s complement, signed result, wherem = 2n.
mul overflows : ∀#n.#n bits × #n bits → bool Tells whether signed multiplicationoverflows n bits. If x and y areconsidered as signed, two’s-complementintegers, mul overflows(x, y) iff any ofthe most significant n bits of mul(x, y)differs from bit n− 1 of mul(x, y).
mulu : ∀#n.∀#m.#n bits × #n bits → #m bits Multiply two n-bit, unsigned integers toproduce an m-bit, unsigned result, wherem = 2n.
33
Operator (with type) Meaning
mulu overflows : ∀#n.#n bits × #n bits → bool Tells whether unsigned multiplicationoverflows n bits. If x and y areconsidered as unsigned integers,mul overflows(x, y) iff any of the mostsignificant n bits of mul(x, y) is nonzero.
mul trunc : ∀#n.∀#m.#n bits × #n bits → #n bits Multiply two n-bit integers to produce a2n-bit result and return only the leastsignificant n bits. Could be written usingmul and bitExtract. We don’t needseparate signed and unsigned versionsbecause they are identical in the leastsignificant n bits. For a proof, writex = x0..n−1 + s · xn−1 · 2
n−1 and similarlyfor y, where s is the sign bit and is ±1.Then show that (x · y) mod 2n isindependent of s. The only tricky bit isto realize that for any k,−s · k · 2n−1 = s · k · 2n−1 (mod 2n).
ne : ∀#n.#n bits × #n bits → bool Comparison of bit vectors (inequality).
neg : ∀#n.#n bits → #n bits Two’s-complement negation.
not : bool → bool Logical complement
or : ∀#n.#n bits × #n bits → #n bits Bitwise or. (Boolean disjunction isdisjoin.)
popcnt : ∀#n.#n bits → #n bits Computes the number of 1 bits in itsargument.
quot : ∀#n.#n bits × #n bits → #n bits Divides an n-bit, two’s-complement,signed integer by an n-bit,two’s-complement, signed integer,rounding toward zero, and producing ann-bit result.
rem : ∀#n.#n bits × #n bits → #n bits Signed remainder, satisfyingx rem y = x− y mul trunc (x quot y) forall y 6= 0.
34 CHAPTER 5. THE MEANINGS OF THE STANDARD RTL OPERATORS
Operator (with type) Meaning
rotl : ∀#n.#n bits × #n bits → #n bits Rotate left. rotl(x, d) shifts value x
d bits to the left, shifting in the mostsignificant d bits of x at the bottom.Or, more precisely, the least significantd bits of the result are the mostsignificant d bits of x, and the remainingbits are the least significant n− d bitsof x. (When d >= n, read d mod n for d
and (n− d) mod n for n− d.) Can beexpressed using bitTransfer, but theequation is not for the faint of heart—then in #n must be part of the equation.
rotr : ∀#n.#n bits × #n bits → #n bits Rotate right. rotr(x, d) shifts value x
d bits to the right, shifting in the leastsignificant d bits of x at the top. Or,more precisely, the most significantd bits of the result are the leastsignificant d bits of x, and the remainingbits are the most significant n− d bitsof x. (When d >= n, read d mod n for d
and (n− d) mod n for n− d.) Can alsobe expressed using bitTransfer.
shl : ∀#n.#n bits × #n bits → #n bits Shift left. shrl(x, d) shifts value x d bitsto the left, shifting in zeroes at thebottom Or, more precisely, the leastsignificant d bits of the result are zero,and the remaining bits are taken fromthe least significant n− d bits of x. (Ifd ≥ n, the result is zero.) Or,alternatively,shl(x, d) = (x mulu 2d) modu 2n. Can alsobe expressed using bitTransfer.
shra : ∀#n.#n bits × #n bits → #n bits Shift right, arithmetic. shrl(x, d) shiftsvalue x d bits to the right, shifting incopies of x’s sign bit at the top. Or,more precisely, the most significantd bits of the result are copies of x’s mostsignificant bit, and the remaining bitsare taken from the most significantn− d bits of x. (If d ≥ n, the result is allzeroes or all ones.) Or, alternatively,shra(x, d) = x divs 2d. Can also beexpressed using bitTransfer.
5.1. KNOWN ALGEBRAIC LAWS 35
Operator (with type) Meaning
shrl : ∀#n.#n bits × #n bits → #n bits Shift right, logical. shrl(x, d) shiftsvalue x d bits to the right, shifting inzeros at the top. Or, more precisely, themost significant d bits of the result arezeroes, and the remaining bits are takenfrom the most significant n− d bits of x.(If d ≥ n, the result is zero.) Or,alternatively, shrl(x, d) = x divu 2d.Can also be expressed usingbitTransfer.
sub : ∀#n.#n bits × #n bits → #n bits Two’s-complement integer subtraction.
sub overflows : ∀#n.#n bits × #n bits → bool Tells whether substraction overflows. Ifx and y are considered as signed,two’s-complement integers,sub overflows(x, y) iff x and y haveopposite signs and sub(x, y) has thesame sign as y. Overflow of unsignedsubtraction may be checked usingborrow.
sx : ∀#n.∀#m.#n bits → #m bits Sign-extend an n-bit, two’s-complement,signed integer to produce an m-bit,two’s-complement, signed integer, bothargument and result denoting the sameinteger value when interpreted astwo’s-complement integers. Requiresm > n.
xor : ∀#n.#n bits × #n bits → #n bits Bitwise exclusive or.
zx : ∀#n.∀#m.#n bits → #m bits Zero-extend an n-bit value to produce anm-bit value, both argument and resultdenoting the same integer value wheninterpreted as unsigned integers.Requires m > n.
5.1 Known algebraic laws
Here’s a smattering of algebraic laws. The derivation of a more complete list would be worthwhile.
35 〈algebraic laws 35〉≡AC [add and or xor]
x add 0 = x
x and 0 = 0
x and (com 0) = x
36 CHAPTER 5. THE MEANINGS OF THE STANDARD RTL OPERATORS
com (com x) = x
x eq y = not (x ne y)
x ge y = not (x lt y)
x geu y = not (x ltu y)
... and so on ...
x ge y = y le x
x gt y = y lt x
x geu y = y leu y
x gtu y = y ltu x
... and so on ...
0 le x conjoin x lt y = x ltu y
not (not p) = p
neg x = add (com x, 1)
x rotl k = x rotr (neg k)
x rotr k = x rotl (neg k)
rotl #n #k (x, y) = rotl #n #k (x, y modu n)
Conditional definition.
5.2. MORE SPARC DESCRIPTION 37
5.2 More SPARC description
The preceding sections illuminate the issues involved in describing the SPARC. Here, we describe the restof the instruction set, with minimal commentary.
More locations
Naming the registers
37a 〈SPARC basics 37a〉≡ 37b ⊲
locations
[ g0 g1 g2 g3 g4 g5 g6 g7
o0 o1 o2 o3 o4 o5 o6 o7
l0 l1 l2 l3 l4 l5 l6 l7
i0 i1 i2 i3 i4 i5 i6 i7 ] is $r[[0..31]]
[sp fp] is [o6 i6]
Floating-point registers
37b 〈SPARC basics 37a〉+≡ ⊳ 37a 38a ⊲
storage
’f’ is 32 cells of 32 bits called "floating-point registers"
aggregate using RTL.AGGB
’F’ is 1 cell of 32 bits called "floating-point state register (FSR)"
locations
FSR is $F[0]
RD is FSR@loc[30..31] --- rounding modes
fcc is FSR@loc[10..11]
We have chosen to use the floating-point rounding operator as rndRD, that is, we make it a function ofthe value of the rounding-mode bits. Strictly speaking, the proper machine-independent thing to do is touse two functions, by making rnd depend on the IEEE rounding modes, then use roundingMode to computethe rounding mode from the RD bits as follows:
37c 〈SPARC basics [[full]] 37c〉≡ 37d ⊲
locations
RD is FSR@loc[30..31]
val roundingMode is [. IEEE754.Round.nearest, IEEE754.Round.zero,
IEEE754.Round.up, IEEE754.Round.down .] sub RD
Conditional definition.
Similar remarks apply to floating-point compare.
37d 〈SPARC basics [[full]] 37c〉+≡ ⊳ 37c
val floatingCompare is [. IEEE754.Comparison.=, IEEE754.Comparison.<,
IEEE754.Comparison.>, IEEE754.Comparison.unordered .] sub fcc
For compilation, we can get away with the scheme we’re using by arranging for the RTL operators denotingthe rounding modes and comparison results to have the right bit patterns for the machine we’re interestedin.
38 CHAPTER 5. THE MEANINGS OF THE STANDARD RTL OPERATORS
Coprocessor registers The number of coprocessor registers (if any) is implementation-dependent.
38a 〈SPARC basics 37a〉+≡ ⊳ 37b 43d ⊲
storage
’c’ is cells of 32 bits called "coprocessor registers (implementation-dependent)"
aggregate using RTL.AGGB
’C’ is 1 cell of 32 bits called "coprocessor status register"
locations
CSR is $C[0]
More Loads and Stores
Load Floating-point Instructions
38b 〈instruction defaults 38b〉≡ 38c ⊲
ld^[f df] (address, fd) is $f[fd] := [word dword] $m[address]
Conditional definition.
The full semantics of the ldfsr instruction is quite dramatic. Luckily it’s irrelevant for many purposes.
38c 〈instruction defaults 38b〉+≡ ⊳ 38b 38e ⊲
ldfsr (address) is FSR := $m[address]
Here’s an imaginary picture of the full semantics. RTL operators of these types are flagrantly illegal.
38d 〈wishful thinking about ldfsr 38d〉≡rtlop drain_fpu_pipeline : effect
-- ‘wait for all FPop instructions that have not finished execution
-- to complete’ (p92)
rtlop delay : #n bits * effect -> effect -- delayed completion of an effect
default attribute of
ldfsr (address) is
drain_fpu_pipeline; FSR := $m[address] | delay(3, branch_fcc := fcc)
Load coprocessor Note that the trap semantics is the same as for other loads, and they could be merged.
38e 〈instruction defaults 38b〉+≡ ⊳ 38c 38g ⊲
ld^[c dc] (address, cd) is $c[cd] := [word dword] $m[address]
ldcsr (address) is CSR := $m[address]
38f 〈trap semantics 38f〉≡ 39h ⊲
ld^[c dc] (address, cd) is alignTrap(address, [4 8])
ldcsr (address) is alignTrap(address, 4)
Store floating-point
38g 〈instruction defaults 38b〉+≡ ⊳ 38e 38h ⊲
st^[f df] (fd, address) is $m[address] := [word dword] $f[fd]
stfsr (address) is $m[address] := FSR
Store coprocessor
38h 〈instruction defaults 38b〉+≡ ⊳ 38g 39a ⊲
st^[c dc] (cd, address) is $m[address] := [word dword] $c[cd]
stcsr (address) is $m[address] := CSR
5.2. MORE SPARC DESCRIPTION 39
Load and store floating-point pair In calling sequences, it can be necessary to do loads and stores ofunaligned doubles. This can be done with a pair of load instructions that refer to the two registers of thepair needed to hold the double. Unfortunately there’s a problem: if the load instructions refer to temporaryregisters, there’s no way to guarantee that the adjacent pair of temporaries map to an adjacent pair of realregisters. To solve this problem, I define the synthetic instructions ldnext and stnext, which take a registernumber but load and store the adjacent register.
To refer to the floating-point register pair (fd, fd + 1), I simply cast $f[fd] into a 64-bit location.
39a 〈instruction defaults 38b〉+≡ ⊳ 38h 39b ⊲
ldnext(fd, address) is ($f[fd] : #64 loc)@loc[0..31] := $m[address]
stnext(fd, address) is $m[address] := ($f[fd] : #64 loc)@bits[0..31]
I must temporarily hack ldnext because I can’t generate C code that assigns to a slice. The hack must bevoided because we can’t deal with temporaries here.
39b 〈instruction defaults 38b〉+≡ ⊳ 39a 39c ⊲
-- ldnext(fd, address) is $f[fd+1] := $m[address]
More integer ALU instructions
SETHI SETHI sets the most significant 22 bits, a computation we represent by taking the val argumentand forcing the least significant 10 bits to 0.
39c 〈instruction defaults 38b〉+≡ ⊳ 39b 39d ⊲
sethi(n, rd) is $r[rd] := bitInsert {wide is n, lsb is 0} (0 : #10 bits)
Shift instructions Note there is no change to condition codes.
39d 〈instruction defaults 38b〉+≡ ⊳ 39c 39g ⊲
[sll srl sra] (rs1, reg_or_imm, rd) is
$r[rd] := [shl shrl shra]($r[rs1], reg_or_imm@bits[5 bits at 0])
Tagged add “A tag overflow occurs if bit 1 or bit 0 of either operand is nonzero, or if the additiongenerates an arithmetic overflow.” Again, this information could be left abstract.
39e 〈SPARC utilities [[simple]] 39e〉≡ 40a ⊲
rtlop tag_overflows : #32 bits * #32 bits -> bool
Conditional definition.
39f 〈SPARC utilities [[full]] 39f〉≡ 40b ⊲
fun tag_overflows(left, right) is
left@bits[0..1] <> 0 orelse right@bits[0..1] <> 0
orelse add_overflows(left, right, 0)
Conditional definition.
39g 〈instruction defaults 38b〉+≡ ⊳ 39d 40d ⊲
[taddcc taddcctv] (rs1, reg_or_imm, rd) is
add_instruction(rs1, reg_or_imm, 0, rd, {set_codes is true})
39h 〈trap semantics 38f〉+≡ ⊳ 38f 40f ⊲
taddcctv (rs1, reg_or_imm, rd) is
(tag_overflows($r[rs1], reg_or_imm) orelse add_overflows($r[rs1], reg_or_imm, 0)
--> trap(tag_overflow))
40 CHAPTER 5. THE MEANINGS OF THE STANDARD RTL OPERATORS
Subtract Similar stuff. The near-duplication with the machinery for addition is unfortunate.
40a 〈SPARC utilities [[simple]] 39e〉+≡ ⊳ 39e
rtlop sub_overflows : #32 bits * #32 bits * #1 bits -> bool
40b 〈SPARC utilities [[full]] 39f〉+≡ ⊳ 39f
fun sub_overflows (x, y, b) is
let val {result, borrow} is subtract(x, y, b)
in x@bits[31] <> y@bits[31] andalso x@bits[31] <> result@bits[31]
end
40c 〈SPARC utilities 40c〉≡ 41b ⊲
fun subtract_instruction (rs1, operand2, borrow, rd, {set_codes}) is
let val {result, borrow is borrow’} is subtract($r[rs1], operand2, borrow)
in $r[rd] := result
| set_codes -->
set_cc(result, bit(sub_overflows($r[rs1], operand2, borrow)), borrow’)
end
Conditional definition.
The explicit fetch is here because of a flaw in the type-inference algorithm currently used by λ-RTL. Thisalgorithm is slated for replacement.
40d 〈instruction defaults 38b〉+≡ ⊳ 39g 40e ⊲
[sub subcc subx subxcc] (rs1, reg_or_imm, rd) is
subtract_instruction(rs1, reg_or_imm, [0 carryBit], rd, {set_codes is [false true]})
Tagged subtract
40e 〈instruction defaults 38b〉+≡ ⊳ 40d 41c ⊲
[tsubcc tsubcctv] (rs1, reg_or_imm, rd) is
subtract_instruction(rs1, reg_or_imm, 0, rd, {set_codes is true})
40f 〈trap semantics 38f〉+≡ ⊳ 39h 43a ⊲
tsubcctv (rs1, reg_or_imm, rd) is
(tag_overflows($r[rs1], reg_or_imm) orelse sub_overflows($r[rs1], reg_or_imm, 0)
--> trap(tag_overflow))
5.2. MORE SPARC DESCRIPTION 41
Multiply step Here’s a synopsis of the semantics in the manual.
1. Establish the multiplier.
2. Compute a value by shifting N xor V into $r[rs1].
3. Add either the value or 0 to the multiplier.
4. Put the sum in $r[rd].
5. Update the condition codes according to the addition.
6. Update the Y register by shifting in the least significant bit of the original $r[rs1].
We use an auxiliary shiftIn function to shift a bit into a 32-bit register.
41a 〈instruction defaults [[full]] 41a〉≡ 42a ⊲
mulscc(rs1, reg_or_imm, rd) is
let fun shiftIn (n, bit) is bitInsert {wide is shrl(n, 1), lsb is 31} bit
val multiplier is reg_or_imm -- step 1
val v2 is shiftIn($r[rs1], xor(icc.N, icc.V)) -- step 2
val add_args is
(multiplier, if Y@bits[0] = 0 then 0 else v2 fi, 0) -- step 3
val {result, carry} is add add_args
in $r[rd] := result | -- step 4
set_cc(result, bit(add_overflows add_args), carry) | -- step 5
Y := shiftIn(Y, $r[rs1]@bits[0]) -- step 6
end
Conditional definition.
Divide instructions We haven’t attempted to specify the overflow semantics exactly—see page 116 ofthe V8 manual. Instead, we introduce machine-dependent RTL operators to stand for this semantics.
41b 〈SPARC utilities 40c〉+≡ ⊳ 40c
rtlop [sparc_udiv_overflow sparc_sdiv_overflow] : #64 bits * #32 bits -> #1 bits
fun divide({signed}, rs1, r_o_i, rd, {set_codes}) is
let val (operator, overflow) is if signed then ((quot), sparc_sdiv_overflow)
else ((divu), sparc_udiv_overflow)
fi
val result is (operator(Reg64.get rs1, zx r_o_i))@bits[32 bits at 0] -- ??
val V is overflow(Reg64.get rs1, r_o_i)
in $r[rd] := result | set_codes --> set_cc (result, V, 0)
end
41c 〈instruction defaults 38b〉+≡ ⊳ 40e 42c ⊲
[u s]^div^["" cc] (rs1, reg_or_imm, rd) is
divide({signed is [false true]}, rs1, reg_or_imm, rd, {set_codes is [false true]})
Extra parentheses are needed because the division operators are normally infix.
42 CHAPTER 5. THE MEANINGS OF THE STANDARD RTL OPERATORS
More control flow
Branch on floating condition codes To define the floating-point branches, we emulate the specificationin the manual. One day we’ll define extra attributes to cope with the fact that a delay is required betweensetting condition codes and testing them with this sort of instruction.
42a 〈instruction defaults [[full]] 41a〉+≡ ⊳ 41a
fb^[a n u g ug l ul lg ne e ue ge uge le ule o] (target, annul) is
let val [L E G U] is
fcc = [(IEEE754.Comparison.<) (IEEE754.Comparison.=)
(IEEE754.Comparison.>) (IEEE754.Comparison.unordered)]
val or is (orelse)
infixl 2 or
in [true false U G
(G or U) L (L or U) (L or G)
(L or G or U) E (E or U) (E or G)
(E or G or U) (E or L) (E or L or U) (E or L or G)
] --> nPC := target
end
If we don’t want to see the details of the tests, we can make them abstract RTL operators.
42b 〈instruction defaults [[simple]] 42b〉≡fb^[a n u g ug l ul lg ne e ue ge uge le ule o] (target, annul) is
let rtlop [fba fbn fbu fbg fbug fbl fbul fblg
fbne fbe fbue fbge fbuge fble fbule fbo] : #2 bits -> bool
in [fba fbn fbu fbg fbug fbl fbul fblg fbne fbe fbue fbge fbuge fble fbule fbo] fcc
--> nPC := target
end
Conditional definition.
Branch on coprocessor condition codes Omitted. (It’s more of the same.)
Return from trap rett is slightly different from restore in that there’s no destination register, so noneof the asssignments need be guarded.
42c 〈instruction defaults 38b〉+≡ ⊳ 41c 42d ⊲
rett (address) is
let fun restoreOut n is Reg.out n := Reg.in’ n
fun restoreLocal n is Reg.local’ n := $w[winptr-8 +zx n]
fun restoreIn n is Reg.in’ n := $w[winptr-16+zx n]
in do8 restoreOut | do8 restoreLocal | do8 restoreIn | winptr := winptr - 16 |
S := PS | ET := 1 | nPC := address
end
Trap on integer condition codes The traps use the same conditional tests as the integer branch in-structions.
42d 〈instruction defaults 38b〉+≡ ⊳ 42c 43b ⊲
t^[a n ne e g le ge l gu leu cc cs pos neg vc vs] (address) is RTL.SKIP
5.2. MORE SPARC DESCRIPTION 43
43a 〈trap semantics 38f〉+≡ ⊳ 40f 43g ⊲
t^[a n ne e g le ge l gu leu cc cs pos neg vc vs] (address) is
let val t is IccTest.tests
in [t.A t.N t.NE t.E t.G t.LE t.GE t.L
t.GU t.LEU t.CC t.CS t.POS t.NEG t.VC t.VS
] --> trap(trap_instruction address@bits[7 bits at 0])
end
State Registers
Read State Register
43b 〈instruction defaults 38b〉+≡ ⊳ 42d 43c ⊲
rd^[y psr wim tbr] (rd) is $r[rd] := [Y PSR WIM TBR]
Write State Register
43c 〈instruction defaults 38b〉+≡ ⊳ 43b 43e ⊲
wr^[y psr wim tbr] (rs1, reg_or_imm, rd) is [Y PSR WIM TBR] := xor($r[rs1], reg_or_imm)
Miscellaneous instructions
Store Barrier STBAR can’t really be modeled by an ordinary effect. Instead, we do just what the manualdoes, which is model it as an assignment to a special bit in the processor state.
43d 〈SPARC basics 37a〉+≡ ⊳ 38a
storage ’b’ is 1 cell of 1 bit called "store-barrier-pending flag"
locations store_barrier_pending is $b[0]
43e 〈instruction defaults 38b〉+≡ ⊳ 43c 43f ⊲
stbar () is store_barrier_pending := 1
Unimplemented instruction
43f 〈instruction defaults 38b〉+≡ ⊳ 43e 44a ⊲
unimp (imm22) is RTL.SKIP
43g 〈trap semantics 38f〉+≡ ⊳ 43a
unimp(imm22) is trap(illegal_instruction)
Floating point
The SPARC uses IEEE floating point, as imported from StdOperators.IEEE754
44 CHAPTER 5. THE MEANINGS OF THE STANDARD RTL OPERATORS
Conversions Because the conversion operators can change the sizes as well as the formats of their argu-ments, it’s easiest to use the casting functions defined earlier.
44a 〈instruction defaults 38b〉+≡ ⊳ 43f 44b ⊲
f ^[s d q]^toi (fs2, fd) is $f[fd] := f2i([word dword qword] $f[fs2],
IEEE754.Round.zero)
fito^[s d q] (fs2, fd) is $f[fd] := [word dword qword] (i2f($f[fs2], RD))
fsto^[d q] (fs2, fd) is $f[fd] := [dword qword] (f2f(word $f[fs2], RD))
fdto^[s q] (fs2, fd) is $f[fd] := [word qword] (f2f(dword $f[fs2], RD))
fqto^[s d] (fs2, fd) is $f[fd] := [word dword] (f2f(qword $f[fs2], RD))
Moves
44b 〈instruction defaults 38b〉+≡ ⊳ 44a 44c ⊲
fmovs (fs2, fd) is $f[fd] := $f[fs2]
fnegs (fs2, fd) is $f[fd] := fneg $f[fs2]
fabss (fs2, fd) is $f[fd] := fabs $f[fs2]
Square root Here, it’s easiest just to instantiate the square-root operator with the proper size.
44c 〈instruction defaults 38b〉+≡ ⊳ 44b 44d ⊲
fsqrt^[s d q] (fs2, fd) is $f[fd] := fsqrt [#32 #64 #128] ($f[fs2], RD)
Arithmetic The SPARC manual is silent about whether the results of addition and subtraction arerounded. We assume they are.
44d 〈instruction defaults 38b〉+≡ ⊳ 44c 44e ⊲
f^[add sub mul div]^[s d q] (fs1, fs2, fd) is
$f[fd] := [fadd fsub fmul fdiv] [#32 #64 #128] ($f[fs1], $f[fs2], RD)
The “exact multiply” instructions do not round, and it is easiest to give the size of arguments and resultsexplicitly.
44e 〈instruction defaults 38b〉+≡ ⊳ 44d 44f ⊲
fsmuld (fs1, fs2, fd) is $f[fd] := fmulx #32 #64 ($f[fs1], $f[fs2])
fdmulq (fs1, fs2, fd) is $f[fd] := fmulx #64 #128 ($f[fs1], $f[fs2])
Comparison The fcmp* and fcmpe* family have the same standard semantics, but different trap seman-tics. We haven’t specified the trap semantics yet.
44f 〈instruction defaults 38b〉+≡ ⊳ 44e
fcmp ^[s d q] (fs1, fs2) is fcc := fcmp [#32 #64 #128] ($f[fs1], $f[fs2])
fcmpe^[s d q] (fs1, fs2) is fcc := fcmp [#32 #64 #128] ($f[fs1], $f[fs2])
Chapter 6
Describing the Intel Pentium
This chapter shows how we use λ-RTL to describe some aspects of the Pentium instruction set that don’thave obvious analogs on the SPARC. Most Pentium instructions come in byte, word (16-bit), and doublewordvariants, and these variants treat parts of the Pentium registers as first-class locations. We show how towork with these locations and how to give them their usual names. A single effective address can refer todifferent parts of a register depending on the opcode with which it is used, so we specify the meanings ofeffective addresses as triples from which one element is selected depending on context.
By a mixture of opcodes and instruction prefixes, the Pentium offers many variants of each basic operation,so factoring the specification is a concern. We use the logical instructions to illustrate aggressive factoring,different notations for instructions that implement binary operations, and a slightly different treatment ofcondition codes.
45
46 CHAPTER 6. DESCRIBING THE INTEL PENTIUM
The large-scale structure of the Pentium description resembles that of the SPARC description, except thatwe add some examples that use infix and.
46a 〈intel.rtl 46a〉≡module Pentium is
import [RTL StdOperators]
from StdOperators import
[ --> + - := bit < > <= >= | = <> ?
add and bitInsert conjoin disjoin div divu mod modu mul mulu or popcnt
xor com shrl shl subtract sx zx
]
infixl 1.2 conjoin
infixl 1.1 disjoin
val * is mul
infixl 7 *
〈basics 46b〉〈new RTL operators 58f〉〈effective-address utilities 49b〉〈effective addresses 50a〉val [OF CF AF SF ZF PF] is [Reg.OF Reg.CF Reg.AF Reg.SF Reg.ZF Reg.PF]
〈utilities 49a〉local infixl 3 and
in default attribute of 〈instruction defaults using infix and 52c〉end
default attribute of 〈instruction defaults 55c〉end
6.1 Storage spaces
We begin with the basics—registers and memory. The Intel registers require no special fetch or store methods.The machine uses little-endian byte order.
46b 〈basics 46b〉≡ (46a) 46c ⊲
storage
’r’ is 8 cells of 32 bits called "registers"
’m’ is cells of 8 bits called "memory" aggregate using RTL.AGGL
Registers
In the Intel architecture, parts of registers have their own names. Figure 6.1 shows the various parts of thegeneral-purpose registers and the names used to refer to them. We put the definitions for the registers intotheir own λ-RTL sub-structure so we can more easily manage the name space.
46c 〈basics 46b〉+≡ (46a) ⊳ 46b 51c ⊲
module Reg is
〈registers 48a〉end
6.1. STORAGE SPACES 47
Register
AX︷ ︸︸ ︷
0 AH AL32 16 8 0
︸ ︷︷ ︸
EAXCX
︷ ︸︸ ︷
1 CH CL32 16 8 0
︸ ︷︷ ︸
ECXDX
︷ ︸︸ ︷
2 DH DL32 16 8 0
︸ ︷︷ ︸
EDXBX
︷ ︸︸ ︷
3 BH BL32 16 8 0
︸ ︷︷ ︸
EBX
4 SP32 16 0
︸ ︷︷ ︸
ESP
5 BP32 16 0
︸ ︷︷ ︸
EBP
6 SI32 16 0
︸ ︷︷ ︸
ESI
7 DI32 16 0
︸ ︷︷ ︸
EDI
Figure 6.1: Intel Pentium registers
48 CHAPTER 6. DESCRIBING THE INTEL PENTIUM
It is straightforward to define these locations in λ-RTL.
48a 〈registers 48a〉≡ (46c) 48b ⊲
locations
[ EAX ECX EDX EBX ESP EBP ESI EDI ] is $r[[0..7]]
-- EAX is $r[0], etc...
[ AX CX DX BX SP BP SI DI ] is $r[[0..7]] @ loc [0..15]
[ AL CL DL BL ] is [EAX ECX EDX EBX] @ loc [8 bits at 0]
[ AH CH DH BH ] is [EAX ECX EDX EBX] @ loc [8 bits at 8]
Dealing with the instruction encoding is more complex. Registers are referred to by number, but as canbe seen from Figure 6.1, the number doesn’t uniquely determine the location. In fact, a register numbermaps to one of three locations, depending on context. These three arrays specify which register, or partthereof, is denoted by which number in which context:
48b 〈registers 48a〉+≡ (46c) ⊳ 48a 48c ⊲
val byte is [. AL, CL, DL, BL, AH, CH, DH, BH .]
val word is [. AX, CX, DX, BX, SP, BP, SI, DI .]
val dword is [. EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI .]
To find out which location is denoted by a register number, we use the number to index the array chosen torepresent the proper context.
Status and Control Registers
The Pentium has two 32-bit status and control registers, EFLAGS and EIP. EIP is the instruction pointer.EFLAGS is a collection of 1-bit status flags and 1- or 2-bit control and system flags.
48c 〈registers 48a〉+≡ (46c) ⊳ 48b 48d ⊲
storage
’c’ is 2 cells of 32 bits called "control registers"
locations
[ EFLAGS EIP ] is $c[[0..1]]
What are commonly called condition codes are referred to as status flags in Intel documentation. ThePentium has six.
48d 〈registers 48a〉+≡ (46c) ⊳ 48c
locations
[ CF PF AF ZF SF OF ] is EFLAGS @ loc [1 bit at [0 2 4 6 7 11]]
-- carry parity auxiliary-carry zero sign overflow
6.1. STORAGE SPACES 49
Most instructions set flags in stylized ways, as shown in Table 3-2 on page 3-14 of the Pentium manual(Intel 1993). In particular, the values of SF, ZF, and PF are determined by simple tests of the result of anoperation. We capture this common semantics in the λ-RTL function set flags. OF, AF, and CF are setbased on conditions that are different for different kinds of instructions, so we pass their values to set flags.We pass a record, not a tuple, to set flags, so the order in which values are given does not matter.
49a 〈utilities 49a〉≡ (46a) 52a ⊲
fun parity n is (popcnt n)@bits[0] -- (number of bits set) mod 2
fun set_flags {result, o, a, c} is
SF := bit (result < 0) |
ZF := bit (result = 0) |
PF := bit (parity (result@bits[0..7]) = 0) |
OF := o | AF := a | CF := c
Defining the other flags within EFLAGS is straightforward, and they are omitted from this example.
Effective addresses
Within an effective address, a register number may denote one of three locations, depending on context. Wedefine functions regb, regw, and regd to be used in these contexts.
49b 〈effective-address utilities 49b〉≡ (46a) 51b ⊲
---- now hack me!
fun hacksub8 (a, n) is
if n = 0 then a sub 0
else if n = 1 then a sub 1
else if n = 2 then a sub 2
else if n = 3 then a sub 3
else if n = 4 then a sub 4
else if n = 5 then a sub 5
else if n = 6 then a sub 6
else a sub 7
fi fi fi fi fi fi fi
val (sub) is (hacksub8)
infixn 0 [sub hacksub8]
fun regb n is Reg.byte sub n -- returns an 8-bit location
fun regw n is Reg.word sub n -- returns a 16-bit location
fun regd n is Reg.dword sub n -- returns a 32-bit location
50 CHAPTER 6. DESCRIBING THE INTEL PENTIUM
The context is usually determined by the opcode, not within the effective address itself. We would never-theless like an effective address to have a denotation (a default attribute) independent of any context. Oursolution to this problem is to let the denotation be a record with three elements—one for each context. TheR function creates such a record from a register number. We name the elements b, w, and d to stand for thebyte, word, and doubleword locations.
50a 〈effective addresses 50a〉≡ (46a) 50b ⊲
fun R n is {b is regb n, w is regw n, d is regd n}
An address in memory also stands for one of three locations, depending on context. (Because a locationincludes size as well as address, we get one byte, two bytes, or four bytes, all at the same address.) TheM function does for memory what R does for registers: turn a number into a collection of locations, one ofwhich will be used based on context. We use type casts to give the sizes of the locations.
50b 〈effective addresses 50a〉+≡ (46a) ⊳ 50a 50c ⊲
fun M address is { b is $m[address] : #8 loc
, w is $m[address] : #16 loc
, d is $m[address] : #32 loc
}
Finally, in order to make immediate operands look like register or memory operands, we define an I
function, which creates a record that provides the same immediate value, no matter what the context.There’s a little problem with polymorphism, in that the argument immed cannot simultaneously have types#8 bits, #16 bits, and #32 bits. We work around the problem by inserting a specious zero extension(zx). Eventually, we’ll probably need different functions for each type of immediate value.
50c 〈effective addresses 50a〉+≡ (46a) ⊳ 50b 51a ⊲
fun I immed is { b is zx immed : #8 bits
, w is zx immed : #16 bits
, d is zx immed : #32 bits
}
fun Is immed is { b is sx immed : #8 bits
, w is sx immed : #16 bits
, d is sx immed : #32 bits
}
6.1. STORAGE SPACES 51
Effective addresses denote registers or memory locations. In order to support LEA (load effective address),effective addresses for memory operands are represented by the address alone. These have constructor typeMem. The true effective addresses have constructor type addr.
51a 〈effective addresses 50a〉+≡ (46a) ⊳ 50c
operand Mem : #32 bits
operand addr : { b : #8 loc, w : #16 loc, d : #32 loc }
default attribute of
Indir (r) : Mem is regd r
Disp8 (d8, r) : Mem is sx d8 + regd r
Disp32 (d32, r) : Mem is d32 + regd r
Index ( base, index, ss) : Mem is regd base + regd index << ss
Index8 (d8, base, index, ss) : Mem is regd base + sx d8 + regd index << ss
Index32 (d32, base, index, ss) : Mem is regd base + d32 + regd index << ss
ShortIndex (d8, index, ss) : Mem is sx d8 + regd index << ss
Abs32 (a) : Mem is a
E (Mem) : addr is M Mem
Reg (r) : addr is R r
We define C notation for 32-bit shifts.
51b 〈effective-address utilities 49b〉+≡ (46a) ⊳ 49b
val [>> <<] is [shrl shl]
infixl 8 [>> <<]
We can’t have constructors without naming the types of the operands. Here are the types recoveredautomatically from the SLED specification.
51c 〈basics 46b〉+≡ (46a) ⊳ 46c 51d ⊲
operand ss : #2 bits
operand [ base index r16 r32 r8 sr16] : #3 bits
operand i8 : #8 bits
operand i16 : #16 bits
operand i32 : #32 bits
And here are additional types written in by hand.
51d 〈basics 46b〉+≡ (46a) ⊳ 51c 51e ⊲
operand d8 : #8 bits
operand [a d32] : #32 bits
operand [l r reg] : #3 bits
Traps
A simple, cheezy model.
51e 〈basics 46b〉+≡ (46a) ⊳ 51d
module Trap is
val [DE DB NMI DP OF BR UD NM DF _ TS NP SS GP PF _ MF AC MC] is [0..18]
storage ’t’ is 1 cell of 8 bits called "trap code"
locations trap_code is $t[0]
fun trap k is $t[0] := k
val exn is trap
end
52 CHAPTER 6. DESCRIBING THE INTEL PENTIUM
Instructions
The Pentium architecture provides more opportunities for factoring than does the SPARC. Most instructionscome in three size variants, and each variant uses one of the three contexts. These variants would be tedious tospecify repetitiously, so we use λ-RTL’s higher-order functions and generators to reduce repetition. The restof this chapter offers several alternative ways to handle this kind of variation, using the logical instructionsas a running example.
A very common pattern of execution is the “two-operand instruction,” which we can write l ← l ⊕ r
where ⊕ is a generic binary operator, l is an effective address dependent on context, and r could be acontext-dependent effective address or an immediate value that has been “put in context” by the I function.
The λ-RTL function llr embodies this pattern of computation. It takes two (curried) arguments: thetriple (l,⊕, r), and a size function that determines the context by choosing the b, w, or d element from eachoperand record. Actually, we wind up making size a pair of functions (sl and sr), because Milner’s typeinference can’t infer a polymorphic type for a parameter, and therefore using a single size function wouldforce both left and right to have the same type. We don’t want that, because we want freedom for rightto be either a location like left or for it to be a value.
52a 〈utilities 49a〉+≡ (46a) ⊳ 49a 54b ⊲
fun llr (left, op, right) (size as (sl, sr)) is sl left := op((sl left), sr right)
fun [b w d] {b, w, d} is [b w d] -- b, w, d select correct size from record
val [b w d] is [(b,b) (w,w) (d,d)] -- b, w, d renamed to pairs
The function pairs b, w, and d are used below as size arguments.
Logical instructions
The Pentium has 42 instructions that implement the three logical operations AND, OR, and XOR. (λ-RTLcounts different size variants as different instructions.) All these instructions have side effects on the statusflags (condition codes) as well as the l ← l⊕ r effect. This section shows progressively more aggressive waysto use λ-RTL to describe such large groups of similar instructions.
In λ-RTL, one can declare any identifier to be an infix operator. Infix operators can be convenient whenspecifying single instructions, as shown in the specification of these three variants of AND, which operate onthe A register.
52b 〈junk: instruction defaults using infix and 52b〉≡ 53a ⊲
ANDiAL (i8) is Reg.AL := Reg.AL and i8
ANDiAX (i16) is Reg.AX := Reg.AX and i16
ANDiEAX (i32) is Reg.EAX := Reg.EAX and i32
Infix notation is less convenient when describing groups of instructions. Here we use the llr functiondefined above to define some variants that work on two or three sizes of values. The parentheses around(and) convert it from an infix operator to an ordinary value that can be passed as a parameter to llr.
52c 〈instruction defaults using infix and 52c〉≡ (46a)
ANDi ^[b w d] (addr, i) is llr (addr, (and), I i) [b w d]
ANDio^[w d]^b (addr, i) is llr (addr, (and), I (sx i)) [ w d]
ANDmr^[b ow od] (addr, reg) is llr (addr, (and), R reg) [b w d]
ANDrm^[b ow od] (addr, reg) is llr (R reg, (and), addr) [b w d]
6.1. STORAGE SPACES 53
The I and R functions create three-element records that are selected from by the size functions also passedto llr.1 Groups in square brackets are used to define multiple variants at once. Each variant uses either b,w, or d as a size function.
Note that the parameter addr also represents a three-element record. This type is inferred.The llr function helps factor the specification, but the ordinary function-application syntax is not very
evocative of what is going on. Here, we use the infix operators <==, <, and > with non-standard meanings,creating a sort of “mixfix” notation that is more suggestive of the semantics:
dst <== l <⊕> r.
We use the notation first, then define it.
53a 〈junk: instruction defaults using infix and 52b〉+≡ ⊳ 52b
ANDiAL (i8) is Reg.AL := Reg.AL and i8
ANDiAX (i16) is Reg.AX := Reg.AX and i16
ANDiEAX (i32) is Reg.EAX := Reg.EAX and i32
ANDi^[b w d](addr, i) is bin (addr <== addr <(and)> I i) [b w d]
ANDio^[ w d]^b (addr, i) is bin (addr <== addr <(and)> I (sx i)) [ w d]
ANDmr^[b ow od] (addr, reg) is bin (addr <== addr <(and)> R reg) [b w d]
ANDrm^[b ow od] (addr, reg) is bin (R reg <== R reg <(and)> addr) [b w d]
We implement the notation by defining <==, <, and > to build up records, then defining bin to createthe effect describing the assignment. Since < and > have existing definitions from Chapter 4, we save thosedefinitions in lt and gt.
53b 〈junk: utilities 53b〉≡nonfix [< >]
val [lt gt] is [< >] -- save these for later
fun < (l, operator) is { l is l, operator is operator }
fun > ({l, operator}, r) is { l is l, operator is operator, r is r }
fun <== (dst, {l, operator, r}) is { dst is dst, l is l, operator is operator, r is r}
infixn ~10 <
infixn ~11 >
infixn ~12 <==
fun bin {dst, l, operator, r} (sdl, sr) is sdl dst := operator (sdl l, sr r)
Note that dst and l must have the same type. This restriction is just fine, since on the Pentium they alwayshave the same value—the reason both dst and l are parameters is purely notational.
1N.B. we probably are going to have to split up the immediate instructions because the types of their second operands aredifferent.
54 CHAPTER 6. DESCRIBING THE INTEL PENTIUM
It turns out that OR and XOR have the same structure as AND, so we factor the description to includethose instructions as well. Once we do that, there’s no point in making the operators infix.
54a 〈junk: instruction defaults 54a〉≡ 55a ⊲
[AND OR XOR]^iAL (i8) is Reg.AL := [and or xor] (Reg.AL, i8)
[AND OR XOR]^iAX (i16) is Reg.AX := [and or xor] (Reg.AX, i16)
[AND OR XOR]^iEAX (i32) is Reg.EAX := [and or xor] (Reg.EAX, i32)
[AND OR XOR]^i^[b w d](addr, i) is
bin (addr <== addr <[and or xor]> I i) [b w d]
[AND OR XOR]^i^[ow od]^b (addr, i) is
bin (addr <== addr <[and or xor]> I (sx i)) [ w d]
[AND OR XOR]^mr^[b ow od] (addr, reg) is
bin (addr <== addr <[and or xor]> R reg) [b w d]
[AND OR XOR]^rm^[b ow od] (addr, reg) is
bin (R reg <== R reg <[and or xor]> addr) [b w d]
Now that we have machinery in place, a minor extension will enable us also to specify the effects of theseinstructions on the status flags (condition codes). Section 4.4.1 of the Pentium manual says “The AND, OR,and XOR instructions clear the OF and CF flags, leave the AF flag undefined, and update the SF, ZF, andPF flags.” The function logical flags implements this behavior.
54b 〈utilities 49a〉+≡ (46a) ⊳ 52a 54c ⊲
fun logical_flags n is set_flags {result is n, o is 0, c is 0, a is ?}
In our description of the SPARC, we defined a “main effect” function like bin and passed it a functionthat set the condition code. Here, we use a slightly different approach—we define bin’ to return both themain effect and the result of the instruction, and we pass bin’ and its arguments to a function that sets thecondition code.
54c 〈utilities 49a〉+≡ (46a) ⊳ 54b 54d ⊲
fun bin’ {dst, l, operator, r} (sdl, sr) is
let val result is operator (sdl l, sr r)
in {result is result, effect is sdl dst := result}
end
logical combines the “main effect” and the effects on the flags.
54d 〈utilities 49a〉+≡ (46a) ⊳ 54c 55b ⊲
fun logical main arg size is
let val {result, effect} is main arg size
in effect | logical_flags result
end
6.1. STORAGE SPACES 55
Here’s the final definition of 42 Pentium instructions, including their effects on the flags. We have re-castthe first three groups so we can apply logical. Because Reg.AL represents a single location, not a triple, weuse (\x.x,\x.x) (the identity function) as a “size selector.” The same trick applies to Reg.AX and Reg.EAX.
55a 〈junk: instruction defaults 54a〉+≡ ⊳ 54a
[AND OR XOR]^iAL (i8) is
logical bin’ (Reg.AL <== Reg.AL <[and or xor]> i8) (\x.x,\x.x)
[AND OR XOR]^iAX (i16) is
logical bin’ (Reg.AX <== Reg.AX <[and or xor]> i16) (\x.x,\x.x)
[AND OR XOR]^iEAX (i32) is
logical bin’ (Reg.EAX <== Reg.EAX <[and or xor]> i32) (\x.x,\x.x)
[AND OR XOR]^i^[b w d](addr, i) is
logical bin’ (addr <== addr <[and or xor]> I i) [b w d]
[AND OR XOR]^i^[ow od]^b (addr, i) is
logical bin’ (addr <== addr <[and or xor]> I (sx i)) [ w d]
[AND OR XOR]^mr^[b ow od] (addr, reg) is
logical bin’ (addr <== addr <[and or xor]> R reg) [b w d]
[AND OR XOR]^rm^[b ow od] (addr, reg) is
logical bin’ (R reg <== R reg <[and or xor]> addr) [b w d]
Finally, we can try for aggressive factoring without the infix notation. The function llr’ is curried, sothat applications won’t be cluttered with parentheses and commas, and it takes an extra argument for anadditional effect.
55b 〈utilities 49a〉+≡ (46a) ⊳ 54d 55e ⊲
fun llr’ l op r (sl, sr) extra is sl l := op(sl l, sr r) | extra (op(sl l, sr r))
fun logical’ l op r size is llr’ l op r size logical_flags
val idsize is (\x.x, \x.x)
55c 〈instruction defaults 55c〉≡ (46a) 55d ⊲
[AND OR XOR]^iAL (i8) is logical’ Reg.AL [and or xor] i8 idsize
[AND OR XOR]^iAX (i16) is logical’ Reg.AX [and or xor] i16 idsize
[AND OR XOR]^iEAX (i32) is logical’ Reg.EAX [and or xor] i32 idsize
[AND OR XOR]^i^[b w d](addr, i) is logical’ addr [and or xor] (I i) [b w d]
[AND OR XOR]^i^[ow od]^b (addr, i) is logical’ addr [and or xor] (Is i) [w d]
[AND OR XOR]^mr^[b ow od] (addr, reg) is logical’ addr [and or xor] (R reg) [b w d]
[AND OR XOR]^rm^[b ow od] (addr, reg) is logical’ (R reg) [and or xor] addr [b w d]
The remaining logical instruction, NOT, comes in only three variants, and it does not affect the conditioncodes. Even though NOT is unary, we still pass a pair of size functions, so that we can reuse b, w, and d asdefined above.
55d 〈instruction defaults 55c〉+≡ (46a) ⊳ 55c 55f ⊲
NOT^[b ow od] (addr) is unary (com, addr) [b w d]
55e 〈utilities 49a〉+≡ (46a) ⊳ 55b 56a ⊲
fun unary (op, v) (size, _) is size v := op (size v)
Load effective address
55f 〈instruction defaults 55c〉+≡ (46a) ⊳ 55d 56b ⊲
LEAod (reg, Mem) is regd reg := Mem
LEAow (reg, Mem) is regw reg := Mem @bits[16 bits at 0]
56 CHAPTER 6. DESCRIBING THE INTEL PENTIUM
6.2 Real instructions, for serious
Now following the SLED spec, more or less.The arithmetic group are the first challenge. We’ve already done the logical.Section 4.4.1 of the Pentium manual says “The AND, OR, and XOR instructions clear the OF and CF
flags, leave the AF flag undefined, and update the SF, ZF, and PF flags.” The function logical flags
implements this behavior.
56a 〈utilities 49a〉+≡ (46a) ⊳ 55e 57a ⊲
fun add_overflows (x, y, c) is
let val {result, carry} is add(x, y, c)
in bit (x < 0) = bit(y < 0) conjoin (bit(x < 0) <> bit(result < 0))
end
fun add_adjust carry_in x y is
StdOperators.carry(x@bits[0..3], y@bits[0..3], carry_in)
fun allr l op r (sl, sr) extra is sl l := op(sl l, sr r) | extra (sl l, sr r)
fun add’ carry_in adj (x, y) is
let val {result, carry} is add(x, y, carry_in)
in set_flags { result is result, a is adj x y , c is carry,
o is bit(add_overflows (x, y, carry_in)) }
end
fun adc (x, y) is (add(x, y, CF)).result
val adcx is add’ CF (add_adjust CF)
val addx is add’ 0 (add_adjust 0)
56b 〈instruction defaults 55c〉+≡ (46a) ⊳ 55f 56c ⊲
ADC^iAL (i8) is allr Reg.AL adc i8 idsize adcx
ADC^iAX (i16) is allr Reg.AX adc i16 idsize adcx
ADC^iEAX (i32) is allr Reg.EAX adc i32 idsize adcx
ADC^i^[b w d](addr, i) is allr addr adc (I i) [b w d] adcx
ADC^i^[ow od]^b (addr, i) is allr addr adc (Is i) [w d] adcx
ADC^mr^[b ow od] (addr, reg) is allr addr adc (R reg) [b w d] adcx
ADC^rm^[b ow od] (addr, reg) is allr (R reg) adc addr [b w d] adcx
Add is nearly identical, but no carry in—add still carries out.
56c 〈instruction defaults 55c〉+≡ (46a) ⊳ 56b 57b ⊲
ADD^iAL (i8) is allr Reg.AL (+) i8 idsize addx
ADD^iAX (i16) is allr Reg.AX (+) i16 idsize addx
ADD^iEAX (i32) is allr Reg.EAX (+) i32 idsize addx
ADD^i^[b w d](addr, i) is allr addr (+) (I i) [b w d] addx
ADD^i^[ow od]^b (addr, i) is allr addr (+) (Is i) [w d] addx
ADD^mr^[b ow od] (addr, reg) is allr addr (+) (R reg) [b w d] addx
ADD^rm^[b ow od] (addr, reg) is allr (R reg) (+) addr [b w d] addx
6.2. REAL INSTRUCTIONS, FOR SERIOUS 57
The add, subtract, with and without carry/borrow, and also cmp, all fit mostly the same template:
l ← l ± (r + c)|set-flags .
The bit c is either the CF bit or zero. Compare is a subtract that doesn’t set the result.
57a 〈utilities 49a〉+≡ (46a) ⊳ 56a 58c ⊲
fun sub_overflows (x, y, c) is
let val {result, borrow} is subtract(x, y, c)
in bit (x < 0) = bit(y > 0) conjoin (bit(x < 0) <> bit(result < 0))
end
fun sub_adjust carry_in x y is
StdOperators.borrow(x@bits[0..3], y@bits[0..3], carry_in)
fun sub’ asgn l r c (sl, sr) is
let val {result, borrow} is subtract(sl l, sr r, c)
in asgn --> sl l := result |
set_flags {result is result, a is sub_adjust c (sl l) (sr r), c is borrow,
o is bit(sub_overflows(sl l, sr r, c))}
end
57b 〈instruction defaults 55c〉+≡ (46a) ⊳ 56c 57c ⊲
-- arith is ADD ADC AND XOR OR SBB SUB CMP
[SUB SBB]^iAL (i8) is sub’ true Reg.AL i8 [0 CF] idsize
[SUB SBB]^iAX (i16) is sub’ true Reg.AX i16 [0 CF] idsize
[SUB SBB]^iEAX (i32) is sub’ true Reg.EAX i32 [0 CF] idsize
[SUB SBB]^i^[b w d](addr, i) is sub’ true addr (I i) [0 CF] [b w d]
[SUB SBB]^i^[ow od]^b (addr, i) is sub’ true addr (Is i) [0 CF] [w d]
[SUB SBB]^mr^[b ow od] (addr, reg) is sub’ true addr (R reg) [0 CF] [b w d]
[SUB SBB]^rm^[b ow od] (addr, reg) is sub’ true (R reg) addr [0 CF] [b w d]
CMP^iAL (i8) is sub’ false Reg.AL i8 0 idsize
CMP^iAX (i16) is sub’ false Reg.AX i16 0 idsize
CMP^iEAX (i32) is sub’ false Reg.EAX i32 0 idsize
CMP^i^[b w d](addr, i) is sub’ false addr (I i) 0 [b w d]
CMP^i^[ow od]^b (addr, i) is sub’ false addr (Is i) 0 [w d]
CMP^mr^[b ow od] (addr, reg) is sub’ false addr (R reg) 0 [b w d]
CMP^rm^[b ow od] (addr, reg) is sub’ false (R reg) addr 0 [b w d]
And now, Alphabetman.
57c 〈instruction defaults 55c〉+≡ (46a) ⊳ 57b 57d ⊲
AAA is
if and(Reg.AL, 0xf) > 9 disjoin AF = 1 then
Reg.AL := and(Reg.AL + 6, 0xf) | Reg.AH := Reg.AH + 1 | AF := 1 | CF := 1
else
Reg.AL := and(Reg.AL, 0xf) | AF := 0 | CF := 0
fi
N.B. At least on the Pentium Pro, these instructions can be used with 8-bit immediate values other than10. The specification should be changed.
57d 〈instruction defaults 55c〉+≡ (46a) ⊳ 57c 58a ⊲
AAD is Reg.AL := (mulu (Reg.AH, 10))@bits[0..7] + Reg.AL | Reg.AH := 0
AAM is Reg.AH := Reg.AL divu 10 | Reg.AL := Reg.AL modu 10
58 CHAPTER 6. DESCRIBING THE INTEL PENTIUM
58a 〈instruction defaults 55c〉+≡ (46a) ⊳ 57d 58b ⊲
AAS is
if and(Reg.AL, 0xf) > 9 disjoin AF = 1 then
Reg.AL := and(Reg.AL - 6, 0xf) | Reg.AH := Reg.AH - 1 | AF := 1 | CF := 1
else
Reg.AL := and(Reg.AL, 0xf) | AF := 0 | CF := 0
fi
58b 〈instruction defaults 55c〉+≡ (46a) ⊳ 58a 58d ⊲
ARPL (addr, r) is
if addr.w@bits[0..1] < (regw r)@bits[0..1] then
ZF := 1 | addr.w@loc[0..1] := (regw r)@bits[0..1]
else
ZF := 0
fi
In the bounds check, size and offset must be different because size is a 32-bit value and offset maybe 16 or 32 bits.
58c 〈utilities 49a〉+≡ (46a) ⊳ 57a 58e ⊲
fun bound index addr size offset is
let val (lower, upper) is ($m[addr], $m[addr+size])
in index < lower disjoin index > (upper + offset) --> Trap.exn Trap.BR
end
58d 〈instruction defaults 55c〉+≡ (46a) ⊳ 58b 58g ⊲
BOUNDow (r, Mem) is bound (regw r) Mem 2 2
BOUNDod (r, Mem) is bound (regd r) Mem 4 4
set zf sets ZF only and leaves other flags undefined. set cf is similar.
58e 〈utilities 49a〉+≡ (46a) ⊳ 58c 59a ⊲
fun set_zf z is ZF := z | SF := ? | PF := ? | OF := ? | AF := ? | CF := ?
fun set_cf z is CF := z | SF := ? | PF := ? | OF := ? | AF := ? | ZF := ?
The RTL operator lsbset n finds the smallest k such that bit k is set in n (i.e., n and 1 << k <> 0).The operator msbset works similarly but for the most significant bit.
58f 〈new RTL operators 58f〉≡ (46a)
rtlop [lsbset msbset] : #n bits -> #k bits
58g 〈instruction defaults 55c〉+≡ (46a) ⊳ 58d 58h ⊲
-- this desperately needs foreach
[BSF BSR]^[od ow] (reg, addr) is
(\scan. \(sl, sr) .
if sr addr = 0 then set_zf 1 | sl (R reg) := ?
else set_zf 0 | sl (R reg) := scan (sr addr)
fi
) [lsbset msbset] [d w]
58h 〈instruction defaults 55c〉+≡ (46a) ⊳ 58g 59c ⊲
BSWAP (reg) is
let fun b k is (regd reg)@bits[8 bits at k]
fun ins lsb byte word is bitInsert {lsb is lsb, wide is word} byte
in regd reg := ins 0 (b 24) (ins 8 (b 16) (ins 16 (b 8) (ins 24 (b 0) 0)))
end
6.2. REAL INSTRUCTIONS, FOR SERIOUS 59
The location of a bit is according to the computations in the manual (Table 11-3 in the Pentium Promanual). bitAt touches only a byte; the others touch words or doublewords.
59a 〈utilities 49a〉+≡ (46a) ⊳ 58e 59b ⊲
fun bitAt {base, offset} is
$m[base + offset div 8]@loc[1 bit at offset mod 8]
fun bit16At {base, offset} is
($m[base + 2 * (offset div 16)] : #16 loc)@loc[1 bit at offset mod 16]
fun bit32At {base, offset} is
($m[base + 4 * (offset div 32)] : #32 loc)@loc[1 bit at offset mod 32]
There are a bunch of things we could do to a bit.
59b 〈utilities 49a〉+≡ (46a) ⊳ 59a
fun bt bit is set_cf bit
fun btc bit is set_cf bit | bit := com bit
fun btr bit is set_cf bit | bit := 0
fun bts bit is set_cf bit | bit := 1
Perhaps surprisingly, when the right operand is immediate, only the low-order 3 or 5 bits are used (ac-cording to the processor manual). The number 3 (@bits[0..2]) is particularly alarming; we should test tosee if the fourth bit is used in word mode.
59c 〈instruction defaults 55c〉+≡ (46a) ⊳ 58h
-- [BT BTC BTR BTS]^ow (Reg(l), r) is [bt btc btr bts] ((regw l)@loc[1 bit at (regw r)@bits[0..3]])
-- [BT BTC BTR BTS]^od (Reg(l), r) is [bt btc btr bts] ((regd l)@loc[1 bit at (regd r)@bits[0..4]])
-- [BT BTC BTR BTS]^ow (E(a), r) is [bt btc btr bts] (bit16At {base is a, offset is regw r})
-- [BT BTC BTR BTS]^od (E(a), r) is [bt btc btr bts] (bit32At {base is a, offset is regd r})
-- [BT BTC BTR BTS]^iow (Reg(l), i8) is [bt btc btr bts] ((regw l)@loc[1 bit at i8@bits[0..3]])
-- [BT BTC BTR BTS]^iod (Reg(l), i8) is [bt btc btr bts] ((regd l)@loc[1 bit at i8@bits[0..4]])
-- [BT BTC BTR BTS]^iow (E(a), i8) is [bt btc btr bts] (bit16At {base is a, offset is i8@bits[0..2]})
-- [BT BTC BTR BTS]^iod (E(a), i8) is [bt btc btr bts] (bit32At {base is a, offset is i8@bits[0..4]})
Chapter 7
Describing the MIPS R?000
This note uses λ-RTL to specify the MIPS instruction set. Page numbers listed here come from the MIPSR4000 reference manual by Joe Heinrich.
7.1 Overview
The MIPS is a RISC architecture. Like most such architectures, it uses load and stores and register-to-register instructions. It uses delayed branches and delayed loads. Unlike the SPARC, it has (almost) nocondition codes. Users may either test and branch (see Section 7.3) or use register set instructions to capturethe effect of a test (see Section 7.3.) The one condition code the MIPS does have is Condition which isassociated with the floating point coprocessor, as described in Section ??.
7.2 Storage spaces
Storage locations manipulated by MIPS instructions include memory and several kinds of registers. Memoryin the MIPS is byte addressed. It uses either big-endian or little-endian byte order. This is selectable by theuser. We will illustrate big-endian.
60a 〈MIPS basics 60a〉≡ (70b) 60b ⊲
storage
’m’ is cells of 8 bits called "memory" aggregate using RTL.AGGB
Integer Registers
The machine’s registers are modeled as a simple collection of mutable cells. The aggregation attribute meansthat registers can be fetched and stored to in chunks larger than one register:
60b 〈MIPS basics 60a〉+≡ (70b) ⊳ 60a 61b ⊲
storage
’r’ is 32 cells of 32 bits called "registers"
〈MIPS register fetch and store methods 61a〉aggregate using RTL.AGGB
60
7.2. STORAGE SPACES 61
On the MIPS register 0 is hardwired to value zero. Dealing with register 0 is fairly simple; there are onlytwo reasonable models. One is the full semantics; the other is a model which specifies only that register 0is immutable. We make register 0 immutable by ensuring that attempts to store into it have no effect. Thefull semantics also shows that fetches return 0.
61a 〈MIPS register fetch and store methods 61a〉≡ (60b)
fetch using \n. if n = 0 then 0 else RTL.TRUE_FETCH ($r[n]) fi
store using \(n, v). n <> 0 --> RTL.TRUE_STORE ($r[n], v)
In this straightforward model of registers, we use these special fetch and store methods, and we also permitregisters to be aggregated into pairs, so we can describe instructions like .
Integer Unit control/status registers
This specification gives the names of the special registers attached to the integer unit. CR is the ConfigRegister. The BE field of CR indicates if the kernel and memory are big-endian or little-endian. The StatusRegister (SR) is also of interest because of the RE field. RE (Reverse-endianness) inverts the user modeendianness. (see page 110).
61b 〈MIPS basics 60a〉+≡ (70b) ⊳ 60b 61c ⊲
storage ’i’ is 6 cells of 32 bits called "Integer-unit control/status register"
locations
[CR SR HI LO PC nPC] is $i[[0..5]]
[RE] is SR@loc[1 bit at [25]]
[BE] is CR@loc[1 bit at [15]]
Aggregating cells in storage spaces
A reference to a cell like $m[a] normally means the cell by itself—in this case, a single byte in memorylocated at address a. But sometimes we want to talk about a 32-bit word at address a, or even a 16-bithalfword, a doubleword, or a quad-word. Often, λ-RTL can figure out what is meant just from the contextin which $m[a] is used—but when it can’t, an explicit cast can be used to make the size explicit, thus: $m[a]: #32 bits. Because the syntax of the casts can be awkward, we define functions that do the casting.
61c 〈MIPS basics 60a〉+≡ (70b) ⊳ 61b 61d ⊲
fun byte x is x : #8 bits
fun hword x is x : #16 bits
fun word x is x : #32 bits
fun dword x is x : #64 bits
we can also use some casts for floating point
61d 〈MIPS basics 60a〉+≡ (70b) ⊳ 61c 67a ⊲
fun single x is x : #32 bits
fun double x is x : #64 bits
Given these functions, the result of, e.g., dword is guaranteed to be a 64-bit value, even if dword is appliedto an 8-bit byte in memory.
62 CHAPTER 7. DESCRIBING THE MIPS R?000
7.3 MIPS instructions
Types of MIPS operands
These types have been gleaned automatically straight from the SLED specification.
62a 〈address modes and operands 62a〉≡ (70b) 62b ⊲
operand [ base fd fs ft rd rs rt shamt]: #5 bits
operand offset : #16 bits
operand reloc : #32 bits
operand breakcode: #20 bits
MIPS addresses and operands
The MIPS has a simple structure for effective addresses and operands. As far as the hardware is concerned,there is only one addressing mode, the address is the value of register base, plus the value of sign-extendinga 16-bit offset.
62b 〈address modes and operands 62a〉+≡ (70b) ⊳ 62a
fun dispA (base, offset) is $r[base] + sx offset
Load instructions
The specifications of the load-integer instructions give us our first opportunity to factor λ-RTL descriptions.On the left-hand side, the phrases [s u] and [b h] are like for loops, and the carets (^) join parts ofconstructor names, so the opcode on the left-hand side expands to the list of constructors lb, lh and lw.Corresponding to [b h w] is the expression group [byte hword word]. These functions, defined above,produce aggregations of 8, 16 and 32 bits.
62c 〈instruction defaults 62c〉≡ (70b) 63a ⊲
default attribute of
lw (rt, offset, base) is $r[rt] := $m[dispA(base,offset)]
l^[b h] (rt, offset, base) is $r[rt] := sx ([byte hword] $m[dispA(base, offset)])
Loading unaligned words
“Load word left instruction adds the sign-extended offset to the contents of base to form the virtual address.It reads bytes only from the doubleword in memory which contains the specified byte.” We implement thisby calculating a virtual address vaddr, aligning it by dropping the two least significant bits of the address(alignAddr) and making an aligned fetch into tmp:
62d 〈address twiddling and fetch 62d〉≡ (63)
let val vaddr is $r[base] + sx offset
val memword is $m[and(vaddr, com 3)] -- word containing address
Then, we use the bitTransfer operator to insert the addressed bytes into the register word. The source(src) is, of course, the fetched word. The number of bytes to transfer is Byte, which is derived from the twolow order bits (tByte) of the address. The offset into the fetched word is calculated in lsbit:
62e 〈number of bytes to shift 62e〉≡ (63a)
val tByte is vaddr@bits[0..1] -- pick out the two lsb’s of the address
val Byte is xor(tByte,3)
val lsbit is shl(Byte,3) -- Byte*8
7.3. MIPS INSTRUCTIONS 63
The destination is target register $r[rt].
63a 〈instruction defaults 62c〉+≡ (70b) ⊳ 62c 63c ⊲
lwl(rt,offset,base) is
〈address twiddling and fetch 62d〉〈number of bytes to shift 62e〉
in
$r[rt] := bitTransfer{src is {value is memword,lsb is 0},
dst is {value is $r[rt] ,lsb is lsbit},
width is 7 + lsbit}
end
Load-word right, in combination with load-word left, allows for the load of an unaligned word. It differsfrom the lwl instruction in that the information is “right aligned” in the register
63b 〈number of bytes to shift into right 63b〉≡ (63c)
val tByte is vaddr@bits[0..1] -- pick out the two lsb’s of the address
val Byte is xor(tByte,3)
val lsbit is shl(Byte,3)
63c 〈instruction defaults 62c〉+≡ (70b) ⊳ 63a 63d ⊲
lwr(rt,offset,base) is
〈address twiddling and fetch 62d〉〈number of bytes to shift into right 63b〉
in
$r[rt] := bitTransfer{src is {value is memword,lsb is 32-lsbit},
dst is {value is $r[rt] ,lsb is 0},
width is 32- lsbit}
end
Store Instructions
Stores work on bytes, halfwords and words. They can also be used to store unaligned data, but it must bedone in two parts.
It is tempting to try to factor the number of bits [8 16 32] in the store instructions, but this does notparse. Notice the use of bit extraction to show what part of the register is being used.
63d 〈instruction defaults 62c〉+≡ (70b) ⊳ 63c 64a ⊲
sb(rt,offset,base) is $m[dispA(base,offset)] := $r[rt]@bits[8 bits at 0]
sh(rt,offset,base) is $m[dispA(base,offset)] := $r[rt]@bits[16 bits at 0]
sw(rt,offset,base) is $m[dispA(base,offset)] := $r[rt]
64 CHAPTER 7. DESCRIBING THE MIPS R?000
Storing unaligned words
These are the store analogues to the unaligned loads
64a 〈instruction defaults 62c〉+≡ (70b) ⊳ 63d 65c ⊲
swr (rt, offset, base) is
let val vaddr is ($r[base] + sx offset)
val memword is $m[and(vaddr, com 3)]
val tByte is vaddr@bits[0..1] -- pick out the two lsb’s of the address
val Byte is xor(tByte,3)
val lsbit is shl(Byte,3) -- Byte*8
val lssrcbit is shl(tByte,3) -- tByte*8
in
memword := bitTransfer{src is {value is $r[rt], lsb is 0},
dst is {value is memword,lsb is lsbit},
width is 8 + lssrcbit}
end
swl (rt, offset, base) is
let val vaddr is $r[base] + sx offset
val memword is $m[and(vaddr, com 3)]
val tByte is vaddr@bits[0..1] -- pick out the two lsb’s of the address
val Byte is xor(tByte,3)
val lsbit is shl(Byte,3) -- Byte*8
val lssrcbit is shl(tByte,3) -- tByte*8
in
memword := bitTransfer{src is {value is $r[rt], lsb is lssrcbit},
dst is {value is memword, lsb is 0},
width is 8 + lsbit}
end
Arithmetic
Adds and subtracts can be on signed and unsigned numbers and can be on immediates, too.When an overflow occurs during adds and subtracts, the result is undefined. if no overflow checks to
see if there is an overflow and if not, then it returns true.
64b 〈mips utilities [[full]] 64b〉≡fun op_overflows (op,x,y) is
let val result is op(x, y)
in x@bits[31] = y@bits[31] andalso x@bits[31] <> result@bits[31]
end
fun no_overflow(op,x,y) is
let val result is op(x,y)
in (x@bits[31] <> y@bits[31] orelse x@bits[31] = result@bits[31])
end
Conditional definition.
7.3. MIPS INSTRUCTIONS 65
65a 〈mips utilities [[simple]] 65a〉≡local rtlop mips_no_overflow : #32 bits * #32 bits -> bool
in fun no_overflow (_, x, y) is mips_no_overflow (x, y)
end
Conditional definition.
65b 〈mips utilities 65b〉≡ (70b) 66a ⊲
fun op_function(op, x,y,z,unsigned) is
unsigned orelse no_overflow(op,y,z) --> x := op(y, z)
65c 〈instruction defaults 62c〉+≡ (70b) ⊳ 64a 65d ⊲
[sub add]^["" u] (rd,rs,rt) is
op_function([(-) (+)],$r[rd],$r[rs],$r[rt],[false true])
addi^["" u] (rt,rs,offset) is
op_function((+),$r[rt],$r[rs],sx offset,[false true])
Multiply and Divide
HI and LO registers
The MIPS has a special register pair called HI and LO. These registers are used for multiplication and division.Special move instructions (mfhi,mflo,mthi,mtlo) are used to read and write the registers. Parenthetically,these instructions have special scheduling attributes. mflo and mfhi cannot be immediately followed byinstructions that modify the LO/HI resp. registers. In general, “correct operation requires separating readsof HI and LO by two or more instructions.”(A-57).
65d 〈instruction defaults 62c〉+≡ (70b) ⊳ 65c 65e ⊲
mf^[hi lo] (rd) is $r[rd] := [HI LO]
mt^[hi lo] (rs) is [HI LO] := $r[rs]
multiplies and divides
We have both signed and unsigned versions:
65e 〈instruction defaults 62c〉+≡ (70b) ⊳ 65d 66b ⊲
mult(rs,rt) is
let val temp is mul($r[rs],$r[rt])
in
HI := (temp: #64 bits)@bits[32 bits at 32]| LO := temp@bits[32 bits at 0]
end
multu(rs,rt) is
let val temp is mulu($r[rs],$r[rt])
in
HI := (temp: #64 bits)@bits[32 bits at 32] | LO := temp@bits[32 bits at 0]
end
div(rs,rt) is
HI := $r[rs] div $r[rt] | LO := $r[rs] mod $r[rt]
divu(rs,rt) is
HI := $r[rs] divu $r[rt] | LO := $r[rs] modu $r[rt]
66 CHAPTER 7. DESCRIBING THE MIPS R?000
Logical Instructions
66a 〈mips utilities 65b〉+≡ (70b) ⊳ 65b
fun nor(x,y) is com (or(x,y))
66b 〈instruction defaults 62c〉+≡ (70b) ⊳ 65e 66c ⊲
[and or xor nor](rd,rs,rt) is $r[rd] := [and or xor nor]($r[rs],$r[rt])
[andi ori xori ](rt,rs,offset) is $r[rt] := [and or xor ]($r[rs],sx offset)
Shift Instructions
66c 〈instruction defaults 62c〉+≡ (70b) ⊳ 66b 66d ⊲
[sll srl sra] (rd,rt,shamt) is $r[rd] := [shl shrl shra] ($r[rt],shamt)
[sll srl sra]^v (rd,rt,rs) is $r[rd] := [shl shrl shra] ($r[rt],$r[rs]@bits[0..4])
Set instructions
The set instructions set register rd to 1 if register rs is less than register rt. There are signed and unsignedversions and corresponding immediate versions, too:
66d 〈instruction defaults 62c〉+≡ (70b) ⊳ 66c 66e ⊲
[slt sltu] (rd,rs,rt) is $r[rd] := sx (bit ([(<) ltu] ($r[rs], $r[rt])))
[slt sltu]^i (rt,rs,offset) is $r[rt] := sx (bit ([(<) ltu] ($r[rs], sx offset)))
Load upper immediate
Like the SETHI instruction on the SPARC. We shift the instruction left 16 bits and concatenate to 16 bitsof zeros.
66e 〈instruction defaults 62c〉+≡ (70b) ⊳ 66d 66f ⊲
lui(rt,offset) is $r[rt] := shl(zx offset, 16)
Jump and branch instructions
66f 〈instruction defaults 62c〉+≡ (70b) ⊳ 66e 68a ⊲
jr(rs) is nPC := $r[rs]
j(reloc) is nPC := reloc
jalr(rd,rs) is nPC := $r[rs] | $r[rd] := PC
jal(reloc) is nPC := reloc | $r[31] := PC
b^[le gt lt ge eq]^z (rs,reloc) is
[(<=) (>) (<) (>=) (=)] ($r[rs],0 ) --> nPC := reloc
b^[eq ne] (rs,rt,reloc) is
[(=) (<>)] ($r[rs],$r[rt]) --> nPC := reloc
bltzal (rs, reloc) is $r[rs] < 0 --> (nPC := reloc | $r[31] := PC)
bgezal (rs, reloc) is $r[rs] >= 0 --> (nPC := reloc | $r[31] := PC)
7.4. FLOATING POINT 67
7.4 Floating Point
Architecture
The MIPS FPU has 32 registers.
67a 〈MIPS basics 60a〉+≡ (70b) ⊳ 61d 67b ⊲
storage ’f’ is 32 cells of 32 bits called "Floating-Point Register"
aggregate using RTL.AGGB
The MIPS FPU has 32 control registers of which only FCR0 and FCR31 are usable.
67b 〈MIPS basics 60a〉+≡ (70b) ⊳ 67a 67c ⊲
storage ’c’ is 32 cells of 32 bits called "Floating-Point Control Registers"
locations
[FCR0 FCR31] is $c[[0 31]]
following the sparc supplement, we will put the floating-point condition codes in their own module
67c 〈MIPS basics 60a〉+≡ (70b) ⊳ 67b 67d ⊲
module pcc is
locations [FS C Cause Enables Flags ] is FCR31@loc[1 bits at [24 23 12 7 2 ]]
locations RM is FCR31@loc[2 bits at 0]
end
val Condition is pcc.C
val RM is pcc.RM
There are signaling and non-signaling versions
compare
non-signaling
When a floating point compare is performed, four bits of information are generated: Greater Than, LessThan, Equal, and Unordered. Since comparisons come in two sizes, we parameterize over the comparisonfunction.
67d 〈MIPS basics 60a〉+≡ (70b) ⊳ 67c 69f ⊲
module Fcmp is
fun [f t un or eq neq ueq ogl olt uge ult oge ole ugt ule ogt] cmp (fs,ft) is
let val or is (orelse)
infixl 2 or
val results is cmp ($f[fs], $f[ft])
val [L E G U] is
results = [(IEEE754.Comparison.float_lt) (IEEE754.Comparison.float_eq)
(IEEE754.Comparison.float_gt)(IEEE754.Comparison.unordered)]
in
Condition := bit [false (G or L or E or U) U (G or L or E)
E (G or L or U) (E or U) (G or L) L (G or E or L)
(L or U) (G or E) (L or E) (G or U) (L or E or U) G]
end
end
68 CHAPTER 7. DESCRIBING THE MIPS R?000
Note the manual says only the s and d formats are valid for these instructions.
68a 〈instruction defaults 62c〉+≡ (70b) ⊳ 66f 68b ⊲
c^"."^[f t un or eq neq ueq ogl olt uge ult oge ole ugt ule ogt]^"."^[s d] (fs,ft)
is [ Fcmp.f Fcmp.t Fcmp.un Fcmp.or Fcmp.eq Fcmp.neq Fcmp.ueq Fcmp.ogl
Fcmp.olt Fcmp.uge Fcmp.ult Fcmp.oge Fcmp.ole Fcmp.ugt Fcmp.ule Fcmp.ogt
] (fcmp [#32 #64]) (fs,ft)
signaling
There are signalling versions, as well. All will generate an exception if the operands are unordered.
Branching
The MIPS can perform branches on the truth or falsehood of the above set Condition field.
68b 〈instruction defaults 62c〉+≡ (70b) ⊳ 68a 68c ⊲
bc1t (reloc) is Condition = 1 --> nPC := reloc
bc1f (reloc) is Condition = 0 --> nPC := reloc
Conversions
We can convert from 32 and 64 bit fixed point integers to single and double floats:
68c 〈instruction defaults 62c〉+≡ (70b) ⊳ 68b 68d ⊲
cvt^"."^[s d]^"."^[w l] (fd,fs) is
$f[fd] := [single double] (i2f( [word dword] $f[fs], RM))
the reverse conversions:
68d 〈instruction defaults 62c〉+≡ (70b) ⊳ 68c 68e ⊲
cvt^"."^[w l]^"."^[s d] (fd,fs) is
$f[fd] := [word dword] (f2i([single double] $f[fs], IEEE754.Round.zero))
We can also convert from doubles to singles, and vice versa:
68e 〈instruction defaults 62c〉+≡ (70b) ⊳ 68d 69a ⊲
cvt^"."^s^"."^d (fd,fs) is $f[fd] := single (f2f(double $f[fs],RM))
cvt^"."^d^"."^s (fd,fs) is $f[fd] := double (f2f(single $f[fs],RM))
Chris Milner reports that these are the correct floating point conversion constructors for the MIPS:
constructor cvt_s_l (fd, fs)
constructor cvt_s_w (fd, fs)
constructor cvt_s_d (fd, fs)
constructor cvt_d_l (fd, fs)
constructor cvt_d_w (fd, fs)
constructor cvt_d_s (fd, fs)
constructor cvt_w_s (fd, fs)
constructor cvt_w_d (fd, fs)
constructor cvt_l_d (fd, fs)
constructor cvt_l_s (fd, fs)
7.5. COPROCESSOR INSTRUCTIONS 69
Rounding
Moving to and from the FPU
We can move registers back and forth...
69a 〈instruction defaults 62c〉+≡ (70b) ⊳ 68e 69b ⊲
mtc1 (rt,rd) is $f[rd] := $r[rt]
mfc1 (rt,rd) is $r[rt] := $f[rd]
how about some loads and stores
69b 〈instruction defaults 62c〉+≡ (70b) ⊳ 69a 69c ⊲
lwc1 (ft,offset,base) is $f[ft] := single $m[$r[base] + sx offset]
"l.d"(ft,offset,base) is $f[ft] := double $m[$r[base] + sx offset]
swc1 (ft,offset,base) is $m[$r[base] + sx offset] := single $f[ft]
"s.d"(ft,offset,base) is $m[$r[base] + sx offset] := double $f[ft]
and the control word, too
69c 〈instruction defaults 62c〉+≡ (70b) ⊳ 69b 69d ⊲
cfc1(rt,rd) is $r[rt] := $c[rd]
ctc1(rt,rd) is $c[rd] := $r[rt]
computation
69d 〈instruction defaults 62c〉+≡ (70b) ⊳ 69c 69e ⊲
[add sub div mul]^"."^[s d] (fd,fs,ft) is
$f[fd] := [fadd fsub fdiv fmul] [#32 #64] ($f[fs],$f[ft],RM)
Let’s do those with two arguments
69e 〈instruction defaults 62c〉+≡ (70b) ⊳ 69d 69g ⊲
[abs neg]^"."^[s d](fd,fs) is $f[fd] := [fabs fneg] [#32 #64] $f[fs]
mov^"."^[s d](fd,fs) is $f[fd] := [single double] $f[fs]
7.5 Coprocessor instructions
Several instructions exist to help move/load/store words to and from the various coprocessors. Coprocessor0 is the system control processor.
69f 〈MIPS basics 60a〉+≡ (70b) ⊳ 67d
storage ’p’ alias CP0 is 32 cells of 32 bits
called "System Control Coprocessor Registers"
storage ’q’ alias CP2 is 32 cells of 32 bits called "Coprocessor 2"
storage ’v’ alias CP3 is 32 cells of 32 bits called "Coprocessor 3"
Its move, load and store instructions are:
69g 〈instruction defaults 62c〉+≡ (70b) ⊳ 69e
mfc0(rt,rd) is $r[rt] := $CP0[rd]
mtc0(rt,rd) is $CP0[rt] := $r[rd]
lwc0(ft,offset,base) is $CP0[ft] := $m[$r[base] + sx offset]
-- note: the virtual address should be word aligned for lw and sw
swc0(ft,offset,base) is $m[$r[base] + sx offset] := $CP0[ft]
70 CHAPTER 7. DESCRIBING THE MIPS R?000
There are other coprocessors. Coprocessor 1 is the FPU and will be covered later. Coprocessor 2 and 3could be defined in the same way, along with their corresponding loads/stores and moves. We can’t factorthis easily,like this:
70a 〈imaginary code 70a〉≡swc^["0" "2" "3"] is
$m[$r[base] + sc offset] := $CP^["0" "2" "3"][ft]
so we won’t bother.
7.6 Miscellaneous
Here we would describe syscall, the TLB instructions and Trap instruction.
7.7 Putting it all together
This code chunk is expanded into our λ-RTL description of the MIPS.
70b 〈mips.rtl 70b〉≡module Mips is
import [RTL Vector]
from StdOperators import
[add and andalso bit bitInsert bool borrow carry com divu ltu mod
modu mul mulu ne not or orelse quot shl shrl shra subtract sx xor zx
--> := | ; = <> + - < <= > >= ? IEEE754 bitTransfer div
]
from StdOperators.IEEE754 import
[fadd fcmp fdiv f2f f2i fabs fmulx fsqrt i2f fmul fneg fsub NaN ]
from StdOperators.IEEE754.Comparison import
[unordered float_lt float_gt float_eq]
〈mips utilities 65b〉〈MIPS basics 60a〉〈address modes and operands 62a〉〈instruction defaults 62c〉
end
Code written to file mips.rtl.
7.8. WHAT THIS SPECIFICATION DOES NOT DO... 71
7.8 What this specification does not do...
So far, we are ignoring big/little endianness issues such as loads and unaligned loads.We also ignore the FR bit of the status register - this indicates if the float registers are even-odd pairs
(for doubles) or all addressible (singles).
7.9 SLED specification
Here is the SLED specification for the MIPS architecture:
71 〈mips.sled 71〉≡operand [ base fd fs ft rd rs rt shamt ] : #5 bits
operand offset : #16 bits
operand breakcode : #20 bits
constructor lb (rt, offset, base)
constructor lbu (rt, offset, base)
constructor lh (rt, offset, base)
constructor lhu (rt, offset, base)
constructor lw (rt, offset, base)
constructor lwl (rt, offset, base)
constructor lwr (rt, offset, base)
constructor sb (rt, offset, base)
constructor sh (rt, offset, base)
constructor sw (rt, offset, base)
constructor swl (rt, offset, base)
constructor swr (rt, offset, base)
constructor addi (rt, rs, offset)
constructor addiu (rt, rs, offset)
constructor slti (rt, rs, offset)
constructor sltiu (rt, rs, offset)
constructor andi (rt, rs, offset)
constructor ori (rt, rs, offset)
constructor xori (rt, rs, offset)
constructor lui (rt, offset)
constructor add (rd, rs, rt)
constructor addu (rd, rs, rt)
constructor sub (rd, rs, rt)
constructor suburd, rs, rt)
constructor slt (rd, rs, rt)
constructor sltu (rd, rs, rt)
constructor and (rd, rs, rt)
constructor or (rd, rs, rt)
constructor xor (rd, rs, rt)
constructor nor (rd, rs, rt)
constructor sll (rd, rt, shamt)
constructor srl (rd, rt, shamt)
constructor sra (rd, rt, shamt)
constructor sllv (rd, rt, rs)
constructor srlv (rd, rt, rs)
72 CHAPTER 7. DESCRIBING THE MIPS R?000
constructor srav (rd, rt, rs)
constructor mult (rs, rt)
constructor multu (rs, rt)
constructor div (rs, rt)
constructor divu (rs, rt)
constructor mfhi (rd)
constructor mflo (rd)
constructor mthi (rs)
constructor mtlo (rs)
constructor syscall ()
constructor break (breakcode)
constructor lwc0 (ft, offset, base)
constructor lwc1 (ft, offset, base)
constructor lwc2 (ft, offset, base)
constructor lwc3 (ft, offset, base)
constructor swc0 (ft, offset, base)
constructor swc1 (ft, offset, base)
constructor swc2 (ft, offset, base)
constructor swc3 (ft, offset, base)
constructor jr (rs)
constructor jalr (rd, rs)
constructor blez (rs, reloc)
constructor bgtz (rs, reloc)
constructor bltz (rs, reloc)
constructor bgez (rs, reloc)
constructor bltzal (rs, reloc)
constructor bgezal (rs, reloc)
constructor beq (rs, rt, reloc)
constructor bne (rs, rt, reloc)
constructor j (reloc)
constructor jal (reloc)
constructor add.s (fd, fs, ft)
constructor add.w (fd, fs, ft)
constructor add.d (fd, fs, ft)
constructor div.s (fd, fs, ft)
constructor div.w (fd, fs, ft)
constructor div.d (fd, fs, ft)
constructor mul.s (fd, fs, ft)
constructor mul.w (fd, fs, ft)
constructor mul.d (fd, fs, ft)
constructor sub.s (fd, fs, ft)
constructor sub.w (fd, fs, ft)
constructor sub.d (fd, fs, ft)
constructor abs.s (fd, fs)
constructor abs.w (fd, fs)
constructor abs.d (fd, fs)
constructor mov.s (fd, fs)
constructor mov.w (fd, fs)
constructor mov.d (fd, fs)
constructor neg.s (fd, fs)
7.9. SLED SPECIFICATION 73
constructor neg.w (fd, fs)
constructor neg.d (fd, fs)
constructor mfc1 (ft, rd)
constructor mtc1 (rt, fd)
constructor cfc1 (rt, fs)
constructor ctc1 (rt, fs)
constructor c.f.s (fs, ft)
constructor c.f.w (fs, ft)
constructor c.f.d (fs, ft)
constructor c.un.s (fs, ft)
constructor c.un.w (fs, ft)
constructor c.un.d (fs, ft)
constructor c.eq.s (fs, ft)
constructor c.eq.w (fs, ft)
constructor c.eq.d (fs, ft)
constructor c.ueq.s (fs, ft)
constructor c.ueq.w (fs, ft)
constructor c.ueq.d (fs, ft)
constructor c.olt.s (fs, ft)
constructor c.olt.w (fs, ft)
constructor c.olt.d (fs, ft)
constructor c.ult.s (fs, ft)
constructor c.ult.w (fs, ft)
constructor c.ult.d (fs, ft)
constructor c.ole.s (fs, ft)
constructor c.ole.w (fs, ft)
constructor c.ole.d (fs, ft)
constructor c.ule.s (fs, ft)
constructor c.ule.w (fs, ft)
constructor c.ule.d (fs, ft)
constructor c.sf.s (fs, ft)
constructor c.sf.w (fs, ft)
constructor c.sf.d (fs, ft)
constructor c.ngle.s (fs, ft)
constructor c.ngle.w (fs, ft)
constructor c.ngle.d (fs, ft)
constructor c.seq.s (fs, ft)
constructor c.seq.w (fs, ft)
constructor c.seq.d (fs, ft)
constructor c.ngl.s (fs, ft)
constructor c.ngl.w (fs, ft)
constructor c.ngl.d (fs, ft)
constructor c.lt.s (fs, ft)
constructor c.lt.w (fs, ft)
constructor c.lt.d (fs, ft)
constructor c.nge.s (fs, ft)
constructor c.nge.w (fs, ft)
constructor c.nge.d (fs, ft)
constructor c.le.s (fs, ft)
constructor c.le.w (fs, ft)
74 CHAPTER 7. DESCRIBING THE MIPS R?000
constructor c.le.d (fs, ft)
constructor c.ngt.s (fs, ft)
constructor c.ngt.w (fs, ft)
constructor c.ngt.d (fs, ft)
constructor cvt.s.s (fd, fs)
constructor cvt.s.w (fd, fs)
constructor cvt.s.d (fd, fs)
constructor cvt.d.s (fd, fs)
constructor cvt.d.w (fd, fs)
constructor cvt.d.d (fd, fs)
constructor cvt.w.s (fd, fs)
constructor cvt.w.w (fd, fs)
constructor cvt.w.d (fd, fs)
constructor bc1f (reloc)
constructor bc1t (reloc)
Code written to file mips.sled.
Chapter 8
Describing the Alpha
In this document any reference to an “UNPREDICTABLE” effect refers to the ALPHA Architecture Hand-book definition of the word, which is that the effect is implementation defined. Implementation definedrefers to the implementation of the DEC ALPHA chip. UNPREDICTABLE further implies that the effectsmay change ’moment to moment.’ It guarantees that no-one may create a security hole, hang the system,or halt the system. That’s all it guarantees. As for how we will define this in a lambda-RTL is yet to bedefined.
8.1 Storage
Memory
Memory storage on the alpha consists of little endian byte addressed cells in the default case. There is alsothe possibility that someone will build a big endian system using the Alpha’s capability of being set intobig endian mode at boot time. If so, we wish to permit a machine description file for a big-endian alpha tobe created merely by changing a single flag, littleEndian, to FALSE. The ALPHA takes a purely hardwareapproach to switching endianness. Compilers always generate little endian code. If the machine is booted asbig endian, the load and store instructions provide the starting address for any non-64-bit load or store as ifthe machine were little endian, but the hardware changes the last three bits of the virtual address to reflectthe true big endian nature of the system. Thus, the 32-bit long words within a 64-bit quad word can beswapped merely by inverting the third bit from the right in the virtual address (i.e. adding or decrementingthe virtual address by 4.) Similar reflections can be done for words within a quad word, or for bytes.
75 〈storage 75〉≡ (80c) 76a ⊲
storage
’m’ is cells of 8 bits called "memory"
aggregate using if littleEndian then RTL.AGGL else RTL.AGGB fi
75
76 CHAPTER 8. DESCRIBING THE ALPHA
Explicit Size Depiction
A reference to a cell like $m[a] normally means the cell by itself—in this case, a single byte in memorylocated at address a. But sometimes we want to talk about a 32-bit long word at address a, or even a 16-bitword, a quad-word. Often, λ-RTL can figure out what is meant just from the context in which $m[a] isused—but when it can’t, an explicit cast can be used to make the size explicit, thus: $m[a] : #32 loc.Because the syntax of the casts can be awkward, we define functions that do the casting.
76a 〈storage 75〉+≡ (80c) ⊳ 75 76b ⊲
fun byte x is x : #8 loc
fun word x is x : #16 loc
fun long x is x : #32 loc
fun quad x is x : #64 loc
fun octa x is x : #128 loc
Given these functions, the result of, e.g., quad is guaranteed to be a 64-bit location, even if quad is appliedto an 8-bit byte in memory.
This bit is unexplained.
76b 〈storage 75〉+≡ (80c) ⊳ 76a 76c ⊲
module Mem is
infixl 3 xor
fun byte a is byte $m[if littleEndian then a else a xor 0b111 fi]
fun word a is word $m[if littleEndian then a else a xor 0b110 fi]
fun long a is long $m[if littleEndian then a else a xor 0b100 fi]
fun quad a is quad $m[a]
end
Integer Registers
There are 32 integer registers, each is 64 bits
76c 〈storage 75〉+≡ (80c) ⊳ 76b 77a ⊲
storage
’r’ is 32 cells of 64 bits called "integer registers"
〈ALPHA register fetch and store methods 76d〉
Register r31 is always zero when read. A write to r31 is ignored. Whether the entire instruction generatingthe write to r31 is ignored, or just the write to r31 effect, is “unpredictable.” [3.1.2 ALPHA ArchitectureHandbook]. For now our description doesn’t deal with the possibility that an exception might be generatedby a store to r31. Instead we simply ignore stores to r31.
76d 〈ALPHA register fetch and store methods 76d〉≡ (76c)
fetch using \n. if n = 31 then 0 else RTL.TRUE_FETCH ($r[n]) fi
store using \(n, v). n <> 31 --> RTL.TRUE_STORE ($r[n], v)
8.2. ALPHA INSTRUCTIONS 77
Floating Point Registers
There are 32 floating point registers; each is 64 bits.
77a 〈storage 75〉+≡ (80c) ⊳ 76c 77b ⊲
storage
’f’ is 32 cells of 64 bits called "floating point registers"
aggregate using RTL.AGGL
fetch using \n. if n = 31 then 0 else RTL.TRUE_FETCH ($f[n]) fi
store using \(n, v). n <> 31 --> RTL.TRUE_STORE ($f[n], v)
The semantics for storing and loading from $f[31] are the same for floating point operations as for integeroperations on r[31].
Special-Purpose Registers
Lock Registers
The lock registers.
77b 〈storage 75〉+≡ (80c) ⊳ 77a 77c ⊲
storage ’l’ alias lockFlag is 1 cell of 1 bit called "Lock Flag"
storage ’a’ alias lockPAR is 1 cell of 64 bits called "Lock Physical Address Register"
Processor Cycle Counter
77c 〈storage 75〉+≡ (80c) ⊳ 77b 77d ⊲
storage ’c’ alias PCC is 1 cell of 64 bits called "Processor Cycle Counter"
locations
[PCC_OFF PCC_CNT] is $c[0]@loc[32 bits at [0 32]]
Program Counter
77d 〈storage 75〉+≡ (80c) ⊳ 77c
storage ’p’ alias PC is 1 cell of 64 bits called "Program Counter"
--- fetch using \n. and (RTL.TRUE_FETCH($p[0]), (neg 0x3))
8.2 ALPHA instructions
Types of ALPHA operands
Alpha instructions consist of 3-tuple form instructions (2 source, one destination).
77e 〈address modes and operands 77e〉≡ (80c) 78a ⊲
operand [ fa fb fc ra rb rc ] : #5 bits
operand imm : #8 bits
operand mhint : #14 bits
operand mdisp : #16 bits
operand palfunc : #26 bits
operand target : #64 bits
78 CHAPTER 8. DESCRIBING THE ALPHA
ALPHA addresses and operands
The ALPHA displaced addressing mode adds a sign extended 16 bit immediate field value to the contentsof register designated by the rb field.
78a 〈address modes and operands 77e〉+≡ (80c) ⊳ 77e 78b ⊲
operand address : #64 bits
fun dispA (rb, mdisp) is $r[rb] + sx mdisp
78b 〈address modes and operands 77e〉+≡ (80c) ⊳ 78a
operand reg_or_imm : #64 bits
default attribute of
rmode(rb) : reg_or_imm is $r[rb]
imode(imm) : reg_or_imm is sx imm
A Useful Composition Function
A composition function on two functions with a single argument will be of use in mixing type informationwith sign extension functions.
78c 〈composition function 78c〉≡ (80c)
fun o(f,g) is \x.f(g(x))
infixl 3 o
Load Instructions
Load Memory Data into Integer Register
There are four size load instructions from memory to registers. The word and byte load instructions zeroextend to the 64 bit register, while the long word variant sign extends.
78d 〈instructions 78d〉≡ (80c) 78e ⊲
default attribute of
ld^[q l wu bu] (ra, mdisp, rb) is
$r[ra] := [(Mem.quad) (sx o Mem.long) (zx o Mem.word) (zx o Mem.byte)
] (dispA(rb, mdisp))
Load the aligned included Quad Word
78e 〈instructions 78d〉+≡ (80c) ⊳ 78d 78f ⊲
default attribute of
ldq_u(ra, mdisp, rb) is
$r[ra] := Mem.quad (and (dispA(rb, mdisp), (com 7)))
Load address (computes effective address)
78f 〈instructions 78d〉+≡ (80c) ⊳ 78e 79c ⊲
default attribute of
lda (ra, mdisp, rb) is $r[ra] := dispA(rb, mdisp)
lda_h (ra, mdisp, rb) is $r[ra] := $r[rb] + shl (sx mdisp, 16)
8.2. ALPHA INSTRUCTIONS 79
The chunks named 〈instructions 78d〉 define the default attributes of instructions. We have chosen to usethe default attributes to specify the “main effect” of these instructions, but there are circumstances in whichthe processor traps instead of executing this main effect. Traps are not relevant to all specifications, so wehave chosen to give the trap semantics separately, by binding them to an attribute named trap. Again,with a proper modules system, trap semantics could be omitted from some specifications.
All loads except byte loads could trap if the address is improperly aligned.
79a 〈trap semantics 79a〉≡ (80c) 79d ⊲
ld^[q l wu] (ra, mdisp, rb) is alignTrap(dispA(rb, mdisp), [8 4 2])
79b 〈ALPHA utilities 79b〉≡ (80c)
fun alignTrap (address, k) is
address modu k <> 0 --> trap(mem_address_not_aligned)
The interaction of the trap semantics with the main semantics is not specified in λ-RTL.
Store Instructions
Store to a Memory Location from a Register
79c 〈instructions 78d〉+≡ (80c) ⊳ 78f 79e ⊲
stq(ra, mdisp, rb) is Mem.quad (dispA(rb, mdisp)) := $r[ra]
stl(ra, mdisp, rb) is Mem.long (dispA(rb, mdisp)) := $r[ra]@bits[32 bits at 0]
stw(ra, mdisp, rb) is Mem.word (dispA(rb, mdisp)) := $r[ra]@bits[16 bits at 0]
stb(ra, mdisp, rb) is Mem.byte (dispA(rb, mdisp)) := $r[ra]@bits[8 bits at 0]
79d 〈trap semantics 79a〉+≡ (80c) ⊳ 79a
st^[q l wu] (ra, mdisp, rb) is alignTrap(dispA(rb, mdisp), [8 4 2])
Store a Register to the Aligned Quad Word below the Unaligned Address
79e 〈instructions 78d〉+≡ (80c) ⊳ 79c 79f ⊲
stq_u(ra, mdisp, rb) is Mem.quad (and (dispA(rb, mdisp), com 7)) := $r[ra]
Logical Operations
Standard Logical Functions
79f 〈instructions 78d〉+≡ (80c) ⊳ 79e 80b ⊲
[and bis xor](ra, reg_or_imm, rc) is $r[rc] := [and or xor] ($r[ra], reg_or_imm)
[bic ornot eqv](ra, reg_or_imm, rc) is $r[rc] := [and or xor] ($r[ra], com reg_or_imm)
Add instructions
Our treatment of the add instructions is similar to our treatment of the logical instructions, except theseinstructions can cause overflow. Overflow semantics is a bit tricky, so in the [[simple]] version of thespecification we leave it abstract.
79g 〈ALPHA utilities [[simple]] 79g〉≡rtlop add_overflows : #n bits * #n bits -> bool
Conditional definition.
80 CHAPTER 8. DESCRIBING THE ALPHA
80a 〈ALPHA utilities [[full]] 80a〉≡fun iff (p, q) is if p then q else not q fi
fun add_overflows (x, y) is
let val result is x + y
infixn 0 iff
in (x < 0 iff y < 0) andalso (x < 0 iff result >= 0)
end
Conditional definition.
The sum goes in rc, plus there may be an overflow trap. When the /V suffix is provided with an addinstruction, it signals overflow, else overflow is ignored.
80b 〈instructions 78d〉+≡ (80c) ⊳ 79f
[addq addqv] (ra, reg_or_imm, rc) is $r[rc] := $r[ra] + reg_or_imm
[addl addlv] (ra, reg_or_imm, rc) is
$r[rc] := sx (($r[ra] + reg_or_imm)@bits[32 bits at 0])
8.3 Lambda RTL File
Our lambda rtl file needs to import a set of lambda rtl operators
80c 〈alpha.rtl 80c〉≡module Alpha is
import [RTL Vector]
from StdOperators import
[add and andalso bit bitInsert bool borrow carry com divu
modu mul mulu ne not or orelse quot shl shrl shra subtract sx xor zx neg
--> := | ; = <> + - <= >= < > ? ]
val littleEndian is true
〈composition function 78c〉〈storage 75〉〈address modes and operands 77e〉〈window dressing for trap semantics 80d〉〈ALPHA utilities 79b〉〈instructions 78d〉attribute trap of
〈trap semantics 79a〉
end
All that is left to do is to define what we mean by trap. Ideally, we would like a λ-RTL abstractionmechanism that would let us define trap as “an unspecified function from a 7-bit code to an effect.” Becauseλ-RTL lacks suitable abstraction mechanisms, we have to specify a concrete effect. Rather than try to specifythe complete semantics of traps, we model a trap as an assignment to a nonexistent location.
80d 〈window dressing for trap semantics 80d〉≡ (80c)
storage ’t’ is 1 cell of 7 bits called "trap code"
locations trap_code is $t[0]
fun trap k is $t[0] := k
8.3. LAMBDA RTL FILE 81
These bindings include only the “trap type” of a handful of traps. We have omitted trap priorities. Thetrap instruction uses a family of codes starting at 0x80.
81a 〈window dressing for trap semantics [[full]] 81a〉≡val [data_store_error
illegal_instruction
mem_address_not_aligned
tag_overflow
trap_instruction
] is [0x2b 0x02 0x07 0x0a 0x80] -- 7-bit constants to be passed to trap
val trap_instruction is \code.trap_instruction+code
Conditional definition.
In the simple version, we create an RTL operator for each trap code, so we can see them by name, not bynumber, in the processor supplement.
81b 〈window dressing for trap semantics [[simple]] 81b〉≡rtlop [data_store_error
illegal_instruction
mem_address_not_aligned
tag_overflow
] : #7 bits
rtlop trap_instruction : #7 bits -> #7 bits
Conditional definition.
Chapter 9
Describing the PDP-8
9.1 Introduction
9.2 Storage Spaces
Main memory on the PDP-8 consists of 4096 12-bit words of magnetic core. The machine’s ISA is organizedaround a 12-bit accumulator that is involved in all arithmetic and logic operations. A 1-bit link registerallows the programmer to extend the arithmetic precision beyond that available through single primitiveoperations. A 12-bit switch register is provided to capture values from the machine’s assortment of frontpanel switches. Last but not least, the PDP-8 has a 12-bit program counter.
Here is the λ-RTL required to model these storage spaces and attach to them the names provided by thearchitecture manual.
82 〈storage spaces 82〉≡ (85c)
storage
’m’ is 4096 cells of 12 bits called "memory"
’a’ is 1 cells of 12 bits called "accumulator"
’s’ is 1 cells of 12 bits called "switch register"
’p’ is 1 cells of 12 bits called "program counter"
’l’ is 1 cells of 1 bits called "link register"
’r’ is 1 cells of 1 bits called "run flip-flop"
locations
PC is $p[0]
AC is $a[0]
L is $l[0]
SR is $s[0]
RUN is $r[0]
82
9.3. OPERATIONS 83
Addressing modes on the PDP-8 are relatively simple. Memory references are either absolute or indirect.Certain indirect references automatically increment the pointer in memory used for indirection. This iscalled “auto-indexing” in the PDP-8 manual.
In λ − RTL we treat memory references as pairs. The first member of the pair is an address, and thesecond is an effect associated with the reference. This is used to handle “auto-indexing”. The effect for theother two addressing modes is RTL.SKIP or the null effect.
Here’s the λ−RTL describing memory references.
83a 〈memory references 83a〉≡ (85c)
operand addr : #12 bits
operand memref : #12 bits * effect
default attribute of
absolute(addr) is (addr, RTL.SKIP)
indirect(addr) is ($m[addr], RTL.SKIP)
autoinc(addr) is ($m[addr] + 1, $m[addr] := $m[addr] + 1)
We assume that addresses are within appropriate ranges for their context.
9.3 Operations
As with memory, the PDP-8 has a simple ISA. Operations can be divided into two categories. The first areunary operators involving memory references. The second are nullary operators that involve the accumulator,link register, or switch register.
We will describe the unary operators first. Each unary operator takes a memory reference. We can modelthese operations as a memory reference, which you will recall is a location and an effect, and a right handside which describes the operation’s primary effect.
Here is the λ-RTL for modeling a unary opertion. Notice that the effect from the memory reference musttake place before the primary effect of the operation. Also notice that the parameter rhs is not an effect,but rather a function taking a location and producing an effect.
83b 〈operations 83b〉≡ (85c) 83c ⊲
fun unary_op((loc,memaction), rhs) is
( memaction ; rhs loc )
The primary effect of all of the unary operations is to fetch a value from memory, perform an operation onthis value and the value of the accumulator, and store the result in the accumulator. The following λ-RTLdescribes the effect that an operation on memory and the accumulator, has on the machine state. Noticethat we have written this function in curried style. The reason for this will be clearer in a moment.
83c 〈operations 83b〉+≡ (85c) ⊳ 83b 84a ⊲
fun accum_effect op loc is AC := op(AC,$m[loc])
84 CHAPTER 9. DESCRIBING THE PDP-8
Now, with our two functions in hand, we can make short work of the unary operations. There are a coupleof things worth mentioning here. First is that unary op takes a tuple whose second element is a functiontaking a location and producing an effect. It should now be obvious why we wrote accum effect as a curriedfunction.
We use the operation mnemonics from the PDP-8 manual wherever possible.
84a 〈operations 83b〉+≡ (85c) ⊳ 83c 84b ⊲
default attribute of
and(memref) is unary_op(memref,accum_effect and)
tad(memref) is unary_op(memref,accum_effect (+))
dca(memref) is unary_op(memref,\loc.$m[loc] := AC)
The next operation merits discussion. It increments the value given by the memory reference. If theresulting value is zero, the following instruction is skipped. We have used guarded assignment to theprogram counter to model this effect.
84b 〈operations 83b〉+≡ (85c) ⊳ 84a 84c ⊲
isz(memref) is
unary_op(memref,
\loc.let val v is $m[loc] + 1
in ($m[loc] := v | v = 0 --> PC := PC + 2)
end
)
The next routines describe the behaviour of subroutine calls and unconditional jumps. The operation jmstransfers control to the subroutine whose address is given by the memory reference. The return address isstored in the following memory location. This means that PDP-8 programs are not intended to recurse, andseems weird in general.
The effect of the unconditional jump should be obvious.
84c 〈operations 83b〉+≡ (85c) ⊳ 84b 84d ⊲
jms(memref) is
unary_op(memref,\loc.$m[loc]:=PC | PC:=loc+1)
jmp(memref) is
unary_op(memref,\loc.PC:=loc)
The remaining PDP-8 operations do not involve memory references. They are nullary operations thateffect the accumulator, the link register or the switch register.
Most of these operations have simple effects that should be obvious from the λ-RTL. The exceptions tothis are the logical rotate instructions.
The PDP-8 specifies that bit 0 is the most significant bit in a 12-bit word. God only knows why thisis the case, but it is. In λ-RTL, bit 0 is always the least significant bit. So the rotates below might seembackwards, but they’re not. Also notice that values are rotated through the link register. This means thatwe can write the rotations as the simultaneouse composition of three assignment effects.
84d 〈operations 83b〉+≡ (85c) ⊳ 84c 85a ⊲
val rar is (L := AC@loc[0] | AC@loc[0..10] := AC@loc[1..11] | AC@loc[11] := L)
val ral is (L := AC@loc[11] | AC@loc[1..11] := AC@loc[0..10] | AC@loc[0] := L)
9.4. PUTTING IT ALL TOGETHER 85
There are also a set of nullary skip operations that skip the next instruction when some boolean conditionis true. We model these operations with a function skip op that takes a condition and then does a guardedupdate to the program counter. Perhaps this is a good time to point out that we assume that each operationin this description involves the implicit increment of the program counter by 1, unless we specify otherwise.We could write this fact down in the description, but we won’t bother.
85a 〈operations 83b〉+≡ (85c) ⊳ 84d 85b ⊲
fun skip_op(cond) is cond --> PC := PC + 2
The rest of the nullary operators are simple and we won’t try to describe them here.
85b 〈operations 83b〉+≡ (85c) ⊳ 85a
val iac is AC := AC + 1
val cma is AC := com AC
default attribute of
iac is iac
nop is RTL.SKIP
rar is rar
rtr is rar; rar
ral is ral
rtl is ral; ral
cml is L := com L
cma is cma
cia is cma; iac
cll is L := 0
stl is L := 1
cla is AC := 0
sta is AC := 4095
osr is AC := or(AC,SR)
halt is RUN := 0
skp is skip_op(true)
snl is skip_op(L = 1)
szl is skip_op(L = 0)
sza is skip_op(AC = 0)
sna is skip_op(AC <> 0)
spa is skip_op(AC@loc[0] = 0)
9.4 Putting it all together
85c 〈pdp8.lrtl 85c〉≡ (86)
module PDP8 is
import [RTL Vector]
from StdOperators import [and or com | + = := --> ; <>]
〈storage spaces 82〉〈memory references 83a〉〈operations 83b〉
end
86 CHAPTER 9. DESCRIBING THE PDP-8
86 〈pdp8.rtl 86〉≡〈pdp8.lrtl 85c〉
Chapter 10
Describing MMIX
This document attempts to describe the MMIX computer, as explained in Fasciscle 1 of TheArt of Computer Programming. This Fascicle, and other information, may be found athttp://www-cs-faculty.stanford.edu/~knuth/mmix-news.html.
Describing this machine is a real challenge, not so much because of the features of the machine itself(though some of the register behavior is baroque), but because Knuth tends to specify all computation overthe rationals, mapping from bit vectors to integers when taking values out of storage spaces, and mappingfrom integers to bit vectors when putting values into storage spaces. This style is inimical to λ-RTL, becauseλ-RTL specifies all computations over bit vectors. To consider just one example, Knuth specifies all additionsusing “God’s addition” over the integers. In λ-RTL, we must say how many bits are used in each addition,and our treatment of overflow must differ, as we must fool with the carry bit.1 One can imagine it mightnot be completely trivial to show that our specification matches his.
10.1 Storage spaces: memory and registers
MMIX is a big-endian machine. The λ-RTL translator has a bug that prevents it from swallowing largeinteger literals, so I’ve left out the size of the memory (264 cells).
87 〈MMIX basics 87〉≡ (91d) 88a ⊲
storage ’m’ is cells of 8 bits called "memory"
aggregate using RTL.AGGB
1This mismatch is not the result of caprice or accident on either party’s part. The goal of Knuth’s specification is concisenessand clarity, and the notations and techniques of mathematics serve that goal. For example, several specifications can be clearerif intermediate values are allowed to lie in the rationals as well as the integers (never mind bit vectors). To take anotherexample, all specifications are clearer if the + symbol has just one meaning: mathematical addition. But the purpose of aλ-RTL specification is to enable automatic generation of machine-dependent tools, and the tool generators want to know whatoperations to use without having to reason about rational-valued intermediate results. It must be easy to figure out whichinstructions do 16-bit adds and which do 32-bit adds, without having to study all of the contexts in which those adds appear.
87
88 CHAPTER 10. DESCRIBING MMIX
Knuth provides special notations for aggregates. These notations ignore low address bits.
88a 〈MMIX basics 87〉+≡ (91d) ⊳ 87 88b ⊲
infixl 7 and
infixl 6 or
fun M a is $m[a] : #8 loc
fun M2 a is $m[a and (com 1)] : #16 loc
fun M4 a is $m[a and (com 3)] : #32 loc
fun M8 a is $m[a and (com 7)] : #64 loc
fun M1 a is M a
MMIX has 256 general-purpose registers.
88b 〈MMIX basics 87〉+≡ (91d) ⊳ 88a 88c ⊲
storage ’r’ is 256 cells of 64 bits called "registers"
MMIX has 32 special-purpose registers. The codes come from page 21 of the fascicle.
88c 〈MMIX basics 87〉+≡ (91d) ⊳ 88b 88d ⊲
storage ’s’ is 32 cells of 64 bits called "special registers"
locations
[ rA rB rC rD rE rF rG rH rI rJ rK rL rM rN rO rP rQ rR rS rT rU rV rW rX rY rZ ] is
$s[[21 0 8 1 2 22 19 3 12 4 15 20 5 9 10 23 16 6 11 13 17 18 24 25 26 27]]
[ rBB rTT rWW rXX rYY rZZ ] is $s[[ 7 14 28 29 30 31]]
The program counter appears to be distinct from all these other registers
88d 〈MMIX basics 87〉+≡ (91d) ⊳ 88c 88e ⊲
storage ’p’ is 1 cell of 64 bits called "program counter"
locations PC is $p[0]
10.2 MMIX instructions
I can’t define a ready equivalent for the s($X) ← N notation, because in Knuth’s world, N is an integer,whereas in mine, N is a 64-bit value.
Trips and traps
It’s clear that λ-RTL is not capable of describing MMIX’s interrupts. Even the specification in section 34of the detailed MMIX document is not entirely clear; I suspect it will be necessary to go look at the sourcecode.
88e 〈MMIX basics 87〉+≡ (91d) ⊳ 88d 89a ⊲
module Interrupt is
val trip is
rX := ? -- current instruction, sometimes (e.g., M4 PC)
| rY := ? -- an operand, sometimes
| rZ := ? -- another operand, sometimes
| rW := ? -- the successor instruction, sometimes
end
10.2. MMIX INSTRUCTIONS 89
Integer overflow
It’s not clear exactly what effect overflow has on the state of the MMIX machine. I note the following fromtrips and traps.
89a 〈MMIX basics 87〉+≡ (91d) ⊳ 88e 89b ⊲
module Event is
locations
event is rA @loc [8 bits at 0]
enable is rA @loc [8 bits at 8]
fun occur ? is
if enable and ? = 0 then
event := event or ?
else
Interrupt.trip
fi
module Mask is
val [D V W I O U Z X] is [0x80 0x40 0x20 0x10 0x08 0x04 0x02 0x01]
end
val [D V W I O U Z X] is occur [0x80 0x40 0x20 0x10 0x08 0x04 0x02 0x01]
end
We can now define some overflows.
89b 〈MMIX basics 87〉+≡ (91d) ⊳ 89a 89c ⊲
module Ovfl is
-- overflow because a value won’t fit
fun noFit n pow2 is n >= pow2 orelse n < neg pow2
fun o1 n is noFit n 0x100 --> Event.O
fun o2 n is noFit n 0x10000 --> Event.O
fun o4 n is noFit n 0x100000000 --> Event.O
fun o8 n is noFit n 0x10000000000000000 --> Event.O
-- overflows for addition and subtraction
fun sign x is bit (x < 0)
fun add x y is
let val s is x + y
in sign x = sign y andalso sign x <> sign s --> Event.O
end
fun sub x y is
let val d is x - y
in sign x <> sign y andalso sign x <> sign d --> Event.O
end
end
Operands
The X , Y , and Z operands are all 8 bits.
89c 〈MMIX basics 87〉+≡ (91d) ⊳ 89b 90a ⊲
operand [X Y Z] : #8 bits
90 CHAPTER 10. DESCRIBING MMIX
Addresses
We prefer to treat the address as a single operand, where it occurs.
90a 〈MMIX basics 87〉+≡ (91d) ⊳ 89c
operand A : #64 bits
default attribute of
Address (Y, Z) : A is $r[Y]+$r[Z]
-- IAddress (Y, Z) : A is $r[Y]+sx Z -- does this exist?
Loading and storing
90b 〈instruction defaults 90b〉≡ (91d) 90c ⊲
LD^[B W T O] (X, A) is $r[X] := sx ([M1 M2 M4 M8] A)
LD^[B W T O]^U (X, A) is $r[X] := zx ([M1 M2 M4 M8] A)
LDHT (X, A) is $r[X] := bitInsert {wide is 0, lsb is 32} (M4 A)
LDA (X, A) is $r[X] := A
Storing signed requires some overflow checks.
90c 〈instruction defaults 90b〉+≡ (91d) ⊳ 90b 90d ⊲
STB (X, A) is M1 A := $r[X] @bits [ 8 bits at 0] | Ovfl.o1 $r[X]
STW (X, A) is M2 A := $r[X] @bits [16 bits at 0] | Ovfl.o2 $r[X]
STT (X, A) is M4 A := $r[X] @bits [32 bits at 0] | Ovfl.o4 $r[X]
STO (X, A) is M8 A := $r[X]
Storing unsigned never overflows.
90d 〈instruction defaults 90b〉+≡ (91d) ⊳ 90c 90e ⊲
STBU (X, A) is M1 A := $r[X] @bits [ 8 bits at 0]
STWU (X, A) is M2 A := $r[X] @bits [16 bits at 0]
STTU (X, A) is M4 A := $r[X] @bits [32 bits at 0]
STOU (X, A) is M8 A := $r[X]
90e 〈instruction defaults 90b〉+≡ (91d) ⊳ 90d 90f ⊲
STHT (X, A) is M4 A := $r[X] @bits [32 bits at 32]
STCO (X, A) is M8 A := zx X
Arithmetic
90f 〈instruction defaults 90b〉+≡ (91d) ⊳ 90e 91a ⊲
ADD (X, Y, Z) is $r[X] := $r[Y] + $r[Z] | Ovfl.add $r[Y] $r[Z]
SUB (X, Y, Z) is $r[X] := $r[Y] - $r[Z] | Ovfl.sub $r[Y] $r[Z]
MUL (X, Y, Z) is
let val p is muls($r[Y], $r[Z])
in $r[X] := p@bits[64 bits at 0] | Ovfl.o8 p
end
DIV (X, Y, Z) is
if $r[Z] <> 0 then
$r[X] := $r[Y] divs $r[Z] | rR := $r[Y] mods $r[Z]
else
$r[X] := 0 | rR := $r[Y]
fi
10.3. PUTTING IT ALL TOGETHER 91
91a 〈instruction defaults 90b〉+≡ (91d) ⊳ 90f 91b ⊲
ADDU (X, Y, Z) is $r[X] := $r[Y] + $r[Z]
SUBU (X, Y, Z) is $r[X] := $r[Y] - $r[Z]
MULU (X, Y, Z) is
let val p is muls($r[Y], $r[Z])
in $r[X] := p@bits[64 bits at 0] | rH := p@bits[64 bits at 64]
end
DIV (X, Y, Z) is
if ugt($r[Z], rD) then
let val big is bitInsert {wide is zx $r[Y], lsb is 64} rD : #128 bits
in $r[X] := (big divu $r[Z])@bits[64 bits at 0] |
rR := (big modu $r[Z])@bits[64 bits at 0]
end
else
$r[X] := rD | rR := $r[Y]
fi
For address arithmetic, the definition using shifting is much easier than Knuth’s integers.
91b 〈instruction defaults 90b〉+≡ (91d) ⊳ 91a 91c ⊲
["2" "4" "8" "16"]^ADDU (X, Y, Z) is
$r[X] := shl(64, $r[Y], [1 2 3 4] : #64 bits) + $r[Z]
91c 〈instruction defaults 90b〉+≡ (91d) ⊳ 91b
NEG (X, Y, Z) is $r[X] := zx Y - $r[Z] | Ovfl.sub (zx Y) $r[Z]
NEGU (X, Y, Z) is $r[X] := zx Y - $r[Z]
10.3 Putting it all together
This code chunk is expanded into our λ-RTL description of the MMIX.
91d 〈mmix.rtl 91d〉≡module MMix is
import [RTL Vector]
from StdOperators import
[add and andalso bit bitInsert bool borrow carry com divu divs
modu mods muls mulu ne neg not or orelse quot shl shrl shra subtract sx ugt xor zx
--> := | ; = <> + - < <= > >= ? IEEE754
]
from StdOperators.IEEE754 import
[fadd fcmp fdiv f2f f2i fabs fmulx fsqrt i2f fmul fneg fsub]
〈MMIX basics 87〉default attribute of
〈instruction defaults 90b〉end
Code written to file mmix.rtl.
Chapter 11
Describing the PowerPC (byMatthew Fluet)
This document uses λ-RTL to specify the PPC instruction set.
11.1 Overview
The PPC module encapsulates the λ-RTL definition of the PPC instruction set.
92 〈ppc.rtl 92〉≡module PPC is
〈imports 93a〉〈extensions 93b〉
〈storage spaces 94a〉〈operands 98c〉〈instruction semantics 99b〉
end
Code written to file ppc.rtl.
92
11.1. OVERVIEW 93
The PPC module imports a standard set of RTL operators.
93a 〈imports 93a〉≡ (92)
import [RTL Vector]
from StdOperators import
[ := --> | ; ?
not conjoin disjoin
and or com xor
neg + carry - borrow mul mulu div divu mod quot rem
zx sx shl shrl shra rotl
bitInsert bitExtract bitTransfer
eq ne = >= > <= < <> geu gtu leu ltu
bit bool
]
from StdOperators.IEEE754 import
[ Round
fadd fdiv fmul fsub fmulx fneg
f2f f2i i2f
]
We extend the set of imported operators with some standard functions.It seems more natural to have all of the arithmetic comparisons use the same naming convention.
93b 〈extensions 93b〉≡ (92) 93c ⊲
val ge is (>=)
val gt is (>)
val le is (<=)
val lt is (<)
The PPC has primitive bitwise operations for “AND with complement,” “OR with complement,”“NAND,” “NOR,” and equivalent.
93c 〈extensions 93b〉+≡ (92) ⊳ 93b 93d ⊲
fun [andc orc](a, b) is [and or](a, com(b))
fun [nand nor](a, b) is com([and or](a, b))
fun eqv (a, b) is com(xor(a, b))
One can also define boolean operations corresponding to XOR and equivalent.
93d 〈extensions 93b〉+≡ (92) ⊳ 93c
fun xjoin (a, b) is disjoin(conjoin(a, not b), conjoin(not a, b))
fun eqjoin (a, b) is not(xjoin(a, b))
94 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
11.2 Storage spaces
Storage locations manipulated by PPC instructions include memory and several kinds of registers. λ-RTL
limitation: Storage space identifiers must be one character and have strange scoping rules: two storagespaces in different modules cannot share the same identifier, but storage space identifiers must be accessedby module selection in expressions.
94a 〈storage spaces 94a〉≡ (92)
〈Memory 94b〉〈General-Purpose Registers 94c〉〈Floating-Point Registers 95a〉〈Condition Register 95b〉〈Floating-Point Status and Control Register 96a〉〈XER Register 96b〉〈Link Register 97a〉〈Count Register 97b〉〈Instruction Addresses 97c〉〈Time Base Registers 98a〉〈Bogus 98b〉
Memory
Memory in the PPC is byte addressed, using big-endian byte order. λ-RTL limitation: Storage methods(e.g., aggregate using) are not implemented. Hence, any aggregation of the memory storage space is givena bogus aggregation.
94b 〈Memory 94b〉≡ (94a)
module Memory is
storage
’m’ is
cells of 8 bits
called "Memory"
aggregate using RTL.AGGB
fun mem n is let val n is n : #32 bits in $m[n] end
end
val mem is Memory.mem
General-purpose registers
The PPC has thirty-two 32-bit general purpose registers.
94c 〈General-Purpose Registers 94c〉≡ (94a)
module GPR is
storage
’g’ is
32 cells of 32 bits
called "General-Purpose Registers"
fun gpr n is $g[n]
end
val gpr is GPR.gpr
11.2. STORAGE SPACES 95
Floating-point registers
The PPC has thirty-two 64-bit floating point registers.
95a 〈Floating-Point Registers 95a〉≡ (94a)
module FPR is
storage
’f’ is
32 cells of 64 bits
called "Floating-Point Registers"
fun fpr n is $f[n]
end
val fpr is FPR.fpr
Condition register
The PPC has one 32-bit condition register. This register is further divided into eight 4-bit fields. The firsttwo fields can be the implicit results of integer and floating-point instructions respectively.
95b 〈Condition Register 95b〉≡ (94a)
module CR is
storage
’c’ is
1 cell of 32 bits
called "Condition Register"
val cr is $c[0]
fun crf n is $c[0]@loc[4 bits at mul(4 : #3 bits, n : #3 bits)]
fun crb n is $c[0]@loc[1 bit at n : #5 bits]
locations
[LT GT EQ SO] is crb [0..3]
[FX FEX VX OX] is crb [4..7]
end
val cr is CR.cr
val crf is CR.crf
val crb is CR.crb
96 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
Floating-point status and control register
The PPC has one 32-bit floating-point status and control register.
96a 〈Floating-Point Status and Control Register 96a〉≡ (94a)
module FPSCR is
storage
’F’ is
1 cell of 32 bits
called "Floating-Point Status and Control Register"
fun fpsrcb n is $F[0]@loc[1 bit at n : #5 bits]
locations
[FX FEX VX OX UX ZX XX] is fpsrcb([0..6])
[VXSNAN VXISI VXIDI VXZDZ VXIMZ VXVC] is fpsrcb([7..12])
[FR FI] is fpsrcb([13..14])
[FPRF] is $F[0]@loc[5 bits at [15] : #5 bits]
[C] is fpsrcb([15])
[FPCC] is $F[0]@loc[4 bits at [16] : #5 bits]
[_] is fpsrcb([20])
[VXSOFT VXSQRT VXCVI] is fpsrcb([21..23])
[VE OE UE ZE XE] is fpsrcb([24..28])
[NI] is fpsrcb([29])
[RN] is $F[0]@loc[2 bits at [30] : #5 bits]
end
XER register
The PPC has one 32-bit XER register.
96b 〈XER Register 96b〉≡ (94a)
module XER is
storage
’x’ is
1 cell of 32 bits
called "XER Register"
val xer is $x[0]
fun xerf n is $x[0]@loc[4 bits at mul(4 : #3 bits, n : #3 bits)]
fun xerb n is $x[0]@loc[1 bit at n : #5 bits]
locations
XER is $x[0]
[SO OV CA] is xerb [0 1 2]
[byte_count] is $x[0]@loc[7 bits at [25] : #5 bits]
end
val xer is XER.xer
val xerf is XER.xerf
11.2. STORAGE SPACES 97
Link register
The PPC has one 32-bit link register.
97a 〈Link Register 97a〉≡ (94a)
module LR is
storage
’L’ is
1 cell of 32 bits
called "Link Register"
locations
LR is $L[0]
end
val lr is LR.LR
Count register
The PPC has one 32-bit count register.
97b 〈Count Register 97b〉≡ (94a)
module CTR is
storage
’C’ is 1 cell of 32 bits
called "Count Register"
locations
CTR is $C[0]
end
val ctr is CTR.CTR
Instruction Addresses
The PPC does not specifically define a PC register. The architecture manual specifies branching and linkinginstructions in terms of cia (current instruction address) and nia (new instruction address). This distinctionis reflected by a storage space with two 32-bit cells.
97c 〈Instruction Addresses 97c〉≡ (94a)
module IA is
storage
’I’ is 2 cells of 32 bits
called "Instruction Addresses"
locations
CIA is $I[0]
NIA is $I[1]
end
val cia is IA.CIA
val nia is IA.NIA
98 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
Time base registers
The PPC virtual environment architecture defines an addition set of 32-bit time base registers.
98a 〈Time Base Registers 98a〉≡ (94a)
module TBR is
storage
’t’ is 2 cells of 32 bits
called "Time Base Registers"
locations
TBU is $t[0]
TBL is $t[1]
end
val tbu is TBR.TBU
val tbl is TBR.TBL
Bogus
The mtspr, mfspr, and mftb select a special purpose register based on one of the instruction operands. Incases where the operand does not encode a valid register, a 32-bit location may still be required. Hence, weintroduce a bogus storage space.
98b 〈Bogus 98b〉≡ (94a)
module Bogus is
storage
’b’ is
1 cell of 32 bits
called "Bogus"
val bogus is $b[0]
end
val bogus is Bogus.bogus
11.3 Operands
For the most part, the operands can be automatically derived automatically from the SLED file by lrtl
-operands ppc.sled.
98c 〈operands 98c〉≡ (92) 99a ⊲
operand L : #1 bits
operand [ crfD crfS ] : #3 bits
operand [ IMM SR ] : #4 bits
operand [ A B BI BO D MB ME NB S SH TO crbA crbB crbD fA fB fC fD fS ] : #5 bits
operand [ CRM FM ] : #8 bits
operand [ SIMM UIMM d ] : #16 bits
11.4. INSTRUCTION SEMANTICS 99
λ-RTL limitation: The following are not automatically generated. None of the operands appear in thefields of instruction (32) declaration in the SLED file. The reloc operand is a relocatable operand.The spr operand is a synthesized operand derived from the sprL and sprH fields (which, not explicitly usedas operands, are also not generated by lrtl -operands ppc.sled). The tbr is also a synthesized operand.
99a 〈operands 98c〉+≡ (92) ⊳ 98c
operand [ reloc ] : #32 bits
operand [ spr ] : #32 bits
operand [ tbr ] : #32 bits
11.4 Instruction semantics
We group the PPC instructions by functional categories.
99b 〈instruction semantics 99b〉≡ (92)
〈utilities 100a〉〈integer arithmetic instructions 102c〉〈integer compare instructions 107〉〈integer logical instructions 108〉〈integer rotate instructions 113〉〈integer shift instructions 114〉〈floating-point arithmetic instructions 115〉〈floating-point multiply-add instructions 116a〉〈floating-point rounding and conversion instructions 116b〉〈floating-point compare instructions 116c〉〈floating-point status and control register instructions 116d〉〈integer load instructions 117a〉〈integer store instructions 119b〉〈integer load and store multiple instructions 122b〉〈integer load and store string instructions 123c〉〈memory synchronization instructions 123d〉〈floating-point load instructions 123e〉〈floating-point store instructions 123f〉〈floating-point move instructions 124a〉〈branch instructions 124b〉〈condition register logical instructions 126a〉〈system linkage instructions 126c〉〈trap instructions 126d〉〈processor control instructions 126e〉〈cache management instructions 128c〉〈segment register manipulation instructions 128d〉〈lookaside buffer management instructions 128e〉〈external control instructions 129〉
100 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
Utilities
We define a number of auxilary functions for handling common cases.One common case is setting CR0 (field 0 of the condition register) corresponding to a result. λ-RTL
limitation: In this case CR.* are slices of one bit into the condition register. In order to give the illusionthat slices are locations that can be assigned to, Norman Ramsey suggests rewriting fetches and stores ofslice locations. That seems intuitive, but in a case like the one above, it seems that one would also need tochange the parallel assignment into a sequential assignement (and do some forward substitution); otherwise,one is left with a non-deterministic assigment of 4 values into the same location. Norman Ramsey’s reply“Yes, this is a serious long-term question. At the moment, we’ve extended the translator so that slices arelegitimate locations that can be assigned to. Exactly what we might do with the information, however, isanyone’s guess. For the time being, you can rely on seeing intermediate code that is more or less what youasked for.”
100a 〈utilities 100a〉≡ (99b) 100b ⊲
fun set_cr0 (res)
is (CR.LT := bit(res < 0) |
CR.GT := bit(res > 0) |
CR.EQ := bit(res = 0))
Another common case is setting the overflow, summary overflow, and carry bits of the CR and XERregisters.
100b 〈utilities 100a〉+≡ (99b) ⊳ 100a 101b ⊲
fun set_ov ov
is (ov --> (XER.SO := bit(true) |
CR.SO := bit(true)) |
XER.OV := bit(ov))
fun set_ca ca
is (XER.CA := bit(ca))
11.4. INSTRUCTION SEMANTICS 101
Some rotate and shift instructions compute a mask, having ones in positions x through y (wrapping ifx > y) and zeros elsewhere. The positions x and y can be expressed using 5 bits, but up to 32 bits might betransferred, so we zero extend the positions to 6 bits.
101a 〈utilities – broken 101a〉≡fun mask(x, y)
is
let
val x is zx #5 #6 x
val y is zx #5 #6 y
val m
is
if x < y
then bitTransfer #6 #32 {dst is {lsb is x,
value is 0x0},
src is {lsb is 0,
value is com 0x0},
width is y - x + 1}
else bitTransfer #6 #32 {dst is {lsb is x + 1,
value is com 0x0},
src is {lsb is 0,
value is 0x0},
width is x - y - 1}
fi
in
m
end
Code written to file utilities -- broken.
λ-RTL limitation: For some reason, the “Computing normal forms” pass of lrtl does not terminatewhen the auxilary function slw is defined using mask. The issue is with the expression mask(0, 31 - n).So, we define mask as an rtlop while investigating.
101b 〈utilities 100a〉+≡ (99b) ⊳ 100b 102a ⊲
rtlop mask : #5 bits * #5 bits -> #32 bits
102 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
Setting the floating-point status word is horribly complicated.
102a 〈utilities 100a〉+≡ (99b) ⊳ 101b 102b ⊲
fun rounding_mode (x)
is
let
val rn is FPSCR.RN
in
if rn = 0
then Round.nearest
else if rn = 1
then Round.zero
else if rn = 2
then Round.up
else Round.down
fi fi fi
end
fun set_cr1(res) is RTL.SKIP
Some branch instructions automatically word align a value before assigning it to nia.
102b 〈utilities 100a〉+≡ (99b) ⊳ 102a
fun align v
is bitInsert #5 #32 #2 {lsb is 30, wide is v} 0
Integer arithmetic instructions
64-bit instructions: divdx, divdux, mulhdx, mulhdux, mulldAll of the addition, subtraction, and negation instructions are specified using variations of the auxilary
function add’. This function takes a destination location, two source values, a carry-in value and a carryflag, an overflow detection flag, and a result condition flag. The flags determine which condition registerbits are set.
102c 〈integer arithmetic instructions 102c〉≡ (99b) 103 ⊲
fun add’ (d, a, b, (cai, caf), oe, rc)
is
let
val res is (+)((+)(a, b), zx cai)
val ca is carry(a, b, cai)
val ov is (conjoin(a@bits[31] = b@bits[31],
a@bits[31] <> res@bits[31]))
in
rc --> set_cr0 res |
oe --> set_ov ov |
caf --> set_ca (bool ca) |
d := res
end
fun subf’ (d, a, b, (cai, caf), oe, rc)
is
add’ (d, com a, b, (cai, caf), oe, rc)
11.4. INSTRUCTION SEMANTICS 103
We instantiate additional auxilary versions to choose values from general purpose registers or sign-extendedimmediates. These auxilary functions additionally choose carry-in values and carry flags.
103 〈integer arithmetic instructions 102c〉+≡ (99b) ⊳ 102c 104a ⊲
fun [add subf](D, A, B, (cai, caf), oe, rc)
is
[add’ subf’](gpr(D), gpr(A), gpr(B), (cai, caf), oe, rc)
fun [addme subfme](D, A, oe, rc)
is
[add’ subf’](gpr(D), gpr(A), neg 1, (XER.CA, true), oe, rc)
fun [addze subfze](D, A, oe, rc)
is
[add’ subf’](gpr(D), gpr(A), 0, (XER.CA, true), oe, rc)
fun addi (D, A, SIMM)
is
add’(gpr(D),
if A = 0 then 0 else gpr(A) fi,
sx SIMM,
(0, false), false, false)
fun addic (D, A, SIMM, rc)
is
add’(gpr(D), gpr(A), sx SIMM, (XER.CA, true), false, rc)
fun addis (D, A, SIMM)
is
add’(gpr(D),
if A = 0 then 0 else gpr(A) fi,
shl(sx SIMM, 16 : #32 bits),
(0, false), false, false)
fun subfic (D, A, SIMM)
is
subf’(gpr(D), gpr(A), sx SIMM, (1, true), false, false)
fun neg’ (D, A, oe, rc)
is
add’ (gpr(D), com(gpr(A)), 1, (0, false), oe, rc)
104 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
The individual instructions choose settings for the overflow detection flag and the result condition flag.
104a 〈integer arithmetic instructions 102c〉+≡ (99b) ⊳ 103 104b ⊲
default attribute of
add^["" "c" "e"]^["" "o"]^["" "."] (D, A, B)
is add(D, A, B,
[(0, false) (0, true) (XER.CA, true)],
[false true],
[false true])
addi (D, A, SIMM)
is addi (D, A, SIMM)
addic^["" "."] (D, A, SIMM)
is addic (D, A, SIMM, [false true])
addis (D, A, SIMM)
is addis (D, A, SIMM)
addme^["" "o"]^["" "."] (D, A, B)
is addme(D, A, [false true], [false true])
addze^["" "o"]^["" "."] (D, A, B)
is addze(D, A, [false true], [false true])
neg^["" "o"]^["" "."] (D, A, B)
is neg’(D, A, [false true], [false true])
subf^["" "c" "e"]^["" "o"]^["" "."] (D, A, B)
is subf(D, A, B,
[(1, false) (1, true) (XER.CA, true)],
[false true],
[false true])
subfic (D, A, SIMM)
is subfic (D, A, SIMM)
subfme^["" "o"]^["" "."] (D, A, B)
is subfme(D, A, [false true], [false true])
subfze^["" "o"]^["" "."] (D, A, B)
is subfze(D, A, [false true], [false true])
In order to specify the correct overflow semantics for the multiply instructions, we introduce an auxilaryfunction sov that examines the low and high order 32-bits of a 64-bit value to determine if signed overflowwould occur when truncating the value to 32-bits.
104b 〈integer arithmetic instructions 102c〉+≡ (99b) ⊳ 104a 105 ⊲
fun sov (resLo, resHi)
is
let
val signLo is bitExtract #5 #32 #1 {lsb is 0 : #5 bits, wide is resLo}
in
if resHi = 0
then signLo = 0
else if resHi = com 0
then signLo = 1
else true
fi
fi
end
11.4. INSTRUCTION SEMANTICS 105
Using the sov function, we define an auxilary function mul’ that factors out many of the common patternsin the multiplication instructions.
105 〈integer arithmetic instructions 102c〉+≡ (99b) ⊳ 104b 106 ⊲
fun mul’ (mul, d, a, b, lh, oe, rc)
is
let
val res is mul(a, b)
val resLo is bitExtract #6 #64 #32 {lsb is 32 : #6 bits, wide is res}
val resHi is bitExtract #6 #64 #32 {lsb is 0 : #6 bits, wide is res}
val res’ is if lh then resHi else resLo fi
in
rc --> set_cr0 res |
oe --> set_ov (sov (resLo, resHi)) |
d := res’
end
fun mullw (D, A, B, oe, rc)
is mul’ (mul, gpr(D), gpr(A), gpr(B), false, oe, rc)
fun mulhw (mul, D, A, B, rc)
is mul’ (mul, gpr(D), gpr(A), gpr(B), true, false, rc)
fun mulli (D, A, SIMM)
is mul’ (mul, gpr(D), gpr(A), sx SIMM, false, false, false)
default attribute of
[mulhw mulhwu]^["" "."] (D, A, B)
is mulhw ([mul mulu], D, A, B, [false true])
mulli (D, A, SIMM)
is mulli (D, A, SIMM)
mullw^["" "o"]^["" "."] (D, A, B)
is mullw (D, A, B, [false true], [false true])
106 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
The division instructions are standard. The boolean ok is true if the result of the division can be expressedin 32 bits. When the result cannot be expressed in 32 bits, the contents of general purpose register D areundefined, as are the contents of CR.LT, CR.GT, and CR.EQ. λ-RTL limitation: Quoted from SpecifyingInstructions’ Semantics Using λ-RTL (Interim Report): “As of July 1999, the λ-RTL translator does notsupport the kill effects. In a later implementation, we plan to use the question mark to stand for an “unde-fined” value. All λ-RTL functions and RTL operators will be strict in this value, and RTL.STORE(loc, ?)
will become RTL.KILL loc. Because we would like to use the question mark in our example descriptions,even though λ-RTL doesn’t yet have the machinery to give it its eventual meaning, we introduce it as aconstant.”
106 〈integer arithmetic instructions 102c〉+≡ (99b) ⊳ 105
fun divw (D, A, B, oe, rc)
is
let
val a is gpr(A)
val b is gpr(B)
val dividend is (gpr(A))
val divisor is (gpr(B))
val res is (div)(dividend, divisor)
val ok is not (disjoin(conjoin(a = (0x80000000 : #32 bits),
b = neg (1 : #32 bits)),
b = (0 : #32 bits)))
val res’ is if ok then res else ? fi
in
rc --> set_cr0 res’ |
oe --> set_ov (not ok) |
gpr(D) := res’
end
fun divwu (D, A, B, oe, rc)
is
let
val a is gpr(A)
val b is gpr(B)
val dividend is (gpr(A))
val divisor is (gpr(B))
val res is (divu)(dividend, divisor)
val ok is not (b = 0)
val res’ is if ok then res else ? fi
in
rc --> set_cr0 res’ |
oe --> set_ov (not ok) |
gpr(D) := res’
end
default attribute of
divw^["" "o"]^["" "."] (D, A, B)
is divw(D, A, B, [false true], [false true])
11.4. INSTRUCTION SEMANTICS 107
divwu^["" "o"]^["" "."] (D, A, B)
is divwu(D, A, B, [false true], [false true])
Integer compare instructions
The arithmetic and logical comparision instructions can be concisely specified by an auxilary function cmp’
that computes the appropriate condition field value from application of a less-than operator to the sourcegeneral purpose register and a 32-bit source value, and stores the result in the destination condition registerfield. Further auxilary functions compute the 32-bit source value either from a general purpose register oran signed or unsigned immediate value.
The L operand must be 0 for 32-bit implementations. This is enforced by the SLED file.
107 〈integer compare instructions 107〉≡ (99b)
fun cmp’ (lt, crfD, A, b)
is
let
val a is gpr(A) : #32 bits
val c is if lt(a, b)
then 8 -- 0b1000
else if lt(b, a)
then 4 -- 0b0100
else 2 -- 0b0010
fi
fi
val res is bitInsert #2 #4 #1 {lsb is 3, wide is c} XER.SO
in
crf(crfD) := res
end
fun cmp (lt, crfD, A, B)
is cmp’ (lt, crfD, A, gpr(B))
fun cmpi (crfD, A, SIMM)
is cmp’ (lt, crfD, A, sx SIMM)
fun cmpli (crfD, A, UIMM)
is cmp’ (ltu, crfD, A, zx UIMM)
default attribute of
cmp^["" "l"] (crfD, L, A, B)
is cmp([lt ltu], crfD, A, B)
cmpi (crfD, L, A, SIMM)
is cmpi (crfD, A, SIMM)
cmpli (crfD, L, A, UIMM)
is cmpli (crfD, A, UIMM)
108 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
Integer logical instructions
64-bit instructions: cntlzdx, extswxThe standard logical instructions can be concisely specified by an auxilary function log’ that computes
the result of applying a logical operation to the source general purpose register and a 32-bit source value,and stores the result in the destination general purpose register. Further auxilary functions compute the32-bit source value either from a general purpose register or an unsigned immediate value.
108 〈integer logical instructions 108〉≡ (99b) 109 ⊲
fun log’ (A, S, b, OP, rc)
is
let
val res is OP(gpr(S), b)
in
rc --> set_cr0(res) |
gpr(A) := res
end
fun log (A, S, B, OP, rc)
is log’ (A, S, gpr(B), OP, rc)
fun logi (A, S, UIMM, OP, rc)
is log’ (A, S, zx UIMM, OP, rc)
fun logis (A, S, UIMM, OP, rc)
is log’ (A, S, shl(zx UIMM, 16), OP, rc)
default attribute of
[and andc eqv nand nor or orc xor]^["" "."] (A, S, B)
is log(A, S, B, [and andc eqv nand nor or orc xor], [false true])
[and]^"i"^["" "s"]^"." (A, S, UIMM)
is
foreach OP in [and]
foreach f in [logi logis]
group
f (A, S, UIMM, OP, true)
end
[or xor]^"i"^["" "s"] (A, S, UIMM)
is
foreach OP in [or xor]
foreach f in [logi logis]
group
f (A, S, UIMM, OP, false)
end
11.4. INSTRUCTION SEMANTICS 109
The extend sign byte and extend sign half word instructions are specified in a similar manner. Thedifferent sizes of the extracted value in the two instructions prevents further code-sharing.
109 〈integer logical instructions 108〉+≡ (99b) ⊳ 108 112 ⊲
fun ext’ (A, res, rc)
is
let
in
rc --> set_cr0(res) |
gpr(A) := res
end
fun extsb (A, S, rc)
is
let
val res is sx #8 #32 (bitExtract #5 #32 #8
{lsb is 24, wide is gpr(S)})
in
ext’(A, res, rc)
end
fun extsh (A, S, rc)
is
let
val res is sx #16 #32 (bitExtract #5 #32 #16
{lsb is 16, wide is gpr(S)})
in
ext’(A, res, rc)
end
default attribute of
exts^["b" "h"]^["" "."] (A, S)
is [extsb extsh](A, S, [false true])
110 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
The count leading zeros word instruction is not easily specified in λ-RTL. The architecture manual de-scribes the instruction in pseudo-code as follows:
n ← 0do while n < 32
if rS[n] = 1 then leave
n ← n + 1rA ← n
However, λ-RTL does permit the declaration of recursive functions; hence, a general while construct cannotbe written. One way of specifying the cntlzwx instruction is with a Vector.foldr.
110 〈integer logical instructions – broken 110〉≡ 111 ⊲
fun cntlz (A, S, rc)
is
let
fun zero n is bool((gpr(S))@loc[1 bit at n : #5 bits] : #1 bits)
val (res, _)
is
Vector.foldr (\ (n, (c, b)). if conjoin(not b, zero n)
then (n, true)
else (c, b)
fi)
(0, false)
(Vector.spanning 0 31)
in
rc --> set_cr0(res) |
gpr(A) := res
end
default attribute of
cntlzw^["" "."] (A, S)
is cntlz (A, S, [false true])
Code written to file integer logical instructions -- broken.
11.4. INSTRUCTION SEMANTICS 111
λ-RTL limitation: However, λ-RTL fails to expand this in a concise manner. Furthermore, lrtl dieswith Error: Toolkit bug: impossible value from attribute not by selection: of an if expres-sion. It is unclear where and how the if expression was supposed to be eliminated.
An alternate specification of cntlzx can be given by nested if statements:
111 〈integer logical instructions – broken 110〉+≡ ⊳ 110
fun cntlz (A, S, rc)
is
let
fun zero n is bool((gpr(S))@loc[1 bit at n : #5 bits] : #1 bits)
val res
is
if zero 0
then if zero 1
then if zero 2
then if zero 3
then if zero 4
then if zero 5
then if zero 6
then if zero 7
then if zero 8
then if zero 9
then if zero 10
then if zero 11
then if zero 12
then if zero 13
then if zero 14
then if zero 15
then if zero 16
then if zero 17
then if zero 18
then if zero 19
then if zero 20
then if zero 21
then if zero 22
then if zero 23
then if zero 24
then if zero 25
then if zero 26
then if zero 27
then if zero 28
then if zero 29
then if zero 30
then if zero 31
then 32
else 31 fi
else 30 fi
else 29 fi
else 28 fi
else 27 fi
112 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
else 26 fi
else 25 fi
else 24 fi
else 23 fi
else 22 fi
else 21 fi
else 20 fi
else 19 fi
else 18 fi
else 17 fi
else 16 fi
else 15 fi
else 14 fi
else 13 fi
else 12 fi
else 11 fi
else 10 fi
else 9 fi
else 8 fi
else 7 fi
else 6 fi
else 5 fi
else 4 fi
else 3 fi
else 2 fi
else 1 fi
else 0 fi : #32 bits
in
rc --> set_cr0(res) |
gpr(A) := res
end
default attribute of
cntlzw^["" "."] (A, S)
is cntlz (A, S, [false true])
This specification generates a very large RTL. Instead, we choose to define an RTL operator for theinstruction.
112 〈integer logical instructions 108〉+≡ (99b) ⊳ 109
rtlop cntlz: #n bits -> #k bits
fun cntlz (A, S, rc)
is
let
val res is cntlz (gpr(S))
in
rc --> set_cr0(res) |
gpr(A) := res
end
default attribute of
cntlzw^["" "."] (A, S)
is cntlz (A, S, [false true])
11.4. INSTRUCTION SEMANTICS 113
Integer rotate instructions
64-bit instructions: rldclx, rldcrx, rldicx, rldiclx, rldicrx, rldimixThe integer rotate instructions are standard.
113 〈integer rotate instructions 113〉≡ (99b)
fun rlwnm (A, S, n, MB, ME, rc)
is
let
val r is rotl (32, gpr(S), n)
val m is mask(MB, ME)
val res is and(r, m)
in
rc --> set_cr0 res |
gpr(A) := res
end
fun rlwnmi (A, S, n, MB, ME, rc)
is
let
val r is rotl (32, gpr(S), n)
val m is mask(MB, ME)
val res is or(and(r, m), and(gpr(A), com m))
in
rc --> set_cr0 res |
gpr(A) := res
end
default attribute of
rlwimi^["" "."] (A, S, SH, MB, ME)
is rlwnm (A, S,
SH,
MB, ME, [false true])
rlwnm^["" "."] (A, S, B, MB, ME)
is rlwnm (A, S,
(bitExtract #5 #32 #5 {lsb is 27, wide is gpr(B)}),
MB, ME, [false true])
rlwinm^["" "."] (A, S, SH, MB, ME)
is rlwnmi (A, S,
SH,
MB, ME, [false true])
114 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
Integer shift instructions
64-bit instructions: sldx, sradx, sradix, srdxThe integer shift instructions are standard. There may be more patterns that can be factored out, but it
is not immediately obvious.
114 〈integer shift instructions 114〉≡ (99b)
fun slw (A, S, B, rc)
is
let
val n is (bitExtract #5 #32 #5 {lsb is 27, wide is gpr(B)})
val r is rotl (32, gpr(S), n)
val m is if bitExtract #5 #32 #1 {lsb is 26, wide is gpr(B)} = 0
then mask(0, 31 - n)
else 0x0
fi
val res is and(r, m)
in
rc --> set_cr0 res |
gpr(A) := res
end
fun srw (A, S, B, rc)
is
let
val n is (bitExtract #5 #32 #5 {lsb is 27, wide is gpr(B)})
val r is rotl (32, gpr(S), 32 - n)
val m is if bitExtract #5 #32 #1 {lsb is 26, wide is gpr(B)} = 0
then mask(n, 31)
else 0x0
fi
val res is and(r, m)
in
rc --> set_cr0 res |
gpr(A) := res
end
fun sraw (A, S, n, rc)
is
let
val r is rotl (32, gpr(S), 32 - n)
val m is if bitExtract #5 #32 #1 {lsb is 26, wide is r} = 0
then mask(0, 31)
else 0x0
fi
val S is bitExtract #5 #32 #1 {lsb is 31, wide is gpr(S)}
val S’ is if bool(S) then com 0x0 else 0x0 fi
val res is or(and(r, m), and(S’, com m))
val ca is and(S, bit(and(r, com m) = 0))
in
rc --> set_cr0 res |
XER.CA := ca |
gpr(A) := res
11.4. INSTRUCTION SEMANTICS 115
end
default attribute of
slw^["" "."] (A, S, B)
is slw (A, S, B, [false true])
srw^["" "."] (A, S, B)
is srw (A, S, B, [false true])
sraw^["" "."] (A, S, B)
is sraw (A, S,
bitExtract #5 #32 #5 {lsb is 27, wide is gpr(B)},
[false true])
srawi^["" "."] (A, S, SH)
is sraw (A, S,
SH,
[false true])
Floating-point arithmetic instructions
Unimplemented: all – setting of condition codesOptional instructions: fresx, frsqrtex, fselx, fsqrtx, fsqrtsx
115 〈floating-point arithmetic instructions 115〉≡ (99b)
fun fbinop’ (fD, fA, fB, OP, cp, rc)
is
let
val rm is rounding_mode (0)
val res is OP (fpr(fA), fpr(fD), rm)
val res is cp (res, rm)
in
rc --> set_cr1(res) |
fpr(fD) := res
end
fun funop’ (fD, fA, fB, OP, cp, rc)
is
let
val rm is rounding_mode (0)
val res is OP (fpr(fA), fpr(fD), rm)
val res is cp (res, rm)
in
rc --> set_cr1(res) |
fpr(fD) := res
end
fun dp (x, rm) is x
fun sp (x, rm) is f2f #32 #64 (f2f #64 #32 (x, rm), rm)
default attribute of
f^[add div sub]^["" "s"]^["" "."] (fD, fA, fB)
is fbinop’ (fD, fA, fB, [fadd fdiv fsub], [dp sp], [false true])
f^[mul]^["" "s"]^["" "."] (fD, fA, fC)
is fbinop’ (fD, fA, fC, [fmul], [dp sp], [false true])
116 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
Floating-point multiply-add instructions
Unimplemented: all – setting of condition codesUse the exact floating-point multiply to compute a precise intermediate result, then use the convert-
precision (cp) function to map the result down to the appropriate precision.
116a 〈floating-point multiply-add instructions 116a〉≡ (99b)
fun fmulop’ (fD, fA, fB, fC, neg, OP, cp, rc)
is
let
val rm is rounding_mode (0)
val res is fmulx #64 #128 (fpr(fA), fpr(fC))
val b is f2f #64 #128 (fpr(fB), Round.nearest)
val res is OP (res, b, rm)
val res is cp (res, rm)
val res is if neg then fneg res else res fi
in
rc --> set_cr1(res) |
fpr(fD) := res
end
fun dp (x, rm) is f2f #128 #64 (x, rm)
fun sp (x, rm) is f2f #32 #64 (f2f #128 #32 (x, rm), rm)
default attribute of
f^["" "n"]^m^[add sub]^["" "s"]^["" "."] (fD, fA, fB, fC)
is fmulop’ (fD, fA, fB, fC,
[false true], [fadd fsub], [dp sp], [false true])
Floating-point rounding and conversion instructions
Unimplemented: all64-bit instructions: fcfidx, fctidx, fctidzx
116b 〈floating-point rounding and conversion instructions 116b〉≡ (99b)
Floating-point compare instructions
Unimplemented: all
116c 〈floating-point compare instructions 116c〉≡ (99b)
Floating-point status and control register instructions
Unimplemented: all
116d 〈floating-point status and control register instructions 116d〉≡ (99b)
11.4. INSTRUCTION SEMANTICS 117
Integer load instructions
64-bit instructions: ld, ldu, ldux, ldx, lwa, lwaux, lwaxAll load instructions can be factored to use the following auxilary function l’. d is a destination location,
x is an extraction function (to extract and extend the correct number of bits from memory), and ea is theeffective address from which the value is loaded. A and u are used to update general purpose register A withthe effective address. λ-RTL limitation: It would be nice to have an alpha option datatype in λ-RTL,rather than passing both the register number and a flag indicating whether or not to update.
117a 〈integer load instructions 117a〉≡ (99b) 117b ⊲
fun l’ (d, x, ea, A, u)
is
let
in
d := x ea |
u --> gpr(A) := ea
end
Two more auxilary functions distinguish between updating and non-updating loads. A non-updating loadrequires a special case when A = 0.
117b 〈integer load instructions 117a〉+≡ (99b) ⊳ 117a 118 ⊲
fun l (D, x, A, o)
is let val ea is if A = 0 then 0 else gpr(A) fi + o
in l’ (gpr(D), x, ea, A, false) end
fun lu (D, x, A, o)
is let val ea is gpr(A) + o
in l’ (gpr(D), x, ea, A, true) end
118 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
We specify various extraction functions, including half-word byte reverse and word byte-reverse.
118 〈integer load instructions 117a〉+≡ (99b) ⊳ 117b 119a ⊲
fun byte ea is zx #8 #32 (mem(ea))
fun hword ea is zx #16 #32 (mem(ea))
fun hwordr ea
is
let
val v1 is mem(ea + 1)
val v2 is mem(ea + 0)
val v’ is 0 : #32 bits
val v’ is bitInsert #5 #32 #8 {lsb is 0, wide is v’} v1
val v’ is bitInsert #5 #32 #8 {lsb is 8, wide is v’} v2
in
v’
end
fun hworda ea is sx #16 #32 (mem(ea))
fun word ea is mem(ea)
fun wordr ea
is
let
val v1 is mem(ea + 3)
val v2 is mem(ea + 2)
val v3 is mem(ea + 1)
val v4 is mem(ea + 0)
val v’ is 0 : #32 bits
val v’ is bitInsert #5 #32 #8 {lsb is 0, wide is v’} v1
val v’ is bitInsert #5 #32 #8 {lsb is 8, wide is v’} v2
val v’ is bitInsert #5 #32 #8 {lsb is 16, wide is v’} v3
val v’ is bitInsert #5 #32 #8 {lsb is 24, wide is v’} v4
in
v’
end
11.4. INSTRUCTION SEMANTICS 119
These load and extraction functions can be used to specify the majority of the load instructions.
119a 〈integer load instructions 117a〉+≡ (99b) ⊳ 118
default attribute of
l^[bz ha hz wz]^["" "u"] (D, d, A)
is
foreach x in [byte hworda hword word]
foreach f in [l lu]
group
f(D, x, A, sx #16 #32 d)
end
l^[bz ha hz wz]^["" "u"]^x (D, A, B)
is
foreach x in [byte hworda hword word]
foreach f in [l lu]
group
f(D, x, A, gpr(B))
end
l^[h w]^brx (D, A, B)
is
foreach x in [hwordr wordr]
group
lu(D, x, A, gpr(B))
end
Integer store instructions
64-bit instructions: std, stdu, stdux, stdxA similar factoring technique expresses all store instructions using the following auxilary function st’. v
is the value to be stored, ea is the effective address to which the value is stored. A and u are used to updategeneral purpose register A with the effective address.
119b 〈integer store instructions 119b〉≡ (99b) 119c ⊲
fun st’ (v, ea, A, u)
is
let
in
mem(ea) := v |
u --> gpr(A) := ea
end
Two more auxilary functions distinguish between updating and non-updating stores. A non-updatingstore requires a special case when A = 0.
119c 〈integer store instructions 119b〉+≡ (99b) ⊳ 119b 120 ⊲
fun st (v, A, o)
is st’ (v, if A = 0 then 0 else gpr(A) fi + o, A, false)
fun stu (v, A, o)
is st’ (v, gpr(A) + o, A, true)
120 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
We specify various extraction functions, including half-word byte reverse and word byte-reverse.
120 〈integer store instructions 119b〉+≡ (99b) ⊳ 119c 121 ⊲
fun byte S is bitExtract #5 #32 #8 {lsb is 24, wide is gpr(S)}
fun hword S is bitExtract #5 #32 #16 {lsb is 16, wide is gpr(S)}
fun hwordr S
is
let
val v1 is bitExtract #5 #32 #8 {lsb is 24, wide is gpr(S)}
val v2 is bitExtract #5 #32 #8 {lsb is 16, wide is gpr(S)}
val v’ is 0 : #16 bits
val v’ is bitInsert #5 #16 #8 {lsb is 0, wide is v’} v1
val v’ is bitInsert #5 #16 #8 {lsb is 8, wide is v’} v2
in
v’
end
fun word S is gpr(S)
fun wordr S
is
let
val v1 is bitExtract #5 #32 #8 {lsb is 24, wide is gpr(S)}
val v2 is bitExtract #5 #32 #8 {lsb is 16, wide is gpr(S)}
val v3 is bitExtract #5 #32 #8 {lsb is 8, wide is gpr(S)}
val v4 is bitExtract #5 #32 #8 {lsb is 0, wide is gpr(S)}
val v’ is 0 : #32 bits
val v’ is bitInsert #5 #32 #8 {lsb is 0, wide is v’} v1
val v’ is bitInsert #5 #32 #8 {lsb is 8, wide is v’} v2
val v’ is bitInsert #5 #32 #8 {lsb is 16, wide is v’} v3
val v’ is bitInsert #5 #32 #8 {lsb is 24, wide is v’} v4
in
v’
end
11.4. INSTRUCTION SEMANTICS 121
These store and extraction functions can be used to specify the majority of the store instructions.
121 〈integer store instructions 119b〉+≡ (99b) ⊳ 120
default attribute of
st^[b h w]^["" "u"] (S, d, A)
is
foreach x in [byte hword word]
foreach f in [st stu]
group
f(x S, A, sx #16 #32 d)
end
st^[b h w]^["" "u"]^x (S, B, A)
is
foreach x in [byte hword word]
foreach f in [st stu]
group
f(x S, A, gpr(B))
end
st^[h w]^brx (S, A, B)
is
foreach x in [hwordr wordr]
group
stu(x S, A, gpr(B))
end
122 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
Integer load and store multiple instructions
The PPC lmw instruction loads n consecutive words starting at the computed effective address into thegeneral purpose registers D through 31. λ-RTL limitation: One would really like to specify val rs is
Vector.spanning D 31 and eliminate the r >= D tests. However, D is an instantiation-time constant,not a compile-time constant. λ-RTL limitation: lrtl is prohibitively slow when compiling this instruc-tion. Unrolling the Vector.foldr expression produces a very large expression. Furthermore, lrtl recog-nizes that at instantiation-time, the r >= D can be evaluated. However, it does not recognize the fact that1 >= D⇒ 2 >= D. Hence, cases for all 233 possible evaluations of r >= D and A = 0 when r ∈ {0, . . . , 31}are generated. The architecture manual states that the instruction form is invalid if A is in the range ofregisters loaded (i.e., if A ≥ D). This is not checked by the SLED file.
122a 〈integer load and store multiple instructions – broken 122a〉≡ 123a ⊲
fun lmw (D, ea)
is
let
val rs is Vector.spanning 0 31
val (eff,ea) is Vector.foldr
(\(r,(eff,ea)). (eff | r >= D --> gpr(r) := mem(ea),
if r >= D then ea + 4 else ea fi))
(RTL.SKIP, ea)
rs
in
eff
end
default attribute of
lmw (D, d, A)
is lmw(D, (if A = 0 then 0 else gpr(A) fi) + (sx #16 #32 d))
Code written to file integer load and store multiple instructions -- broken.
122b 〈integer load and store multiple instructions 122b〉≡ (99b) 123b ⊲
default attribute of
lmw (D, d, A) is RTL.SKIP
11.4. INSTRUCTION SEMANTICS 123
The PPC stmw instruction stores n consecutive words starting at the computed effective address from thegeneral purpose registers D through 31. λ-RTL limitation: See above. The architecture manual states thatthe instruction form is invalid if A is in the range of registers loaded (i.e., if A ≥ D). This is not checked bythe SLED file.
123a 〈integer load and store multiple instructions – broken 122a〉+≡ ⊳ 122a
fun stmw (S, ea)
is
let
val rs is Vector.spanning 0 31
val (eff,ea) is Vector.foldr
(\(r,(eff,ea)). (eff | r >= S --> mem(ea) := gpr(r),
if r >= S then ea + 4 else ea fi))
(RTL.SKIP, ea)
rs
in
eff
end
default attribute of
stmw (S, d, A)
is stmw(S, (if A = 0 then 0 else gpr(A) fi) + (sx #16 #32 d))
123b 〈integer load and store multiple instructions 122b〉+≡ (99b) ⊳ 122b
default attribute of
stmw (S, d, A) is RTL.SKIP
Integer load and store string instructions
Unimplemented: all
123c 〈integer load and store string instructions 123c〉≡ (99b)
Memory synchronization instructions
Unimplemented: eieio, isync, lwarx, stwcx., sync64-bit instructions: ldarx, stdcx.
123d 〈memory synchronization instructions 123d〉≡ (99b)
Floating-point load instructions
Unimplemented: all
123e 〈floating-point load instructions 123e〉≡ (99b)
Floating-point store instructions
Unimplemented: allOptional instructions: stfiwx
123f 〈floating-point store instructions 123f〉≡ (99b)
124 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
Floating-point move instructions
Unimplemented: all
124a 〈floating-point move instructions 124a〉≡ (99b)
Branch instructions
There are four types of branch instructions.The first type simply assigns a relocatable operand to nia. It optionally saves cia in the link register.
The SLED file computes reloc from the LI and AA operands. However, b and ba are considered distinctinstructions, although they have identical semantics. Similar comments apply to bl and bla.
124b 〈branch instructions 124b〉≡ (99b) 124c ⊲
fun b (reloc, lk)
is
let
in
lk --> lr := cia + 4 |
nia := reloc
end
default attribute of
b^["" "l"] (reloc)
is b (reloc, [false true])
b^["" "l"]^"a" (reloc)
is b (reloc, [false true])
The remaining types of branch instructions branch conditionally on the values of the condition register,count register, and the operands BO and BI. The auxilary function bc’ computes a thunk to assign a newvalue to the count register and two boolean values.
124c 〈branch instructions 124b〉+≡ (99b) ⊳ 124b 125a ⊲
fun bc’ (BO, BI)
is
let
fun bob n is BO@bits[1 bit at n : #3 bits]
val nctr is if bool(bob 2) then ctr else ctr - 1 fi
val ctr_ok
is bool(or(bob 2, xor(bit(nctr <> 0), bob 3)))
val cond_ok
is bool(or(bob 0, eqv(crb BI, bob 1)))
val set_ctr
is \(x). not (bool (bob 2)) --> ctr := nctr
in
(set_ctr, ctr_ok, cond_ok)
end
11.4. INSTRUCTION SEMANTICS 125
The second type of branch instruction updates the count register and branches to reloc if both ctr ok
and cond ok are true. It optionally saves cia in the link register. The SLED file computes reloc from theBD and AA operands. However, bc and bca are considered distinct instructions, although they have identicalsemantics. Similary comments apply to bcl and bcla.
125a 〈branch instructions 124b〉+≡ (99b) ⊳ 124c 125b ⊲
fun bc (BO, BI, reloc, lk)
is
let
val (set_ctr, ctr_ok, cond_ok) is bc’ (BO, BI)
in
set_ctr (0) |
conjoin(ctr_ok, cond_ok) --> b (reloc, lk)
end
default attribute of
bc^["" "l"] (BO, BI, reloc)
is bc (BO, BI, reloc, [false true])
bc^["" "l"]^"a" (BO, BI, reloc)
is bc (BO, BI, reloc, [false true])
The third type of branch instruction does not update the count register and branches to the aligned countregister if cond ok is true. It optionally saves cia in the link register. The architecture manual states thatthe instruction form is invalid if BO[2] = 0. This is not checked by the SLED file.
125b 〈branch instructions 124b〉+≡ (99b) ⊳ 125a 125c ⊲
fun bcctr (BO, BI, lk)
is
let
val (set_ctr, ctr_ok, cond_ok) is bc’ (BO, BI)
in
cond_ok --> b(align(ctr), lk)
end
default attribute of
bcctr^["" "l"] (BO, BI)
is bcctr (BO, BI, [false true])
The fourth type of branch instruction updates the count register and branches to the aligned link registerif both ctr ok and cond ok are true. It optionally saves cia in the link register.
125c 〈branch instructions 124b〉+≡ (99b) ⊳ 125b
fun bclr (BO, BI, lk)
is
let
val (set_ctr, ctr_ok, cond_ok) is bc’ (BO, BI)
in
set_ctr (0) |
conjoin(ctr_ok, cond_ok) --> b(align(lr), lk)
end
default attribute of
bclr^["" "l"] (BO, BI)
is bclr (BO, BI, [false true])
126 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
Condition register logical instructions
The standard condition register logical instructions applies a bitwise operation to two source conditionregister fields, and stores the result in the destination condition register field.
126a 〈condition register logical instructions 126a〉≡ (99b) 126b ⊲
fun crlog (D, A, B, OP) is crb(D) := OP(crb(A), crb(B))
default attribute of
cr^[and andc eqv nand nor or orc xor] (crbD, crbA, crbB)
is crlog (crbD, crbA, crbB, [and andc eqv nand nor or orc xor])
There is also an instruction to copy the contents of one source condition register field to a destinationcondition register field.
126b 〈condition register logical instructions 126a〉+≡ (99b) ⊳ 126a
default attribute of
mcrf (crfD, crfS)
is crf(crfD) := crf(crfS)
System linkage instructions
Unimplemented: scSupervisor-level instructions: rfi, rfid64-bit instructions: rfidOptional 64-bit bridge instruction: rfi
126c 〈system linkage instructions 126c〉≡ (99b)
Trap instructions
Unimplemented: tw, twi64-bit instructions: td, tdi
126d 〈trap instructions 126d〉≡ (99b)
Processor control instructions
Supervisor-level instructions: mfmsr, mtmsr, mtmsrdSupervisor- and user-level instructions: mfspr, mtspr64-bit instructions: mtmsrdOptional 64-bit bridge instructions: mtmsr
Two instructions that move values to and from the condition and XER registers are easily specified.
126e 〈processor control instructions 126e〉≡ (99b) 127a ⊲
default attribute of
mcrxr (crfD)
is (crf(crfD) := xerf(0) | xerf(0) := 0)
mfcr (D)
is gpr(D) := cr
11.4. INSTRUCTION SEMANTICS 127
The mfspr, mtspr, mftb instructions use an integer encoding to select a special purpose register. Theauxilary function getSPR returns the location corresponding to the encoding. The architecture manual statesthat when the encoded value is any value other than those defined below, then one of the following mayoccur:
• The system illegal instruction error handler is invoked.
• The system supervisor-level instruction error handler is invoked.
• The results are boundedly undefined.
These semantics are not captured by the λ-RTL specifications below. Instead, we return a bogus locationfor the operation.
127a 〈processor control instructions 126e〉+≡ (99b) ⊳ 126e 127b ⊲
fun getSPR spr
is
let
val dst is if spr = 1 then xer
else if spr = 8 then lr
else if spr = 9 then ctr
else if spr = 268 then tbl
else if spr = 269 then tbu
else bogus
fi fi fi fi fi : #32 loc
in
dst
end
Using the auxilary function, we define the instructions.
127b 〈processor control instructions 126e〉+≡ (99b) ⊳ 127a 128b ⊲
default attribute of
mfspr (D, spr)
is gpr(D) := (getSPR spr)
mtspr (spr, S)
is (getSPR spr) := gpr(S)
mftb (D, tbr)
is gpr(D) := (getSPR tbr)
128 CHAPTER 11. DESCRIBING THE POWERPC (BY MATTHEW FLUET)
The move to condition register fields copies the contents of a general purpose register to the conditionregister under the control of the field mask. The field mask is specified by an 8-bit operand which indicateswhich fields of the condition register should be updated. λ-RTL limitation: Many of the issues withVector.foldr described above with the lmw instruction also apply.
128a 〈processor control instructions – broken 128a〉≡default attribute of
mtcrf (CRM, S)
is
let
fun crmb n is CRM@bits[1 bit at n]
val mask is Vector.foldr
(\(n,mask). bitInsert #5 #32 #4
{lsb is mul(4 : #3 bits, n),
wide is mask}
(if bool(crmb(n))
then 0xF
else 0x0
fi))
(0 : #32 bits)
(Vector.spanning 0 7)
in
cr := or(and(gpr(S), mask), and(cr, com mask))
end
Code written to file processor control instructions -- broken.
128b 〈processor control instructions 126e〉+≡ (99b) ⊳ 127b
default attribute of
mtcrf (CRM, S) is RTL.SKIP
Cache management instructions
Unimplemented: dcbf, dcbst, dcbt, dcbtst, dcbz, icbiSupervisor-level instructions: dcbiOptional instructions: dcba
128c 〈cache management instructions 128c〉≡ (99b)
Segment register manipulation instructions
Supervisor-level instructions: mfsr, mfsrin, mtsr, mtsrd, mtsrdin, mtsrinOptional 64-bit bridge instructions: mfsr, mfsrin, mtsr, mtsrd, mtsrdin, mtsrin
128d 〈segment register manipulation instructions 128d〉≡ (99b)
Lookaside buffer management instructions
Supervisor-level instructions: slbia, slbie, tlbia, tlbie, tlbsyncOptional instructions: slbia, slbie, tlbia, tlbie, tlbsync64-bit instructions: slbia, slbie
128e 〈lookaside buffer management instructions 128e〉≡ (99b)
11.4. INSTRUCTION SEMANTICS 129
External control instructions
Unimplemented: eciwx, ecowx
129 〈external control instructions 129〉≡ (99b)
Chapter 12
Index
The indices are split into two pieces, one for the SPARC and one for the Intel. We’ll combine them one day.
12.1 List of chunks
〈basic.rtl 20d〉〈definitions of standard RTL operators and functions 21a〉〈definitions of standard RTL operators and functions [[full]] 22b〉〈definitions of standard RTL operators and functions [[simple]] 22a〉〈for every i that is an input, print i and a blank 20b〉〈IEEE 754 floating point 26d〉〈summarize the problem 20a〉〈algebraic laws 35〉〈trap semantics 38f〉〈instruction defaults 38b〉〈instruction defaults [[full]] 41a〉〈instruction defaults [[simple]] 42b〉〈SPARC basics 37a〉〈SPARC basics [[full]] 37c〉〈SPARC utilities 40c〉〈SPARC utilities [[full]] 39f〉〈SPARC utilities [[simple]] 39e〉〈wishful thinking about ldfsr 38d〉〈basics 46b〉〈effective addresses 50a〉〈effective-address utilities 49b〉〈instruction defaults 55c〉〈instruction defaults using infix and 52c〉〈intel.rtl 46a〉〈junk: instruction defaults 54a〉〈junk: instruction defaults using infix and 52b〉
130
12.1. LIST OF CHUNKS 131
〈junk: utilities 53b〉〈new RTL operators 58f〉〈registers 48a〉〈utilities 49a〉〈address modes and operands 62a〉〈address twiddling and fetch 62d〉〈imaginary code 70a〉〈instruction defaults 62c〉〈MIPS basics 60a〉〈MIPS register fetch and store methods 61a〉〈mips utilities 65b〉〈mips utilities [[full]] 64b〉〈mips utilities [[simple]] 65a〉〈mips.rtl 70b〉〈mips.sled 71〉〈number of bytes to shift 62e〉〈number of bytes to shift into right 63b〉〈trap semantics 79a〉〈address modes and operands 77e〉〈ALPHA register fetch and store methods 76d〉〈ALPHA utilities 79b〉〈ALPHA utilities [[full]] 80a〉〈ALPHA utilities [[simple]] 79g〉〈alpha.rtl 80c〉〈composition function 78c〉〈instructions 78d〉〈storage 75〉〈window dressing for trap semantics 80d〉〈window dressing for trap semantics [[full]] 81a〉〈window dressing for trap semantics [[simple]] 81b〉〈memory references 83a〉〈operations 83b〉〈pdp8.lrtl 85c〉〈pdp8.rtl 86〉〈storage spaces 82〉〈instruction defaults 90b〉〈MMIX basics 87〉〈mmix.rtl 91d〉〈Bogus 98b〉〈branch instructions 124b〉〈cache management instructions 128c〉〈Condition Register 95b〉〈condition register logical instructions 126a〉〈Count Register 97b〉〈extensions 93b〉〈external control instructions 129〉
132 CHAPTER 12. INDEX
〈floating-point arithmetic instructions 115〉〈floating-point compare instructions 116c〉〈floating-point load instructions 123e〉〈floating-point move instructions 124a〉〈floating-point multiply-add instructions 116a〉〈Floating-Point Registers 95a〉〈floating-point rounding and conversion instructions 116b〉〈Floating-Point Status and Control Register 96a〉〈floating-point status and control register instructions 116d〉〈floating-point store instructions 123f〉〈General-Purpose Registers 94c〉〈imports 93a〉〈Instruction Addresses 97c〉〈instruction semantics 99b〉〈integer arithmetic instructions 102c〉〈integer compare instructions 107〉〈integer load and store multiple instructions 122b〉〈integer load and store multiple instructions – broken 122a〉〈integer load and store string instructions 123c〉〈integer load instructions 117a〉〈integer logical instructions 108〉〈integer logical instructions – broken 110〉〈integer rotate instructions 113〉〈integer shift instructions 114〉〈integer store instructions 119b〉〈Link Register 97a〉〈lookaside buffer management instructions 128e〉〈Memory 94b〉〈memory synchronization instructions 123d〉〈operands 98c〉〈ppc.rtl 92〉〈processor control instructions 126e〉〈processor control instructions – broken 128a〉〈segment register manipulation instructions 128d〉〈storage spaces 94a〉〈system linkage instructions 126c〉〈Time Base Registers 98a〉〈trap instructions 126d〉〈utilities 100a〉〈utilities – broken 101a〉〈XER Register 96b〉
12.2 Identifier index
+: 23a, 23c, 24c -: 20c, 23a, 23c, 24c, 25c, 26a
12.2. IDENTIFIER INDEX 133
-->: 21a:=: 21a<: 20b, 24a, 24c, 27e, 46a, 49a, 53a, 53b, 54a, 55a,
56a, 57a, 58b, 58c<=: 24a, 24c<>: 22d, 22f>: 23b, 24a, 24c, 27e, 46a, 53a, 53b, 54a, 55a, 57a,
57c, 58a, 58c>=: 24a, 24c?: 26badd: 23a, 23c, 89b, 90f, 91d, 103, 104a, 115, 116aand: 24dandalso: 22a, 22b, 22cbit: 22e, 22fbitExtract: 24dbitInsert: 24d, 24dbitTransfer: 24e, 25c, 26abool: 21a, 21c, 22d, 22e, 22f, 24a, 24bborrow: 23a, 23ccarry: 23a, 23ccom: 24dconjoin: 21c, 22adisjoin: 21c, 22adiv: 23d, 24cdivu: 23d, 24cdo nothing: 21aeq: 22d, 66f, 67d, 68a, 71f2f: 27bf2i: 27bfabs: 26dfadd: 26dfcmp: 27efdiv: 26dfloat eq: 27efloat gt: 27efloat lt: 27efmul: 26dfmulx: 26dfneg: 26dfsqrt: 26dfsub: 26dge: 24a, 93bgeu: 24bgt: 24a, 53b, 93bgtu: 24bi2f: 27b
le: 24a, 93bleu: 24blt: 24a, 53b, 93b, 107ltu: 24bminf: 27cmod: 23d, 24cmodu: 23d, 24cmul: 23dmulu: 23dmzero: 27cNaN: 27dne: 22dneg: 23anot: 21cor: 23d, 24d, 66a, 66b, 67d, 68a, 70b, 71orelse: 22a, 22b, 22cpinf: 27cpopcnt: 25bpzero: 27cquot: 23d, 24crem: 23d, 24crotl: 25crotr: 26ashl: 25ashra: 25ashrl: 25asub: 23a, 89b, 90f, 91csubtract: 23csx: 23bundefined: 26bunordered: 27exor: 24dzx: 23b, 23cdelay: 38ddivide: 41b, 41cdrain fpu pipeline: 38dfloatingCompare: 37droundingMode: 37csparc sdiv overflow: 41bsparc udiv overflow: 41bsub overflows: 40a, 40b, 40c, 40f, 57asubtract instruction: 40c, 40d, 40etag overflows: 39e, 39f, 39h, 40f<: 20b, 24a, 24c, 27e, 46a, 49a, 53a, 53b, 54a, 55a,
56a, 57a, 58b, 58c<==: 53a, 53b, 54a, 55a
134 CHAPTER 12. INDEX
>: 23b, 24a, 24c, 27e, 46a, 53a, 53b, 54a, 55a, 57a,57c, 58a, 58c
>>: 51bAC: 51eadc: 56a, 56badcx: 56a, 56badd’: 56a, 102c, 103add adjust: 56aadd overflows: 56a, 79g, 80aaddx: 56a, 56cAF: 46a, 48d, 49a, 57c, 58a, 58eallr: 56a, 56b, 56cb: 50a, 50b, 50c, 51a, 52a, 52a, 52c, 53a, 54a, 55a,
55c, 55d, 56b, 56c, 57b, 58h, 93c, 93d, 98b, 102c,105, 106, 107, 108, 109, 110, 116a, 121, 124b,125a, 125b, 125c
bin: 53a, 53b, 54abin’: 54c, 55abit16At: 59a, 59cbit32At: 59a, 59cbitAt: 59abound: 58c, 58dBR: 51e, 58cbt: 59b, 59cbtc: 59b, 59cbtr: 59b, 59cbts: 59b, 59cbyte: 48b, 49b, 58h, 61c, 62c, 76a, 76b, 78d, 79c,
118, 119a, 120, 121CF: 46a, 48d, 49a, 56a, 57b, 57c, 58a, 58ed: 50a, 50b, 50c, 51a, 52a, 52a, 52c, 53a, 54a, 55a,
55c, 55d, 56b, 56c, 57b, 58gDB: 51eDE: 51eDF: 51eDP: 51edword: 48b, 49b, 61c, 68c, 68dexn: 51e, 58cGP: 51egt: 24a, 53b, 93bhacksub8: 49bI: 50c, 52c, 53a, 54a, 55a, 55c, 56b, 56c, 57b, 89aidsize: 55b, 55c, 56b, 56c, 57bIs: 50c, 55c, 56b, 56c, 57bllr: 52a, 52cllr’: 55b
logical: 54d, 55alogical’: 55b, 55clogical flags: 54b, 54d, 55blsbset: 58f, 58glt: 24a, 53b, 93b, 107M: 50b, 51a, 88aMC: 51eMF: 51emsbset: 58f, 58gNM: 51eNMI: 51eNP: 51eOF: 46a, 48d, 49a, 51e, 58eparity: 48d, 49aPF: 46a, 48d, 49a, 51e, 58eR: 50a, 51a, 52c, 53a, 54a, 55a, 55c, 56b, 56c, 57b,
58gregb: 49b, 50aregd: 49b, 50a, 51a, 55f, 58d, 58h, 59cregw: 49b, 50a, 55f, 58b, 58d, 59cset cf: 58e, 59bset flags: 49a, 54b, 56a, 57aset zf: 58e, 58gSF: 46a, 48d, 49a, 58eSS: 51esub’: 57a, 57bsub adjust: 57asub overflows: 40a, 40b, 40c, 40f, 57atrap: 51e, 79b, 80c, 80d, 81aTS: 51eUD: 51eunary: 55d, 55ew: 50a, 50b, 50c, 51a, 52a, 52a, 52c, 53a, 54a, 55a,
55c, 55d, 56b, 56c, 57b, 58gword: 48b, 49b, 58h, 61c, 62d, 68c, 68d, 69g, 76a,
76b, 78d, 79c, 118, 119a, 120, 121ZF: 46a, 48d, 49a, 58b, 58eByte: 62e, 63b, 64abyte: 48b, 49b, 58h, 61c, 62c, 76a, 76b, 78d, 79c,
118, 119a, 120, 121Condition: 67c, 67d, 68bdispA: 62b, 62c, 63d, 78a, 78d, 78e, 78f, 79a, 79c,
79d, 79edouble: 61d, 68c, 68d, 68e, 69b, 69edword: 48b, 49b, 61c, 68c, 68deq: 22d, 66f, 67d, 68a, 71
12.2. IDENTIFIER INDEX 135
f: 67d, 68a, 68c, 68d, 68e, 69a, 69b, 69d, 69e, 71hword: 61c, 62c, 118, 119a, 120, 121lsbit: 62e, 63a, 63b, 63c, 64aneq: 67d, 68ano overflow: 64b, 65a, 65bnor: 66a, 66b, 71, 93c, 108, 126aoge: 67d, 68aogl: 67d, 68aogt: 67d, 68aole: 67d, 68a, 71olt: 67d, 68a, 71op function: 65b, 65cop overflows: 64bor: 23d, 24d, 66a, 66b, 67d, 68a, 70b, 71RM: 67c, 68c, 68e, 69dsingle: 61d, 68c, 68d, 68e, 69b, 69et: 67d, 68atByte: 62e, 63b, 64aueq: 67d, 68a, 71uge: 67d, 68augt: 67d, 68aule: 67d, 68a, 71ult: 67d, 68a, 71un: 67d, 68a, 71word: 48b, 49b, 58h, 61c, 62d, 68c, 68d, 69g, 76a,
76b, 78d, 79c, 118, 119a, 120, 121add overflows: 56a, 79g, 80aalignTrap: 79a, 79b, 79dbyte: 48b, 49b, 58h, 61c, 62c, 76a, 76b, 78d, 79c,
118, 119a, 120, 121data store error: 81a, 81bdispA: 62b, 62c, 63d, 78a, 78d, 78e, 78f, 79a, 79c,
79d, 79eiff: 80alittleEndian: 75, 76b, 80clong: 76a, 76b, 78d, 79co: 78c, 78docta: 76aquad: 76a, 76b, 78d, 78e, 79c, 79etrap: 51e, 79b, 80c, 80d, 81atrap instruction: 81a, 81bword: 48b, 49b, 58h, 61c, 62d, 68c, 68d, 69g, 76a,
76b, 78d, 79c, 118, 119a, 120, 121accum effect: 83c, 84acma: 85biac: 85b
ral: 84d, 85brar: 84d, 85bskip op: 85a, 85bunary op: 83b, 84a, 84b, 84cadd: 23a, 23c, 89b, 90f, 91d, 103, 104a, 115, 116aD: 89aI: 50c, 52c, 53a, 54a, 55a, 55c, 56b, 56c, 57b, 89aM: 50b, 51a, 88aM1: 88a, 90b, 90c, 90dM2: 88a, 90b, 90c, 90dM4: 88a, 88e, 90b, 90c, 90d, 90eM8: 88a, 90b, 90c, 90d, 90enoFit: 89bO: 89a, 89b, 90bo1: 89b, 90co2: 89b, 90co4: 89b, 90co8: 89b, 90foccur: 89asign: 89bsub: 23a, 89b, 90f, 91ctrip: 88e, 89aU: 89a, 90bV: 89aW: 89a, 90bX: 89a, 89c, 90b, 90c, 90d, 90e, 90f, 91a, 91b, 91cZ: 89a, 89c, 90a, 90f, 91a, 91b, 91cadd: 23a, 23c, 89b, 90f, 91d, 103, 104a, 115, 116aadd’: 56a, 102c, 103addi: 103, 104aaddic: 103, 104aaddis: 103, 104aaddme: 103, 104aaddze: 103, 104aalign: 102b, 125b, 125candc: 93c, 108, 126ab: 50a, 50b, 50c, 51a, 52a, 52a, 52c, 53a, 54a, 55a,
55c, 55d, 56b, 56c, 57b, 58h, 93c, 93d, 98b, 102c,105, 106, 107, 108, 109, 110, 116a, 121, 124b,125a, 125b, 125c
bc: 125abc’: 124c, 125a, 125b, 125cbcctr: 125bbclr: 125cbogus: 98b, 98b, 127a
136 CHAPTER 12. INDEX
byte: 48b, 49b, 58h, 61c, 62c, 76a, 76b, 78d, 79c,118, 119a, 120, 121
cia: 97c, 124bcmp: 107cmp’: 107cmpi: 107cmpli: 107cntlz: 110, 111, 112, 112cr: 95b, 95b, 126a, 126e, 128acrb: 95b, 95b, 124c, 126acrf: 95b, 95b, 107, 126b, 126ecrlog: 126actr: 97b, 124c, 125b, 127adivw: 106divwu: 106dp: 115, 116aeqjoin: 93deqv: 93c, 108, 124c, 126aext’: 109extsb: 109extsh: 109fbinop’: 115fmulop’: 116afpr: 95a, 95a, 115, 116afpsrcb: 96afunop’: 115ge: 24a, 93bgetSPR: 127a, 127bgpr: 94c, 94c, 103, 105, 106, 107, 108, 109, 110,
111, 112, 113, 114, 117a, 117b, 119a, 119b, 119c,120, 121, 122a, 123a, 126e, 127b, 128a
gt: 24a, 53b, 93bhword: 61c, 62c, 118, 119a, 120, 121hworda: 118, 119ahwordr: 118, 119a, 120, 121l: 107, 117b, 119a, 124b, 125a, 125b, 125cl’: 117a, 117ble: 24a, 93blmw: 122a, 122blog: 108log’: 108logi: 108logis: 108lr: 97a, 124b, 125c, 127a
lt: 24a, 53b, 93b, 107lu: 117b, 119amask: 101a, 101b, 113, 114, 128amem: 94b, 94b, 118, 119b, 122a, 123amul’: 105mulhw: 105mulli: 105mullw: 105nand: 93c, 108, 126aneg’: 103, 104ania: 97c, 124bnor: 66a, 66b, 71, 93c, 108, 126aorc: 93c, 108, 126arlwnm: 113rlwnmi: 113rounding mode: 102a, 115, 116aset ca: 100b, 102cset cr0: 100a, 102c, 105, 106, 108, 109, 110, 111,
112, 113, 114set cr1: 102a, 115, 116aset ov: 100b, 102c, 105, 106slw: 114sov: 104b, 105sp: 115, 116asraw: 114srw: 114st: 119c, 121st’: 119b, 119cstmw: 123a, 123bstu: 119c, 121subf: 103, 104asubf’: 102c, 103subfic: 103, 104asubfme: 103, 104asubfze: 103, 104atbl: 98a, 127atbu: 98a, 127aword: 48b, 49b, 58h, 61c, 62d, 68c, 68d, 69g, 76a,
76b, 78d, 79c, 118, 119a, 120, 121wordr: 118, 119a, 120, 121xer: 96b, 96b, 127axerb: 96bxerf: 96b, 96b, 126exjoin: 93d
References
Bailey, Mark W. and Jack W. Davidson. 1995 (January).A formal model and specification language for procedure calling conventions.In Conference Record of the 22nd Annual ACM Symposium on Principles of Programming Languages,pages 298–310, San Francisco, CA.
. 1996 (May).Target-sensitive construction of diagnostic programs for procedure calling sequence generators.Proceedings of the ACM SIGPLAN ’96 Conference on Programming Language Design and Implemen-tation, in SIGPLAN Notices, 31(5):249–257.
Bala, Vasanth and Norman Rubin. 1995 (November 29–December 1,).Efficient instruction scheduling using finite state automata.In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 46–56, AnnArbor, Michigan.
Benitez, Manuel E. and Jack W. Davidson. 1988 (July).A portable global optimizer and linker.Proceedings of the ACM SIGPLAN ’88 Conference on Programming Language Design and Implemen-tation, in SIGPLAN Notices, 23(7):329–338.
Davidson, Jack W. and Christopher W. Fraser. 1980 (April).The design and application of a retargetable peephole optimizer.ACM Transactions on Programming Languages and Systems, 2(2):191–202.
Emmelmann, Helmut, Friedrich-Wilhelm Schroer, and Rudolf Landwehr. 1989 (July).BEG — a generator for efficient back ends.Proceedings of the ACM SIGPLAN ’89 Conference on Programming Language Design and Implemen-tation, in SIGPLAN Notices, 24(7):227–237.
Fauth, Andreas, Johan Van Praet, and Markus Freericks. 1995 (March).Describing instruction set processors using nML.In The European Design and Test Conference, pages 503–507.
Fernandez, Mary F. 1995 (November).A Retargetable Optimizing Linker.PhD thesis, Dept of Computer Science, Princeton University.
Fraser, Christopher W., Robert R. Henry, and Todd A. Proebsting. 1992 (April).BURG—fast optimal instruction selection and tree parsing.SIGPLAN Notices, 27(4):68–76.
137
138 REFERENCES
Griswold, Ralph E. 1982 (October).The evaluation of expressions in Icon.ACM Transactions on Programming Languages and Systems, 4(4):563–584.
Intel Corporation. 1993.Architecture and Programming Manual. Vol. 3 of Pentium Processor User’s Manual.Mount Prospect, IL.
Larus, James R. and Eric Schnarr. 1995 (June).EEL: machine-independent executable editing.Proceedings of the ACM SIGPLAN ’95 Conference on Programming Language Design and Implemen-tation, in SIGPLAN Notices, 30(6):291–300.
Larus, James R. 1990 (September).SPIM S20: A MIPS R2000 simulator.Technical Report 966, Computer Sciences Department, University of Wisconsin, Madison, WI.
Lipsett, R., C. Schaefer, and C. Ussery. 1993.VHDL: Hardware Description and Design. 12 edition.Kluwer Academic Publishers.
Milner, Robin, Mads Tofte, and Robert W. Harper. 1990.The Definition of Standard ML.Cambridge, Massachusetts: MIT Press.
Milner, Robin. 1978 (December).A theory of type polymorphism in programming.Journal of Computer and System Sciences, 17:348–375.
Morrisett, Greg. 1995 (December).Compiling with Types.PhD thesis, Carnegie Mellon.Published as technical report CMU–CS–95–226.
Proebsting, Todd A. and Christopher W. Fraser. 1994 (January).Detecting pipeline structural hazards quickly.In Conference Record of the 21st Annual ACM Symposium on Principles of Programming Languages,pages 280–286, Portland, OR.
Ramsey, Norman and Mary F. Fernandez. 1997 (May).Specifying representations of machine instructions.ACM Transactions on Programming Languages and Systems, 19(3):492–524.
Ramsey, Norman and David R. Hanson. 1992 (July).A retargetable debugger.ACM SIGPLAN ’92 Conference on Programming Language Design and Implementation, in SIGPLANNotices, 27(7):22–31.
Ramsey, Norman. 1994 (September).Literate programming simplified.IEEE Software, 11(5):97–105.
REFERENCES 139
. 1996 (April).A simple solver for linear equations containing nonlinear operators.Software—Practice & Experience, 26(4):467–487.
SPARC International. 1992.The SPARC Architecture Manual, Version 8.Englewood Cliffs, NJ: Prentice Hall.
Srivastava, Amitabh and Alan Eustace. 1994 (June).ATOM: A system for building customized program analysis tools.Proceedings of the ACM SIGPLAN ’94 Conference on Programming Language Design and Implemen-tation, in SIGPLAN Notices, 29(6):196–205.
Stallman, Richard M. 1992 (February).Using and Porting GNU CC (Version 2.0).Free Software Foundation.
Thomas, Donald and Philip Moorby. 1995.The Verilog Hardware Description Language. 2nd edition.Norwell, USA: Kluwer Academic Publishers.
Thompson, T. 1996 (February).An Alpha in PC clothing.Byte, pages 195–196.
Wang, Daniel C., Andrew W. Appel, Jeff L. Korn, and Christopher S. Serra. 1997 (October).The Zephyr abstract syntax description language.In Proceedings of the 2nd USENIX Conference on Domain-Specific Languages, pages 213–227, SantaBarbara, CA.