RyuJIT: Porting to different platforms

2019-02-05 00:00:19

What is a Platform?

ターゲット命令セットとポインターサイズ
ターゲット呼び出し規約
実行時データ構造 (この文書の対象外です)
GCのエンコーディング
- 現時点では、JIT32_GCENCODERとそれ以外の2種類
デバッグ情報 (現時点では、どのターゲットでもほぼ同じ？)
EH情報 (この文書の対象外です)

CLRの利点の一つは、(ABI以外の) OSの差異をVMが (ほとんど) 隠ぺいしていることです。

非常に抽象度が高いビュー

32ビット vs. 64ビット
- バックエンドの作業はまだ完了していませんが、共有すべきでしょう
命令セットのアーキテクチャー:
- instrsXXX.h, emitXXX.cpp, および targetXXX.cpp
- lowerXXX.cpp
- codeGenXXX.cpp, および simdcodegenXXX.cpp
- unwindXXX.cpp
呼び出し規約: あちこち

Front-end changes

Calling Convention
- Struct args and returns seem to be the most complex differences
  - Importer and morph are highly aware of these
    - E.g. fgMorphArgs(), fgFixupStructReturn(), fgMorphCall(), fgPromoteStructs() and the various struct assignment morphing methods
- HFAs on ARM
Tail calls are target-dependent, but probably should be less so
Intrinsics: each platform recognizes different methods as intrinsics (e.g. Sin only for x86, Round everywhere BUT amd64)
Target-specific morphs such as for mul, mod and div

Backend Changes

Lowering: fully expose control flow and register requirements
Code Generation: traverse blocks in layout order, generating code (InstrDescs) based on register assignments on nodes
- Then, generate prolog & epilog, as well as GC, EH and scope tables
ABI changes:
- Calling convention register requirements
  - Lowering of calls and returns
  - Code sequences for prologs & epilogs
- Allocation & layout of frame

Target ISA “Configuration”

Conditional compilation (set in jit.h, based on incoming define, e.g. #ifdef X86)

_TARGET_64_BIT_ (32 bit target is just ! _TARGET_64BIT_)
_TARGET_XARCH_, _TARGET_ARMARCH_
_TARGET_AMD64_, _TARGET_X86_, _TARGET_ARM64_, _TARGET_ARM_

Target.h
InstrsXXX.h

Instruction Encoding

The instrDesc is the data structure used for encoding
- It is initialized with the opcode bits, and has fields for immediates and register numbers.
- instrDescs are collected into groups
- A label may only occur at the beginning of a group
The emitter is called to:
- Create new instructions (instrDescs), during CodeGen
- Emit the bits from the instrDescs after CodeGen is complete
- Update Gcinfo (live GC vars & safe points)

Adding Encodings

The instruction encodings are captured in instrsXXX.h. These are the opcode bits for each instruction
The structure of each instruction’s encoding is target-dependent
An “instruction” is just the representation of the opcode
An instance of “instrDesc” represents the instruction to be emitted
For each “type” of instruction, emit methods need to be implemented. These follow a pattern but a target may have unique ones, e.g.

emitter::emitInsMov(instruction ins, emitAttr attr, GenTree* node)
emitter::emitIns_R_I(instruction ins, emitAttr attr, regNumber reg, ssize_t     val)
emitter::emitInsTernary(instruction ins, emitAttr attr, GenTree* dst, GenTree* src1, GenTree* src2) (currently Arm64 only)

Lowering

Lowering ensures that all register requirements are exposed for the register allocator
- Use count, def count, “internal” reg count, and any special register requirements
- Does half the work of code generation, since all computation is made explicit
  - But it is NOT necessarily a 1:1 mapping from lowered tree nodes to target instructions
- Its first pass does a tree walk, transforming the instructions. Some of this is target-independent. Notable exceptions:
  - Calls and arguments
  - Switch lowering
  - LEA transformation
- Its second pass walks the nodes in execution order
  - Sets register requirements
    - sometimes changes the register requirements children (which have already been traversed)
  - Sets the block order and node locations for LSRA
    - LinearScan:: startBlockSequence() and LinearScan::moveToNextBlock()

Register Allocation

Register allocation is largely target-independent
- The second phase of Lowering does nearly all the target-dependent work
Register candidates are determined in the front-end
- Local variables or temps, or fields of local variables or temps
- Not address-taken, plus a few other restrictions
- Sorted by lvaSortByRefCount(), and marked “lvTracked”

Addressing Modes

The code to find and capture addressing modes is particularly poorly abstracted
genCreateAddrMode(), in CodeGenCommon.cpp traverses the tree looking for an addressing mode, then captures its constituent elements (base, index, scale & offset) in “out parameters”
- It optionally generates code
- For RyuJIT, it NEVER generates code, and is only used by gtSetEvalOrder, and by lowering

Code Generation

For the most part, the code generation method structure is the same for all architectures
- Most code generation methods start with “gen”
Theoretically, CodeGenCommon.cpp contains code “mostly” common to all targets (this factoring is imperfect)
- Method prolog, epilog,
genCodeForBBList
- walks the trees in execution order, calling genCodeForTreeNode, which needs to handle all nodes that are not “contained”
- generates control flow code (branches, EH) for the block

matarillo.com

The best days are ahead of us.

RyuJIT: Porting to different platforms

2019-02-05 00:00:19

What is a Platform?

非常に抽象度が高いビュー

Front-end changes

Backend Changes

Target ISA “Configuration”

Instruction Encoding

Adding Encodings

Lowering

Register Allocation

Addressing Modes

Code Generation