Microarchitecture

Fact There can be many implementations of the same (Instruction Set Architecture)ISA ISA|350

Process Instruction

Def Processing an Instruction Processing an instruction is to transforming state AS to state AS’ according to the ISA specification of the instruction, where ISA specifies abstractly what AS’ should be, given an instruction and AS(i.e. transition function in an abstract finite state machine), and the microarchitecture implements how AS is transformed to AS’. For example, it can transform AS to AS’ in a single clock cycle, or take multiple clock cycles to transform AS to AS’.

Fact Almost every high-performance computer processes instructions out of order (OOO). In-program-order instruction processing (execution) is an illusion

Fact Stages in “Instruction Processing”:

Fetch
Decode/RF-Read
Execute
Memory Access
Writeback e.g. For the ARM assembly code LDR R2, [R0, #40], the processing would be:
Fetch the instruction(envolv reading PC)
Decode the instruction and RF-Read R0 and #40
Execute R0 + #40
Memory access, retriving the value in memory address R0+#40
Write back to R2 e.g. For the ARM assembly code AND R5, R12, R13, the processing would be:
Fetch the instruction(envolv reading PC)
Decode the instruction and RF-Read R12 and R13
Execute R12 & R13
Memory access, retriving the value in memory address R12 & R13
Write back to R5

Def Single-cycle machines is the machine such that:

Each instruction takes a single clock cycle
All state updates made at the end of an instruction’s execution
Big disadvantage: The slowest instruction determines cycle time

Def Multi-cycle machine is the machine such that:

Instruction processing broken into multiple cycles/stages
State updates can be made during an instruction’s execution
Architectural state updates made at the end of an instruction’s execution
Advantage over single-cycle: The slowest “stage” determines cycle time

Instruction Processing Engine

Fact An instruction processing engine consists of two components, datapath and control logic

Def Datapath Datapath consists of hardware elements that deal with and transform data signals.

Functional units that operate on data
Hardware structures (e.g., wires, muxes, decoders, tri-state bufs) that enable the flow of data into the functional units and registers
Storage units that store data (e.g., registers)

Def Control Logic Control logic consists of hardware elements that determine control signals, i.e., signals that specify what the datapath elements should do to the data.

Pipelined Microarchitecture

Def Pipelined Microarchitecture Pipelined Microarchitecture has following properties:

Multiple instructions (up to 5) can be in the pipeline in any cycle
Each instruction can be in a different stage(i.e. maximize utilization of hardware resources)
Stages must be isolated from one another using pipelined register(non-arch. registers), referred to as “PPR”
The work of a stage should be preserved in a PPR each cycle
PPR acts as a source of data the next stage needs in a subsequent cycle

Prop Instruction latency with pipelining:

Pipelining does not help to reduce the latency of a single instruction.
- Latency of a single instruction increases as sequencing overhead of pipeline registers and clock cycle time is decided by slowest pipeline stage
Pipelining helps increase the throughput of an entire workload as long as number of instructions is sufficiently large

Def Timing Diagrams To visualize the execution of many instructions in a pipeline we can use where time is on the horizontal axis, instructions are on the vertical axis. cycle|400

Balanced Pipeline

Pipeline Hazards

Fact When multiple instructions are handled concurrently there is a danger of hazard.

Structural hazard
- When two instructions want to use the same resource
- Memory for instructions (F) and data (M)
- Register file is accessed in two different stages
Data hazard
- When a dependent instruction wants the result of an earlier instruction
Control hazard
- When a PC-changing instruction is in the pipeline

Def Data Dependency Data dependency in computer science is a situation in which a program statement (instruction) refers to the data of a preceding statement. There are three types of data dependencies: true, anti, and output.

True dependence (also known as flow dependence or read-after-write) occurs when an instruction depends on the result of a previous instruction. For example, B = A + 1 is truly dependent on A = 3.
Anti-dependence (also known as False-dependence or write-after-read) occurs when an instruction requires a value that is later updated. For example, A = B + 1 is anti-dependent on B = 7.
Output dependence (also known as write-after-write) occurs when both instructions write the same memory location. For example, A = 3 and A = 4 are output dependent.

Solutions to Pipeline Hazards

Summary

Read-after-Write Hazards Load-Use Hazards Control Hazards
Software Interlocking ✅ ✅ ✅
Forwarding or Bypassing ✅
Stall ✅ ✅
Branch Prediction ✅

	Read-after-Write Hazards	Load-Use Hazards	Control Hazards
Software Interlocking	✅	✅	✅
Forwarding or Bypassing	✅
Stall		✅	✅
Branch Prediction			✅

Def Software Interlocking Software interlocking is a solution to pipeline hazards such that insert NOPS in code at compile time.

Prop Software Interlocking solution has two main weaknesses:

Programming is complicated
Speed is degraded

Def Forwarding and Bypassing Forwarding or bypassing is a solution to read-after-write hazards that needs extra hardware to send the result from the Memory or Writeback stage to a dependent instruction in the Execute stage. i.e. We can bypass the register file and get results early from pipeline register. Concretely, it checks if register read in EX stage matches register written in MEM or WB stage, if so, it will forward the result. fb|500

Def Stall Stall is a solution for Load-Use Hazards such that stall the dependent instruction in Decode stage. Stalling a stage requires disabling the pipeline register, so that the contents do not change. Stalls degrade performance so must be used only when needed. stall|500

Def Bubble The used stage propagating through the pipeline is called a bubble. bubble|500

Def Control Hazards Control hazards are due to changes to sequential control flow.

Branch (B) instructions
Writes to PC (R15) by regular instructions

Prop Stall the pipeline on a branch instruction can resolve control hazards.

Instruction is fetched in the first stage
Branch is resolved in the last (fifth) stage
Four stall cycles is a very high penalty for a branch

Def Branch Prediction Branch prediction is a solution exclusively for control hazards.

If the outcome is correct, continue execution (zero penalty)
If the outcome is wrong (branch misprediction), clean up the pipeline, and restart from the correct target instruction

Science Done Right

Explorer

Microarchitecture

Process Instruction

Instruction Processing Engine

Pipelined Microarchitecture

Balanced Pipeline

Pipeline Hazards

Solutions to Pipeline Hazards

Graph View

Table of Contents

Backlinks