Different instructions take different number of cycles (latencies). If instruction B depends on instruction A’s result, and A takes e.g. 3 cycles, then B must wait. The scheduler reorders instructions to fill those wait cycles with useful work.

Dependence Graph

Nodes = instructions. We draw an edge from A to B if B depends on A (uses A’s result). We label each edge with A’s latency. There are three types of dependencies:

  • True dependence (RAW): A writes x, B reads x
  • Anti dependence (WAR): A reads x, B writes x
  • Output dependence (WAW): A writes x, B writes x

Critical Path

The critical path of a node is the longest weighted path from that node to any leaf. We compute bottom-up using DFS. Leaf nodes have a critical path = 0. For a node n with edges to children, critical path(n) = max over children c of latency(n c) + critical path(c).

List Scheduling

Use the critical path as priority, where a higher critical path means higher priority meaning that we schedule it first. Then, for each cycle:

  • Look at the ready list - instructions whose predecessors are all done and latencies have elapsed
  • Pick the ready instruction with the highest critical path
  • Issue it and add to the active list
  • If nothing is ready, stall (insert nop)

Example

A: loadI 5 → r1       (latency 1)
B: load x → r2        (latency 3)
C: add r1, r2 → r3    (latency 1)
D: loadI 7 → r4       (latency 1)
E: load y → r5        (latency 3)
F: mult r4, r5 → r6   (latency 4)
G: add r3, r6 → r7    (latency 1)

Naively, this order has 14 cycles with a lot of stalls. However, after list scheduling, we can get it down to 8 cycles. We do this by moving E and D earlier to fill stall slots while waiting for B’s load to complete.