IR forms the bridge between the front-end and back-end. The middle-end analyzes and transformers the IR to optimize code via three categories:
- Structural (graph-based) - ASTs, DAGs. Good for src-src translation but tend to be large in nature.
- Linear (pseudo-code) - Three-addr code (TAC) in format
op, x, y, z, which can be translated asx <- y op z. Each instruction has at most one def, and uses temp variables (t1,t2,โฆ) that are generated by the compiler. This is a lot more compact and easier to rearrange. - Hybrid: Control-flow graphs + static single assignment (SSA). A control-flow graph has basic blocks as nodes, which are just maximal sequences of straight-line code which are broken by labels/branches and edges for control flow. SSA adds the constraint that every variable is defined exactly once, using phi-functions and merge points to reconcile names.