IR forms the bridge between the front-end and back-end. The middle-end analyzes and transformers the IR to optimize code via three categories:

  • Structural (graph-based) - ASTs, DAGs. Good for src-src translation but tend to be large in nature.
  • Linear (pseudo-code) - Three-addr code (TAC) in format op, x, y, z, which can be translated as x <- y op z. Each instruction has at most one def, and uses temp variables (t1, t2,โ€ฆ) that are generated by the compiler. This is a lot more compact and easier to rearrange.
  • Hybrid: Control-flow graphs + static single assignment (SSA). A control-flow graph has basic blocks as nodes, which are just maximal sequences of straight-line code which are broken by labels/branches and edges for control flow. SSA adds the constraint that every variable is defined exactly once, using phi-functions and merge points to reconcile names.