Based on their research at Carnegie Mellon University, the authors
argue for billion-transistor uniprocessors. They divide the important
implementation problems into three components: instruction flow,
register dataflow, and memory dataflow. They also argue for trace caches
and advanced branch prediction. Their article, however, focuses on using
massive speculation at all levels to improve
... [Show full abstract] performance. They claim
that without this much speculation, future processors will be limited by
true data dependences, and will be unable to harvest enough
instruction-level parallelism (ILP) to improve performance
satisfactorily. Their investigations discovered large speedups on code
that have traditionally not been amenable to finding ILP