Please use this identifier to cite or link to this item: http://hdl.handle.net/10553/50487
DC FieldValueLanguage
dc.contributor.authorRamirez, Alexen_US
dc.contributor.authorFalcón, Ayose J.en_US
dc.contributor.authorSantana, Oliverio J.en_US
dc.contributor.authorValero, Mateoen_US
dc.date.accessioned2018-11-24T16:25:22Z-
dc.date.available2018-11-24T16:25:22Z-
dc.date.issued2014en_US
dc.identifier.isbn9781450328401en_US
dc.identifier.urihttp://hdl.handle.net/10553/50487-
dc.description.abstract© 2014 by the Association for Computing Machinery, Inc. (ACM).The Software Trace Cache is a compiler transformation, or a postcompilation binary optimization, that extends the seminar work of Pettis and Hansen PLDI'90 to perform the reordering of the dynamic instruction stream into sequential memory locations using profile information from previous executions. The major advantage compared to the trace cache, is that it is a software optimization, and does not require additional hardware to capture the dynamic instruction stream, nor additional memories to store them. Instructions are still captured in the regular instruction cache. The major disadvantage is that it can only capture one of the dynamic instruction sequences to store it sequentially. Any other control flow will result in taken branches, and interruptions of the fetch stream.Our results show that fetch width using STC was competitive with that obtained with the hardware TC, and was applicable to a wide range of superscalar architectures, because it does not require hardware changes to enable fetching from multiple basic blocks in a single cycle, even if only one branch prediction can be issued. Any front-end architecture built on a BTB that ignores branches as long as they are not taken, will automatically treat such branches as NOP instructions in terms of fetch [5].This was only the beginning. The impact of this optimization on fetch and superscalar processor architectures went much further than the original ICS paper in 1999.In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. CopyrightInstruction Level Parallelism, represented by wide issue out of order superscalar processors, was the trending topic during the end of the 90's and early 2000's. It is indeed the most promising way to continue improving processor performance in a way that does not impact application development, unlike current multicore architectures which require parallelizing the applications (a process that is still far from being automated in the general case). Widening superscalar processor issue was the promise of neverending improvements to single thread performance, as identified by Yale N. Patt et al. in the 1997 special issue of IEEE Computer about "Billion transistor processors" [1]. However, instruction fetch performance is limited by the control flow of the program. The basic fetch stage implementation can read instructions from a single cache line, starting from the current fetch address and up to the next control flow instruction. That is one basic block per cycle at most.Given that the typical basic block size in SPEC integer benchmarks is 4-6 instructions, fetch performance was limited to those same 4-6 instructions per cycle, making 8-wide and 16-wide superscalar processors impractical. It became imperative to find mechanisms to fetch more than 8 instructions per cycle, and that meant fetching more than one basic block per cycle.The Trace Cache [2] [3] [4] quickly established itself as the state of the art in high performance instruction fetch. The trace cache relies on a trace building mechanism that dynamically reorders the control flow of the program, and stores the dynamic instruction sequences in sequential storage, increasing fetch width./// However, it is a complex hardware structure that adds not only another cache memory to the on-chip storage hierarchy, but also requires a branch predictor capable of issuing multiple predictions per cycle to index the contents of the trace cache, and distinguish between multiple dynamic sequences of instructions.
dc.languageengen_US
dc.relation.ispartofProceedings of the International Conference on Supercomputingen_US
dc.sourceProceedings of the International Conference on Supercomputing, p. 45-47en_US
dc.subject330406 Arquitectura de ordenadoresen_US
dc.titleAuthor retrospective for Software trace cacheen_US
dc.typeinfo:eu-repo/semantics/conferenceObjectes
dc.typeConferenceObjectes
dc.relation.conference25th ACM International Conference on Supercomputing, ICS 2014
dc.identifier.doi10.1145/2591635.2594508
dc.identifier.scopus84907969513-
dc.contributor.authorscopusid55837529000-
dc.contributor.authorscopusid9733156400-
dc.contributor.authorscopusid7003605046-
dc.contributor.authorscopusid24475914200-
dc.description.lastpage47-
dc.description.firstpage45-
dc.investigacionIngeniería y Arquitecturaen_US
dc.type2Actas de congresosen_US
dc.date.coverdateEnero 2014
dc.identifier.conferenceidevents121528
dc.identifier.ulpgces
dc.description.ggs2
item.grantfulltextnone-
item.fulltextSin texto completo-
crisitem.author.deptGIR SIANI: Inteligencia Artificial, Robótica y Oceanografía Computacional-
crisitem.author.deptIU Sistemas Inteligentes y Aplicaciones Numéricas-
crisitem.author.deptDepartamento de Informática y Sistemas-
crisitem.author.orcid0000-0001-7511-5783-
crisitem.author.parentorgIU Sistemas Inteligentes y Aplicaciones Numéricas-
crisitem.author.fullNameSantana Jaria, Oliverio Jesús-
crisitem.event.eventsstartdate10-06-2014-
crisitem.event.eventsenddate13-06-2014-
Appears in Collections:Actas de congresos
Show simple item record

Page view(s)

67
checked on Jan 27, 2024

Google ScholarTM

Check

Altmetric


Share



Export metadata



Items in accedaCRIS are protected by copyright, with all rights reserved, unless otherwise indicated.