Author retrospective for Software trace cache

Ramirez, Alex; Falcón, Ayose J.; Santana, Oliverio J.; Valero, Mateo

Please use this identifier to cite or link to this item: https://accedacris.ulpgc.es/jspui/handle/10553/50487

DC Field	Value	Language
dc.contributor.author	Ramirez, Alex	en_US
dc.contributor.author	Falcón, Ayose J.	en_US
dc.contributor.author	Santana, Oliverio J.	en_US
dc.contributor.author	Valero, Mateo	en_US
dc.date.accessioned	2018-11-24T16:25:22Z	-
dc.date.available	2018-11-24T16:25:22Z	-
dc.date.issued	2014	en_US
dc.identifier.isbn	9781450328401	en_US
dc.identifier.uri	https://accedacris.ulpgc.es/handle/10553/50487	-
dc.description.abstract	© 2014 by the Association for Computing Machinery, Inc. (ACM).The Software Trace Cache is a compiler transformation, or a postcompilation binary optimization, that extends the seminar work of Pettis and Hansen PLDI'90 to perform the reordering of the dynamic instruction stream into sequential memory locations using profile information from previous executions. The major advantage compared to the trace cache, is that it is a software optimization, and does not require additional hardware to capture the dynamic instruction stream, nor additional memories to store them. Instructions are still captured in the regular instruction cache. The major disadvantage is that it can only capture one of the dynamic instruction sequences to store it sequentially. Any other control flow will result in taken branches, and interruptions of the fetch stream.Our results show that fetch width using STC was competitive with that obtained with the hardware TC, and was applicable to a wide range of superscalar architectures, because it does not require hardware changes to enable fetching from multiple basic blocks in a single cycle, even if only one branch prediction can be issued. Any front-end architecture built on a BTB that ignores branches as long as they are not taken, will automatically treat such branches as NOP instructions in terms of fetch [5].This was only the beginning. The impact of this optimization on fetch and superscalar processor architectures went much further than the original ICS paper in 1999.In superscalar processors, capable of issuing and executing multiple instructions per cycle, fetch performance represents an upper bound to the overall processor performance. Unless there is some form of instruction re-use mechanism, you cannot execute instructions faster than you can fetch them. CopyrightInstruction Level Parallelism, represented by wide issue out of order superscalar processors, was the trending topic during the end of the 90's and early 2000's. It is indeed the most promising way to continue improving processor performance in a way that does not impact application development, unlike current multicore architectures which require parallelizing the applications (a process that is still far from being automated in the general case). Widening superscalar processor issue was the promise of neverending improvements to single thread performance, as identified by Yale N. Patt et al. in the 1997 special issue of IEEE Computer about "Billion transistor processors" [1]. However, instruction fetch performance is limited by the control flow of the program. The basic fetch stage implementation can read instructions from a single cache line, starting from the current fetch address and up to the next control flow instruction. That is one basic block per cycle at most.Given that the typical basic block size in SPEC integer benchmarks is 4-6 instructions, fetch performance was limited to those same 4-6 instructions per cycle, making 8-wide and 16-wide superscalar processors impractical. It became imperative to find mechanisms to fetch more than 8 instructions per cycle, and that meant fetching more than one basic block per cycle.The Trace Cache [2] [3] [4] quickly established itself as the state of the art in high performance instruction fetch. The trace cache relies on a trace building mechanism that dynamically reorders the control flow of the program, and stores the dynamic instruction sequences in sequential storage, increasing fetch width./// However, it is a complex hardware structure that adds not only another cache memory to the on-chip storage hierarchy, but also requires a branch predictor capable of issuing multiple predictions per cycle to index the contents of the trace cache, and distinguish between multiple dynamic sequences of instructions.
dc.language	eng	en_US
dc.relation.ispartof	Proceedings of the International Conference on Supercomputing	en_US
dc.source	Proceedings of the International Conference on Supercomputing, p. 45-47	en_US
dc.subject	330406 Arquitectura de ordenadores	en_US
dc.title	Author retrospective for Software trace cache	en_US
dc.type	info:eu-repo/semantics/conferenceObject	es
dc.type	ConferenceObject	es
dc.relation.conference	25th ACM International Conference on Supercomputing, ICS 2014
dc.identifier.doi	10.1145/2591635.2594508
dc.identifier.scopus	84907969513	-
dc.contributor.authorscopusid	55837529000	-
dc.contributor.authorscopusid	9733156400	-
dc.contributor.authorscopusid	7003605046	-
dc.contributor.authorscopusid	24475914200	-
dc.description.lastpage	47	-
dc.description.firstpage	45	-
dc.investigacion	Ingeniería y Arquitectura	en_US
dc.type2	Actas de congresos	en_US
dc.date.coverdate	Enero 2014
dc.identifier.conferenceid	events121528
dc.identifier.ulpgc	Sí	es
dc.description.ggs	2
item.grantfulltext	none	-
item.fulltext	Sin texto completo	-
crisitem.event.eventsstartdate	10-06-2014	-
crisitem.event.eventsenddate	13-06-2014	-
crisitem.author.dept	GIR SIANI: Inteligencia Artificial, Robótica y Oceanografía Computacional	-
crisitem.author.dept	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.dept	Departamento de Informática y Sistemas	-
crisitem.author.orcid	0000-0001-7511-5783	-
crisitem.author.parentorg	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.fullName	Santana Jaria, Oliverio Jesús	-
Appears in Collections:	Actas de congresos

Show simple item record

Page view(s)

51

checked on Jan 10, 2026

Page view(s)

Google Scholar^TM

Altmetric

Share

Export metadata

Dirección

Contacto

Legal

De interés

Page view(s)

Google ScholarTM

Altmetric

Share

Export metadata

Dirección

Google Scholar^TM