By John M. Levesque
Contents: creation; Supercomputer structure; Fortran; Vectorization of Fortran courses. Index. This ebook explains intimately either the underlying structure of modern-day supercomputers and the way in which a compiler maps Fortran code onto that structure. most vital, the constructs fighting complete optimizations are defined, and particular concepts for restructuring a software are supplied.
Read or Download A Guidebook to Fortran on Supercomputers PDF
Similar software books
This e-book constitutes the refereed complaints of the fifteenth overseas convention on Formal Engineering equipment, ICFEM 2013, held in Queenstown, New Zealand, in October/November 2013. The 28 revised complete papers including 2 keynote speeches provided have been conscientiously reviewed and chosen from 88 submissions.
The LNCS magazine Transactions on Aspect-Oriented software program improvement is dedicated to all elements of aspect-oriented software program improvement (AOSD) ideas within the context of all levels of the software program lifestyles cycle, from requisites and layout to implementation, upkeep and evolution. the point of interest of the magazine is on ways for systematic identity, modularization, illustration and composition of crosscutting matters, i.
This booklet constitutes the refereed court cases of the 14th foreign convention on Formal Engineering tools, ICFEM 2012, held in Kyoto, Japan, November 2012. The 31 revised complete papers including three invited talks provided have been rigorously reviewed and chosen from eighty five submissions. The papers tackle all present concerns in formal tools and their functions in software program engineering.
Additional info for A Guidebook to Fortran on Supercomputers
Since contiguous vectors are always handled more effi ciently than noncontiguous vectors, it is good practice to vary the left-most subscript in inner loops where possible. It is also important (if possible) to have the longest dimension of an array as the left-most to achieve long-vec tor processing in inner-loop references. Finally, note the diagonal processing, shown in loop 3020, has a stride of five, which is the length of a column plus one. We shall see in a later section that when nonunit strides are unavoidable, as in diagonal processing, it is sometimes important to adjust the dimension of the arrays to avoid memory-bank conflicts.
An additional device, the "stream unit," performs many special-purpose data-motion operations, among which are the "gather-periodic" and "scatter-periodic" instructions, which specifically handle strided data. These instructions can be used to vectorize the following loop: DO 2090 1 = 1 , 10000, 10 A(I) = B(I) + C(I) 2090 CONTINUE Here the vector pipeline cannot directly fetch or store every tenth item of data. So the vector stream unit issues gather-periodic instructions to fetch the necessary data from the B and C arrays and stores the data into temporary contiguous arrays in memory.
On the Cray computers there is only one of each kind of functional unit in a CPU: one adder, one multiplier, and so forth. " For these machines, the multiple pipelines act in a manner similar to those of the CYBER 205; that is, all duplicate func tional units work on the same vectors, each taking a separate segment of the data. In effect, it is as if the vector length were divided by the number of pipelines, with the time to complete a vector operation reduced by about the same factor. Chaining.