Next:
The Cray XT3.
Up:
Recount of (almost) available ...
Previous:
The Cray Inc. X1.
| Machine type |
Distributed-memory multi-vector processor. |
| Models |
XD1. |
| Operating system |
Linux (kernel 2.4.21 with Cray HPC enhancements). |
| Connection structure |
Variable (see remarks). |
| Compilers |
Fortran 90, C, C++. |
| Vendors information Web page |
www.cray.com/products/xd1/ |
| Year of introduction |
2004. |
System parameters:
| Model |
Cray XD1 |
| Clock cycle |
2.2 GHz |
| Theor. peak performance |
| Per Chassis (see remarks) |
52.8+ Gflop/s |
| Per Rack (see remarks) |
663+ Gflop/s |
| Memory |
| Per Chassis |
96 GB |
| Per Rack |
1.2 TB |
| No. of processors |
| Per Chassis |
12 |
| Per Rack |
144 |
| Communication bandwidth |
| Point-to-point |
≤ 2.9 GB/s |
| Aggregate per Chassis |
96 GB/s |
Remarks:
The Cray XD1 is a product that was originally developed by Octigabay until this
company was taken over by Cray. A distinctive factor in the Octigabay systems
was the possibility to add FPGAs (see Glossary) to
the compute boards of the systems to accelerate algorithms that are of special
interest to the user, like massive FFTs or DNA sequence alignments. Hence the
plus symbols in the entries for the Theorectical Peak Performance in the System
Parameters list above. Cray turned the system into a product by adding its
special communication networking capability to connect the compute boards and
the nodes, called “chassis” by Cray by means of its proprietary
Rapid Array Network.
The general structure of an XD1 is as follows: one chassis houses up to 6
compute cards. Each compute card has 2 AMD Opterons at 2.2 GHz and one or two
RapidArray Processors (RAPs) that handle the communication. The two Opterons on
a card are connected via AMD's HyperTransport with a bandwidth of 3.2 GB/s
forming a 2-way SMP. Because of the high bandwidth of the HyperTransport bus
the memory access does not suffer from using two processors on a board, unlike
in most 2 processor/node clusters. Optionally an application acceleration
processor (FPGA) can be put onto a compute board. With 2 RAPs/board a bandwidth
of 8 GB/s (4 GB/s bi-directional) between boards is available via a RapidArray
switch. This switch has 48 links of which half is used to connect to the RAPs
on the compute boards within the chassis and the others can be used to connect
to other chassis. Twelve chassis fit into a standard rack and because of the
number of free links per RapidArray switch the chassis in two racks may be
connected directly. Of course larger configurations can be put together by
connecting the links in a more sparsely connected network, like a 3-D torus or
a fat tree.
The RAPs offload the Opteron processors from communication tasks and have
hardware support for MPI, Cray-style shmem and Global Arrays (a
virtual shared memory system). The communication characteristics for MPI via
the RapidArray network as stated by Cray are impressive: 2.9 GB/s for long
messages and a 1.6 µs latency for small messages.
An extra feature of the Cray-enhanced Linux OS is the synchronisation of tasks
in the system. The random scheduling of tasks within the system (by the OS or
otherwise) can result in large latencies (see
[31]) that may be detrimental to the MPI
performance. By task synchronisation this problem can be evaded.
Measured Performances:
The Cray XD1 is still new. Evaluation of 6 chasis test is presently being
performed at ORNL. See the website of Tom Dunigan at
[12].
Next:
The Cray XT3.
Up:
Recount of (almost) available ...
Previous:
The Cray Inc. X1.
Aad van der Steen
Tue Mar 8 12:10:07 CET 2005
|