Guidelines for Writing System Software and Tools Requirements
Specification of Baseline Development Environment
Version 2.0 (November 14, 1995)
This portion of the document lays the capabilities that are needed by an
overwhelming majority of HPC user sites. The recommendation of the task force
is that this entire Baseline Development Environment be included on all
procurements for parallel and clustered machines. Most sites will have
additional requirements for system software and tools; these are addressed in
other sections of the document.
Terminology
The following terms have been defined for specific use in this document:
platform, PE (Processing Element), API (Application
Programming Interface), standard API, published API, current
standard, fully-supported implementations, XXX-compatible
software, and single-point control interfaces.
See Terms and Definitions section for details.
Contents
Components of the Baseline Development Environment (BDE)
Shells and Utilities
- BDE-1a Fully supported implementation of sh, as specified in POSIX
1003.2.
- BDE-1b Fully supported implementation of csh
(compatible with version 4 of System V UNIX).
- BDE-1c Fully supported implementation of grep, egrep, sed,
and diff, version 2.0.
- BDE-1d Fully supported implementation of vi
(compatible with version 4 of System V UNIX, including
ex), supporting line-oriented as well as full-screen mode.
- BDE-1e Fully supported make, conforming to the
GNU make interface (version 3.74).
(See Make Problems)
- BDE-1f Fully supported implementation of an object module linker.
Language Support
- BDE-1g Fully supported implementations of Fortran77 (ANSI standard
plus MIL standard extensions) and C (current ANSI standard).
These will be referred to hereafter as the baseline
languages.
- BDE-1h Fully supported implementation of mechanisms for
mixed-language applications
(i.e., allowing
inter-language subprocedure invocations), for the baseline
languages.
Stack Traceback Utilities
- BDE-2a
Fully supported implementation of a feature whereby
critical information is generated to stderr upon interruption
of a process/thread involving any trap for which the application
has not defined a handler. The information will include a
source-level stack traceback
(indicating the approximate
location of the process/thread in terms of source routine and
line number) and an indication of the interrupt type.
Interactive Debugger
- BDE-2b Fully supported implementation of an interactive parallel
debugger providing single-point control for debugging both
sequential and parallel applications (multiple debugger invocations
to control individual processes are not acceptable). At least
the following functionality must be supported:
- control of parallel processes: start/stop
processes, set/list/remove breakpoints and data watchpoints,
single-step into/over subprocedure invocations
- examination of program state: stack traceback(s) for
processes, contents of variables, aggregates, and blocks of
memory, current states and source locations of processes
- modification of program state: change contents of
variables, aggregates, and blocks of memory
The debugger must report information at the level of
application source code (before preprocessing) for all baseline
languages, including support for mixed-language applications.
Both a full-screen, window-based interface and a command-line
interface must be fully supported, although they need not be
functionally identical.
(Some Debugging Scenarios)
- BDE-2c Where the programming model supports it, fully supported
implementation of some mechanism for
viewing and controlling MPMD (multiple executable) as well as SPMD
applications.
- BDE-2d In the presence of code optimization, fully supported
implementation of some mechanism for reporting at least
minimal information on program state (stack traceback, access to
variables that have not been eliminated) and some degree of
functionality (breakpoints where possible, single-stepping at some
level, stepping over subroutines).
- BDE-2e Fully supported implementation of some mechanism for invoking
the debugger for examining the final state of an application
that failed
("postmortem debugging"). Facilities for modifying program state
and/or continuing execution need not be available in this mode.
If the code was not compiled for debugging, it is understood that
access to source-level information may be limited.
Performance Tuning Tools
- BDE-2f Fully supported implementation of a tool for profiling
CPU time distribution from all processes/threads in a parallel
application, at the levels of subprocedures and coarse blocks (e.g.,
large loops). Must include capability for statically restricting
the amount of profiling data collected
to certain portions of the source code (e.g., a specific
subset of procedures), through the use of compiler directives or
command-line switches. Must provide visual as well as textual
displays of tool output.
- BDE-2g Fully supported implementation of an event tracing tool.
Mechanisms for generating event records must include timestamp
and event type designator and be formatted in SDDF (self-defining
data format), and require the availability
of a published API (possibly platform-specific)
for dynamically activating and deactivating event monitoring
during execution. A single visual tool must be capable of
displaying the event data.
- BDE-2h For all message-passing libraries supported on the platform,
fully supported implementation of some mechanism
for tracing message sends, receives, and
synchronizations, at least to the level supported interactively
by the Parallel Tool Consortium's "message queue manager."
- BDE-2i Fully supported implementation of performance statistics
tool(s), whereby performance measures obtained for individual
PEs/processes are reported and summarized for the entire
application. There must be some mechanism for capturing the
statistics and storing them for later analysis/viewing. The
measures may be platform-specific, but must include a summary
of memory usage.
Programming Libraries
- BDE-3a Fully supported implementation of MPI at the current standard,
as defined by the most recent specification from the MPI Forum.
- BDE-3b Fully supported implementation of the dynamic process control
routines specified by the MPI Forum (released at Supercomputing
'95).
- BDE-3c Fully supported implementation of PVM (version 3.3.7).
- BDE-3d Fully supported implementation, as defined by the POSIX 1003.4
working group standard, of thread operations, in shared address
spaces.
Math Libraries
- BDE-3e Fully supported implementation of
a published API (may be platform-specific) for one-, two- and
three-dimensional FFTs for both radix-2 and mixed-radix arrays,
executed on a single PE. Must handle complex-to-complex,
real-to-complex, and complex-to-real formats.
- BDE-3f Fully supported implementation of
a published API (may be platform-specific) for one-, two- and
three-dimensional FFTs for both radix-2 and mixed-radix arrays,
in parallel form for execution across multiple PEs, handling
same formats as BDE-3e.
- BDE-3g Fully supported implementation of levels 1, 2, and 3 of the
BLAS, executed on a single PE.
- BDE-3h Fully supported implementation of LAPACK (single-PE)
and ScaLAPACK (multiple-PE).
- BDE-3i Fully supported implementation of
a published API (may be platform-specific) for a parallel lagged
Fibonacci random number generator using Mascagni's seed selection
algorithm, so that
- the same seed for same random number generator produces
the same (reproducible) sequence of random numbers on all
platforms; and
- there is a mathematically sound method of choosing seeds for
the capability of producing different sequences of random
numbers on different processors.
- BDE-3j Fully supported implementation of
a published API (may be platform-specific) for transposing
arrays among the PEs corresponding to all permutations of the
array's indices, including straightforward (blocked) distribution.
It must be possible for the user to specify which indices
correspond to data that is distributed.
- BDE-3k Fully supported implementation of
a published API (may be platform-specific) for converting
among standard data decompositions, including the ScaLAPACK
distribution, blocked distributions where up to three indices are
distributed (other indices area serial), and all other distributions
supported on the platform.
Performance Measurement Libraries
- BDE-3l Fully supported implementation of the standard API defined by
the Parallel Tools Consortium for interval wallclock timers local
to a thread/process. This must access the best available
wallclock timer on the platform, in terms of accuracy and
non-intrusiveness.
- BDE-3m Fully supported implementation of the standard API defined by
the Parallel Tools Consortium for interval CPU timers local
to a thread/process. This must provide access to both
user CPU time and system CPU time (where criteria for each may
be platform-dependent) and must access the best available timers
the platform, in terms of accuracy and non-intrusiveness.
Parallel I/O
- BDE-3n Fully supported implementation of a published API (possibly
platform-specific) supporting four kinds of concurrent file I/O,
where it is the user's responsibility to ensure that all
participating processes open the logical file in the same mode:
- Sequential Read:
All participating processes
read from a logically shared file using a shared file
pointer. Each record will be read just once.
(Usage Scenario)
- Parallel Read:
All participating processes read
from a logically shared file using independent file pointers.
Thus, each process reads each record.
(Usage Scenario)
- Sequentialized Write:
All participating processes
write to a logically shared file using a shared file pointer.
Records are atomic and cannot be overwritten, but they may
be merged into the shared output file in any order.
(Usage Scenario)
- Direct Access Read/Write:
Each participating process
can read or write any specified record location of a logically
shared file. It is the user's resposibilty to assure that
records do not overlap. When a file is open for reading
and writing at the same time, the effect of reads and
writes into the same locationn is implementation dependent.
Updates are not guaranteed to take effect until the file
is closed.
(Usage Scenario)
- BDE-3o All processes in a parallel application must be capable of
performing the operations in BDE-3n, although there may be variation
in performance from one process to another.
- BDE-3p A process' buffers associated with the operations in BDE-3n must
be flushed automatically upon completion or failure of the
process.
Authentication/Security and Namespace Management
- BDE-4a Fully supported implementation of DCE-compatible (version 1.1)
authentication and access control services. (Note that Kerberos
version 5 satisfies the authentication portion of this
requirement.) Such facilities must be capable of processing
messages, and also the following UNIX commands:
- login and passwd on machines defined to the
authentication facility; and
- rcp, rlogin, rsh, rexec, telnet, and ftp coming
from machines defined to the authentication facility.
- BDE-4b Fully supported implementation of DCE-compatible RPCs.
- BDE-4c Fully supported implementation mapping service names to service
locations, so that clients are not required to know locations.
- BDE-4d Fully supported implementation of mechanisms for the name mapping
service specified in BDE-4c, so that it works for the RPCs specified
in BDE-4b, as well as any system messages supported (e.g., Mach
messages).
- BDE-4e Message delivery must be guaranteed; neither messages nor RPCs
may be discarded or ignored without notification to the sender.
- BDE-4f Fully supported TCP/IP suite.
File System
- BDE-4g Fully supported implementation of POSIX-compliant (version 1003)
file system, including long filename support.
- BDE-4h Fully supported implementation of file system larger than
4 gigabytes.
- BDE-4i Fully supported implementation of file system capable of
supporting files larger than 4 gigabytes.
Job Management and Scheduling
- BDE-4j Fully supported commands to manipulate a job as a single
entity, including kill, modify, query characteristics, and query
state (similar to commands provided by UNIX processes); must
include mechanisms whereby system can
fully kill all processes of any job and free all resources. Crash
recovery methods must clean up all cases of "partially dead" jobs,
taking special care to release locks on allocated resources.
- BDE-4k Fully supported implementation of a batch system interface
conforming to POSIX 1003.2d. (Note that PBS 2.1 satisfies this
requirement.)
- BDE-4l Capability for a single batch system to span any subset of
user-accessible PEs.
- BDE-4m Fully supported implementation of spacesharing, or tiling,
making it possible to allocate PEs as dedicated
resources to support non-overlapping jobs. This feature
is critical for benchmarking purposes and for special
resources that may suffer performance degradation if shared
among multiple jobs.
Resource Management and Accounting
- BDE-4n Fully supported implementation to manage a minimal set of
resources, including number and type of PEs, plus per-PE as
as well as aggregate CPU time, wallclock time, memory (high-water
allocation), network adapters, and temporary disk space.
- BDE-4o At least the minimal resource set, as defined in BDE-4n, must be
allocatable to individual jobs.
- BDE-4p At least the minimal resource set, as defined in BDE-4n, must be
allocatable to individual processes within a job.
- BDE-4q Availability of a published API (possibly platform-specific) for
getting and setting the status of at least the minimal resource
set, as defined in BDE-4n.
- BDE-4r Fully supported implementation of job accounting, where data
for all processes of a job is combined to provide an aggregate job
accounting record, for at least the minimal resource set defined
in BDE-4n.
- BDE-4s Fully supported implementation of mechanisms enforcing a hard
limit for at least the minimal resource set defined in BDE-4n.
- BDE-4t Fully supported implementation of mechanisms for detecting
and reporting failures of critical resources, including
PEs, network paths, and disks.
- BDE-5a Fully supported implementation of a single-point
system administration tool for parallel or clustered machines
administered as a single system to handle:
- file system mounts
- PE booting, where appropriate
- PE status, where appropriate
- PE consistency checks, where appropriate
- software installation
- resource administration
- BDE-6a Fully supported availability of online versions, in a
non-proprietary format (preferably SGML, HTML, or PostScript)
for all documentation on baseline software.
Back to document home page.