Supercomputer history

supercomputer

Request a supercomputer!

In this post we want to share information about Supercomputer history. Some of the information belongs to wikipedia.org. Here is the Supercomputer history:

Colossus Computer Project in 1944

The supercomputers built and introduced in the 1960s were designed by Seymour Cray of the Control Data Corporation (CDC), and until the 1990s, the market was dominated by these supercomputers. When Seymour Cray left CDC to start his own company, Seymour Research, with his new designs, he took control of the supercomputer market and was the dominant force in supercomputing from 1985 to 1990. Cray himself never used the term “supercomputer” and is mostly remembered for only using the word “computer.” In 1980, coinciding with the emergence of the microcomputer market that had appeared a decade earlier, many small competitors entered the market. However, many of these were eliminated in the 1990s as supercomputing market struggles intensified. Today, supercomputers are custom-designed systems produced by industrial companies like IBM and HP, the same companies that acquired many companies from the 1990s to leverage their experience. Cray’s company still professionally manufactures supercomputers. The term “supercomputer” is not particularly stable or fixed; a supercomputer today may tomorrow be just a regular computer. The first CDC machines with scalar processors were incredibly fast—ten times faster than the fastest machines of other companies. In the 1970s, most supercomputers focused on vector computations, and many new competitors introduced their own processors at lower prices using the same method to enter the market. By the mid-1980s, machines with a small number of vector processors operating in parallel became the standard. Each machine typically had fourteen to sixteen vector processors. In the late 1980s and 1990s, attention shifted from vector processors to conventional parallel processor systems that had thousands of regular microprocessors, some of which were pre-made and some custom orders (in technical terms, this is called the “micro killer attack”). Today, parallel designs are based on off-the-shelf server-type microprocessors, including Power PC, Itanium, x86-64, and the most modern supercomputers are computer clusters with precise configurations, featuring compact processors and custom internal interfaces as needed.

Software Tools Software tools for distributed processing include standard APIs such as MPI, PVM, and open-source software tools like Beowulf, Warewulf, and Open Mosix, which make building a supercomputer from a number of servers or work units possible. Technologies like ZerConf (Rendez-Vous/Bonjour) are used to create the required computing packages for specialized software like Apple’s Shake. There is still no simple programming language for supercomputers in computer science, and it remains a good research topic. Applications once cost thousands of dollars, but today, thanks to the open-source community (which sometimes produces fascinating technologies), they are free.

General Uses Supercomputers, with their many RAMs and capabilities, are typically used for sensitive computational operations such as quantum physics problems, meteorology, climate research (including global warming studies), molecular modeling (studying the structures and contents of chemical compounds, biological macromolecules, polymers, and crystals), physical simulations (such as airplane simulations in wind tunnels, nuclear weapon explosion simulations, and nuclear attachment research), cryptanalysis, and more. Large universities, military centers, and large scientific research laboratories are the biggest users. A particular type of problem called “hard problems” refers to issues that require semi-infinite computing resources to solve. One notable concept in this context is the distinction between computation, computational ability, and capacity, as examined by Graham and colleagues. Computational ability refers to using maximum computational power to solve a large problem in the shortest time possible. This system can often solve problems with scale and complexity that no other computer can handle. On the other hand, computational capacity refers to using affordable and efficient computing power to solve a smaller or less complex problem or many smaller problems, or preparing for execution on a system with the ability to handle larger problems.

Hardware and Software Design Supercomputers that had custom processors used to achieve their speed on conventional computers by leveraging innovative designs that allowed multiple tasks to be performed in parallel, similar to engineering. They were used only for specific types of computations like numerical calculations and were weak in general computing tasks. Their memory hierarchy was carefully designed to constantly provide data and instructions to the processor. Essentially, the main difference between supercomputers and slower computers lies in their memory hierarchy. Their input/output system was designed for high bandwidth with very low latency since supercomputers are fundamentally not designed for processing transfers. Here, as in any parallel system, Amdahl’s law applies. Various supercomputer designs work hard to eliminate software serialization and, to address remaining bottlenecks and accelerate processing, use hardware solutions.

Supercomputer Technologies and Challenges

A Beowulf cluster

A supercomputer generates a lot of heat and needs cooling. Cooling many supercomputers is a significant issue for HVAC systems. Information cannot be transferred between parts of a computer faster than the speed of light. Therefore, a supercomputer that is several meters wide must maintain the delay between its components in the range of a few tens of nanoseconds. This issue is why Seymour Cray’s designs aimed to use shorter cables as much as possible, leading to the cylindrical shape of Cray’s designs. In supercomputers with many CPUs working in parallel, a delay of one to five microseconds between processors is common. To send messages between processors, large amounts of data are consumed and produced in a short time. Ken Batcher states that the device used to send messages between processors limits computation to I/O tasks. To ensure rapid transmission and correct storage and retrieval of information, a lot of work must be done on external storage bandwidth. Technologies developed for supercomputers include:

    • Vector processing

    • Liquid cooling

    • Non-uniform memory access (NUMA)

    • Stripe disks (the first example of what would later be called RAID)

    • Parallel file systems

    • Processing techniques

    • Early vector processing techniques were designed and created for supercomputers and are used in high-level and specialized applications. These techniques have also entered the DSP architecture and SIMD computing solutions markets. Modern gaming consoles, for example, make extensive use of SIMD, which is why some manufacturers claim their gaming machines are supercomputers. In reality, some graphics cards can perform several teraFLOPS of computational work. Early computational processes were designed for specific purposes, limiting the applications that could benefit from this power. As video games advanced, graphical processing units (GPUs) transformed into more general-purpose vector processors, and a complete discipline in computer science emerged to use this ability, called General-Purpose GPU (GPGPU) computing.

Operating System Supercomputer operating systems, often variations of Linux or Unix, are typically just as complex as those of smaller machines, if not more. However, the user interface is simplified because OS developers have fewer resources to invest in unnecessary OS parts (i.e., those not directly related to optimizing hardware usage). This is mainly because these computers cost millions of dollars, but their market size is very small, so their R&D budgets are often limited. The existence of Unix and Linux allows traditional desktop user interfaces to be reused. Interestingly, this trend continues in the supercomputer industry, with old technology leaders like Silicon Graphics being overtaken by newer companies like nVIDIA, which can produce affordable and innovative products, benefiting from the large customer base that supports their R&D. Historically, until the early to mid-1980s, supercomputers often sacrificed instruction set compatibility and code portability for performance, memory access speed, and processing speed. Many supercomputers still have operating systems very different from high-end mainframe computers, which are much more expensive. For example, the Cray-1 had six unique OS versions that the computing community had no knowledge of. Similarly, there were many vectorizing and parallelizing compilers for Fortran available. Without the instruction set compatibility between Cray-1 and Cray X-MP and the adoption of Unix-based OSs like CrayUnicos and Linux, this situation would have repeated for ETA-10 as well. Hence, future systems with high applicability will likely incorporate Unix characteristics but with specific, incompatible features, especially for very technical and expensive systems with secure features.

Programming The parallel architecture of supercomputers requires the use of specific programming techniques to achieve high speeds. Targeted Fortran compilers can typically generate faster code than C or C++. For this reason, Fortran remains the best choice for scientific programming and, of course, for most programs running on supercomputers. To exploit the parallelism of supercomputers, specialized programming environments are used, such as PVM and MPI for distributed, widely separated computing tasks, and OpenMP for machines with shared memory systems that are close together.

Modern Supercomputer Architecture As seen in the November 2006 list, the top ten computers in the top 500 list (along with many others) have high-level but similar architectures. Each consists of a set of multiprocessors that are fully SIMD. The difference between supercomputers depends on the number of multiprocessors in the set, the number of processors in each multiprocessor, and the number of operations each SIMD processor can perform simultaneously. These systems typically include:

    • A computing set where the computers are extensively connected via high-speed networks or switching fabrics. Each computer operates under its own OS instance.

    • A multiprocessor computer, which operates under a specific OS and has more than one CPU, where software performance is independent of the number of processors. Tasks like symmetric multiprocessing (SMP) and non-uniform memory access (NUMA) are performed together.

    • A SIMD processor executes the same instruction over multiple data sets simultaneously. The processor can be general-purpose or specialized for certain applications. Performance levels can vary.

Supercomputers

According to a November 2006 review, Moore’s Law and economy of scale are the primary factors in supercomputer design. A modern desktop PC is now more powerful than a supercomputer from fifteen years ago, and the designs that previously allowed supercomputers to outperform desktop machines are now used in PC design. Additionally, the costs of creating chips make it uneconomical to design custom chips for limited applications, favoring mass production of chips with broader appeal to cover production costs. A quad-core Xeon workstation with a 2.66 GHz performance surpasses a multi-million dollar Cray C90 supercomputer used in the 1990s, and a very large amount of work that required such a supercomputer in the 1990s is now done with a workstation costing less than $4,000. Problems that supercomputers solved mostly had to be parallelized (i.e., dividing a large task into smaller tasks for simultaneous execution) into large chunks to reduce the amount of information transferred between independent processing units. This allows for the use of standard design packages with programmable single and convergent functionality instead of many traditional supercomputers.

Purpose-Built Supercomputers

Opens in a new window www.anl.gov

IBM Blue Gene supercomputer at Argonne National Laboratory

A purpose-built supercomputer is a computing tool with very high performance and hardware architecture suitable for solving a specific problem. They can use programmed FPGA chips or custom VLSI chips, which lose generality but offer a higher price-to-performance ratio. They are used for astronomical calculations and very powerful code breaking. A new purpose-built supercomputer has sometimes outperformed the fastest supercomputer of the time in some respects, such as the GRAPE-6, which in 2002 was faster than the Earth Simulator in some problems.

Examples of purpose-built supercomputers:

    • DEEP BLUE for playing chess

    • Reconfigurable computing machines or machine parts

    • GRAPE for astrophysics and molecular dynamics

    • DEEP CRACK for DES decryption

The Fastest Supercomputer Today

Calculating Supercomputer Speed

Supercomputer speed is calculated based on FLOPS, which stands for Floating Point Operations Per Second, usually with an SI prefix such as Tera or Peta. Tera is TFLOPS (teraFLOPS, 10^12 FLOP), and Peta is PFLOPS (petaFLOPS, 10^15 FLOP). This calculation is based on a scale that decomposes a large matrix (LU decomposition). This examines real problems but is much easier than calculating real-world problems.

Top 500 List

Since 1993, LINPACK results have consistently ranked the 500 fastest supercomputers in the world. Although this list is not claimed to be entirely flawless, it represents the best measure of computer speed at any given time.

Current Fastest Supercomputer

After Tianhe-1A, this Chinese giant broke the speed record with a recorded 33.86 petaFLOPS. Tianhe-2 uses Intel Xeons and Xeon Phi processors from the Ivy Bridge series and has a total of 3 million and 120 thousand processing cores. This supercomputer, which consumes 17,808 kilowatts of energy, is theoretically capable of reaching a speed of 54.9 petaFLOPS. So, if necessary, it may be able to fight hard to maintain its position.

Quasi Super Computing

Some types of large-scale distributed computing for highly parallelized problems can be called the peak of categorized supercomputing. For example, the BOINC platform (which hosts several distributed computing projects) recorded a performance speed of over 530.7 teraFLOPS through 1,797,000 additional computers on the network on March 27, 2007. The fastest project was SETI@home, which worked with 1,390,000 additional computers at 276.3 teraFLOPS. Another distributed project was Folding@home, which reported a performance power of 1.3 petaFLOPS in late September 2007. PlayStation clients use a high computing power of 1 petaFLOP. The GIMP distributed Mersenne Prime research recorded a power of 23 teraFLOPS by October 2007. Google’s search engine system with 126 to 316 teraFLOPS is probably the fastest.

Research and Development

On September 9, 2006, the US National Nuclear Security Administration (NNSA) selected IBM to design and build the world’s first supercomputer. A system to produce a cell broadband engine processor machine with a sustained power of one petaFLOPS or one thousand trillion calculations per second. Another project that IBM is working on is building Cyclops64, which is supposed to install a supercomputer on a chip. Dr. Karmarkar in India is leading a project to build a one petaFLOP supercomputer. CDAC is also leading a project that can reach one petaFLOPS by 2010. NSF also has a twenty million dollar project to build a one petaFLOPS supercomputer. NCSA at the University of Illinois Urbana-Champaign is working on such a project and is estimated to complete it by 2011.

Timeline of Supercomputers

Here is a table of the fastest recorded general-purpose supercomputers in the world with their record-breaking year. The source of titles whose registration year is before 1993 is different, but for titles after 1993, the list of the top five hundred computers in the world has been used.

Year Supercomputer Peak Speed Location
1942 Atanasoff–Berry Computer (ABC) 30 OPS (Operations Per Second) Iowa State College (now University), Ames, Iowa, USA
1943 TRE Heath Robinson 200 OPS Bletchley Park, UK
1944 Flowers Colossus 5 kOPS (kilo Operations Per Second) Post Office Research Station, Dollis Hill, UK
1946 UPenn ENIAC (before 1948+ modifications) 100 kOPS Aberdeen Proving Ground, Maryland, USA
1954 IBM NORC 67 kOPS U.S. Naval Proving Ground, Dahlgren, Virginia, USA
1956 MIT TX-0 83 kOPS Massachusetts Institute of Technology, Lexington, Massachusetts, USA
1958 IBM AN/FSQ-7 400 kOPS 25 U.S. Air Force sites across the continental USA and 1 site in Canada (52 computers)
1960 UNIVAC LARC 250 kFLOPS (kilo Floating Point Operations Per Second) Lawrence Livermore National Laboratory, California, USA
1961 IBM 7030 “Stretch” 1.2 MFLOPS (MegaFLOPS) Los Alamos National Laboratory, New Mexico, USA
1964 CDC 6600 3 MFLOPS Lawrence Livermore National Laboratory, California, USA
1969 CDC 7600 36 MFLOPS Various locations
1974 CDC STAR-100 100 MFLOPS Various locations
1975 Burroughs ILLIAC IV 150 MFLOPS NASA Ames Research Center, California, USA
1976 Cray-1 250 MFLOPS Los Alamos National Laboratory, New Mexico, USA (80+ sold worldwide)
1981 CDC Cyber 205 400 MFLOPS Numerous sites worldwide
1983 Cray X-MP/4 941 MFLOPS Los Alamos National Laboratory; Lawrence Livermore National Laboratory; Battelle; Boeing
1984 M-13 2.4 GFLOPS (GigaFLOPS) Scientific Research Institute of Computer Complexes, Moscow, USSR
1985 Cray-2/8 3.9 GFLOPS Lawrence Livermore National Laboratory, California, USA
1989 ETA10-G/8 10.3 GFLOPS Florida State University, Florida, USA
1990 NEC SX-3/44R 23.2 GFLOPS NEC Fuchu Plant, Fuchu, Japan
1993 Thinking Machines CM-5/1024 65.5 GFLOPS Los Alamos National Laboratory; National Security Agency, USA
  Fujitsu Numerical Wind Tunnel 124.50 GFLOPS National Aerospace Laboratory, Tokyo, Japan
  Intel Paragon XP/S 140 143.40 GFLOPS Sandia National Laboratories, New Mexico, USA
1994 Fujitsu Numerical Wind Tunnel 170.40 GFLOPS National Aerospace Laboratory, Tokyo, Japan
1996 Hitachi SR2201/1024 220.4 GFLOPS University of Tokyo, Japan
  Hitachi/Tsukuba 1 CP-PACS/2048 368.2 GFLOPS Center for Computational Physics, University of Tsukuba, Tsukuba, Ibaraki, Japan
1997 Intel ASCI Red/9152 1.338 TFLOPS (TeraFLOPS) Sandia National Laboratories, New Mexico, USA
1999 Intel ASCI Red/9632 2.3796 TFLOPS Sandia National Laboratories, New Mexico, USA
2000 IBM ASCI White 7.226 TFLOPS Lawrence Livermore National Laboratory, California, USA
2002 NEC Earth Simulator 35.86 TFLOPS Earth Simulator Center, Yokohama-shi, 2 Japan
2004 IBM Blue Gene/L 70.72 TFLOPS U.S. Department of Energy/IBM, USA
2005   136.8 TFLOPS U.S. Department of Energy/U.S. National Nuclear Security Administration, Lawrence Livermore National Laboratory, California, USA
2007   280.6 TFLOPS Various Locations