GPU /GPGPU /TPU progamming links / SuperComputing at your fingertips

OpenCL is an open general-purpose GPU computing language. It is an open standard defined by the Khronos Group. OpenCL provides a cross-platform GPGPU platform that additionally supports data parallel compute on CPUs. OpenCL is supported on Intel, AMD, Nvidia, and ARM platforms. The Khronos Group is currently involved in the development of SYCL, which has its implementations with ComputeCPP and SYCL STL.

A proprietary framework is Nvidia CUDA. Nvidia started CUDA in 2006, a software development kit (SDK) and application programming interface (API) that allows using the programming language C to code algorithms for execution on GeForce 8 series and later GPUs.

Close to Metal, later called Stream, is AMD's GPGPU technology for ATI Radeon-based GPUs. AMD Stream SDK, was released under AMD EULA in December 2007 after the software stack was rewritten. Stream SDK provides high-level in addition to low-level tools for general-purpose access to AMD graphics hardware. Using GPUs to perform computations holds a lot of potential for some applications because of the fundamental differences of GPU microarchitectures compared to CPUs. GPUs achieve much greater throughput (calculations per second) by executing many programs in parallel and restricting flow control (the ability of one program to execute instructions independently of another). Modern GPUs also have addressable on-die memory and extremely high performance multi-channel external memory. AMD subsequently switched from CTM to OpenCL.

Programming standards for parallel computing include OpenCL (vendor-independent), OpenACC, and OpenHMPP.

The Xcelerit SDK, created by Xcelerit, is designed to accelerate large existing C++ or C# code-bases on GPUs with minimal effort. It provides a simplified programming model, automates parallelisation, manages devices and memory, and compiles to CUDA binaries. Additionally, multi-core CPUs and other accelerators can be targeted from the same source code.

OpenVIDIA was developed at University of Toronto between 2003-2005, in collaboration with Nvidia.

MATLAB supports GPGPU acceleration using the Parallel Computing Toolbox and MATLAB Distributed Computing Server, and third-party packages like Jacket.

GPGPU processing is also used to simulate Newtonian physics by Physics engines, and commercial implementations include Havok Physics, FX and PhysX, both of which are typically used for computer and video games.

C++ Accelerated Massive Parallelism (C++ AMP) is a library that accelerates execution of C++ code by exploiting the data-parallel hardware on GPUs.

Altimesh Hybridizer by Altimesh compiles Common Intermediate Language to CUDA binaries. It supports generics and virtual functions. Debugging and profiling is integrated to visual studio and Nsight. It's available as a Visual Studio Extension on Visual Studio Marketplace.

Microsoft introduced the DirectCompute GPU computing API, released with the DirectX 11 API.

Alea GPU by QuantAlea introduces native GPU computing capabilities for the Microsoft .NET language F# and C#. Alea GPU also provides a simplified GPU programming model based on GPU parallel-for and parallel aggregate using delegates and automatic memory management.

GPU-Gems Part I http://developer.nvidia.com/content/gpu-gems-part-i-natural-effects

Pharr, M.: GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation. Addison-Wesley, Boston, MA, 2005 or http://http.developer.nvidia.com/GPUGems2/gpugems2_part01.html or http://www.addison-wesley.de

Nguyen, H.: GPU Gems 3: Programming Techniques for High-Performance Graphics and General-Purpose Computation. Addison-Wesley, Boston, MA, 2007 or http://www.addison-wesley.de or http://http.developer.nvidia.com/GPUGems3/gpugems3_part01.html

Some newer pdf- articles stored locally:

GPU CLUSTER COMPUTING FOR MULTIGRID-FEM SOLVERS WITH APPLICATIONS IN CFD

GPU Simulation and Rendering of Volumetric Effects for Computer Games and Virtual Environments

Higher order FEM numerical integration on GPUs with OpenCL

Implicit FEM and Fluid Coupling on GPU for Interactive Multiphysics Simulation

Fluid–solid coupling on a cluster of GPU graphics cards for seismic wave propagation

Fast seismic modeling and reverse time migration on a GPU cluster

GPU Cluster Computing For Multigrid FEM-Solvers... (abstract)

Assembly of Finite Element Methods on Graphics Processors

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers see also:

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers(2)

Finite Element Multigrid Solvers for PDE Problems on GPUs and GPU Clusters Part 2: Applications on GPU Clusters

Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Accelerating Double Precision FEM Simulations with GPUs

Analyzing CUDA Workloads Using a Detailed GPU Simulator

Automated Finite Element Computations in the FEniCS Framework using GPUs

GPU Cluster Computing for Finite Element Applications

Finite Element Integration on GPUs

Efficient Implementation of Finite Element Operators on GPUs

Massively Parallel Micromagnetic FEM Calculations with Graphical Processing Units (GPUs)

Making Faster FEM Solvers, Faster

general gpu:

General Purpose Computation On Graphics Processing Units

texts in german:

GPU-basierte Verfahren zur interaktiven Simulation und Darstellung von Fluid-Effekten

Implementierung von FEMMethoden auf programmierbaren Grafikkarten

FFT auf der GPU Von Alexander Kubias

A litle bit older (SOFA see below):

Efficient nonlinear FEM for soft tissue modelling and its GPU implementation within the open source framework SOFA

Software:

NVIDIA Parallel Nsight or here

NVIDIA Parallel Nsight brings GPU Computing into Microsoft Visual Studio. Debug, profile and analyze GPGPU or graphics applications using CUDA C, OpenCL, DirectCompute, Direct3D, and OpenGL.

NVIDIA PhysX (2.X)

NVIDIA PhysX SDK 2.X provides game physics solutions for a variety of platforms including PC, in both software and GPU hardware-accelerated confugurations, OSX, Linux, all current major game consoles (PS3, Xbox 360, and Wii), and key mobile computing platforms.

NVIDIA CUDA (Compute Unified Device Architecture), Nvidia's GPGPU technology for Nvidia GeForce-, Quadro- and Tesla-based GPUs (NVIDIA CUDA german)

Nvidia CUDA Programming Guide for CUDA Toolkit 3.2

http://developer.download.nvidia.com/compute/DevZone/C/html/featured_samples.html

Nvidia Developer Web Site

Nvidia Development Whitepapers and Presentations

Nvidia developer resources page

NVIDIA GPU Computing Developer Home Page

Nvidia Free GPU Computing Online Seminars

Nvidia GPU Programming Guide or http://developer.download.nvidia.com/GPU_Programming_Guide/GPU_Programming_Guide_G80.pdf

Nvidia Tesla C1060/C2050/M2090 (512 CUDA cores, up to 665 gflops) (Overview, Specifications, Drivers & Downloads, ...) or http://www.nvidia.com/docs/IO/43395/tesla_technical_brief.pdf or http://www.nvidia.com/object/tesla_computing_solutions.html M2090: http://www.nvidia.com/docs/IO/43395/Tesla-M2090-Board-Specification.pdf The Next Generation CUDA Architecture, Code Named Fermi (up to 512 CUDA cores). pdf

Nvidia GTX 590 /580 / 570

https://stackoverflow.com/questions/10460742/how-do-cuda-blocks-warps-threads-map-onto-cuda-cores/10467342#10467342

ATI:

Stream, AMD/ATI's GPGPU technology for ATI Radeon-based GPUs

http://ati.amd.com/developer/index.html

AMD Accelerated Parallel Processing (APP) SDK (formerly ATI Stream)

AMD Accelerated Parallel Processing (APP) SDK OpenCL Programming Guide

AMD HD 6990, FireStream 9270 up to 1.2 TFLOPS (single prec.), AMD 5970 up to 928 GFLOPS in double precision

AMD ATI FirePro V7800 (overview, Tecnical Data, ...) or http://www.amd.com/us/products/workstation/graphics/ati-firepro-3d/v7800/pages/v7800.aspx

AMD APP SDK with OpenCL 1.1 Support

Which grafics card to choose - a "best" card does not exist. You got to choose - all or high end or price performance (G3D Mark / $Price)

There are descriptions in the net how to flash a 465 to a 470 (not for the faint at heart - do a back up first, not all cards can flash to a 470!) german description

test your gpu: GPU-Z

FurMark 1.9.0

GPU Caps Viewer see also here

A SuperComputer at your fingertips? !!!

http://atlasfolding.com/?page_id=148 GPU-Supercomputer mit 30 TFLOPS(german)

SuperComputer with the same performance as a supercomputer cluster consisting of hundreds of PCs

http://www.geek.com/articles/chips/new-fastest-supercomputer-uses-7168-nvidia-gpus-14336-intel-cpus-20101028/ Chinese supercomputer

http://www.dvhardware.net/article27538.html

Microsoft:

DirectCompute Microsoft's GPU Computing API - Initially released with the DirectX 11 API

Microsoft Accelerator

Microsoft DirectX / DirectCompute or http://www.microsoft.com/games/en-en/aboutgfw/pages/directx.aspx or http://www.nvidia.com/object/cuda_directcompute.html or http://www.nvidia.de/object/directcompute_de.html or http://developer.nvidia.com/category/zone/cuda-zone

Microsoft Parallel Computing Developer Center or here: http://msdn.microsoft.com/en-en/concurrency/default

Intel:

Intel OpenCL SDK (Windows 7 32/64) or http://software.intel.com/en-us/articles/intel-opencl-sdk

Intel C/C++ Compiler

Open source:

OpenCL (Open Computing Language) cross platform GPGPU language for GPUs (AMD/ATI/Nvidia) and general purpose CPUs
Apple's GPU utilization introduced in Mac OS X v10.6 ‘Snow Leopard’

Adventures in OpenCL: Part 1, Getting Started

Adventures in OpenCL: Part 1.5, C++ Bindings

Adventures in OpenCL Part 2: Particles with OpenGL

Brown Deer Technology: OpenCL Tutorial: N-Body Simulation.

Nvidia OpenCL

AMD OpenCL

OpenCL Programming Guide

OpenCL Quick Reference Card

OpenCL Spezifikation

OpenCV / GpuCV see also: http://opencv.willowgarage.com/wiki

OpenCV / GpuCV links and downl. here

OpenGL and OpenCL Debugger

Open MPI: Open Source High Performance Computing.

OpenMP.org: OpenMP Application Program Interface. Version 3.0, May 2008. pdf: http://www.openmp.org/mp-documents/spec30.pdf

Sh, a GPGPU library for C++
BrookGPU is the Stanford University Graphics group's compiler and runtime implementation of the Brook stream programming language. See also here.
GLSL Shader Programming Resources


CBC Seminar on GPU Programming and Computing



General-Purpose Computation on Graphics Hardware



GNU Scientific Library (GSL) or http://www.gnu.org/software/gsl/manual/html_node



GPUcomputing.net: Research and development community.




TPU
https://www.sigarch.org/why-the-gpgpu-is-less-efficient-than-the-tpu-for-dnns/   Why the GPGPU is Less Efficient than the TPU for DNNs
https://www.quora.com/How-different-is-a-TPU-from-GPU
https://timdettmers.com/2018/10/17/tpus-vs-gpus-for-transformers-bert/
    


GPU Resources

GPUSort: High Performance Sorting using Graphics Processors or http://gamma.cs.unc.edu/GPUSORT/results.html


Mathematica GPU Computing see also: http://reference.wolfram.com/mathematica/ParallelTools/tutorial/Overview.html or here: http://www.nvidia.de/object/cuda-programming-mathematica-de.html

MATLAB GPU Computing or here http://www.mathworks.de/discovery/matlab-gpu.html or here http://developer.nvidia.com/object/matlab_cuda.html

MIT Open Courseware: Applied Parallel Computing.

MPI standard: The Message Passing Interface Standard.or here http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf or here http://www-unix.mcs.anl.gov/mpi

Intel Xeon E7, Xeon E5000er processor
AMD Llano , Bulldozer, FX8000-Serie, FX6000-Serie, FX4000-Serie (anouncement)

Fea/Fem packages here

Some more GPU /FEM links:

GPU Floating-Point Paranoia

ATILA GPU simulator source code released - Beyond3D Forum ATILA

GPU simulator source code released 3D Technology & Algorithms gpuprogramming-project3-final - monkology

SOFA download SOFA documentation SOFA altern. link ForceField

Tag: Finite Element Methods :: GPGPU.org IEEE Xplore - GPU accelerated fast FEM deformation simulation GPU acceleration of an unmodified parallel finite element Navier-Stokes solver GPGPU - Wikipedia, the free encyclopedia

GPU accelerated FEM for simulation and segmentation - NAMIC

Graphics Processor Unit (GPU) acceleration of Time-Domain Finite Element Method (TD-FEM) algorithm (IEEE) | GPU Computing

wisc.edu: Simulation-Based Engineering Laboratory
DISCRETE ELEMENT METHODS: DEM5
SIGGPU: http://siggpu.org/
GPU Computing
Automated Finite Element Discretization and Solution of Nonlinear FEM Systems /Magma Dynamics

GMH: A Message Passing Toolkit for GPU Clusters

A MEMORY EFFICIENT AND FAST SPARSE MATRIX VECTOR PRODUCT ON A GPU Aspect: Advanced Solver for Problems in Earth's ConvecTion download here, svn here

FEM Utils:

calc4fem Spreadsheet for Structural Engineering (FEM Analysis for beams, trusses, 2D-frames).

Meshgen Meshgen is designed to interactively generate 2D FEM meshes composed of triangular and quadrilateral elements.
fem_converter Conversion of data elements from one format to another (no files released)
Grid3D is a preprocessing tool for FEAST and its predecessor FEATFLOW. It provides a convenient graphical interface to create geometries and coarse grids, define boundary conditions, etc. Grid3D is implemented purely in JAVA. (Downloads and documentation...)

ALBERTA - An adaptive hierarchical finite element toolbox newer version: http://www.numa.uni-due.de/downloads/alberta/

The ALUGrid Library provides both hexahedral and tetrahedral grids which can be locally adapted and when used for parallel computations the decomposition of the domain can be recomputed.
The PARTY partitioning library serves a variety of different partitioning methods in a very simple and easy way. Instead of implementing the methods directly, the user may take advantage of the ready implemented methods of the library (on demand)
PreView is a Finite Element (FE) preprocessor that has been designed specifically to set up FE problems for FEBio Postview is a finite element post-processor that is designed to post-process the results from FEBio. WinFiber3D is a program that allows you to visualize MicroVisu3D files. WarpLAB is a finite element (FE) post-processing application that is specially designed to post-process warping problems. OpenDX OpenDX is a full-featured software package for the visualization of scientific, engineering and analytical data: Its open system design is built on a standard interface environments. And its data model provides users with great flexibility in creating visualizations.

Salome pre- & postprocessor

GMV GMV is no longer available for free and is being commercialized.

Tecplot not free, site licence

VTK The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image processing and visualization.

VTKEdge library of advanced visualization and data processing techniques that complement the Visualization Toolkit.

ParaView is an open-source, multi-platform data analysis and visualization application.

PovRay raytracer

Visit VisIt is a free interactive parallel visualization and graphical analysis tool for viewing scientific data on Unix and PC platforms

GeoMesh (131 KB). simple mesh generator
GenMesh (190 KB) more general mesh generator.
Casca mesh generator (no more avail ? manual here). The casca program can be used to make a general finite element mesh. This can then be read into Geocrack2D.

Netgen is a multi-platform automatic mesh generation tool written in C++ capable of generating meshes in two and three dimensions. The program is open source

Tetgen Open source code for generating tetrahedral meshes. Volume mesh created from surface meshes.

Gmsh: a three-dimensional finite element mesh generator with built-in pre- and post-processing facilities

LaGriT is a library of user callable tools that provide mesh generation, mesh optimization and dynamic mesh maintenance.

List of mesh generators (public domain and comerc.) Another one.

CUBIT (free for governmental use, else comercial) http://www.csimsoft.com/

OpenCTM (last Upd 2010-01-15) OpenCTM is a file format, a software library and a tool set for compression of 3D triangle meshes. The geometry is compressed to a fraction of comparable file formats (3DS, STL, COLLADA...), and the format is accessible through a simple, portable API

Some converters may stll be useful on the old ASME/Mecheng website README, FTP, short description of files