GPU progamming links / SuperComputing at your fingertips

( What is GPU Computing? )


GPU-Gems Part I


Pharr, M.: GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation. Addison-Wesley, Boston, MA, 2005 or or



Nguyen, H.: GPU Gems 3: Programming Techniques for High-Performance Graphics and General-Purpose Computation. Addison-Wesley, Boston, MA, 2007 or or





Some newer pdf- articles stored locally:


GPU Simulation and Rendering of Volumetric Effects for Computer Games and Virtual Environments

Higher order FEM numerical integration on GPUs with OpenCL

Implicit FEM and Fluid Coupling on GPU for Interactive Multiphysics Simulation

Fluid–solid coupling on a cluster of GPU graphics cards for seismic wave propagation

Fast seismic modeling and reverse time migration on a GPU cluster

GPU Cluster Computing For Multigrid FEM-Solvers...  (abstract)

Assembly of Finite Element Methods on Graphics Processors

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers see also:

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers(2)

Finite Element Multigrid Solvers for PDE Problems on GPUs and GPU Clusters Part 2: Applications on GPU Clusters

Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Accelerating Double Precision FEM Simulations with GPUs

Analyzing CUDA Workloads Using a Detailed GPU Simulator

Automated Finite Element Computations in the FEniCS Framework using GPUs

GPU Cluster Computing for Finite Element Applications

Finite Element Integration on GPUs

Efficient Implementation of Finite Element Operators on GPUs

Massively Parallel Micromagnetic FEM Calculations with Graphical Processing Units (GPUs)

Making Faster FEM Solvers, Faster


general gpu:

General Purpose Computation On Graphics Processing Units



texts in german:

GPU-basierte Verfahren zur interaktiven Simulation und Darstellung von Fluid-Effekten

Implementierung von FEMMethoden auf programmierbaren Grafikkarten

FFT auf der GPU Von Alexander Kubias

A litle bit older (SOFA see below):

Efficient nonlinear FEM for soft tissue modelling and its GPU implementation within the open source framework SOFA






(some links point to german pages of the respective company, if the english page link is not given you got to change the language specifier, sorry for the inconvenience)


NVIDIA Parallel Nsight or here

NVIDIA Parallel Nsight brings GPU Computing into Microsoft Visual Studio. Debug, profile and analyze GPGPU or graphics applications using CUDA C, OpenCL, DirectCompute, Direct3D, and OpenGL.


NVIDIA PhysX  (2.X)

NVIDIA PhysX SDK 2.X provides game physics solutions for a variety of platforms including PC, in both software and GPU hardware-accelerated confugurations, OSX, Linux, all current major game consoles (PS3, Xbox 360, and Wii), and key mobile computing platforms.


NVIDIA CUDA  (Compute Unified Device Architecture), Nvidia's GPGPU technology for Nvidia GeForce-, Quadro- and Tesla-based GPUs (NVIDIA CUDA german)

Nvidia CUDA Programming Guide for CUDA Toolkit 3.2


Nvidia Developer Web Site

Nvidia Development Whitepapers and Presentations

Nvidia developer resources page

NVIDIA GPU Computing Developer Home Page

Nvidia Free GPU Computing Online Seminars

Nvidia GPU Programming Guide or

Nvidia Tesla C1060/C2050/M2090 (512 CUDA cores, up to 665 gflops)  (Overview, Specifications, Drivers & Downloads, ...) or or M2090: The Next Generation CUDA Architecture, Code Named Fermi (up to 512 CUDA cores). pdf

Nvidia GTX 590 /580 / 570





Stream, AMD/ATI's GPGPU technology for ATI Radeon-based GPUs

AMD Accelerated Parallel Processing (APP) SDK (formerly ATI Stream)

AMD Accelerated Parallel Processing (APP) SDK OpenCL Programming Guide

AMD HD 6990, FireStream 9270 up to 1.2 TFLOPS (single prec.),  AMD 5970 up to 928 GFLOPS in double precision

AMD ATI FirePro V7800 (overview, Tecnical Data, ...) or

AMD APP SDK with OpenCL 1.1 Support


Which grafics card to choose - a "best" card does not exist. You got to choose - all or high end  or  price performance (G3D Mark / $Price)

There are descriptions in the net how to flash a 465 to a 470 (not for the faint at heart - do a back up first, not all cards can flash to a 470!) german description

test your gpu: GPU-Z

    FurMark 1.9.0

    GPU Caps Viewer see also here

A SuperComputer at your fingertips? !!!

GPU-Supercomputer mit 30 TFLOPS(german) orig. page:

SuperComputer with the same performance as a supercomputer cluster consisting of hundreds of PCs

        Chinese supercomputer english:




DirectCompute Microsoft's GPU Computing API - Initially released with the DirectX 11 API

Microsoft Accelerator

Microsoft DirectX / DirectCompute or or or or

Microsoft Parallel Computing Developer Center or here:






Intel OpenCL SDK (Windows 7 32/64) or

Intel C/C++ Compiler


Open source:

OpenCL (Open Computing Language) cross platform GPGPU language for GPUs (AMD/ATI/Nvidia) and general purpose CPUs
Apple's GPU utilization introduced in Mac OS X v10.6 ‘Snow Leopard’

Adventures in OpenCL: Part 1, Getting Started

Adventures in OpenCL: Part 1.5, C++ Bindings

Adventures in OpenCL Part 2: Particles with OpenGL

Brown Deer Technology: OpenCL Tutorial: N-Body Simulation.

Nvidia  OpenCL


OpenCL Programming Guide

OpenCL Quick Reference Card

OpenCL Spezifikation

OpenCV / GpuCV see also:

OpenCV / GpuCV links and downl. here

OpenGL and OpenCL Debugger

Open MPI: Open Source High Performance Computing. OpenMP Application Program Interface. Version 3.0, May 2008. pdf:

Sh, a GPGPU library for C++

BrookGPU is the Stanford University Graphics group's compiler and runtime implementation of the Brook stream programming language. See also here.
GLSL Shader Programming Resources

CBC Seminar on GPU Programming and Computing

General-Purpose Computation on Graphics Hardware

GNU Scientific Library (GSL) or Research and development community.

GPU Resources

GPUSort: High Performance Sorting using Graphics Processors or

Mathematica GPU Computing see also:
or here:

MATLAB GPU Computing or here or here

MIT Open Courseware: Applied Parallel Computing.

MPI standard: The Message Passing Interface Standard.or here or here

Intel Xeon E7, Xeon E5000er processor
AMD Llano , Bulldozer, FX8000-Serie, FX6000-Serie, FX4000-Serie (anouncement)

Fea/Fem packages here

Some more GPU /FEM links:

GPU Floating-Point Paranoia

ATILA GPU simulator source code released - Beyond3D Forum ATILA


GPU simulator source code released 3D Technology & Algorithms gpuprogramming-project3-final - monkology


SOFA download SOFA documentation SOFA altern. link ForceField

Tag: Finite Element Methods ::
IEEE Xplore - GPU accelerated fast FEM deformation simulation
GPU acceleration of an unmodified parallel finite element Navier-Stokes solver
GPGPU - Wikipedia, the free encyclopedia
GPU accelerated FEM for simulation and segmentation - NAMIC
Graphics Processor Unit (GPU) acceleration of Time-Domain Finite Element Method (TD-FEM) algorithm (IEEE) | GPU Computing Simulation-Based Engineering Laboratory
GPU Computing
Automated Finite Element Discretization and Solution of Nonlinear FEM Systems /Magma Dynamics

GMH: A Message Passing Toolkit for GPU Clusters

Aspect: Advanced Solver for Problems in Earth's ConvecTion download here, svn here
FEM Utils:

calc4fem Spreadsheet for Structural Engineering (FEM Analysis for beams, trusses, 2D-frames).
Meshgen  Meshgen is designed to interactively generate 2D FEM meshes composed of triangular and quadrilateral elements.

fem_converter Conversion of data elements from one format to another (no files released)

Grid3D is a preprocessing tool for FEAST and its predecessor FEATFLOW. It provides a convenient graphical interface to create geometries and coarse grids, define boundary conditions, etc. Grid3D is implemented purely in JAVA. (Downloads and documentation...)

ALBERTA - An adaptive hierarchical finite element toolbox newer version:

The ALUGrid Library provides both hexahedral and tetrahedral grids which can be locally adapted and when used for parallel computations the decomposition of the domain can be recomputed.
The PARTY partitioning library serves a variety of different partitioning methods in a very simple and easy way. Instead of implementing the methods directly, the user may take advantage of the ready implemented methods of the library (on demand)
PreView is a Finite Element (FE) preprocessor that has been designed specifically to set up FE problems for FEBio Postview is a finite element post-processor that is designed to post-process the results from FEBio.
WinFiber3D is a program that allows you to visualize MicroVisu3D files.
WarpLAB is a finite element (FE) post-processing application that is specially designed to post-process warping problems.
 OpenDX OpenDX is a full-featured software package for the visualization of scientific, engineering and analytical data: Its open system design is built on a standard interface environments. And its data model provides users with great flexibility in creating visualizations.

Salome pre- & postprocessor

GMV GMV is no longer available for free and is being commercialized.

Tecplot not free, site licence

VTK The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image processing and visualization.

VTKEdge library of advanced visualization and data processing techniques that complement the Visualization Toolkit.

ParaView is an open-source, multi-platform data analysis and visualization application.

PovRay raytracer

Visit VisIt is a free interactive parallel visualization and graphical analysis tool for viewing scientific data on Unix and PC platforms

GeoMesh (131 KB). simple mesh generator
GenMesh (190 KB) more general mesh generator.
Casca mesh generator (no more avail ? manual here). The casca program can be used to make a general finite element mesh. This can then be read into Geocrack2D.

Netgen is a multi-platform automatic mesh generation tool written in C++ capable of generating meshes in two and three dimensions. The program is open source

Tetgen Open source code for generating tetrahedral meshes. Volume mesh created from surface meshes.

Gmsh: a three-dimensional finite element mesh generator with built-in pre- and post-processing facilities

LaGriT is a library of user callable tools that provide mesh generation, mesh optimization and dynamic mesh maintenance.

List of mesh generators (public domain and comerc.) Another one.

CUBIT (free for governmental use, else comercial)

OpenCTM (last Upd 2010-01-15) OpenCTM is a file format, a software library and a tool set for compression of 3D triangle meshes. The geometry is compressed to a fraction of comparable file formats (3DS, STL, COLLADA...), and the format is accessible through a simple, portable API

Some converters may stll be useful on the old ASME/Mecheng website README, FTP, short description of files

Physics Engines: (most open source)

3D Physics Engine  simple physics engine (not updated since 2009)

ASCEND modelling environment

Box2d on Googlecode:




Flave Flash-based, OOP verlet physics engine developed in AS3 and Flash

Frank Engine


InertiaEngine InertiaEngine is a 2D rigid body physics engine written in C++, not OS dependent


jinngine A 3-d Constraint-based multibody physics engine written entirely in Java

jME Physics System Java engine



ode4j Ode in Java


Pal (physics abstraction layer) see also: Open Physics Abstraction Layer

Planck Physics Engine Planck is a small physics engine  to simulate rigid bodies and particles.

Rage 3D Game Engine Open source game engine developed in Delphi using OpenGL 2.0

RockSpace Physics Engine  RockSpace is a real-time, three-dimensional Newtonian physics engine written in C++

SOFA (see above)

Tokamak or on sf:



Scythe Physics Editor  


P4D - editor and environment  P4D is a lowpolygon 3D editor

Ephydryne (aka Hyperion)  Ephydryne is a plug-in for physic engine which calculates surface deformations in real-time.



no engine, but..

Astronomy  Astronomy is an open-source game engine for Windows supporting OpenGL (with modern hardware capabilities support), OpenAL, DirectInput, multithreading, scripting, physics.

general gpu links: or codesnippets:

Mesh-based Monte Carlo (MMC)
 Collins Brain Atlas FEM Mesh Version 2

Monte Carlo eXtreme (MCX)

GPU-based Interactive Simulation of Liver Resection (using SOFA)
Digimouse is a popular mouse atlas (Dogdas2007). FEM mesh Version 1 was created by Qianqian Fang using iso2mesh (Fang2009) version 1.0 and CGAL (CGAL2009). or

What is GPU Computing?

CUDA, Supercomputing for the Masses: Part 1-20

Use your GPU for scientific computing or

test your gpu: GPU-Z

Using Graphics Processors for High Performance IR Query Processing

GPU-based Fast Analysis of Networks

Computational Intelligence Research Lab Graphics Processor Unit (GPU) Site or here
FLAGON is a library for programming NVIDIA CUDA from Fortran 95

GPU programming concepts

Installing the CUDA SDK

General Purpose GPU Programming

CUDA game of life:

Mandelbulb stereo angalyph
CFD cuda paper of rodinia bench

Radix sort for doubles: Sort doubles with two 32-bit radix sorts using similar tricks. Here are some performance results

Performance of 3D Deconvolution Algorithms on Multi-Core and Many-Core Architectures

Swarm-NG focuses on the integration of an ensemble of N-body systems evolving under Newtonian gravity.
8 cpus vs gpu code you find here: Clarity Deconvolution Library 1.0 manual here:

Fuzzy Logic on the GPU in CUDA

C# Backpropagation library written for GPU
Slideshow for ATI GPGPU physics demonstration by Stanford grad student Mike Houston See p. 13 for overview of mapping of conventional program tasks to GPU hardware.
Tech Report article: "ATI stakes claims on physics, GPGPU ground" by Scott Wasson
gpu economics or here:

montecarlo gpus or here:

using 3D arrays in cuda with explanation - New Open Standard for Many-Core gpu's
SIGGRAPH 2005 GPGPU Course Notes
IEEE VIS 2005 GPGPU Course Notes
Jacket: GPU Engine for MATLAB
Ascalaph Liquid GPU see also molecular dynamics. GPGPU Publications, Videos and Software
GP-You Project
GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model - porting a standard model to GPU hardware
GPGPU software catalog
GPGPU Computing @ Duke Statistical Science
Brahma - open-source library written for the .NET 3.5 framework (in C# 3.0). Focus on GPGPU.
Penumbra - open-source library for Clojure. Penumbra is a Clojure wrapper for LWJGL that includes s-expression representation of GLSL and GPGPU.
OpenCL Studio Integrated development environment for OpenCL.
Monte Carlo of diffuse light propagation (photon migration) CUDA-based codes for Monte Carlo simulation of light transport
GPGPU Programming in F# using the Microsoft Research Accelerator system.
ViennaCL scientific computing library compatible with uBLAS for GPUs and multi-core CPUs written in C++ and based on OpenCL.
GPGPU Image Post-Processing GPU accelerated examples of Paint.NET's blur effects with performance comparison.
VizExperts provide HPC solutions and training.
Intro to GPGPU featuring CUDA and OpenCL
GPGPU Review, European Physical Journal Special Topics 194, 87-119 (2011)
CUDAfy.NET Open source library for the .NET framework for programming CUDA GPUs. Supports device code in native .NET; and CURAND, CUBLAS and CUFFT.
code that permutes indexes or here , pdf: see also this page: this can be used for: Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
Introduction to GPU Programming with GLSL: SIBGRAPI 2009 Ricardo Marroquim André Maximo motivation architecture language examples wrap-up Tutorial project:

Introduction to GPU Programming with GLSL Ricardo Marroquim Istitutodi Scienzae Tecnologiedell'Informazio ne CNR Pisa, Italy Andr ...

GPU Programming and GLSL: 15-466 Computer Game Programming, Carnegie Mellon University, Spring 2007 (James Kuffner) Announcements Announcements Announcements Lab2 posted at: …

GPU Programming using GLSL and VTK: 3 vizNETConference 2009 Graphics shaders • Procedural graphics shadershave been around since the early days of computing. • Developed by scientists

Release Notes for NVIDIA OpenGL Shading Language Support November 9, 2006 These release notes explain the implementation status of the OpenGL Shading Language GLSL ...

GPU Christmas Tree Rendering: January 2007 1 Beta Release This is the beta version of the Christmas tree rendering whitepaper. A final version will be released in a later SDK

AMD - Introduction to OpenGL 3.0 Introduction OpenGL continues to evolve, growing alongside the hardware that supports it. With the release of the latest version of OpenGL

SiftGPU Manual Changchang Wu University of North Carolina at Chapel Hill Introduction SiftGPU is a GPU implementation of David Lowe's Scale Invariant Feature

The OpenGL Shading Language: Introduction This document specifies only version 1.30 of the OpenGL Shading Language. It requires __VERSION__ to substitute 130, and requires #version to …

Speed-up of Algorithms With Graphics Processing Units (GPU): Part I of IV Derek Anderson and Robert Luke * Electrical and Computer Engineering Department

Intro to OpenGL Shading Language (GLSL): Intro to OpenGL Shading Language (GLSL) Why should we care? •We can do lots of really cool stuff in real-time, without overworking the CPU •Some Examples

Step-Through Debugging of GLSL Shaders Hilgart, Mark School of Computer Science, DePaul University, Chicago, USA

Intro to GLSL (OpenGL Shading Language): Worcester Polytechnic Institute 5 Back To Lecture Back To Lecture Q: What is a Programmable GPU & Why do we need it?

RTSL: a Ray Tracing Shading Language StevenG. Parker † Solomon Boulos James Bigler † Austin Robison † SCI Institute, University of Utah School of Computing

PyStream: Python Shaders on the GPU: PyStream vs. GLSL PyStream vs. GLSL class CompiledAmbientP ass(pystreamruntime.BaseC ompiled Shader): def _bindUniforms(self, shader):

Ray Tracing on GPU: University of Applied Sciences Basel (FHBB) Diploma Thesis DA070405 RayTracing on GPU Ray Tracingon GPU
OpenGL"Hello, world!" byIan Romanick This work is licensed under the Creative Commons Attribution Non-commercial Share Alike (by-nc-sa)

GLC_lib GLC_lib is a C++ library for high performance 3D application based on OpenGL and QT4 GUI. Some GLC_lib features : Supported file format : 3DS, OBJ, COLLADA, 3DXML, OFF, STL. Easy view manipulation, Level of detail, shaders, Portuguese GPGPU 


GPU Computing Gems Emerald Edition (Applications of GPU Computing Series) by Wen-mei W. Hwu Hardcover

CUDA by Example: An Introduction to General-Purpose GPU Programming by Jason Sanders Paperback 

Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series) by David B. Kirk Paperback

GPU Pro 2

The Art of Multiprocessor Programming

Scientific Computing with Multicore and Accelerators



BLAS/PBLAS: (Parallel) Basic Linear Algebra Subprograms: or here

CUBLAS: CUDA/GPU accelerated BLAS:

LAPACK/PLAPACK: (Parallel) Linear Algebra Package: or here

CULA: CUDA/GPU accelerated LAPACK:

MAGMA: Matrix Algebra on GPU and Multicore Architectures:

CUDPP: CUDA Data Parallel Primitives Library:


- Open source scientific resources

GSL - GNU Scientific Library
- Boost not necessary to mention (uBLAS...)

- Mathematics Libraries  Tools/libraries: analytical and numerical mathematical methods in ODE, PDE, Vector Calculus, Linear Algebra, Probability and Statistics, Numerical Methods, FEM, DSP. The current file release libham1.0 is for Geometric Integration of Hamiltonian Systems

General Programming

Wikipedia (select your preferred language):







Fea/Fem packages: wickipedia article

3d Converter tools:



many more converters to come....


3d Modeller


(a little bit older..)



GLC-Player GLC_Player is a OpenGL Open Source 3D viewer used to view 3d models (COLLADA, 3DXML, OBJ 3DS STL OFF COFF Format) and to navigate easily in these models.

The Open Source STL viewer (no update since 2004) Viewstl is an open source way to view Stereo Lithography Files as shaded on-screen images. Ascii STL files and dynamic rotation, scaling, and panning are currently supported. Written in C using OpenGL, GLU and GLUT.

STL Viewer (last Upd. 2010-01-23) Display and manipulate the content of stereolithography or STL files.



particle engines
gps software: (codeproject gets slower and slower, be prepared to wait..)

Add GPS support to your desktop

Map Grabber on C#

Writing Your Own GPS Applications  part1 part3: Writing Mapping and GIS Software In .NET

Native DLL for GPS communication

A Simple Geo Fencing Using Polygon Method

GPS - Deriving British Ordnance Survey Grid Reference from NMEA data
Arbitrary precission math 
c# has since version 4 Biginteger


Comparission of different arbitrary precission implementations: (strange: some have timing information, some not,)


Some more Euler/Runge-Kutta/Midpoint etc Solver: euler

Universal-Framework-for-Science-and-Engineering     alternative adress:

Astrophysics source code lib:




and quite naturally since years the de facto standard for everything concerning fourier analysis :  (complete page to wavelets follows soon...)


NBody Links: This list aims at listing allmost all resources of the net concerning nbody simulations (with emphasis on gpu). At the moment only a start.....


last update: aug. 2011
soon: many more physics engines, editors, modeller, converter, particle engines/packages, gps data reader/converter, utilitys .....