## Papers on hgpu.org (.txt-file)

An Implementation of the Discontinuous Galerkin Method on Graphics Processing Units

An Implementation of the Smooth Particle Mesh Ewald Method on GPU Hardware

An implementation of the tile QR factorization for a GPU and multiple CPUs

An implicit multigrid solver for high-order compressible flow simulations on GPUs

An implicit Tensor-Mass solver on the GPU for soft bodies simulation

An Improved CUDA-Based Implementation of Differential Evolution on GPU

An Improved Image Segmentation Algorithm Based on GPU Parallel Computing

An improved implementation of Preconditioned Conjugate Gradient Method on GPU

An Improved Magma Gemm For Fermi Graphics Processing Units

An Improved Monte Carlo Ray Tracing for Large-Scale Rendering in Hadoop

An Improved Parallel Algorithm using GPU for Siting Observers on Terrain

An improved parallel contrast-aware halftoning

An Improved Parallel Implementation of 3D DRIE Simulation on GPU

An improved scheme of an interactive finite element model for 3D soft-tissue cutting and deformation

An Improved Study of Physically Based Fluid Simulation on GPU

An improved study of real-time fluid simulation on GPU

An improved visual inspection system using visual servo

An in-depth performance analysis of irregular workloads on VLIW APU

An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures

An Incompressible Navier-Stokes Equations Solver on the GPU Using CUDA

An initial performance review of software components for a heterogeneous computing platform

An innovative compilation tool-chain for embedded multi-core architectures

An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing

An Integrated Framework for Feature Extraction, Object Recognition and Stereo Vision with GPU support

An integrated GPU power and performance model

An intelligent semi-automatic application porting system for application accelerators

An Interest Point Based Illumination Condition Matching Approach to Photometric Registration Within Augmented Reality Worlds

An Interface for Halo Exchange Pattern

An Intermediate Library for Multi-GPUs Computing Skeletons

An Interrupt-Driven Work-Sharing For-Loop Scheduler

An Introduction to GPU Accelerated Surgical Simulation

An Introduction to High Performance Computing on AWS

An Introduction to the OpenCL Programming Model

An introductory tour of interactive rendering

An Investigation into Concurrent Expectation Propagation

An Investigation of Atomic Synchronization for Sort-Based Group-By Aggregation on GPUs

An investigation of GPU-based stiff chemical kinetics integration methods

An Investigation of the Performance Portability of OpenCL

An Investigation of Unified Memory Access Performance in CUDA

An MDE Approach for Automatic Code Generation from MARTE to OpenCL

An MPI-Based Python Framework for Distributed Training with Keras

An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)

An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

An MPI-CUDA Implementation for the Compression of DEM

An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems

An N log N Parallel Fast Direct Solver for Kernel Matrices

An octree-based proxy for collision detection in large-scale particle systems

An On-Demand Fast Parallel Pseudo Random Number Generator with Applications

An open framework for rapid prototyping of signal processing applications

An open source finite-difference time-domain solver for room acoustics using graphics processing units

An open source MATLAB program for fast numerical Feynman integral calculations for open quantum system dynamics on GPUs

An Open-Source GPU-Accelerated Feature Extraction Tool

An OpenCL 3D FFT for Molecular Dynamics Simulations on Multiple FPGAs

An OpenCL design of the Bob Jenkins lookup3 hash function using the Xilinx SDAccel Development Environment

An OpenCL Fast Fourier Transformation

An OpenCL framework for heterogeneous multicores with local memory

An OpenCL implementation for the solution of TDSE on GPU and CPU architectures

An OpenCL implementation of a forward sampling algorithm for CP-logic

An OpenCL Method of Parallel Sorting Algorithms for GPU Architecture

An OpenCL Runtime and Scheduler for Embedded Multicore DSP Parallel Systems

An OpenCL-based Implementation of H.264 Encoder

An OpenCL-based Monte Carlo dose calculation engine (oclMC) for coupled photon-electron transport

An OpenCL(TM) Deep Learning Accelerator on Arria 10

An OpenMP Programming Environment on Mobile Devices

An optimal k-exclusion real-time locking protocol motivated by multi-GPU systems

An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU implementation

An optimised multi-baseline approach for on-line MR-temperature monitoring on commodity graphics hardware

An optimised radial basis function algorithm for fast non-rigid registration of medical images

An Optimization for Fast Generation of Digital Hologram

An optimized algorithm for discrete element system analysis using CUDA

An optimized GPU implementation of a 2D free surface simulation model on unstructured meshes

An Optimized GPU Memory Hierarchy Design for an OpenCL Kernel

An Optimized Large-Scale Hybrid DGEMM Design for CPUs and ATI GPUs

An Optimized Multiple Right-Hand Side Dslash Kernel for Intel Xeon Phi

An Optimized Parallel IDCT on Graphics Processing Units

An optimizing multi-platform source-to-source compiler framework for the NEURON MODeling Language

An Out-of-core GPU Approach for Accelerating Geostatistical Interpolation

An Overview of Miscellaneous Applications of GPU Computing

An Overview of Selected Hybrid and Reconfigurable Architectures

An overview of techniques for predicting the performance of GPU accelerated applications

An Overview on the Latest Nature-Inspired and Metaheuristics-Based Image Registration Algorithms

An Ultra-Fast, Optimized and Massively-Parallelized Curvelet Transform Algorithm on GP-GPUs

An Ultrafast Scalable Many-core Motif Discovery Algorithm for Multiple GPUs

An ultrasonic imaging system based on a new SAFT approach and a GPU beamformer

An unsupervised parallel genetic cluster algorithm for graphics processing units

Analysing Astronomy Algorithms for GPUs and Beyond

Analysing the Performance of GPU Hash Tables for State Space Exploration

Analysis & Design of Efficient Cryptographic Systems

Analysis Acceleration in TMVA for the ATLAS Experiment at CERN using GPU Computing

Analysis and implementation of a BLAST-Like algorithm for MIC architectures

Analysis and Implementation of eSTREAM and SHA-3 Cryptographic Algorithms

Analysis and Modeling of the Timing Behavior of GPU Architectures

Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms

Analysis and Optimization Techniques for Massively Parallel Processors

Analysis and Parameter Prediction of Compiler Transformation for Graphics Processors

Analysis and performance estimation of the conjugate gradient method on multiple GPUs

Analysis and Review of Sorting Algorithms

Analysis of 3-dimensional electromagnetic fields in dispersive media using cuda

Titles: 100

open PDFs: 92

packages: 12