This repository contains implementations of vector addition in both CPU (C) and GPU (CUDA) to demonstrate and compare their performance characteristics.
cpu_vector_add.c: CPU implementation of vector additionvector_add.cu: GPU (CUDA) implementation of vector additiontoy.cu: A toy CUDA program for testing and learning purposesMakefile: Compilation instructions for all programs
- GCC compiler for CPU code
- NVIDIA CUDA Toolkit for GPU code
- Make utility
Use the provided Makefile to compile the programs:
- To compile all programs:
makeormake all - To compile only the CPU version:
make cpu - To compile only the GPU version:
make gpu - To compile only the toy CUDA program:
make toy - To clean up compiled executables:
make clean
After compilation, you can run the programs as follows:
- CPU version:
./cpu - GPU version:
./gpu - Toy CUDA program:
./toy
This program performs vector addition on the CPU. It includes timing measurements for:
- Memory allocation
- Array initialization
- Vector addition operation
- Total execution time
This CUDA program performs vector addition on the GPU. It includes timing measurements for:
- Memory allocation (on GPU)
- Data transfer (Host to Device and Device to Host)
- Kernel execution (actual vector addition)
- Total execution time
This is a simple CUDA program for learning and testing purposes. It may demonstrate basic CUDA concepts or serve as a template for further CUDA development.
Run both the CPU and GPU versions and compare their execution times. Note that:
- The GPU version includes data transfer overhead, which may impact performance for smaller datasets.
- The GPU version is expected to perform better for larger datasets or more complex operations.
- Both CPU and GPU versions use the
-O3optimization flag.