nsight

The MNIST classification problem is a fundamental machine learning task that involves recognizing handwritten digits (0- 9) from a dataset of 70,000 grayscale images (28x28 pixels each). It serves as a benchmark for evaluating machine learning models, particularly neural networks.

benchmarking deep-learning parallel-computing cuda mnist neural-networks high-performance-computing gpu-acceleration profiling shared-memory openacc performance-optimization c-cpp nsight tensor-cores cuda-streams pinned-memory

Updated Sep 12, 2025
Cuda

Kulasus / APPS-2.0

Star

Repository for Architecture of computers and parallel systems course on VŠB

radio c leds lcd cpp assembly architecture assembler cuda assembly-language led assembly-x86 k64f mcuxpresso nsight

Updated May 20, 2020
C++

Juanx65 / yolov8test

Star

learning how to do profiling on a yolov8 net using nvidia nsight compute

profiling nsight yolov8

Updated Jul 5, 2023
Python

Chirag005 / CUDA-Kernel-project

Star

Custom PyTorch CUDA kernel implementing optimized ReLU activation with vectorization, performance profiling, and memory analysis on Tesla T4 GPU achieving 75% bandwidth efficiency.

gpu cuda pytorch cuda-kernels performance-analysis nsight cuda-programming kernel-profiler cuda-kernel

Updated Oct 27, 2025
Jupyter Notebook

itm-unipi / Parallelized-Nearest-Neighbor-Upscaler

Star

University Project for "Computer Architecture" course (MSc Computer Engineering @ University of Pisa). Implementation of a Parallelized Nearest Neighbor Upscaler using CUDA.

gpu nvidia nvidia-cuda nvidia-gpu nsight image-upscaling parallelized nearest-neighbor-algorithm nsight-compute

Updated Dec 29, 2023
C

yasser1-0 / FP16-vs-FP32-A-GPU-Lab-in-Frames

Star

🎬 Explore GPU training efficiency with FP32 vs FP16 in this modular lab, utilizing Tensor Core acceleration for deep learning insights.

performance-engineering deep-learning reproducible-research cuda pytorch fp16 cupy mixed-precision nsight gpu-benchmark nvtx fp32 tensor-core

Updated Sep 6, 2025
Python

K-Wu / HET_nsight_utils

Star

cuda nvidia trace gspread profiling ncu nsight nsys

Updated Aug 12, 2024
Python

Dartayous / FP16-vs-FP32-A-GPU-Lab-in-Frames

Star

A reproducible GPU benchmarking lab that compares FP16 vs FP32 training on MNIST using PyTorch, CuPy, and Nsight profiling tools. This project blends performance engineering with cinematic storytelling—featuring NVTX-tagged training loops, fused CuPy kernels, and a profiler-driven README that narrates the GPU’s inner workings frame by frame.

performance-engineering deep-learning reproducible-research cuda pytorch fp16 cupy mixed-precision nsight gpu-benchmark nvtx fp32 tensor-core

Updated Sep 5, 2025
Python

UchihaIthachi / sssp-apsp-hpc-openmp-cuda

Star

🚀 High-performance implementations and benchmarks of SSSP and APSP algorithms (Bellman–Ford, Dijkstra, Floyd–Warshall, Johnson) in Serial, OpenMP, CUDA, and Hybrid CPU+GPU. Includes profiling, speedup plots, and HPC notebooks

graph-algorithms hpc openmp parallel-computing cuda performance-analysis gpu-computing johnson-algorithm nvcc apsp bellman-ford floyd-warshall-algorithm shortest-path-algorithm sssp nsight dijikstra-algorithm

Updated Oct 17, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the nsight topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the nsight topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsight

Here are 15 public repositories matching this topic...

BrainTwister / docker-devel-env

sharcnet / vscode-hpc

mnicely / computeWorks_examples

HROlive / Fundamentals-of-Accelerated-Computing-with-CUDA-C-Cpp

kayush2O6 / nsight-for-remote-gpu-server

salehjg / batch-matmul-cuda

Umer-Farooq-CS / MNIST-Classification

Kulasus / APPS-2.0

Juanx65 / yolov8test

Chirag005 / CUDA-Kernel-project

itm-unipi / Parallelized-Nearest-Neighbor-Upscaler

yasser1-0 / FP16-vs-FP32-A-GPU-Lab-in-Frames

K-Wu / HET_nsight_utils

Dartayous / FP16-vs-FP32-A-GPU-Lab-in-Frames

UchihaIthachi / sssp-apsp-hpc-openmp-cuda

Improve this page

Add this topic to your repo