Welcome to Deepankar’s corner on the internet. You can enjoy some of my writings below.

Vulkan Core Abstractions [AI Generated]
VkInstance VkInstance is the very first object you must create. It represents the connection between your application and the Vulkan driver. Creating an instance initializes the Vulkan library and allows you to specify global configurations, such as which validation layers and instance-level extensions you want to enable. Validation layers are crucial for debugging, as they check for API misuse and provide detailed error messages. Extensions are used to enable functionality not included in the core Vulkan specification, such as support for drawing to a window surface, which is essential for any graphical application. Without an instance, you can’t query for physical devices or do anything else in Vulkan.
Learn Rust via MIR
Background MIR stands for Mid-Level-IR and is an Intermediate representation that sits between Rust HIR and LLVM IR. An excellent source for learning more about MIR is Introducing MIR. I am a systems programmer whose prior systems languages are C and C++. As a systems programmer I am always curious to understand the cost of things and to have some idea of how they are implemented internally. Recently I have been learning Rust and have been looking to bootstrap my understanding of Rust semantics. One technique that worked well for me is to look at the MIR emitted by Rust for small snippets of code and try to understand what is going in. Given how readable and explicit MIR is, I found this approach a much faster way of piercing through syntax and implementation to the underlying semantics. Will share some examples that illustrate this process.
GPGPU on AMD: Vector addition kernel using libhsa
Motivation The following questions have been on my mind for some time now: Why has AMD not been able to replicate CUDA? Why is there no GPGPU (general purpose gpu) programming stack that works across AMD’s entire product portfolio? Why do AMD SDKs typically only work with a few Distros and Kernel versions? If you squint a bit and think of shaders as general purpose computation, what explains the disparity between AMD being able to run shader computation on GPUs out of the box on practically any unixen with OSS drivers while being unable to do the same for CUDA style compute tasks? After all adds and multiplies and the primitives around loading compiled code into the GPU should be the same in either case right? This series of blog posts will be to get answers to the above questions. We will start small and try to run until we run into unsolveable blockers. Who knows we might get all the way to the end :D
Towards Fast IO on Linux using Rust
We will be trying to compare various different ways of reading a file using Rust. Apart from "wc -l" we will be running each function 10 times using criterion and then picking the mean. Code for the following benchmarks lives at Benchmark code for Linux IO using Rust. In the following code BUFFER_SIZE was 8192 and NUM_BUFFERS was 32. Details about the machine Framework 16 with 7840hs and 64 Gigs of RAM. Power plugged in and performance mode enabled. SSD: WD_BLACK SN850X 4000GB. Test using Gnome Disks shows the read speed at 3.6 GB/s (Sample size 1000MB, 100 Samples). Filesystem : btrfs Uname string: Linux fedora 6.8.8-300.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Apr 27 17:53:31 UTC 2024 x86_64 GNU/Linux Details about the text file Uncompressed size: 22G Number of lines: 200,000,000 Compressed size after btrfs compression (zstd): 5.3G