<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Dev Blogs]]></title><description><![CDATA[Dev Blogs]]></description><link>https://blog.aaronxco.de</link><image><url>https://cdn.hashnode.com/uploads/logos/6998f011a20b74e093d808bc/9b29dd52-c0bc-400f-b247-bee47c9b8ffe.png</url><title>Dev Blogs</title><link>https://blog.aaronxco.de</link></image><generator>RSS for Node</generator><lastBuildDate>Wed, 27 May 2026 16:58:06 GMT</lastBuildDate><atom:link href="https://blog.aaronxco.de/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[A tryst with Claude Code to get whisper.cpp to run on an iGPU]]></title><description><![CDATA[I have a lot of meeting recordings that need transcribing. Cloud services work, but sending audio to a third party felt unnecessary when my laptop has a perfectly capable Intel Iris Xe GPU sitting mos]]></description><link>https://blog.aaronxco.de/a-tryst-with-claude-code-to-get-whisper-cpp-to-run-on-an-igpu</link><guid isPermaLink="true">https://blog.aaronxco.de/a-tryst-with-claude-code-to-get-whisper-cpp-to-run-on-an-igpu</guid><dc:creator><![CDATA[Aaron]]></dc:creator><pubDate>Wed, 27 May 2026 12:10:07 GMT</pubDate><content:encoded><![CDATA[<p>I have a lot of meeting recordings that need transcribing. Cloud services work, but sending audio to a third party felt unnecessary when my laptop has a perfectly capable Intel Iris Xe GPU sitting mostly idle. So I built a local transcription pipeline using whisper.cpp with OpenVINO acceleration — and wrapped it in a small Flask web app so I don't have to touch the terminal every time.</p>
<p>This is a writeup of how it works, what broke along the way, and a few implementation decisions worth documenting.</p>
<hr />
<h2>Why whisper.cpp + OpenVINO</h2>
<p><a href="https://github.com/ggerganov/whisper.cpp">whisper.cpp</a> is a C++ port of OpenAI's Whisper model. It's fast, runs entirely offline, and supports OpenVINO as a backend — which means it can use Intel's iGPU for inference via the <code>-oved</code> flag. On a 7-minute recording I was getting around 7 minutes on CPU and under 3 on the iGPU. Not earth-shattering but meaningful for batch work.</p>
<p>The catch is that getting OpenVINO to actually see the GPU takes a bit of setup.</p>
<hr />
<h2>Getting OpenVINO to see the GPU</h2>
<p>Three things needed fixing before <code>ov.Core().available_devices</code> would report anything beyond CPU:</p>
<p><strong>1. Level Zero registry.</strong> The Intel GPU runtime depends on Level Zero being properly registered with the system. If it's missing or misconfigured, OpenVINO silently falls back to CPU with no error.</p>
<p><strong>2.</strong> <code>LD_LIBRARY_PATH</code> <strong>must be a conda env variable, not a shell variable.</strong> This one is subtle. Setting it in <code>.bashrc</code> or before running a command doesn't carry through <code>conda run</code>. It has to live inside the environment itself:</p>
<pre><code class="language-bash">conda env config vars set LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu -n openvino
</code></pre>
<p><strong>3.</strong> <code>libstdcxx-ng</code> <strong>upgrade.</strong> The default version bundled with conda was too old for the OpenVINO shared libraries. Upgrading it inside the env fixed the import errors.</p>
<hr />
<h2>The Transcription Script</h2>
<p>With the environment working, <code>transcribe.sh</code> handles the full pipeline: detect available devices, extract audio from whatever file you hand it, run whisper.</p>
<p>Device detection runs at startup and falls back to CPU if the GPU isn't available:</p>
<pre><code class="language-bash">AVAILABLE_DEVICES=$(conda run -n openvino python -c \
    "import openvino as ov; print(','.join(ov.Core().available_devices))" \
    2&gt;/dev/null | grep -v WARNING | grep -v overwriting | grep -E "^[A-Z]")

if echo "$AVAILABLE_DEVICES" | grep -q "GPU"; then
    OVED_FLAG="-oved GPU"
else
    OVED_FLAG="-oved CPU"
fi
</code></pre>
<p>Audio extraction via ffmpeg handles both audio and video input — the <code>-vn</code> flag strips the video track and the <code>-ar 16000 -ac 1</code> flags resample to the 16kHz mono WAV that Whisper expects:</p>
<pre><code class="language-bash">ffmpeg -i "\(INPUT_FILE" -vn -ar 16000 -ac 1 -c:a pcm_s16le "\)WAV_FILE" -y
</code></pre>
<p>Then whisper runs through conda:</p>
<pre><code class="language-bash">conda run --no-capture-output -n openvino "$WHISPER_PATH" \
    -f "$WAV_FILE" \
    -m "$MODEL_PATH" \
    -l "$LANGUAGE" \
    -t 12 \
    $OVED_FLAG \
    -otxt \
    -of "\({OUTPUT_DIR}/\){BASENAME}"
</code></pre>
<h3>The <code>set -e</code> trap</h3>
<p>One early gotcha: the script uses <code>set -e</code>, which exits immediately on any non-zero return code. The <code>grep -E '^[A-Z]'</code> at the end of the device detection pipeline returns exit code 1 when there are no matches — which happens if conda produces no output at all. That silently killed the whole script before whisper ever ran.</p>
<p>The fix was being explicit about redirecting conda's noisy stderr and accepting that the grep might match nothing without treating it as a fatal error.</p>
<hr />
<h2>The Flask Web App</h2>
<p>Running <code>bash transcribe.sh recording.m4a</code> from the terminal works fine, but it gets tedious when you're doing it repeatedly. I wanted a browser interface: drop a file, pick a language and model, watch the output stream in, cancel if needed.</p>
<p>The app has five routes:</p>
<ul>
<li><p><code>GET /</code> — renders the UI with available models and local files pre-populated</p>
</li>
<li><p><code>POST /api/upload</code> — accepts files up to 2 GB</p>
</li>
<li><p><code>POST /api/transcribe</code> — spawns a background thread and returns a <code>job_id</code></p>
</li>
<li><p><code>GET /api/jobs/&lt;id&gt;/stream</code> — SSE endpoint that streams log output line by line</p>
</li>
<li><p><code>POST /api/jobs/&lt;id&gt;/cancel</code> — kills the job</p>
</li>
</ul>
<h3>Finding conda from inside Flask</h3>
<p>The first runtime problem: Flask's dev server doesn't inherit the shell environment, so <code>conda</code> wasn't in PATH. Hardcoding <code>/home/aaron/anaconda3/bin/conda</code> would work on my machine but nowhere else. Instead, a small helper searches common install locations and also respects a <code>CONDA_EXE</code> environment variable override:</p>
<pre><code class="language-python">_CONDA_SEARCH_PATHS = [
    "/home/aaron/anaconda3/bin/conda",
    "/home/aaron/miniconda3/bin/conda",
    "/opt/conda/bin/conda",
    "/usr/local/anaconda3/bin/conda",
]

def find_conda() -&gt; str:
    found = shutil.which("conda")
    if found:
        return found
    for p in _CONDA_SEARCH_PATHS:
        if os.path.isfile(p):
            return p
    raise RuntimeError("conda executable not found. Set CONDA_EXE to override.")

CONDA = os.environ.get("CONDA_EXE") or find_conda()
</code></pre>
<h3>Real progress from Whisper's output</h3>
<p>Whisper prints timestamps as it processes audio: <code>[00:01.000 --&gt; 00:04.000] Some transcribed text</code>. The backend parses these timestamps and divides by the total audio duration to compute a real completion percentage, which gets pushed to the client over SSE. So the progress bar actually moves in proportion to how much has been transcribed, rather than pulsing indefinitely.</p>
<h3>Model browser</h3>
<p>Downloading models normally means running <code>models/download-ggml-model.sh tiny</code> in the terminal. I added a model browser panel in the UI: it lists all 30 available models with their sizes, marks which ones are already downloaded, and lets you download any of them with progress streaming over SSE. Smaller quantized models like <code>small-q5_1</code> (182 MB) are a reasonable tradeoff against the full <code>small</code> (466 MB) if storage is a concern.</p>
<h3>Cancellation and process groups</h3>
<p>The stop button was the most interesting bug. Clicking it would show "Stopping…" in the UI, but whisper kept running and the output kept streaming. The issue: <code>conda run</code> is a wrapper that spawns <code>whisper-cli</code> as a child process. Sending SIGTERM to the <code>conda run</code> process leaves the child alive and running.</p>
<p>The fix is to put the subprocess in its own session when spawning it, then kill the entire process group on cancel:</p>
<pre><code class="language-python">proc = subprocess.Popen(
    cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    start_new_session=True,
)
</code></pre>
<pre><code class="language-python"># on cancel:
os.killpg(os.getpgid(proc.pid), signal.SIGTERM)
</code></pre>
<p><code>start_new_session=True</code> creates a new process group for <code>conda run</code> and all its descendants. <code>os.killpg</code> sends the signal to every process in that group at once, so whisper-cli stops immediately.</p>
<hr />
<h2>The Result</h2>
<p>A self-hosted transcription app that runs entirely on local hardware, uses the integrated GPU when available, streams output in real-time, and handles cancellation cleanly. Drop in a meeting recording, pick a language, get a text file. Nothing leaves the machine.</p>
<p>The code is on the <code>aaxa_openvino</code> branch of my whisper.cpp fork if you want to take a look.</p>
]]></content:encoded></item></channel></rss>