Driver crashes when loading a 7B model with 4-bit quantization. Solution: The driver’s memory scrubber may be too aggressive. Add siudi_npu.memory_scrub=0 to your kernel boot parameters.
echo performance > /sys/class/siudi_npu/siudi0/power_governor The driver allocates a ring buffer for the KV cache of the LLM. To increase the context window from 2048 to 8192 tokens:
While it is not a consumer-facing product (you won’t find it in your laptop’s app store), it is the silent workhorse powering the next generation of private, fast, and capable AI agents running in your pocket, your car, and your factory.