helvede.net is one of the many independent Mastodon servers you can use to participate in the fediverse.
Velkommen til Helvede, fediversets hotteste instance! Vi er en queerfeministisk server, der shitposter i den 9. cirkel. Welcome to Hell, We’re a DK-based queerfeminist server. Read our server rules!

Server stats:

158
active users

#gpu

3 posts3 participants0 posts today

💾 Quick HomeLab Update 💾

As a follow up to this morning's iGPU 512M GART debugging (re: drm-61-kmod memory leak on amdgpu.ko), here are some pics. Reluctantly, the "ok fine" method for testing the memory leak allocation aspects involves using a non-integrated GPU which requires the same driver, for which this low-TDP single-slot AMD W7500 was acquired. Sometime tomorrow it will be installed in my workstation, which is the third chassis from the bottom (4U) situated above the two 5U chassis (HCI private cloud for GPU compute VMs).

More to follow...

#gpu #amd #ai #homelab #freebsd #linux #

Continued thread

Panic most recently used by lkpikmalloc ...

Well, that was fast... didn't even get a mouse cursor of a full MATE Desktop menu system load. Was yet to connect kgdb to COM1 (need to swap from minicom to do so)... makes me want a PCIe RS232 card (for "comconsole_pcidev") so that I have a few more COMs to play with on redirects. Gotta love these iGPU tash-bins eh? "It's better than not having a GPU right?" ... not really.''

Closed bug report from the drm-515-kmod, discussing amdgpu memory leak. so, maybe a new one in drm-61-kmod, would not be surprised.
- github.com/freebsd/drm-kmod/is

Short term revision of approach:
----

1. Today via post arrives, an AMD Radeon Pro W7500 (single slot 8GB, Navi-whatever gen)
2. I'll block off the iGPU during loader.conf sequence, using a "pptdev" blackhole (not for VM pt, but maybe an experiment for a 14.1 VM with the known-good amdgpu version).
3. Known as: throw money at the problem?

Some hardware notes:
----

1. This is not a Nvidia GPU situation; there are several generations of cards in the room which have been cycled through the workstation during "hardware isolation" and "process of elimination" sequences. I know those are stable, and which gen cards require which nvidia driver versions for stability purposes.

2. This is not a FreeBSD kernel issue, nor a Xorg "Plain Jane FrameBuffer" situation. The kernel (14.0, 14.1, 14.2) is stable and fine, and the basic vt driver for non-4K display-port functionality works fine. I can work all day in a series of tmux windows with some fifty or so panes, but that's not quite the optimal experience.

3. The AMD iGPU (Raphael) maxes out default to 512MB GART VRAM, and it can handle 240Hz @ 4K all day with no issues as long as that 512M doesn't get used up... that is until the latest amdgpu kmod drm, which crashes whenever it feels like it.

Michael... yes yes, I do have a lot of hardware, but this issue has surpassed the Sunk Cost Fallacy and has become a consumate knowledge-requirement process. I must know where this is failing so horrendously, otherwise the operating rule of "if it doesn't fulfill its hardware destiny, it will get the hammer and flames"... and the hardware is too nice for that - plus I could involve Supermicro support since it's still in warranty, but a replacement motherboard or CPU for the iGPU isn't going to solve a kernel module issue.

In the interim, laptop life and tablet meetings are getting me by, mostly decently.

Debug items of interest:
----
intsmb0: <AMD FCH SMBus Controller> at device 20.0 on pci0
intsmb0: Could not allocate I/O space
device_attach: intsmb0 attach returned 6

drmn0: Fetched VBIOS from VFCT
amdgpu: ATOM BIOS: 102-RAPHAEL-008
drmn0: Trusted Memory Zone (TMZ) feature not supported
drmn0: PCIE atomic ops is not supported
drmn0: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
[drm ERROR :amdgpu_bo_init] Unable to set WC memtype for the aperture base

Loader items of usage:
----
# Multi-Console Output
# boot output primary: TTY, standard monitor via UEFI
# boot output secondary: COM1 RS232 Redirect (physical)
# boot output tertiary: COM2 RS232 Redirect (BMC SoL)
ipmi_load="YES"
boot_mute="NO"
boot_verbose="YES"
verbose_loading="YES"
boot_multicons="YES"
boot_serial="YES"
console="efi,comconsole,comconsole"
comconsole_port1="0x3F8"
comconsole_speed1="115200"
comconsole_port2="0x2F8"
comconsole_speed2="115200"
hw.uart.console="io:0x3f8,br:115200 io:0x2f8,br:115200"

#amd#gpu#drm616kmod

#AMD splits #ROCm toolkit into two parts – ROCm #AMDGPU drivers get their own branch under Instinct #datacenter #GPU moniker
The new #datacenter Instinct driver is a renamed version of the #Linux AMDGPU driver packages that are already distributed and documented with ROCm. Previously, everything related to ROCm (including the amdgpu driver) existed as part of the ROCm software stack.
tomshardware.com/pc-components

Tom's Hardware · AMD splits ROCm toolkit into two parts – ROCm AMDGPU drivers get their own branch under Instinct datacenter GPU monikerBy Aaron Klotz

Does anyone have a collection of .dds (and maybe also .ktx and/or .ktx2) textures I could use to test my texture viewer (github.com/DanielGibson/texvie)?

For DDS I only found test textures from GLI and unfortunately many of them are invalid/broken :-/
For KTX(2) I have the ones from libktx (and the broken ones from GLI).

Thanks in advance! :)

crossplatform texture viewer. Contribute to DanielGibson/texview development by creating an account on GitHub.
GitHubGitHub - DanielGibson/texview: crossplatform texture viewercrossplatform texture viewer. Contribute to DanielGibson/texview development by creating an account on GitHub.

I stumbled upon an interesting blog today while searching for an image of the Silicon Graphics 3D cube logo. Named "Abort Retry Fail" by Bradford Morgan White, the blog's articles document computer history. I eventually wound up reading three of them.

#1 "The Rise and Fall of Silicon Graphics"

abortretry.fail/p/the-rise-and

Abort Retry Fail · The Rise and Fall of Silicon GraphicsBy Bradford Morgan White

💻 FreeBSD CUDA drm-61-kmod 💻

"Just going to test the current pkg driver, this will only take a second...", the old refrain goes. Surely, it will not punt away an hour or so of messing about in loader.conf on this EPYC system...

- Here are some notes to back-track a botched/crashing driver kernel panic situation.
- Standard stuff, nothing new over the years here with loader prompt.
- A few directives are specific to this system, though may provide a useful general reference.
- The server has an integrated GPU in addition to nvidia pcie, so a module blacklist for the "amdgpu" driver is necessary (EPYC 4564P).

Step 1: during boot-up, "exit to loader prompt"
Step 2: set/unset the values as needed at the loader prompt

unset nvidia_load
unset nvidia_modeset_load
unset hw.nvidiadrm.modeset
set module_blacklist=amdgpu,nvidia,nvidia_modeset
set machdep.hyperthreading_intr_allowed=0
set verbose_loading=YES
set boot_verbose=YES
set acpi_dsdt_load=YES
set audit_event_load=YES
kern.consmsgbuf_size=1048576
set loader_menu_title=waffenschwester
boot

Step 3: login to standard tty shell
Step 4: edit /boot/loader.conf (and maybe .local)
Step 5: edit /etc/rc.conf (and maybe .local)
Step 6: debug the vast output from kern.consmsgbuf logs

🌇 Blog Post Forthcoming for GPUs 🌇

ok, it's morning nap time starting 5min ago... but I'm decently pleased that this morning I've wrangled the (yes, linux) kernel into behaving more decently (a little like FreeBSD's sanity) when allocating similarly ID'd PCIe devices to VMs for GPU passthrough.

Short version:
- many GPUs of same PCI ID type in host
- want certain VMs to get certain GPUs
- want main host to get one GPU for compute (server, no display)

FreeBSD:
- the above requires two loader.conf lines and one rc.conf line

Linux:
- oh lordy, how many modprobe and modules-load.d files, and grub entries, and kmod blacklists, rebuild initramfs, and rebuild grub menu, and don't forget to purge the entire nouveau driver from the kernel so it doens't try to backstab nvidia and steal your first born...

#nvidia#gpu#linux

In honor of Trans Day of Visibility, here’s something really cool for you to know…

A computer scientist trans woman by the name of Sophie Mary Wilson was a co-creator of the ARM architecture.

More about Sophie:
en.wikipedia.org/wiki/Sophie_W

Great video on the history of ARM, which is computer science geeky stuff (sorry about YT):
youtube.com/watch?v=nIwdhPOVOU

en.wikipedia.orgSophie Wilson - Wikipedia

#AMD Announces "#Instella" Fully #OpenSource 3B Language Models
AMD Instella represents "fully open state-of-the-art 3-billion-parameter language models (LMs)." These models were trained on AMD Instinct #MI300X #GPU and according to AMD's published data delivers competitive performance to the likes of Llama 3.2 3B, Gemma-2 2B, and Qwen 2.5 3B.
phoronix.com/news/AMD-Intella-

www.phoronix.comAMD Announces "Instella" Fully Open-Source 3B Language Models

📣 NEWS FOR #HAIKU: #NVIDIA GPU support coming soon! 🚀

Developer @X512 has successfully ported Nvidia kernel drivers to Haiku. The driver will support Turing+ GPUs and already includes Vulkan integration via Mesa's NVK.

Initial tests are working and show potential for future uses, including AI acceleration with llama.cpp.

A major step forward for the Haiku ecosystem and hardware compatibility!

#OpenSource #GPU #Vulkan #Driver #AlternativeOS

🔗 desktoponfire.com/haikuos/soft

👩🏻‍💻 Big Brain Computer Parts 👩🏻‍💻

- "Build Day Monday Funday, Yet Another Machine Intelligence System [YAMIS]"
- Season 2, Episode 4: Mostly Modern Systems and Hardware Engineering

Despite some goon sending this kinda pricey Ice Lake Xeon through international post, wrapped up in two thin layers of non/anti-static bubble wrap shoved into a Tyvek shipping bag (not a box!), it arrived yesterday and seems to be in decent shape after a trip from Australia to the middle of America.

So, what one does with a NOS (new old stock) bit of hardware which needs to be perfect in order to function, we inspect with macro photos and various image filtering methods to identify any potential flaws.

CPU: Intel Xeon 8370C
Spec: 32 cores, 64 threads, 2.8GHz base clock
Dimensions: my favorite general purpose lip gloss (sorta) tube for scale