helvede.net is one of the many independent Mastodon servers you can use to participate in the fediverse.
Velkommen til Helvede, fediversets hotteste instance! Vi er en queerfeministisk server, der shitposter i den 9. cirkel. Welcome to Hell, We’re a DK-based queerfeminist server. Read our server rules!

Server stats:

163
active users

#benchmarks

0 posts0 participants0 posts today

Hallo schlaues Fediverse, ich tauche gerade in ein völlig absurdes #Rabbithole und mein M1 Macbook hat dank #LlmStudio und #Ollama seinen Lüfter wiederentdeckt… Aktuell ist lokal bei #LLM mit 8-12b Schluss (32GB Ram). Gibt es irgendwo #Benchmarks die mir bitte ausreden, dass das mit einem M4 >48GB RAM drastisch besser wird? Oder wäre was ganz anderes schlauer? Oder anderes Hobby? Muss Mobil (erreichbar) sein, weil zu unsteter Lebenswandel für ein Desktop. Empfehlungen gern in den Kommentaren.

🔔 New Essay 🔔

"The Intelligent AI Coin: A Thought Experiment"

Open Access here: seanfobbe.com/posts/2025-02-21

Recent years have seen a concerning trend towards normalizing decisionmaking by Large Language Models (LLM), including in the adoption of legislation, the writing of judicial opinions and the routine administration of the rule of law. AI agents acting on behalf of human principals are supposed to lead us into a new age of productivity and convenience. The eloquence of AI-generated text and the narrative of super-human intelligence invite us to trust these systems more than we have trusted any human or algorithm ever before.

It is difficult to know whether a machine is actually intelligent because of problems with construct validity, plagiarism, reproducibility and transferability in AI benchmarks. Most people will either have to personally evaluate the usefulness of AI tools against the benchmark of their own lived experience or be forced to trust an expert.

To explain this conundrum I propose the Intelligent AI Coin Thought Experiment and discuss four objections: the restriction of agents to low-value decisions, making AI decisionmakers open source, adding a human-in-the-loop and the general limits of trust in human agents.

@histodons @politicalscience

seanfobbe.com · [Essay] The Intelligent AI Coin: A Thought Experiment
More from Seán Fobbe
Continued thread

Ansible annoyances when writing compliance-as-code checks with it:

* When a value can be set in multiple places
* When you want to check for the required absence of a service
* Dealing with situations when you want to handle multiple files in a single location
* Where one or more options is acceptable
* Handling read-only file systems
* Error handling on shell module

It's a shame as it's still the best general purpose fit I've found but the fact it always assumes you want to deploy a given state vs simply compare with a set of potentially valid options makes it frustrating to use the native modules.

#compliance, #benchmarks, #hardening

CIS Benchmarks are awful.

Just a bunch of arbitrary commands (not even consistent across checks of the same type) dumped into randomly structured Markdown. Even if the commands themselves are any good (which they may/may not be), they're only usable by humans because you have to parse the instructions on how to interpret them by eye.

Writing compliance-as-code framework and I might as well rewrite the underlying benchmarks /except/ for the fact they're recognisable so people demand them over something more useful.

#compliance, #benchmarks, #hardening

Since the #Moonbit #JavaScript backend post (moonbitlang.com/blog/js-suppor) is trending, I thought I'd compare #PureScript backend optimizer (github.com/aristanetworks/pure) output to see how it fares. The results were pretty good!

With basically this PureScript code -
```
run = fromArray
>>> flatMapF (fromArray <<< _.members)
>>> filterF _.gender
>>> mapF (\x -> min 100 (x.score + 5))
>>> mapF grade
>>> filterF (_ == 'A')
>>> foldF (\_ x -> x+1) 0
```

the benchmark results are as follows. PureScript is roughly 6x faster than plain JS, and 6x slower than Moonbit output ( -

```
┌─────────┬──────────────┬─────────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name │ ops/sec │ Average Time (ns) │ Margin │ Samples │
├─────────┼──────────────┼─────────────┼────────────────────┼──────────┼─────────┤
│ 0 │ 'Moonbit' │ '34,67,542' │ 288.38869989829305 │ '±0.06%' │ 1733772 │
│ 1 │ 'Plain Js' │ '74,816' │ 13365.983827421464 │ '±0.54%' │ 37409 │
│ 2 │ 'Kotlin Js' │ '1,90,241' │ 5256.474017304151 │ '±0.38%' │ 95121 │
│ 3 │ 'PureScript' │ '4,99,456' │ 2002.1768597161156 │ '±0.70%' │ 249729 │
└─────────┴──────────────┴─────────────┴────────────────────┴──────────┴─────────┘
```

www.moonbitlang.com · MoonBit adds JS backend, up to 25x faster than native JS | MoonBitMoonBit adds JS backend, up to 25x faster than native JS

I finally have benchmarks for Mergeable Libraries! Here are the results on my iPhone 14 Pro. I took app startup measurements with 0–100 small frameworks in three batches: plain old dynamic frameworks, directly merged dynamic frameworks, and using one intermediate framework to merge the frameworks. See the results for yourself. The second image is a close-up near the origin.

The results are measured from the time the app begins running (the process is created) to just before the UIKit initialization signpost. Process creation time varies wildly, but typically ranges from 100–400 ms.

Blog post here: humancode.us/2024/01/02/measur

EDIT: Added measurement using static frameworks instead of dynamic.

#ios#iPhone#Xcode

Wow, this is nuts. Intel must be suffering far more than I thought.

Check out this asinine marketing from Intel calling out AMD and other rivals. The message is crazy. They are literally using the term "snake oil" and showing sleazy used car salesmen, representing AMD. And this is public - from Intel.

youtube.com/watch?v=xUT4d5IVY0

#Intel#marketing#AMD

Is RISC-V ready for HPC prime-time: evaluating the 64-core Sophon SG2042 RISC-V CPU

The Sophon SG2042 is the world's first commodity 64-core RISC-V CPU for high performance workloads and an important question is whether the SG2042 has the potential to encourage the HPC community to embrace RISC-V.

In this paper we undertaking a performance explorat

osnews.com/story/138049/is-ris

www.osnews.comIs RISC-V ready for HPC prime-time: evaluating the 64-core Sophon SG2042 RISC-V CPU – OSnews

::: Linux vs. Windows in 10 games - Linux 17% faster on average :afire:

Times have changed have they not :thinkergunsunglasses:

With macOS seemingly dropping out of the gaming field altogether & Linux only rising - where might Linux be in couple of years? :thinkhappy:

=> video.hardlimit.com/w/uZGK12oU

#Linux #vs #Windows #benchmarks #Peertube #performance #gaming @cosmic_happiness

Windows 11 vs. Ubuntu 23.10 performance on the Lenovo ThinkPad P14s Gen 4

Out of 72 benchmarks ran in total on both operating systems with the Lenovo ThinkPad P14s Gen 4, Ubuntu 23.10 was the fastest about 64% of the time.

If taking the geometric mean of all the benchmark results, Ubuntu 23.10 comes out to being 10% faster than the stock Windows 11 Pro insta

osnews.com/story/137528/window

www.osnews.comWindows 11 vs. Ubuntu 23.10 performance on the Lenovo ThinkPad P14s Gen 4 – OSnews

Looking for a more powerful version of the ‘time’ tool on #Linux. Any suggestions?
I’m doing some simple #Benchmarks that I’m driving from a shell script. The ‘time’ command is a cheap and easy starting point but I’d love to be able to measure in more detail including total IO (in bytes, not IO count like ‘time’ does) and peak filesystem space usage. I suspect some of this could be figured out using ‘strace’ logs, but that would need a bit more work.