Dronningen af Helvede @queen

3 posts3 participants0 posts today

**Martin Owens** @doctormo@floss.social · 4d *

I've set up my new #inkscape website AI bot trap. It works by giving everyone a chance to not fall into it.

An anchor link that says "I am a bot" and links to /P3W-451/{datetime}/ it's got a fixed position at top -100px so should never be seen

The robots.txt says "Disallow: /P3W-451/" so if you were reading the robots, you'd know.

Then #nginx logs the requests to a log of their ip-addresses and browser strings and sends them a 301 redirect to google.com

#ai #Scraping

1/2

**Strypey** @strypey@mastodon.nzoss.nz · 4d

Strypey @strypey@mastodon.nzoss.nz

Joshua Yuvaraj, co-director of the New Zealand Centre for Intellectual Property, was interviewed on RNZ yesterday, about the degree to which copyright law might be used to prevent scraping of the open web by #MOLE Trainers;

https://www.rnz.co.nz/national/programmes/nights/audio/2018981590/what-can-writers-do-about-their-work-being-used-to-train-ai-models

As Cory Doctorow noted back in 2023;

"In privacy and labor fights, copyright is a clumsy tool at best."

https://pluralistic.net/2023/09/17/how-to-think-about-scraping/

RNZ · 4dWhat can writers do about their work being used to train AI models?Joshua Yuvaraj is the co-director of the New Zealand Centre for Intellectual Property, and a senior lecturer in law at the University of Auckland specialising in copyright and artificial intelligence.

#RNZ #NZCIP #JoshuaYuvaraj

Replied in thread

**sheislaurence** @sheislaurence@mastodon.social · 4d

sheislaurence @sheislaurence@mastodon.social

@nimi @papuass @stefan @freediverx yeah except you can't force bad actors to use your commercial API if they still have an open route in, that basically cost them next to nothing. It really doesn't matter #scraping isn't elegant. It works, it's cheap. It's basically an arms race that #opensource #openknowledge were never designed to wage. My only hope is that the #cyberpunk spirit will reorganise itself along those faultlines and fight the good fight.

Replied in thread

**Petra van Cronenburg** @NatureMC@mastodon.online · Mar 28 *

Mar 28 *

Petra van Cronenburg @NatureMC@mastodon.online

@susankayequinn Here's another article by @brianmerchant : https://www.bloodinthemachine.com/p/openais-studio-ghibli-meme-factory
"AI giants are indeed eating away at the livelihoods and dignity of working artists, and this devouring, appropriating, and automation of the production of art, of culture, at a scale truly never seen before, should not be underestimated as a menace"

Blood in the Machine · Mar 27OpenAI's Studio Ghibli meme factory is an insult to art itselfBy Brian Merchant

#AI #OpenAI #StudioGhibli

**Petra van Cronenburg** @NatureMC@mastodon.online · Mar 27

Mar 27

Petra van Cronenburg @NatureMC@mastodon.online

"GPT-4o is partly (aside from some licensed content) a product of a massive scrape of the Internet without regard to copyright or consent from artists ... GPT-4o's image generation model (and the technology behind it, once open source) feels like it further erodes trust in remotely produced media ... Everyone needs media literacy skills ..." https://arstechnica.com/ai/2025/03/openais-new-ai-image-generator-is-potent-and-bound-to-provoke/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social via @arstechnica

Ars Technica · Mar 27OpenAI’s new AI image generator is potent and bound to provokeBy Benj Edwards

#AI #generativeAI #imageGenerator

Replied in thread

**Petra van Cronenburg** @NatureMC@mastodon.online · Mar 24

Mar 24

Petra van Cronenburg @NatureMC@mastodon.online

@Garwboy As a friend of biodiversity I had nearly stopped reading until there: "I like all of those creatures. I find them fascinating, and they occupy important roles in our society and ecosystem. I would never say that about Mark Zuckerberg."
But now I dream of writer troll farms using your inspiring idea to train #AI: https://theneuroscienceofeverydaylife.substack.com/p/an-article-for-meta-to-use-to-train Great! Made my day.
@writing @writers @writerscommunity

The Neuroscience of Everyday Life · Mar 21An article for Meta to use to train their AIBy Dean Burnett

#writers #authors #LLM

**Petra van Cronenburg** @NatureMC@mastodon.online · Mar 23

Mar 23

Petra van Cronenburg @NatureMC@mastodon.online

Yesterday I made a test, warned against this account with a hashtag of the name and a certain bird, and promptly got the #scam again. It's the sign that this paragon of a #troll factory or a narcissistic bot tinkerer hopping instances is not reacting randomly. Don't just block it, it's important to #report it so that it finally comes to an end. Don't click the links. If it's #scraping, a joke, or an attack on the Fediverse: a #fediblock would be fine! The phrase pattern could be filtered.

Screenshot of the well-known fake account which is neither this woman nor anything real.

Continued thread

**Sumana Harihareswara** @brainwane@social.coop · Feb 4

Feb 4

Sumana Harihareswara @brainwane@social.coop

And now @leonardr has "released the biggest update to Beautiful Soup in many years." Upgrade to 4.13.1 to enjoy better warnings, type hints, generated API docs, and more.

https://wandering.shop/@leonardr/113935996322582259

The Wandering ShopLeonard Richardson (@leonardr@wandering.shop)After a beta process lasting nearly a year, I've released Beautiful Soup 4.13.0. Adds type hints, generated API docs, and access to the low-level matching API. Highlights: https://www.crummy.com/2025/02/02/0 Full changelog: https://git.launchpad.net/beautifulsoup/tree/CHANGELOG

#Python #civictech #opensource

**AlgorithmWatch** @algorithmwatch@chaos.social · Feb 2 *

Feb 2 *

AlgorithmWatch @algorithmwatch@chaos.social

The EU’s #AIAct prohibitions are now in effect! But gaps remain. Learn more: https://algorithmwatch.org/en/ai-act-prohibitions-february-2025/

Now banned in the EU: #ManipulativeAI, AI that exploits people's vulnerabilities, #SocialScoring, #Scraping of facial images on the internet, Live #FaceRecognition in Public Spaces. Others are partially banned, like #PredictivePolicing, #EmotionRecognition, and more.

**FiXato** @FiXato@toot.cat · Jan 25 *

Jan 25 *

FiXato @FiXato@toot.cat

It's interesting how big companies used to whine and complain about users #scraping their content to provide innovative mashups and alternative interfaces, but now they are turning the tables to scrape user content in the name of #AI / #LLM

**Dobody** @dobody@mastodon.design · Jan 23

Jan 23

Dobody @dobody@mastodon.design

How would one theoretically use #scraping from different sites of #events organizers and generate an #icalendar file to easily get notified of events in their city or region for themselves or their community?

This is to avoid using #meta as a source that many rely on for lack of alternatives (that are actually invested).

#webscraping #quitmeta

Replied in thread

**Toni Aittoniemi** @gimulnautti@mastodon.green · Dec 27, 2024 *

Dec 27, 2024 *

Toni Aittoniemi @gimulnautti@mastodon.green

@khobochka We need an international co-operative system of making these parties pay for scraping. It includes legislative changes. At the same time it can become a real-time pricing market for ”rights to scrape” and for creators to get paid.

Here’s my whitepaper for a solution. Absolutely no cryptocurrency involved.

#ai #scraping #copyright #technology #whitepaper

https://docs.google.com/document/d/18cz-ZX1copCYiC4C2ReY8GLJjuhG2IH0MEBGaoSJhP4/edit

Google DocsCommit-to-paying-by-scraping: A market-based model of re-introducing value feedback into an AI-based information economyCommit-to-paying-by-scraping: A market-based model of re-introducing value feedback into an AI-based information economy The next few years are likely to become an important turning point in the history of humankind and our technology. The coming years might very well determine whether we build t...

**Farooq Karimi Zadeh** @farooqkz@blackrock.city · Dec 19, 2024

Dec 19, 2024

Farooq Karimi Zadeh @farooqkz@blackrock.city

TIL

"Earlier this year, Microsoft-owned LinkedIn came under similar scrutiny for toggling on a feature that allows the company to scrape user data for AI training. The UK's International Commissioner's Office forced LinkedIn to stop doing that with UK user data. LinkedIn still scrapes US user data by default; disable it by visiting Settings > Data Privacy > Data for Generative AI Improvement."

https://www.pcmag.com/news/microsoft-we-dont-use-your-word-excel-data-for-ai-training

#Microsoft #LLM #AI

Replied in thread

**JdeBP** @JdeBP@mastodon.scot · Dec 19, 2024

Dec 19, 2024

JdeBP @JdeBP@mastodon.scot

For those (like me!) looking for what @cstross is referring to:

https://www.gov.uk/government/consultations/copyright-and-artificial-intelligence

GOV.UKCopyright and Artificial IntelligenceThis consultation seeks views on how the government can ensure the UK’s legal framework for AI and copyright supports the UK creative industries and AI sector together.

#Copyright #UKLaw #AI

**anorax** @l_inadapte@framapiaf.org · Dec 17, 2024

Dec 17, 2024

anorax @l_inadapte@framapiaf.org

Je cherche à scraper une page d'un site web avec FreshRSS. Et ben… c'est une certitude :
je suis nul.

Ça existe une app en ligne pour (semi-)automatiser ça ?

#WebsiteScraping #scraping

**Erik Jonker** @ErikJonker@mastodon.social · Oct 21, 2024

Oct 21, 2024

Erik Jonker @ErikJonker@mastodon.social

"What he discovered seems simple on its surface, but the quality of the result has deeper implications for the future of AI assistants, which may soon be able to see and interact with what we're doing on our computer screens."
https://arstechnica.com/ai/2024/10/cheap-ai-video-scraping-can-now-extract-data-from-any-screen-recording/
#AI #video #scraping

Ars Technica · Oct 17, 2024Cheap AI “video scraping” can now extract data from any screen recordingBy Benj Edwards

**Toni Aittoniemi** @gimulnautti@mastodon.green · Oct 20, 2024 *

Oct 20, 2024 *

Toni Aittoniemi @gimulnautti@mastodon.green

Here’s a top pin!

My #market-based, publicly underpinned model for determining copyright liability payments in real-time for an information economies with #AI #scraping.

We have a choice of either a healthy #economy where being scraped pays those who produce the best information, or no economy at all where only lies, propaganda & bs are openly visible.

We can avoid creatives hiding their content behind closed doors out of fear of being scraped, but only if we act now!

https://docs.google.com/document/d/18cz-ZX1copCYiC4C2ReY8GLJjuhG2IH0MEBGaoSJhP4/edit

**Marcus "MajorLinux" Summers** @majorlinux@toot.majorshouse.com · Aug 7, 2024

Aug 7, 2024

Marcus "MajorLinux" Summers @majorlinux@toot.majorshouse.com

Digital Colonialism strikes again!

NVIDIA’s AI team reportedly scraped YouTube, Netflix videos without permission

https://www.engadget.com/ai/nvidias-ai-team-reportedly-scraped-youtube-netflix-videos-without-permission-204942022.html?src=rss&utm_source=press.coop

#Nvidia #AI #YouTube

**jbz** @jbz@indieweb.social · Jun 12, 2024

Jun 12, 2024

jbz @jbz@indieweb.social

Maven Imported 1.12 Million Fediverse Posts - @news

｢ A recent investigation by Liaizon Wakest revealed that Maven, a new social network founded by former OpenAI Team Lead Ken Stanley, has been importing a vast amount of statuses from Mastodon without anyone’s consent. Additionally, it’s pulling in Bluesky statuses connected via Bridgy Fed ｣

https://wedistribute.org/2024/06/maven-mastodon-posts/

We Distribute · Jun 12, 2024Maven Imported 1.12 Million Fediverse Posts (Updated) - We Distribute

Recent searches

Search options

Administered by:

Server stats:

#scraping