helvede.net is one of the many independent Mastodon servers you can use to participate in the fediverse.
Velkommen til Helvede, fediversets hotteste instance! Vi er en queerfeministisk server, der shitposter i den 9. cirkel. Welcome to Hell, We’re a DK-based queerfeminist server. Read our server rules!

Server stats:

171
active users

#scraping

3 posts3 participants0 posts today

I've set up my new #inkscape website AI bot trap. It works by giving everyone a chance to not fall into it.

An anchor link that says "I am a bot" and links to /P3W-451/{datetime}/ it's got a fixed position at top -100px so should never be seen

The robots.txt says "Disallow: /P3W-451/" so if you were reading the robots, you'd know.

Then #nginx logs the requests to a log of their ip-addresses and browser strings and sends them a 301 redirect to google.com

#ai #Scraping

1/2

Replied in thread

@nimi @papuass @stefan @freediverx yeah except you can't force bad actors to use your commercial API if they still have an open route in, that basically cost them next to nothing. It really doesn't matter #scraping isn't elegant. It works, it's cheap. It's basically an arms race that #opensource #openknowledge were never designed to wage. My only hope is that the #cyberpunk spirit will reorganise itself along those faultlines and fight the good fight.

Replied in thread

@susankayequinn Here's another article by @brianmerchant : bloodinthemachine.com/p/openai
"AI giants are indeed eating away at the livelihoods and dignity of working artists, and this devouring, appropriating, and automation of the production of art, of culture, at a scale truly never seen before, should not be underestimated as a menace"

Blood in the Machine · OpenAI's Studio Ghibli meme factory is an insult to art itselfBy Brian Merchant

"GPT-4o is partly (aside from some licensed content) a product of a massive scrape of the Internet without regard to copyright or consent from artists ... GPT-4o's image generation model (and the technology behind it, once open source) feels like it further erodes trust in remotely produced media ... Everyone needs media literacy skills ..." arstechnica.com/ai/2025/03/ope via @arstechnica

Ars Technica · OpenAI’s new AI image generator is potent and bound to provokeBy Benj Edwards
Replied in thread

@Garwboy As a friend of biodiversity I had nearly stopped reading until there: "I like all of those creatures. I find them fascinating, and they occupy important roles in our society and ecosystem. I would never say that about Mark Zuckerberg."
But now I dream of writer troll farms using your inspiring idea to train #AI: theneuroscienceofeverydaylife. Great! Made my day. 😂
@writing @writers @writerscommunity

The Neuroscience of Everyday Life · An article for Meta to use to train their AIBy Dean Burnett

Yesterday I made a test, warned against this account with a hashtag of the name and a certain bird, and promptly got the #scam again. It's the sign that this paragon of a #troll factory or a narcissistic bot tinkerer hopping instances is not reacting randomly. Don't just block it, it's important to #report it so that it finally comes to an end. Don't click the links. If it's #scraping, a joke, or an attack on the Fediverse: a #fediblock would be fine! The phrase pattern could be filtered.

It's interesting how big companies used to whine and complain about users #scraping their content to provide innovative mashups and alternative interfaces, but now they are turning the tables to scrape user content in the name of #AI / #LLM

TIL

"Earlier this year, Microsoft-owned LinkedIn came under similar scrutiny for toggling on a feature that allows the company to scrape user data for AI training. The UK's International Commissioner's Office forced LinkedIn to stop doing that with UK user data. LinkedIn still scrapes US user data by default; disable it by visiting Settings > Data Privacy > Data for Generative AI Improvement."

pcmag.com/news/microsoft-we-do

#Microsoft#LLM#AI

"What he discovered seems simple on its surface, but the quality of the result has deeper implications for the future of AI assistants, which may soon be able to see and interact with what we're doing on our computer screens."
arstechnica.com/ai/2024/10/che
#AI #video #scraping

Ars Technica · Cheap AI “video scraping” can now extract data from any screen recordingBy Benj Edwards

Here’s a top pin!

My #market-based, publicly underpinned model for determining copyright liability payments in real-time for an information economies with #AI #scraping.

We have a choice of either a healthy #economy where being scraped pays those who produce the best information, or no economy at all where only lies, propaganda & bs are openly visible.

We can avoid creatives hiding their content behind closed doors out of fear of being scraped, but only if we act now!

docs.google.com/document/d/18c

Google DocsCommit-to-paying-by-scraping: A market-based model of re-introducing value feedback into an AI-based information economyCommit-to-paying-by-scraping: A market-based model of re-introducing value feedback into an AI-based information economy The next few years are likely to become an important turning point in the history of humankind and our technology. The coming years might very well determine whether we build t...

🚨 Maven Imported 1.12 Million Fediverse Posts - @news

「 A recent investigation by Liaizon Wakest revealed that Maven, a new social network founded by former OpenAI Team Lead Ken Stanley, has been importing a vast amount of statuses from Mastodon without anyone’s consent. Additionally, it’s pulling in Bluesky statuses connected via Bridgy Fed 」

wedistribute.org/2024/06/maven

We Distribute · Maven Imported 1.12 Million Fediverse Posts (Updated) - We Distribute
More from We Distribute