Shiny new torrents are up!!
Shiny new torrents are up!!
5 more torrents added to the antifa torrent seeding server this morning.
If you have or know of a good torrent for government data preservation, please send me a link!! The server has plenty of TB.
Found my first DVD that just straight up will not read because of delamination of the disc. I have "a" copy of these episodes transcoded, but I'm working on re-transcoding to AV1, adding the "No laugh track" audio I missed last time, etc. and am planning on keeping ISOs of these discs for long term usage. I may have to re-order a copy of just this season so I can rip it properly.
Just discovered ArchiveBox — FOSS, self-hosted internet archiving.
The way the web is going, with the US government redacting and outright erasing historic content, publishers segmenting content by region (and also sometimes redacting/censoring it), and CloudFlare shitting all over everything, I think it's time for me to start my #archiving and #DataHoarding journey.
Amazon will remove the ability to download the ebooks for Kindle at the end of the month. So if you ever close your amazon account, you'll no longer be able to access the books you had bought.
Let's fix that
1. Bulk Exporter: https://github.com/treetrum/amazon-kindle-bulk-downloader
2. Calibre to manage books https://calibre-ebook.com/download
3. Calibre plugin to remove DRM: https://github.com/noDRM/DeDRM_tools/releases
Source: https://bsky.app/profile/remysharp.com/post/3lihtiq2rqc22
#TechHelp please (calling any #DataHoarder out here):
is there a way I can archive web pages (not the entire website, just single pages) locally en masse, kind of like Adobe Encoder? Just give it a list of links and off it goes?
Bonus points for:
Retaining all of the media therein (like video, images, links, audio...)
Either being able to get around paywalls, or letting me know which pages have them (so, I guess including that info in any error messages)
Being able to omit ads (but I don't care too much about this)
Having some sort of UI (Windows user; extremely uncomfortable with command line-type stuff)
I now have my own fully search-able mirror of #Kiwix hosted on my home server, including #Wikipedia. You can check it out at:
Content may change as time goes on since I literally "just" got it set up and working, and you should definitely prioritize the original resources and donate to the folks hosting it. But IF some of these resources become unavailable from their original source, feel free to use my mirror as long as it's up.
Adding the drives makes the fans spin up for about 30 seconds. Ignore the ancient Sun stuff.
#datahoarder #homelab
In case you want to follow along, I'm adding two 24TB drives as a mirror. The existing pool is four 18TB drives as two mirrors. The new pool will be 18 + 18 + 24 as three mirrors.
#datahoarder #homelab
My server is full. You know what that means?!
(I add lines to this spreadsheet every time I do a price check. The last actual purchase was 2021)
I'm going to add 2 24TB drives.
EDIT: ARCHIVE.ORG sets have been fixed. Let the downloading commence!
My US government data hoarding page is up and ready with links and torrents. The torrents are all being seeded by my junkbox torrent server. I will continue to add torrents as I download things.
Due to the current data preservation emergency, I'm pretty sure I'm going to find at least 20 terabytes to stuff into an old 8 drive tower that I'm building out of junk box parts and then I will host government data mirrors that the fascists have wiped out. #datahoarder #datapreservation #fascists
And now @leonardr has "released the biggest update to Beautiful Soup in many years." Upgrade to 4.13.1 to enjoy better warnings, type hints, generated API docs, and more.
I usually leave a new laptop sticker-free for a few months. I've had this MacBook Air for over a year and have finally broken it in with a #sticker from @molly0xfff, which arrived last week and is even more timely now than when I ordered it. They are available from https://store.mollywhite.net/collections/stickers
#Archives #DigitalPreservation #DataHoarder
Are you a fellow data hoarder? Have some spare terabytes? Start here:
https://commoncrawl.org/blog/january-2025-crawl-archive-now-available
https://meta.wikimedia.org/wiki/Data_dump_torrents#English_Wikipedia
https://github.com/end-of-term/eot2024
https://github.com/internetarchive/dweb-mirror
https://archive.org/details/20250128-cdc-datasets
#TIL about the Internet History Initiative (@IHI). It's a website that focuses on historical relevant public data sets. As a #datanerd and #datahoarder of #internet data, I appreciate that something like this spun up.
However, I am shocked, I haven't heard from it so far. Although, it's online since January 2024 already! Will definitely start to keep an eye on it.
Edit: Forgot to link the website: internethistoryinitiative.org
As I'm sure many of you are already aware, the ReproductiveRights.gov website has already vanished.
If you want to save a copy for yourself and others, for now archive.org still has a copy.
http://web.archive.org/web/20250114100235/https%3A%2F%2Freproductiverights%2Egov/
Repeating an important PSA for the 10th time. And now, this is more important than ever. Is your data in the US Cloud? If it is, get it downloaded local. See something neat online? Download that shit.
I've always been a data hoarder. Hence, my Rip Tower 2000 #datahoarder #digitalpreservation https://youtu.be/iQwO5UNRtl0