Deviant Login Shop  Join deviantART for FREE Take the Tour
×



Details

Submitted on
March 25, 2013
Link
Thumb

Stats

Views
2,948
Favourites
7 (who?)
Comments
33
×

Archiving like a boss! (UPDATE)

Journal Entry: Mon Mar 25, 2013, 8:37 AM
Update: Current as of 8/29/13. New download link: docs.google.com/file/d/0B38kVw…

Most of the description below still applies. Download is now in 7z format www.7-zip.org/ to avoid breaking some poor computer that tries to open it with the clunky windows zipper.

Addendum: This time, the archive rip consists of 83,784 files. No, they haven't written quite that many new fics in six months, I just didn't prune the lowest size entries this time. Previously, I deleted everything below 3 KB. These small files are either fics where no more than the title has been written, epubs that are broken for one reason or another (I suspect they're the very first stage of pre-publishing), or a few empty placeholders. Whatever they are, you can safely ignore them unless you're specifically looking for them and you know they're supposed to be complete, in which case you can download a replacement manually. I just kept everything so I'd have an exhaustive title reference while I work through the five-stars and four-stars so I don't run into anything that is completely missing. It's a bit less work now, since these come *almost* fully tagged, but I still have to gather them manually.

Also, remember... This has EVERYTHING on Fimfiction. Use caution unless you want to be scarred for life.






So I've managed to pull a copy of (as far as I can tell) every single story on FimFiction in EPUB format. That's forty-thousand files! Even with the best download manager I could find, using FimFiction's sequential numbering scheme, queueing 40,000 files was almost more than my poor computer could handle.

It's a raw rip, so the file names are a bit messy, for example "its-always-sunny-in-fillydelphia-story=20936.epub", though the Title and Author tags, at least, are intact. It weighs 1.13 GB, is current as of February 9, and may take up to 20 minutes to decompress due to the sheer number of files. If you are feeling adventurous, or are a fellow archive warrior or fanfic connoisseur, here's the download link. docs.google.com/file/d/0B38kVw… (Warning, this has EVERY fanfic on FimFiction, including the ones I'd rather not think about, and the ones that aren't much more than a title.)

I'm also in the process of making a much more organized and fully tagged Kindle Collection using only the fanfics that were good enough to make it to Equestria Daily. I am tagging each one, including their descriptions, completely by hand, thus it's obviously taking a while, and isn't ready for release yet. After 5 months, and 900 fanfics later, I've finished tagging the entirity of Nallar's Collection (nallar.me/fics/?order=length). Since that collection only contains fics that are hosted on Google Docs, I still have more work to do to get the others, thus the crazy FimFiction rip above. If that number on EqD is right, I have about 1500 more to go.

Obviously this endeavor will never really be complete, since new fics are being written all the time, but if I get enough interest, I will release the collection as it currently exists on my Kindle (just the stuff from Nallar).

(You may also be interested in my BGM and Everfree collections. fav.me/d5vrqn4)

(In case you were wondering, yes, I am indeed completely insane.)

  • Mood: Dumbfounded
  • Listening to: Everfree Radio
  • Reading: Twilight October
  • Watching: MLP-FIM (as usual)
  • Playing: Half Life 2
  • Eating: Hay fries
  • Drinking: Zap Apple Cider
Add a Comment:
 
:icondarkfur18:
Darkfur18 Featured By Owner Mar 23, 2014
I have every story of any importance (It has at least 1 chapter) archived from FIMfiction plus story data and images, ready to be hosted on an offline server.
It's a hefty 3.5 gigs.
Reply
:iconlahirien:
Lahirien Featured By Owner Mar 23, 2014  Hobbyist General Artist
When is it from?
Reply
:icondarkfur18:
Darkfur18 Featured By Owner Mar 23, 2014
I built a Linux program that does it automatically. It goes through the website like a boss.
Reply
:iconlahirien:
Lahirien Featured By Owner Mar 23, 2014  Hobbyist General Artist
Oh awesome! You sound like Nallar! nallar.me/fics/?order=length

I'm into archiving, so what I need is a script that can download all the epubs pointed at by download_epub.php?story=[1-nnnnnn] (not just the php pages like wget seems to do), ignore html files that mean there's no story for that number, then run as a cron job daily to top itself off. Is this something that can be done in a linux environment?

Also, where are all the views coming from on this old journal?
Reply
:icondarkfur18:
Darkfur18 Featured By Owner Mar 23, 2014
First, yes. With a little modification my script will do exactly what you want.
Second, Library of Equestria thread on /mlp/
Reply
:iconlahirien:
Lahirien Featured By Owner Mar 23, 2014  Hobbyist General Artist
Would it be possible for me to get a copy of your script for personal use? I have a little Linux box and NAS I've been playing around with that I think would be perfect for that. I can probably do the modifications myself, it's just getting something working from scratch that I never seem to have time for (thus my having to resort to DownloadStudio).
Reply
:icondarkfur18:
Darkfur18 Featured By Owner Mar 23, 2014
Here: www.dropbox.com/s/c7dqw2mthcm9…

Open terminal in folder containing files and type ./miner to run,
Super simple 2 question prompt!

Check the .epub files before you really get started because I don't have an epub reader, and tell me how it works.
Reply
:icondarkfur18:
Darkfur18 Featured By Owner Mar 23, 2014
Whoops, wrong one.
Here: www.dropbox.com/s/c7dqw2mthcm9…
Reply
(2 Replies)
:iconshadowflares:
Shadowflares Featured By Owner Mar 23, 2014
God tier archiving 
Reply
:iconsevensix1:
sevensix1 Featured By Owner Mar 22, 2014
Did you check for fics that were edited or deleted between the two rips?
Reply
Add a Comment: