Tuesday, 6 July 2010

FanFiction.Net story totals

Good news!

Our research venture has completed gathering data about site-wide story numbers. This post explains how many stories FanFiction.Net (FFN) really has.

The data in our evaluations has been generated based on the total number of stories posted on June 25th, 2010. The gathered data has been in processing since June 25th, 2010 till June 30th, 2010. We treat it as spatial or point collection; 5 days = 1 instance.

At the time of collection, there has been a total of 6,085,534 registered story entries, based on the newest registered story number in the Just In section on June 25th, 2010.[1].

However, we understand that some stories are deleted, and their ID number is not taken out from the database to be recycled for a new story.[2] Instead, the list carries on, and every newly posted story receives a number higher than the previous.

(An additional explanation for younger readers: you submit a story, and it gets an ID number in FFN's database, so everyone could easily find them. Let's say your ID is 123. If you know that, you can easily make a link without having to copy anything because all stories on FFN have http://www.fanfiction.net/s/*your story ID*. When someone posts a story right after you, their ID is 124, then 125, 126 and so on and so forth. Say, the site got to story number 140, but story 128 has been deemed illegal because it was about living actors, and deleted by the FFN staff, so nobody would sue. What do we have? We have numbers from 1 till 140, but 128 has been deleted. You can't know it has been deleted, by the way, because you're not the writer, and the only way to find out is to check. There are now 139 stories on the site even though it looks like there is 140. Thing is, on a site as big as FFN, you can't just guess how many numbers are 'blank' like that.)

It is the main reason for this analysis: the number FanFiction.Net presents to you is not the total number of stories it has at the moment, but a sum of all fanworks it had at every moment of time available to the public. The key term is 'available to the public' because FFN, according to their ToS, keeps server copies of user submissions. It is reasonable to assume that the real number of stories we can see now (dated June 25th, 2010) is not over 6 million.

We're implementing two methods to reach the data. The first is doing an account of all stories present in all ten top categories and crossovers such as this. Surely, it is a lot of very repetitive and dull labour, but it gives us the exact number, which is: 3,256,278 stories.

As of June 25th, 2010 there are 3,256,278 stories noted as accessible to the public on FanFiction.Net.

This is an accurate number, but it is not 100% of what the story number has been. Why? We made a top category account, without having to rummage through every single fandom, opening it like this. Why is this important? The number of stories in the top category window is always bigger (or even, when the fandom is inactive [has less than 50 stories]) than the real number of fictions one can browse inside the category. The researchers cannot provide you a firm answer on this discrepancy, but it may be attributed to dynamics of stories being deleted at a slower rate than they are added (for example, if you upload a story by mistake, and delete it, you raise the top category number of stories, and it stays above the real number even though you can no longer find the story, a server delay).

It has been determined that, depending on the fandom, the real number is from 0,19% to 5% smaller than the one provided. In large categories, the weight of which forces the researchers to consider them, this number teeters closer to the first value. Now, it might not seem substantial, but Twilight with its 150,000 uploads may have up to 2000 dead stories counted as alive every day. To be completely fair to the estimate, we are multiplying the number by an arbitrary 0.987 coefficient, which best describes the current number of stories, as seen in ten most popular, story-wise, subcategories of Books, Anime and others, except crossovers. Since they make up the trending bulk of FFN, their averages have been considered.

Here is a better estimate, statistically not different than the first, but more exact for the human eye: 3,213,946.

What does that say to you? FanFiction.Net is only 54%-53% (without/with 0.987 coefficient) of what it appears to the layman, with the remaining 46%-47% being deleted content. As such, you may take it that every second story is destined to be deleted, and out of every two stories You post only one will survive (statistically).

What about the second method? Aside from these real numbers taken in raw, the research includes a sample of 1100 randomly generated story IDs with a range [1;6085534], which allows the research to continue with case study at a 3% error margin and a 95,34% confidence level. The survivability estimate taken from the sample size is 55%, which is within the 3% acceptable error and statistically identical to 54%-53%, received with the help of raw data. For future studies, this means our method of sampling follows the general population's characteristics.

In conclusion, there are 3,213,946 stories on FanFiction.Net at the time of our study, and nearly half of all stories posted will sooner or later disappear. How soon? Come back later to find out!

Should you require additional data, requests can be made in the comments, emailed to Lord Kelvin or posted in the Literate Union forum. The list used in our sample can be found here:
http://www.usbupload.com/23228_FFNstatsdatadoc.usb
http://www.usbupload.com/23227_samplelinksFFNdoc.usb

2 comments:

  1. Wow. I'm shocked — I would've never guessed that that many fics have been deleted.
    I think a good question here is: How can half of the stories ever posted to FFN be deleted and yet quality control still is by far the largest issue FFN has? It's kind of absurd, really.

    ReplyDelete
  2. Just goes to show you how bad the problem is. The ones that are still there must be the tip of the iceberg when it comes to quality...

    That, or a lot of people (tried to) write for Twilight.

    ReplyDelete