GIF vs. WebM on 4chan – Part 2: Method & Quantitative Analysis

This is a series of three articles that present the outcome of a study of GIF and WebM usage on 4chan. All data and graphs, the following text relates to, can be found in part one of the series.

PART 1: The Data
PART 2: Quantitative Analysis & Method (this article)
PART 3: Qualitative Analysis (will follow)

 

Introduction

The image board 4chan introduced WebM support in April 2014. This was the first time that video files could be attached to posts, although it is an image board. But there are certain limitations that made WebM appear in a GIF-like way: They may not be larger than 3 MB and 2048×2048 pixels, not longer than 120 seconds. Also they don’t have sound. At least they didn’t for a few months – Since February 2015, 4chan also allows WebMs with sound.

Moot (former head of 4chan), announced the newly added WebM support and the mentioned restrictions for them in a blog post in April 2014. He stated that the limitations were set because the new file format is meant to provide better animation than GIFs, but not actual videos. Because of this intention it is a self-evident idea to inspect how WebM and GIF would be used along with each other. There are two boards that provide a good field for a survey like this: /gif and /wsg. The first one was founded in 2005 as a place to share any GIF, but mainly for a growing number of images that are not safe for work. Consequently, /wsg was opened in June 2012 as a place for worksafe GIFs, as moot himself explains in the first post on that board: “Board title says it all. Rules: 1. Post GIFs. 2. Keep it worksafe.”. At that time, /gif was the only board with a focus on animated image exchange and had no regulations regarding the image content. As rumour has it, an alternative board without explicit contents was demanded because of the growing percentage of pornographic images on /gif.

So, 4chan provides good circumstances for a comparison of GIF and WebM usage for two reasons. First, the limitations of WebM made these files very similar to GIF on the image boards, at least from April 2014 to January 2015. Second, two different use cases (pornographic and non-pornographic) appear in separate, but almost identical environments. The design of the two boards /gif and /wsg is basically the same, as well as the board rules, except for the “worksafe” rule. Even after a short first observation of the two image boards, one can get the impression that /gif is indeed mainly used to share pornographic material and also appears to be more frequented than /wsg. These two assumptions lead to the hypotheses, that the file format usage differs between both boards in a way, that H1) significantly more files are shared on /gif than on /wsg and that H2) the percentage of WebM files is higher on /gif than on /wsg. The second part of the hypothesis derives from the assumption that users prefer to watch pornographic content with high image quality, for which WebM is better suited than GIF. Besides those specific points of interest, the general intention of this study was to analyse the overall development of GIF and WebM posting behaviour on the /gif and /wsg boards of 4chan.

This study was conducted in two steps. At first, data was collected for the years 2013 and 2014 and the results were presented at the 31st annual Chaos Communication Congress in Hamburg. In fact, this presentation was the reason to start this survey in the first place. This first step focussed on the development of file usage behaviour since the introduction of WebM in April 2014 and a comparison with the time before this incident. Due to a lack of archive data for the /gif board, the survey was continued after the presentation to gather a broader database. This second step lasted from December 2014 to September 2015 and is summarised by this paper. During this period (on 31st January, to be precise), 4chan decided to allow WebMs with sound on its boards. The effects of this change are the focus of the second step of this survey.

Method and Database

Although this study depends on rather basic data, the architecture of 4chan makes it difficult to extract the relevant information. The image board is designed to archive only a certain amount of submissions and comments. In a way, the site offers a built-in obsolescence as a feature. Older threads (lists of comments, that all refer to one first submission, the “original post”) are not available any longer, when the limit of threads of the board is exceeded. The period for which a thread will appear on the board depends on the activity of the board and in the thread itself. A post on a highly frequented board might disappear within minutes, while it may stay for months or years on less active boards. Under these circumstances, a collection of data for a current “point” of time would be possible directly on 4chan, but not for a period that reaches further into past. But exactly this is the main interest of this study.

Fortunately, there are user-maintained archives of several 4chan boards that contain month- and year-old threads. These archives are based on the “Asagi” framework, which automatically gathers all new posts from 4chan regularly and stores a copy of them on external archive servers. A variety of these archives exist, but every one of them only provides archives for certain boards. This study makes use of the data from imcute.yt & archive.moe, where archives of the boards /gif and /wsg were to be found. These archives provided a very good database for this study, but nonetheless, some aspects must be kept in mind. These archives are no complete mirrors of 4chan. Technically, there is no restriction that would leave out certain posts while archiving a board. But of course, the archives only contain data starting from the day they began to collect it, which was after 4chan was founded. Very old threads can not be accessed this way. Also, if a copyright owner demands certain images or other content to be taken offline (DMCA takedown), they are deleted from the archives. Some of the archives present lists of these requests. Luckily the cases of deleted content are outnumbered by the much higher amount of still available entries, so they are barely noticeable in the sheer mass of data. Another problem was, that both of the mentioned archives went offline during the study, but more details on that later.

Only very specific data points are of interest for this survey. Image files or comment texts have not been collected. All information that was extracted is: 1) does a post contain a GIF or a WebM, 2) at what time and date has it been posted (yy-mm for the first phase of the study, yy-mm-dd for the second phase), 3) what is the size of that file (in bytes) and 4) what is the name of that file? 1 and 2 are essential for the survey, while 3 and 4 are additional data for optional further research in the future.

These data points were collected by using a Python script, written for this very purpose. First, it scans the indexes of /gif and /wsg archives for the URLs of every thread in the selected time period and thereby creates a link list for the further procedure. These links are then opened one after another and the script searches in the HTML of every thread for the four data points mentioned above and collects them in Excel tables. In the last step, the data has been merged into an overview table that organises the values for each month (first step of the study, June 2012 to December 2014) and each day (second step, from December 2014 to July 2015). This procedure was necessary, because upon request the maintainers of the archives could either not provide these statistical data or in the other case did not respond at all.

After the outcome of the first phase of the study was presented at the end of December 2015, it became obvious that the further development needed to be inspected to by continuing the survey, to build up a larger database. This way it could be checked whether the trends, that already showed up during the first period, would continue or change in the upcoming months. Also, to provide a more detailed data set, the data was not compared on a monthly basis, but with daily data points. Thus, it was possible to analyse if and how the amount of shared GIF/WebM files correlates with certain 4chan-related events such as server downtime or changes in regulations.

Data collection for the first part of the study was conducted on 26th December 2014 for the whole time period from January 2013 up to this date. Data for the second part was collected separately for each month, during the first days of the following month. Additionally, in the beginning of January, December data was collected again. One reason was to also include the last days of the year, the second reason was to gain daily data points instead of one, monthly value. Additionally, going further into the past, the period from June 2012 (when /wsg was opened) to December 2012 was also included, at least with each month as one data point and only for /wsg. “Asagi”-archives for /gif, that would go back that far, have already had been deleted at that time.

An exception to this procedure was March. As mentioned in the beginning, this study is a leisure time project and due to several reasons there was no time for data collection in April, so this was done for both March and April in the beginning of May. Unfortunately, imcute.yt, which delivered the data for the /gif board went offline sometime in April. Therefore there is no data for March and parts of April. Since the middle of April, archive.moe, which was beforehand only used as the data source for the /wsg board also provided archived threads from the /gif board, so since then the data is based completely on that archive. Even after intense research on every other 4chan archive that could be found, none of the other ones provided an archived /gif board (Comments or recommendations on that issue are much appreciated, in the case someone knows another archive or can provide a private mirror of the March/April archives.)

imcute.yt was not the first 4chan archive to close its doors. Before, there have been 4chandata.org and 4chanarchive.net, for example. The websites are still online, but not active any more and most of the content was deleted. Those alternatives have been taken into concern, because imcute.yt did not cover the time before October 2014, so the database for the /gif board is limited to only a few months. Eventually, even archive.moe was taken offline because of technical problems and data loss: “effective immediately we are shutting down archive.moe” This occurred in early September 2015 and marked the preliminary end point of this study. Meanwhile, another archive, based on Asagi, has been founded. It is called Desustorage and aims to provide a more stable and sustaining access to the archives.

Collected from about 30000 threads, the database consists of the same amount of Excel tables. 86275 GIFs and 133469 WebMs have been inspected in total, 64855/98262 of them on /gif and 21420/38207 of them on /wsg. The /wsg data stems from a time period from June 2012 until August 2015, the /gif data from October 2014 to February 2015 and, after one and a half month of lost data, from the middle of April until August 2015.

Findings (Quantitative Perspective)

For reasons that have already been outlined, the data covers different time periods for the two different image boards. Thus, the development of file usage will be first examined separately and later compared to each other.

For /wsg, the graphs show several phases during the last years with an alternating amount of images that have been posted. When the board was opened in June 2012, it had its highest activity and an all-time-high of 498.5 GIFs per day (median). In the following months, this early enthusiasm fell down to a lower level of roughly 150 GIFs per day. That amount was relatively stable for three months, before entering a period of higher activity in the first half of 2013 with its high point of 287.7 GIFs daily in June. Beginning in August of that year, the amount decreased at a high pace until the end of the year. In November, the median daily amount was below the 100 GIFs mark for the first time. 2014 started with a small peak, but beginning in March the Graph that shows the amount of GIFs enters a phase of ongoing decrease. Meanwhile, WebM enters as a new player in April 2014, but was at first used rather briefly for about nine months, with about 35 WebMs per day. This changed, when in January 2015 WebM sound support was introduced on 4chan. The outcome of this became visible immediately. The huge amount of WebMs in the week after the announcement of the new feature is clearly visible as a huge peak of the graph with 1374 WebMs on 31st January alone. From this date on the amount of WebMs stayed above the number of GIFs. As the GIF amount slowly continues to cease, WebM usage is steadily growing.

The data for /gif does not provide any insights for the time before October 2014. Thus, it is not possible to inspect the changes that happened to this board during the introduction of WebM on 4chan in April 2014. In contrast to /wsg, the percentage of WebMs was already relatively high in late 2014 and on several days in December even higher than the GIF percentage. The monthly median however was lower than for GIFs. In October this distance was still quiet obvious with 335.5 GIFs and 260.2 WebMs per day. In January 2015, the total number of WebMs of that month outnumbered GIFs for the first time, but only by a small margin: 343.9 GIFs and 354.4 WebMs per day (345.1, if the peak on 31st January is not included). Now, a second phase began. Beforehand, the courses of the GIF and WebM graphs were on about the same level, but beginning in February they went in opposite directions. Unfortunately, data for March and April could not be recovered from the 4chan archives, resulting in a gap in the graphs during that time. Nonetheless, their later development indicate a very clear trend: The amount of GIFs on /gif is reducing, while the amount of WebMs becomes increasingly higher. In August, the median daily amount of 266.2 GIFs was already less than half of the 653.9 WebMs.

The most notable difference between the /gif and the /wsg graphs is their different time span. Thus, a comparison of how WebM made its first appearance in April 2014 is not possible. However, the absolute numbers of GIF and WebM files on both boards provide a good enough basis for the discussion of the hypotheses. As it shows clearly in the graphs, the amount of posted files on /gif is located at a much higher magnitude than on /wsg. Looking on the median amount of files per day for each month, the different dimensions become clear. In 2015 on /gif this value changed from 343.9 GIFs and 354.4 WebMs in January to 266.2 GIFs and 653.9 WebMs in August. During the same period on /wsg the value changed from 120.6 GIFs and 83.3 WebMs in January to 58.9 GIFs and 221.3 WebMs in August. Even the rising amount of WebMs on /wsg had not yet caught up to the constantly decreasing amount of GIFs on /gif, which is a good illustration of the different rate of activity on these two boards.

[To be continued in the third part of this article series, with some qualitative analysis and the summary.]

Advertisements

One thought on “GIF vs. WebM on 4chan – Part 2: Method & Quantitative Analysis

write comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s