GIF library updated

The GIF library on this blog is up to date again. Besides adding several research papers and links, I reorganised the list of articles a bit. Academic papers are now organised in sub-categories:

  • GIFs and Film Studies / Pre-Cinéma
  • Emotions and self expression in GIFs
  • GIF users behaviour / appropriation / Memes
  • Technical Aspects / Hacking GIF files
  • Other Research Papers

Enjoy!

Advertisements

GIF vs. WebM on 4chan – Part 2: Method & Quantitative Analysis

This is a series of three articles that present the outcome of a study of GIF and WebM usage on 4chan. All data and graphs, the following text relates to, can be found in part one of the series.

PART 1: The Data
PART 2: Quantitative Analysis & Method (this article)
PART 3: Qualitative Analysis (will follow)

 

Introduction

The image board 4chan introduced WebM support in April 2014. This was the first time that video files could be attached to posts, although it is an image board. But there are certain limitations that made WebM appear in a GIF-like way: They may not be larger than 3 MB and 2048×2048 pixels, not longer than 120 seconds. Also they don’t have sound. At least they didn’t for a few months – Since February 2015, 4chan also allows WebMs with sound.

Moot (former head of 4chan), announced the newly added WebM support and the mentioned restrictions for them in a blog post in April 2014. He stated that the limitations were set because the new file format is meant to provide better animation than GIFs, but not actual videos. Because of this intention it is a self-evident idea to inspect how WebM and GIF would be used along with each other. There are two boards that provide a good field for a survey like this: /gif and /wsg. The first one was founded in 2005 as a place to share any GIF, but mainly for a growing number of images that are not safe for work. Consequently, /wsg was opened in June 2012 as a place for worksafe GIFs, as moot himself explains in the first post on that board: “Board title says it all. Rules: 1. Post GIFs. 2. Keep it worksafe.”. At that time, /gif was the only board with a focus on animated image exchange and had no regulations regarding the image content. As rumour has it, an alternative board without explicit contents was demanded because of the growing percentage of pornographic images on /gif.

So, 4chan provides good circumstances for a comparison of GIF and WebM usage for two reasons. First, the limitations of WebM made these files very similar to GIF on the image boards, at least from April 2014 to January 2015. Second, two different use cases (pornographic and non-pornographic) appear in separate, but almost identical environments. The design of the two boards /gif and /wsg is basically the same, as well as the board rules, except for the “worksafe” rule. Even after a short first observation of the two image boards, one can get the impression that /gif is indeed mainly used to share pornographic material and also appears to be more frequented than /wsg. These two assumptions lead to the hypotheses, that the file format usage differs between both boards in a way, that H1) significantly more files are shared on /gif than on /wsg and that H2) the percentage of WebM files is higher on /gif than on /wsg. The second part of the hypothesis derives from the assumption that users prefer to watch pornographic content with high image quality, for which WebM is better suited than GIF. Besides those specific points of interest, the general intention of this study was to analyse the overall development of GIF and WebM posting behaviour on the /gif and /wsg boards of 4chan.

This study was conducted in two steps. At first, data was collected for the years 2013 and 2014 and the results were presented at the 31st annual Chaos Communication Congress in Hamburg. In fact, this presentation was the reason to start this survey in the first place. This first step focussed on the development of file usage behaviour since the introduction of WebM in April 2014 and a comparison with the time before this incident. Due to a lack of archive data for the /gif board, the survey was continued after the presentation to gather a broader database. This second step lasted from December 2014 to September 2015 and is summarised by this paper. During this period (on 31st January, to be precise), 4chan decided to allow WebMs with sound on its boards. The effects of this change are the focus of the second step of this survey.

Method and Database

Although this study depends on rather basic data, the architecture of 4chan makes it difficult to extract the relevant information. The image board is designed to archive only a certain amount of submissions and comments. In a way, the site offers a built-in obsolescence as a feature. Older threads (lists of comments, that all refer to one first submission, the “original post”) are not available any longer, when the limit of threads of the board is exceeded. The period for which a thread will appear on the board depends on the activity of the board and in the thread itself. A post on a highly frequented board might disappear within minutes, while it may stay for months or years on less active boards. Under these circumstances, a collection of data for a current “point” of time would be possible directly on 4chan, but not for a period that reaches further into past. But exactly this is the main interest of this study.

Fortunately, there are user-maintained archives of several 4chan boards that contain month- and year-old threads. These archives are based on the “Asagi” framework, which automatically gathers all new posts from 4chan regularly and stores a copy of them on external archive servers. A variety of these archives exist, but every one of them only provides archives for certain boards. This study makes use of the data from imcute.yt & archive.moe, where archives of the boards /gif and /wsg were to be found. These archives provided a very good database for this study, but nonetheless, some aspects must be kept in mind. These archives are no complete mirrors of 4chan. Technically, there is no restriction that would leave out certain posts while archiving a board. But of course, the archives only contain data starting from the day they began to collect it, which was after 4chan was founded. Very old threads can not be accessed this way. Also, if a copyright owner demands certain images or other content to be taken offline (DMCA takedown), they are deleted from the archives. Some of the archives present lists of these requests. Luckily the cases of deleted content are outnumbered by the much higher amount of still available entries, so they are barely noticeable in the sheer mass of data. Another problem was, that both of the mentioned archives went offline during the study, but more details on that later.

Only very specific data points are of interest for this survey. Image files or comment texts have not been collected. All information that was extracted is: 1) does a post contain a GIF or a WebM, 2) at what time and date has it been posted (yy-mm for the first phase of the study, yy-mm-dd for the second phase), 3) what is the size of that file (in bytes) and 4) what is the name of that file? 1 and 2 are essential for the survey, while 3 and 4 are additional data for optional further research in the future.

These data points were collected by using a Python script, written for this very purpose. First, it scans the indexes of /gif and /wsg archives for the URLs of every thread in the selected time period and thereby creates a link list for the further procedure. These links are then opened one after another and the script searches in the HTML of every thread for the four data points mentioned above and collects them in Excel tables. In the last step, the data has been merged into an overview table that organises the values for each month (first step of the study, June 2012 to December 2014) and each day (second step, from December 2014 to July 2015). This procedure was necessary, because upon request the maintainers of the archives could either not provide these statistical data or in the other case did not respond at all.

After the outcome of the first phase of the study was presented at the end of December 2015, it became obvious that the further development needed to be inspected to by continuing the survey, to build up a larger database. This way it could be checked whether the trends, that already showed up during the first period, would continue or change in the upcoming months. Also, to provide a more detailed data set, the data was not compared on a monthly basis, but with daily data points. Thus, it was possible to analyse if and how the amount of shared GIF/WebM files correlates with certain 4chan-related events such as server downtime or changes in regulations.

Data collection for the first part of the study was conducted on 26th December 2014 for the whole time period from January 2013 up to this date. Data for the second part was collected separately for each month, during the first days of the following month. Additionally, in the beginning of January, December data was collected again. One reason was to also include the last days of the year, the second reason was to gain daily data points instead of one, monthly value. Additionally, going further into the past, the period from June 2012 (when /wsg was opened) to December 2012 was also included, at least with each month as one data point and only for /wsg. “Asagi”-archives for /gif, that would go back that far, have already had been deleted at that time.

An exception to this procedure was March. As mentioned in the beginning, this study is a leisure time project and due to several reasons there was no time for data collection in April, so this was done for both March and April in the beginning of May. Unfortunately, imcute.yt, which delivered the data for the /gif board went offline sometime in April. Therefore there is no data for March and parts of April. Since the middle of April, archive.moe, which was beforehand only used as the data source for the /wsg board also provided archived threads from the /gif board, so since then the data is based completely on that archive. Even after intense research on every other 4chan archive that could be found, none of the other ones provided an archived /gif board (Comments or recommendations on that issue are much appreciated, in the case someone knows another archive or can provide a private mirror of the March/April archives.)

imcute.yt was not the first 4chan archive to close its doors. Before, there have been 4chandata.org and 4chanarchive.net, for example. The websites are still online, but not active any more and most of the content was deleted. Those alternatives have been taken into concern, because imcute.yt did not cover the time before October 2014, so the database for the /gif board is limited to only a few months. Eventually, even archive.moe was taken offline because of technical problems and data loss: “effective immediately we are shutting down archive.moe” This occurred in early September 2015 and marked the preliminary end point of this study. Meanwhile, another archive, based on Asagi, has been founded. It is called Desustorage and aims to provide a more stable and sustaining access to the archives.

Collected from about 30000 threads, the database consists of the same amount of Excel tables. 86275 GIFs and 133469 WebMs have been inspected in total, 64855/98262 of them on /gif and 21420/38207 of them on /wsg. The /wsg data stems from a time period from June 2012 until August 2015, the /gif data from October 2014 to February 2015 and, after one and a half month of lost data, from the middle of April until August 2015.

Findings (Quantitative Perspective)

For reasons that have already been outlined, the data covers different time periods for the two different image boards. Thus, the development of file usage will be first examined separately and later compared to each other.

For /wsg, the graphs show several phases during the last years with an alternating amount of images that have been posted. When the board was opened in June 2012, it had its highest activity and an all-time-high of 498.5 GIFs per day (median). In the following months, this early enthusiasm fell down to a lower level of roughly 150 GIFs per day. That amount was relatively stable for three months, before entering a period of higher activity in the first half of 2013 with its high point of 287.7 GIFs daily in June. Beginning in August of that year, the amount decreased at a high pace until the end of the year. In November, the median daily amount was below the 100 GIFs mark for the first time. 2014 started with a small peak, but beginning in March the Graph that shows the amount of GIFs enters a phase of ongoing decrease. Meanwhile, WebM enters as a new player in April 2014, but was at first used rather briefly for about nine months, with about 35 WebMs per day. This changed, when in January 2015 WebM sound support was introduced on 4chan. The outcome of this became visible immediately. The huge amount of WebMs in the week after the announcement of the new feature is clearly visible as a huge peak of the graph with 1374 WebMs on 31st January alone. From this date on the amount of WebMs stayed above the number of GIFs. As the GIF amount slowly continues to cease, WebM usage is steadily growing.

The data for /gif does not provide any insights for the time before October 2014. Thus, it is not possible to inspect the changes that happened to this board during the introduction of WebM on 4chan in April 2014. In contrast to /wsg, the percentage of WebMs was already relatively high in late 2014 and on several days in December even higher than the GIF percentage. The monthly median however was lower than for GIFs. In October this distance was still quiet obvious with 335.5 GIFs and 260.2 WebMs per day. In January 2015, the total number of WebMs of that month outnumbered GIFs for the first time, but only by a small margin: 343.9 GIFs and 354.4 WebMs per day (345.1, if the peak on 31st January is not included). Now, a second phase began. Beforehand, the courses of the GIF and WebM graphs were on about the same level, but beginning in February they went in opposite directions. Unfortunately, data for March and April could not be recovered from the 4chan archives, resulting in a gap in the graphs during that time. Nonetheless, their later development indicate a very clear trend: The amount of GIFs on /gif is reducing, while the amount of WebMs becomes increasingly higher. In August, the median daily amount of 266.2 GIFs was already less than half of the 653.9 WebMs.

The most notable difference between the /gif and the /wsg graphs is their different time span. Thus, a comparison of how WebM made its first appearance in April 2014 is not possible. However, the absolute numbers of GIF and WebM files on both boards provide a good enough basis for the discussion of the hypotheses. As it shows clearly in the graphs, the amount of posted files on /gif is located at a much higher magnitude than on /wsg. Looking on the median amount of files per day for each month, the different dimensions become clear. In 2015 on /gif this value changed from 343.9 GIFs and 354.4 WebMs in January to 266.2 GIFs and 653.9 WebMs in August. During the same period on /wsg the value changed from 120.6 GIFs and 83.3 WebMs in January to 58.9 GIFs and 221.3 WebMs in August. Even the rising amount of WebMs on /wsg had not yet caught up to the constantly decreasing amount of GIFs on /gif, which is a good illustration of the different rate of activity on these two boards.

[To be continued in the third part of this article series, with some qualitative analysis and the summary.]

GIF vs. WebM on 4chan – Part 1: The Data

One year ago I conducted a study about GIF and WebM usage on 4chan and presented the first results at 32C3 (see here). Since then, I continued to collect data for 8 more months until August 2015*.

And now I finally publish the findings. As this is a leisure time project, I hope you understand that it took some time to finish it. Also, it is divided into three separate articles to make it more accessible:

PART 1: The Data (this article)

PART 2: Quantitative Analysis & Method

PART 3: Qualitative Analysis (will follow)

Okay, before you scroll down for all the shiny graphs and tables, take a second for some basic information about the study:

What’s the idea?
The hypothesis was, that the 4chan /gif board, as it contains material that is considered to be “not suitable for work”, is more active in posting images than the “worksafe” /wsg board. Also, I assumed that /gif would use more WebM files instead of GIFs than /wsg, because WebMs provide a better image quality at a smaller file size, which – as I thought – might be highly appreciated for sharing (mostly) pornographic content.

*Why this time period?
I wanted to inspect the changes since April 2014, when WebM could be used on 4chan for the first time. A second paradigm shift was at the end of January 2015, when WebMs on 4chan were allowed to have sound. And for looking at a change it is good to look at older data, too. That’s why I included it back to 2012 (at least for the /wsg board). The end of the study in August 2015 is due to the circumstances explained in the next paragraph…

Why are there holes in the dataset?
I extracted the data from 4chan archives, because 4chan itself deletes them after a while. But until now, every single one of the archives I worked with eventually went offline. So I wasn’t able to find a data source for the time before fall 2014 for the /gif board. And there is also a lack of data in spring 2015. However, meanwhile I found a new archive for the ongoing time after August 2015, but I will care about that in a while.

How have the data been extracted?
Using a Python-Script that read it from the HTML of the threads in the /gif and /wsg boards (in the archives, not on 4chan itself). More details will follow in part 2.

Ok, ready?

Click to enlarge the images.

First, there are two tables for the monthly amount of image files on both /wsg and /gif board. Notice the gaps in the data, as I explained earlier. It becomes very obvious which board is more populated by Animations.

absolute amount of images on /wsg

absolute amount of images on /gif

The next two graphs represent the same dataset. This time not the absolute numbers, but the average amount of files per day in each month.

EDIT: The next two images have been updated. Before, they have been older/incorrect versions. Sorry!

median_wsg

median_gif

These next graphs show the data from December to August on a daily base. This way, the details of the development become visible. For example the huge peak at the end of January or some low points during 4chan’s server downtime. More details on that will follow in part 3.
You can see two versions of each graph. The first ones show an overview and the other (flat and greyscale) ones are thumbnails of very large graphs that show the exact data for each day in that time period. Click them to see all the details.

/wsg from January to August

/gif from January to August

thumbnail

thumbnail

A first summary:

  • Users on /gif share more animations than those on /wsg.
  • On /gif WebM files quickly began to outnumber GIF files, while on /wsg this trend started later and the difference increases more slowly.
  • The total amount of animations slowly but steadily grows on both boards respectively.

To be continued :)

Major Update of “The GIF Library”

I just added a ton of links to the “Library” section on this blog, mainly research papers. Some of them focus on technical aspects of the GIF format, others on certain phenomena of GIF culture. I tried to only include articles that focus on GIFs, but I have a lot of stuff on a wider range of topics with a more or less strong connection to GIFs (banner ads, meme culture, methods of internet research, animation in general and more technical stuff). If you are interested in one of these topics, feel free to contact me. Right now I don’t want to overload the publication list.

And now, take a day off to read, read and read:)

Die Ästhetik von Community-GIFs

Vor kurzem hab ich als Seminararbeit für die Uni eine Analyse von GIFs aus der Serie “Community” von Dan Harmon, die im Subreddit Communitygifs geteilt werden, durchgeführt. Da der Text ziemlich lang ist gibt es ihn mal nicht als Blogartikel, sondern als PDF. Konkret handelt es sich um eine Sequenzanalyse von GIFs und den Szenen aus der Serie, die die Vorlage dafür bilden. Dabei versuche ich, die wesentlichen Stilmittel herauszuarbeiten, die bei dieser Umwandlung angewandt werden.

Manches ist für geübte GIF-Nutzer sicher ein alter Hut, aber nun gibt es das noch mal als wissenschaftliche Analyse. Außerdem sind ein paar Seiten dabei, in denen ich den aktuellen Forschungsstand zu GIFs umreiße (vielleicht gibt es diesen Teil in etwas ausführlicher noch mal als extra Artikel).

Viel Spaß beim Lesen.

A small preview on my talk at 31C3

On december 28th – the second day of the Chaos Communication Congress – I’ll talk about current and past format wars GIFs have been involved in. The core of the presentation will be a small survey I conducted over the last weeks. It compares the usage of GIFs and WebMs on the two GIF-boards on 4chan (/gif/ and /wsg/). I won’t reveal everything in advance, but here’s a graphic to make you curious:

wsg-statsAnd by the way, I’ll talk German there, but thanks to the great translation team there will be subtitles and live translation.

When the congress is over I will finally have time again to take better care of this blog:)

The GIF-Library

researchlogosmall

One of the things I realised very quikly while writing my thesis was: There is no literature about GIFs. At least there wasn’t, back in 2012. Of course there are books with an overview on graphic file formats or even design guides for GIF animation. Furthermore there are many books about older animated images (Zoetrope, Mutoscope, Flip Books and so on – which was an important part of my thesis). And, of course, all the tons of media coverage about the 25th anniversary of the file format. But apart from that I had the very nice feeling of being the very first one to write a paper about GIFs from the media studies perspective.

I was able to keep this feeling for some months but in the end I am now very happy to see that there are also some others who started at about the same time as I did to examine this cultural phenomenon. Accordingly, nowadays one can find several papers on certain aspects of the GIF culture. Unfortunately, many of them are hidden in journals or can only be found with very specific search terms.

However, I put quite some effort in looking for GIF research papers or other resources. That’s why I would like to share this list of links. You can find it on this blog on the page “The Library“. Right now it is a rather short list, but it’s only the beginning. If you know any other good articles, papers, studies or other resources, I would be glad if you send me a link.

And now, have fun reading!