Pros	Cons
Some of the big and popular video archiving products support JPEG2000-lossless variants. Codec is standardized on paper. Very good lossless compression ratio.	Slow performance (due to the wavelet algorithms). Encoding 8bit SD PAL video material in realtime is barely feasible even on a powerful state-of-the-art quad-core computer. Therefore, some products require special expansion cards to support this codec. Currently, existing implementations have limited support of colorspaces and pixel formats. In proprietary applications, extending this is at the discretion of its vendor. Technical specifications of this standard not freely available. Very complex format specification. Small number of tools supporting this codec. Even less than JPEG2000-lossy. In many cases the existing implementations are not compatible between vendors/versions because of the complexity of paper-specifications and not using a reference implementation. Not specialized on lossless. Therefore, most applications only support lossy.

Started: September 4th, 2013
Last update: August 13th, 2015
Written by Peter B., Hermann Lewetz and Marion Jaks

Topics:

Video codecs
Video Containers
Video codec performance tests (speed and size)

Q: What are the pros and cons regarding these video formats for archiving?

	container	video-codec	audio-codec
1.	MXF	JPEG2000	PCM (uncompressed)
2.	MXF	Uncompressed	PCM (uncompressed)
3.	MOV	ProRes	PCM (uncompressed)
4.	MKV	FFV1	PCM (uncompressed)
5.	AVI	FFV1	PCM (uncompressed)

First of all: We think it's a good idea to discuss the audio/video-codecs separately from the container in this comparison. Especially, because the combination possibilities of which codec to use in which container are numerous.
For audio we will assume uncompressed PCM, and focus on the pros and cons of the container and the video-codecs.

Video codecs

There are many video codecs existing and in use in the real world. Some are more widespread and well known as others, others are less common or can even be considered "exotic".

Lossy vs lossless vs uncompressed?

Fear, Uncertainty and Doubt (FUD)

First of all, I'd like to note that there is currently some confusion in the video domain, regarding the naming and properties of artefacts introduced by storing video in digtial form. As one example, on the product site of Blackmagic's Decklink video cards they were talking about "unrivaled video quality", but wrongly equating "compressed" with "lossy". They also showed an image suggesting uncompressed as opposite of subsampling. Here is a screenshot from December 2013.

Even though, Blackmagic could be considered to be addressing only semi-pro users, it still shows that even well-known vendors in the video domain sometimes unfortunately may communicate erroneous and/or confusing information.

Archivists might want to keep this in mind when evaluating which products or solutions to use for their purposes.

Filesizes

Digital video produces vast amounts of data. Take PAL-SD material (only image, no audio), stored as YUV422 with 8-bits per component (=16-bits per pixel) and a resolution of 720x576 pixels, for example.
Applying no compression at all (=uncompressed) results in:

720 x 576 x 16 bits x 25 frames ≥ 1.159 GiB/Minute

For YUV422, 10 bpc (10+5+5 = 20 bits per pixel), this would be:

720 x 576 x 20 bits x 25 frames ≥ 1.448 GiB/Minute

So you would fit no more than 4 minutes of video on a whole 4.7 GiB DVD. Since these values are already uncomfortable (even for 422-subsampled PAL), imagine HD and beyond.
Therefore, you will hardly find uncompressed video in the wild, except for temporary editing use cases.

Currently, most video is stored "lossy":

Even most digital cameras already use some form of compression before saving their data.
The majority of video codecs currently in use, perform a so called "lossy compression". As the name implies, data is lost.
Currently, this is even the normal case for professional video production, broadcasting, digital cinemas, etc.

Examples of lossy compression used for professional production are:

Digital Betacam "DigiBeta"
ProRes
DNxHD
Motion JPEG2000-lossy (used currently for most Digital Cinema Packages (DCP))
DVCPRO, DVCPRO50, etc.
HDCAM
IMX (MPEG D10)
AVC-Intra
…

Even for archiving, it's currently still common to see people using lossy video compression. Maybe because many archives follow the example of large and well-known broadcasting archives – sometimes overlooking that the latter's purpose focuses on the needs of production rather than longterm preservation.

Extending the above list, here are additional formats, usually more common on the consumer market, which are used by some to store their videos in:

MPEG-2: this is the format used for DVDs
MPEG-4 (Part 2): Better known as "DivX/XviD".
MPEG-4 (Part 10): Better known as "H.264" or "AVC".
DV / HDV
…

All of the aforementioned video codecs are lossy and should not be used for archiving, unless this is already the format the original source was in.

The technical guidelines by IASA, for archiving video (TC-06), are still work in progress, but in the IASA guidelines for preserving audio (TC-03), chapter 11 (p.8), it says quite clearly:

"[...] formats employing data reduction [...] based on perceptual coding (“lossy codecs”) must not be used. Transfers employing such data reduction result in the irretrievable loss of parts of the primary information. The results of such “lossy” data reduction may sound identical or very similar to the unreduced (linear) signal, but the further use of the data reduced signal will be severely restricted. These archival principles should also be applied, whenever possible, to the creation of original recordings made with the intention of being archived."

In the online version of the TC03 paragraph "11. Data reduction", an additional comment clarifies the reason for this even further:

"Its use is, however, counter to the ethical principle of preserving as much of the primary information as possible. Data reduction does not permit the restoration of the signal to its original acoustic condition and will, in addition, limit the further use of the recording because of the artefacts generated when cascading perceptually coded material - for example, in the making of a new programme incorporating the original [recordings]."

So, for long-term preservation of video, lossy is not an option. As we've seen above, storing video "uncompressed" is currently still non-trivial to handle, regarding its filesize. See details about "uncompressed" for video, below.

Interesting for archiving: "lossless compression".

Uncompressed is not the only option for avoiding loss of image information when storing digital video. There are compression algorithms which reduce the filesize, but preserve every bit in its original state, even after recompressing an infinite number of times.
This is called "lossless compression".
For generic data, most people use lossless compression in their daily routine when creating a "Zip file (.zip)".

There are a few codecs providing lossless compression for video. In practice however, most of them are unsuitable for long-term archiving for several reasons. So, although this list (see below) may be longer than expected, it boils down to only a handful of potential candidates for long-term preservation.

For example, the "MSU Graphics & Media Lab" of the department of computer science of Lomonosov Moscow State University published test results for different lossless codecs in 2007.
They have performed comparisons of the following lossless codecs:

Alpary
ArithYuv
AVIzlib
CamStudio GZIP
CorePNG
FastCodec
FFV1
Huffyuv
Lagarith
LOCO
LZO
MSU Lab
PICVideo
Snow
x264
YULS

NOTE: It might be interesting to point out that JPEG2000-lossless is missing in this list, although the same group has published a JPEG2000 (lossy) comparison for still-images in 2005 (2 years before). Another one missing is BBC's Dirac codec, but because it was released about one year after their tests.

For long-term archiving purposes, the most important factor for choosing a codec is to be able to preserve the content adequately in a stable manner and its vendor-independent accessibility over time.
Quoting the online page of chapter 11 of IASA's TC03, this excludes proprietary formats:

"In the case of recordings originated in data reduced formats, a major problem with obsolescence of equipment may arise when the format of origination is of a proprietary character [...]"

Most of the lossless codecs on that list did not meet the above mentioned criteria: Some could not preserve the input pixel-format and colorspace (e.g. YUV or RGB only), others were limited to proprietary implementations, or even bound to certain operating systems.

Adding Uncompressed, JPEG2000-lossless and Dirac, only a small number of lossless codecs remain for closer evaluation for video archiving purposes:

Uncompressed
JPEG2000-lossless
Dirac
FFV1
H.264-lossless (x264)

Apple ProRes

Lossy encoding must not be used for archiving.
Therefore, Apple ProRes is not an option.

Uncompressed

Storing video data in an uncompressed way is definitely the most straightforward format.

For the time being, the vast amounts of data required by uncompressed video still poses a big challenge/problem for many institutions. Not only regarding storage costs, but uncompressed video impacts on all parts of the workflow:

Network bandwidth
Storage speed (especially with data-tapes. LTO, for example)
Local disk throughput (on all clients and servers involved)

Uncompressed is not one certain codec, nor one certain format. It simply means that the stored video image information is not using any additional methods to compress the image data. Every pixel's color information is simply stored "as-is".

"As-is", in case of digital video however means that, depending on "pix_fmt" (=pixel format) concerning "colorspace", "bit-depth", "subsampling" etc. the resulting video file will be stored in completely different bitstream layouts.

For example, on the Apple Core Video developer documentation page, there's a list of a few pixel formats (>15) supported by Core Video.
If stored as "Uncompressed", every entry of this list would have a different video bitstream than the other, resulting in more than 15 different formats.
Each one of these variants would be considered "Uncompressed".
Each one requiring a matching uncompressed-codec implementation.

Dave Rice, archivist at City University NY, published a short video, showing a video how a raw, uncompressed video stream can be misinterpreted in several different ways by a video-player (Quicktime, in his example).

When working with uncompressed video, one should clarify which uncompressed layout is supported/requested.
See links below for some names/types of uncompressed to choose from. For example "v210" as for YUV422, 10bpc.

Interesting side note:
"v210" is mentioned as (codec-)name for uncompressed video with 10-bits per component. Since some bytes are shared by different components to gain some space, it could actually be seen as compression, too ;) To show that digital video terminology might be confusing, Apple refers to it as "'v210' 4:2:2 Compression Type" in their technical notes about "uncompressed" in Quicktime (TN2162).

Bits used: Multiples of 8

Computers can only store bits in a multiple of 8 (=byte boundary).
Therefore, values that don't divide well by 8 (e.g. 10 bpc) are either padded with zero-bits, or grouped together to use the bytes more efficiently.

There are different ways to layout bits across multiple bytes, so in practice bits-per-component (bpc) that are not a multiple of 8 usually have a few bits overhead in their bits-per-pixel (bpp).

In "v210" for example, 12*10-bit components (=6 pixels) are packed into 4*32-bit words. Each word contains 3*10 bits, padded with 2 zeroes. So in v210, 6 image pixels actually require 128 bits, instead of 120:

The actual data:

3 * 10 bits = 30 bits 4 * 30 bits = 120 bits

Packed and zero-padded:

3 * 10 bits + 2 zeroes = 32 bits 4 * 32 = 128 bits

Details about how the Y'CbCr data is layout within v210, can be seen in Apple's Technical Note TN2162.

Pros	Cons
Nearly all serious video applications are able to handle some versions of uncompressed video. Uncompressed is trivial to implement as video codec. It is also easy to reverse engineer without any written documentation. Data errors have impact on less pixels than with compressed formats. NOTE: This is mentioned for the sake of completeness. Bit-errors must not happen in archived material, and if they do, one could use a backup copy to restore the video completely.	Uncompressed video requires huge amounts of diskspace. For the same costs for 1 uncompressed copy, one could have about 3 copies of the lossless version. Dealing with uncompressed video stresses every part of your workflow (increased network load, disk I/O on client and server, etc). There is currently no integrity information within an uncompressed video codec. This means that data errors will go unnoticed by any decoder. This might be handled by the container (and supported by encoding and decoding applications), but currently this is not the case. "Uncompressed" is not one unified format. There are different type of uncompressed bitstream variations, which are not necessarily compatible to each other, or supported by other applications.

Pros

Cons

Nearly all serious video applications are able to handle some versions of uncompressed video.
Uncompressed is trivial to implement as video codec. It is also easy to reverse engineer without any written documentation.
Data errors have impact on less pixels than with compressed formats.
NOTE: This is mentioned for the sake of completeness. Bit-errors must not happen in archived material, and if they do, one could use a backup copy to restore the video completely.

Uncompressed video requires huge amounts of diskspace.
For the same costs for 1 uncompressed copy, one could have about 3 copies of the lossless version.
Dealing with uncompressed video stresses every part of your workflow (increased network load, disk I/O on client and server, etc).
There is currently no integrity information within an uncompressed video codec. This means that data errors will go unnoticed by any decoder. This might be handled by the container (and supported by encoding and decoding applications), but currently this is not the case.
"Uncompressed" is not one unified format. There are different type of uncompressed bitstream variations, which are not necessarily compatible to each other, or supported by other applications.

JPEG2000-lossy

NOTE: In most cases, when the term "JPEG2000" is used, it refers to its lossy variant.
This is often a pitfall, because products claiming to support JPEG2000 as codec, might mistakenly be considered to handle JPEG2000-lossless, too.

The recent increase of digital cinemas has increased the occurence of the "DCP" (Digital Cinema Package) format, which uses JPEG2000-lossy, has brought a wider attention to this format.

Lossy encoding must not be used for archiving.
Therefore, JPEG2000-lossy is not an option.

JPEG2000-lossless

In theory, "JPEG2000-lossless" could be a viable codec for long-term archiving of digital video. It is a standardized format, based on the same codec used for single image compression.

Regarding availability and support, it must not be mistaken with "JPEG2000-lossy".
For example: AVID tools support JPEG2000 out of the box - but only JPEG2000-lossy.

Reference implementation: The general case at the moment is that no common reference implementation is being used to verify interoperability and practical standards conformance. This usually means that even if someone implements JPEG2000-lossless, conforming to "the standard", it only means that that program satisfies the standards-paper. This may or may not be sufficient, depending on the environments and use cases these files will enounter.

There are only a few institutions already archiving in JPEG2000-lossless. Almost all of them require a lossy-compressed mezzanine copy to work with.

If you want, you can try for yourself:

Create a JPEG2000-lossless video. And/or:
Open, edit and transcode a JPEG2000-lossless video:
Ask a fellow archive (or vendor?) to provide you an example file with JPEG2000-lossless video. Try to open it with different programs or transcode it to another format.
Losslessly if possible, because this is a prerequisite for infinite format migration. This also means preserving the original pixel-format/colorspace.

Some things to be considered during evaluating the use of JPEG2000-lossless:

What choice of tools/products did you have to view/edit it?
Under which conditions (proprietary vs. open)?
Transcoding speed?
Colorspace/pixel formats supported?

Pros

Cons

Some of the big and popular video archiving products support JPEG2000-lossless variants.
Codec is standardized on paper.
Very good lossless compression ratio.

Slow performance (due to the wavelet algorithms).
Encoding 8bit SD PAL video material in realtime is barely feasible even on a powerful state-of-the-art quad-core computer.
Therefore, some products require special expansion cards to support this codec.
Currently, existing implementations have limited support of colorspaces and pixel formats. In proprietary applications, extending this is at the discretion of its vendor.
Technical specifications of this standard not freely available.
Very complex format specification.
Small number of tools supporting this codec. Even less than JPEG2000-lossy.
In many cases the existing implementations are not compatible between vendors/versions because of the complexity of paper-specifications and not using a reference implementation.
Not specialized on lossless. Therefore, most applications only support lossy.

Dirac

"Dirac" is the name of the codec and its reference implementation. The Dirac project was started by the BBC in 2005.
Dirac "is designed to be simple, flexible, yet highly effective." (Dirac Specs, p.1)

The first version of the reference implementation was released under the name "Dirac" under several Free Software licenses, such as Mozilla Public License, GNU GPLv2 and GNU LGPL.
The implementation for reading/writing Dirac in practice, is called "Schroedinger" (or "schro" for short).

Dirac was developed with a similar approach as JPEG2000 in mind: A general-purpose codec to serve a wide number of use cases. It offers lossy compression, for uses ranging from low-bandwidth (mobile, low-resolution) to very high bandwidth (production, high-resolution) - as well as lossless.

From an archival point of view, BBC's Dirac fulfills all requirements for long-term preservation. Even more than JPEG2000 lossless.
It's performance (libschroedinger v1.0.11) is currently quite slow. See the codec performance comparison test results for details.

Pros	Cons
Codec is standardized on paper (SMPTE VC-2). Specification freely available Reference implementation exists, available under a Free License Backed by the BBC	Slow performance (due to the wavelet algorithms). Small number of tools supporting this codec. Even less than JPEG2000 (lossy and lossless). Currently, existing implementations have limited support of colorspaces and pixel formats. In proprietary applications, extending this is at the discretion of its vendor. Not specialized on lossless. Therefore, most applications only support lossy.

FFV1

"FFV1" stands for "FF video codec 1" and is a lossless video codec, on arithmetic- and entropy-coding algorithms. The arithmetic coder of FFV1 is very similar to (and based on) that of H.264. FFV1 is implemented as part of the free, open-source libraries of the project FFmpeg. FFmpeg is one of the most widely used program libraries for processing digital video.

It was designed and implemented by the FFmpeg developer and maintainer "Michael Niedermayer" in 2003. Due to the fact that it is included directly in FFmpeg and LibAV directly by default, many applications and devices support FFV1 out of the box (For example, VLC player or Sorenson Squeeze/)
Video applications supporting FFV1 can be used to edit lossless material directly (without transcoding or proxy copies). Even over the network. No special hardware needed.

Comparison tests have shown that FFV1 currently produces the smallest files at the fastest speed. It can be used to capture SD material in realtime, and HD possibly, too.

Pros	Cons
Very good lossless compression ratio (Comparable to JPEG2000-lossless). Low CPU ressource requirements, compared to JPEG2000-lossy or Dirac (FFV1 uses simpler algorithms). Encoding 8bit SD PAL video material can be done in realtime on a regular off-the-shelf state-of-the-art PC, using only a single CPU core (@3.3 GHz). Therefore, no special encoding/decoding hardware is required. Large number of pixel-formats and colorspaces natively supported. List can be extended on demand. Technical specifications of this format are freely available, accessible for everyone. Source code available and released under a Free Software license (OpenSource). Considerably small size of program code = simple implementation. Large number of tools supporting this codec, due to its appearance in FFmpeg's libraries. No interoperability issues with different applications. Larger userbase than most other lossless codecs. Possibility to enable CRC checksums in the video bitstream to detect and correct data errors.	Codec is not yet standardized on paper. Not all vendors of currently popular video applications support it yet. Larger areas of image affected by errors in the bitstream, compared to Uncompressed. NOTE: This is mentioned for the sake of completeness. Bit-errors must not happen in archived material, and if they do, one can easily restore the video completely, by using one of the backup copies.

Pros

Cons

Very good lossless compression ratio (Comparable to JPEG2000-lossless).
Low CPU ressource requirements, compared to JPEG2000-lossy or Dirac (FFV1 uses simpler algorithms).
Encoding 8bit SD PAL video material can be done in realtime on a regular off-the-shelf state-of-the-art PC, using only a single CPU core (@3.3 GHz).
Therefore, no special encoding/decoding hardware is required.
Large number of pixel-formats and colorspaces natively supported. List can be extended on demand.
Technical specifications of this format are freely available, accessible for everyone.
Source code available and released under a Free Software license (OpenSource).
Considerably small size of program code = simple implementation.
Large number of tools supporting this codec, due to its appearance in FFmpeg's libraries.
No interoperability issues with different applications.
Larger userbase than most other lossless codecs.
Possibility to enable CRC checksums in the video bitstream to detect and correct data errors.

Codec is not yet standardized on paper.
Not all vendors of currently popular video applications support it yet.
Larger areas of image affected by errors in the bitstream, compared to Uncompressed.
NOTE: This is mentioned for the sake of completeness. Bit-errors must not happen in archived material, and if they do, one can easily restore the video completely, by using one of the backup copies.

H.264-lossless

H.264 is part of the MPEG-4 standard and also known as "MPEG-4 AVC" (Advanced Video Coding). The first version of the standard definition was completed in May 2003.

Due to its use in Blu-Ray, cameras, mobile environments, web, etc. it has become quite popular in recent years. H.264 is widely supported by different applications and devices from different vendors. Its most widely used implementation is x264.

Lossless support

In spite of H.264's widespread application support, most implementations only support its lossy variant. Lossless H.264 video is rarely implemented and hardly tested.
As it is noted on FFmpeg's x264 encoding guide, interoperability is not guaranteed, even among different applications/devices using x264:

"if compatibility is an issue you should not use lossless [H.264]."

In performance tests, H.264 has shown to be faster than some other codecs (Dirac, JPEG2000-lossless), but the produced filesize is bigger than that of other lossless codecs. It could however be used to capture SD material in realtime.

Pros	Cons
Codec is standardized on paper. Well known, established standard Well documented (format as well as use cases)	Faster than some other lossless codecs, but still slower than FFV1.3. Not specialized on lossless. Therefore, most applications only support lossy Weak compression ratio, compared to other lossless codecs

Video containers

There are many video container formats. The ones listed here are just a few which are the currently most important ones "talked about" in the archiving domain. Not all of the listed containers should actually be used for long-term preservation, but are listed here to counteract common mis-assumptions.

MXF (Material eXchange Format)

Technical specification

Pros	Cons

MOV (Quicktime)

Quicktime is a complex and powerful container-format, which means that different applications/devices might implement only a certain subset of features.

Pros	Cons

MKV (Matroska)

Matroska is an open standard of a free container format. It can hold an unlimited number of audio/video/picture/metadata streams. It was announced in 2002 and originated in the Open Source scene. Technically, Matroska is a very good container which offers additional features beyond just storing audio/video. It's based on "Extensible Binary Meta Language (EBML)" instead of a binary format. This allows Matroska to support new features and adapt to changing environments, without breaking interoperability with existing applications.

Due to the advanced features, good support and high interoperability it became increasingly popular in the video consumer domain. MKV is also the base for Google's WebM video format, which is a royalty free, standardized subset of Matroska, that is supported out-of-the-box by the majority of web browsers today.

In recent years, the “non-production” industry are increasingly implementing MKV support in software and hardware, like digital TVs, set-top boxes, etc. Although, Matroska's interoperability and sustainability is very good, it's probably due to its origin in the Free Software domain – and especially, due to its usage for distribution of illegal copies on the Internet - that the production industry has not yet picked it up. For the time being, MOV and AVI are still supported by more professional production tools – and therefore the easier choice at the moment. This could, of course be changed, depending on the interest of MKV for more professional use.

Within the archiving community, there seems to be increased interest in MKV as container used for long-term preservation. In 2014, an EU project called "PREFORMA" has selected Matroska as preferred format for storing video in a wide spectrum of memory institutions. In December 2014, The U.S. Library of Congress (LoC) has released a comparison of video containers ("wrappers"), called "Digital File Formats for Videotape Reformatting", which includes MKV as preservation option.

Altering/augmenting data in existing files

It seems that Matroska allows to allocate its embedded data streams in a way that allows altering/augmenting metadata inside the container later on, without the necessity to rewrite the whole file. This is a very important aspect when dealing with large preservation files, often several gigabytes in size.

Interoperability

Since Matroska's design and specification were free and open from its very beginning, including the reference implementation, a large developer- and user-base implemented Matroska support in a variety of applications. Therefore interoperability issues are encountered in a variety of use-cases and usually resolved in very short time.
Yet, Matroska is a complex and powerful container-format, which means that different applications/devices might implement only a certain subset of features.

Technical specification

The technical specifications for handling MKV files are easily available on the Internet, and freely accessible for everyone. Technical information is available on "matroska.org":

Pros	Cons
Supports storing many types of metadata Allows embedding of arbitrary files (XML, images, etc) Well known, established format Well documented (format as well as use cases) Supports storing almost all audio/video codec streams Supports augmenting metadata without re-writing the whole file Is supported by archiving conformance checker (PREFORMA)	Not yet supported by certain video applications More complex than AVI Tainted image, due to its origin in the online video community

AVI (Audio Video Interleaved)

It was introduced by Microsoft in November 1992 and is, along with MOV, one of the oldest video containers still widely in use today.
AVI files can contain both audio and video data for synchronous audio-with-video playback (See: AVI on Wikipedia). It's a derivative of RIFF, a generic file structure, mostly known by its use as WAVE (.wav) files.

Almost every application that has to do with video can handle AVI files. Ranging from Free Software (Open Source) to proprietary tools, professional and consumer alike. It has a quite limited set of features, but this is also a main feature for long-term archiving:
simple = robust.

Also: more features = more points of failure. In practice, we've had almost no interoperability issues with using AVI across different tools from different vendors. Even across different operating system platforms.

Popular limitations of the AVI container format are, that the original specification limited the filesize to 2 GB, and that audio-streams with a variable bitrate (VBR) are not officially supported in AVI and can lead to interoperability problems. Same applies to videos with a variable framerate.

Filesize limitation

Originally, the AVI header could only support files up to 2GB size. This was solved by Matrox, who developed the "OpenDML" extensions for AVI in 1996.
In very rare cases, some applications might have issues with AVIs > 2GB in size.

Embedded metadata

AVI supports text metadata, conforming to the "Exchangeable Image File Format" (EXIF). This set of metadata fields is originating from its usage for images.

Another reason why we are only storing the most vitally necessary metadata within the container is, that even with this simple set of metadata in such a popular container, still most applications don't support every available field. Nonetheless, compared to other video containers it is still the widest metadata support available, with the least interoperability issues.

Advanced features

Some archives want the container-format to be able to store additional information, such as timecode, subtitles, still-images, XML-metadata, etc. AVI does not offer this functionality. Our experience with digital archive copies is, that keeping everything in one video-file, increases the required complexity of the container, the video-codec - or both.

It might look "simpler" to have just one file, but the choice of tools available to handle the embedded data is, by design, greatly reduced. In practice means, that it can be harder (or even impossible) to view or edit the embedded data. Especially, if the programs used to create the file were rare or proprietary.

Technical specification

The technical specifications for handling AVI files are easily available on the Internet, and freely accessible for everyone. One source for example is Microsoft's Developer Network (MSDN):

NOTE: Uncompressed audio in AVI is stored as WAVEFORMATEX structure, which is identical to WAV.

Pros	Cons
Supported by almost all applications handling video files Highly interoperable Minimalistic standard = robust Well known, established and stable standard Well documented (format as well as use cases) Based on existing standards, such as RIFF (=WAV) and EXIF	Does not support storing certain (time-based) metadata.

Codec comparison

This section will show a comparison of video codec speeds for encoding and decoding, as well as their encoded filesize (in AVI container). It is highly encouraged that others may reproduce these tests to compare their results and provide feedback if possible.
All tests were on the same hardware under the same conditions (see below).

All files generated where frame-wise compared with their original to verify that they are truly mathematically lossless. The process was done using FFmpeg's "framemd5" checksum feature. This creates an MD5 hash number for each frame, representing all uncompressed pixel values within that image as a hexadecimal number.

More information can be found in Dave Rice's article, titled "Reconsidering the Checksum for Audiovisual Preservation: Detecting digital change in audiovisual data with decoders and checksums" in the IASA journal No.39 (June 2012).
The method of checksumming is also commonly used to verify the bit-exact integrity of files.

Test setup

Hardware:
CPU:	Intel(R) QuadCore(TM) i7-2600K CPU @ 3.40GHz
RAM:	8 GB
Disk:	Intel SSDSA2CW080G3 (SSD)
Software:
Operating System:	GNU/Linux (Xubuntu 12.04.1, 64bit)
Transcoding tool:	FFmpeg (version git N-59183-g3e62654, Dec 17 2013)

NOTE: This is a consumer grade, off-the-shelf PC setup.

Results

Codec	Encoding	Decoding	Filesize	% of uncompressed	Implementation	Details
Video source file:	VQEG reference video "football" (NTSC-SD, 720x486px, 30fps, yuv422p, 8bpc)
Dirac	23 fps	29 fps	122 MiB	50.6%	libschroedinger	log, framemd5
FFV1 (version 1)	55 fps	285 fps	109 MiB	45.2%	libavcodec (FFmpeg)	log, framemd5
FFV1 (version 3)	216 fps	277 fps	111 MiB	46.1%	libavcodec (FFmpeg)	log, framemd5
H.264 lossless	94 fps	190 fps	118 MiB	49.0%	libx264	log, framemd5
JPEG2000 lossless	9.9 fps	51 fps	113 MiB	46.9%	libopenjpeg	log, framemd5

Codec	Encoding	Decoding	Filesize	% of uncompressed	Implementation	Details
Video source file:	SVT reference video "park joy" (full-HD/1080p, 1920x1080px, 50fps, yuv420p, 8bpc)
Dirac	4.6 fps	5.3 fps	942 MiB	61.3%	libschroedinger	log, framemd5
FFV1 (version 1)	11 fps	64 fps	874 MiB	56.9%	libavcodec (FFmpeg)	log, framemd5
FFV1 (version 3)	31 fps	63 fps	879 MiB	57.2%	libavcodec (FFmpeg)	log, framemd5
H.264 lossless	15 fps	31 fps	957 MiB	62.3%	libx264	log, framemd5
JPEG2000 lossless	1.8 fps	9.3 fps	888 MiB	57.8%	libopenjpeg	log, framemd5

Comparing video codecs and containers for archives

Topics:

Q: What are the pros and cons regarding these video formats for archiving?

Video codecs

Lossy vs lossless vs uncompressed?

Fear, Uncertainty and Doubt (FUD)

Filesizes

Currently, most video is stored "lossy":

Interesting for archiving: "lossless compression".

Apple ProRes

Uncompressed

Bits used: Multiples of 8

Links:

JPEG2000-lossy

JPEG2000-lossless

Links:

Dirac

Links:

FFV1

Links:

H.264-lossless

Lossless support

Links:

Video containers

MXF (Material eXchange Format)

Technical specification

Links:

MOV (Quicktime)

Links:

MKV (Matroska)

Altering/augmenting data in existing files

Interoperability

Technical specification

Links:

AVI (Audio Video Interleaved)

Filesize limitation

Embedded metadata

Advanced features

Technical specification

Links:

Codec comparison

Test setup

Results