Guidelines on File Formats for Transfer
On this page
1. Effective Date
These Guidelines have been approved by the Senior Director General and Chief Information Officer, Digital Services Sector, and take effect on March 4, 2025.
2. Application
These Guidelines provide advice on the file formats to be used when transferring digital material to Library and Archives Canada (LAC). Specifically, digital archival records, digital publications, or web resources.
These Guidelines apply to all persons and organizations transferring digital material to LAC (hereinafter referred to as "content providers").
These Guidelines do not contain information on creation, migration and capture standards. The Treasury Board Secretariat (TBS) has issued Guidance on Digital File Formats (2024) which addresses recommended file formats for creation of digital material in order for GC institutions to be in compliance with requirements under the Policy on Service and Digital (2020) including Appendix J: Standard on Systems that Manage Information and Data (2022). LAC also has issued Digitization Guidelines that may be useful for GC institutions who are digitizing their analogue records.
These Guidelines do not give information on the generation of metadata during the record creation and management by GC institutions. See LAC's Operational Standard for Digital Archival Records' Metadata.
These Guidelines do not outline how to achieve the actual physical or electronic transfer of digital material. For this process, content providers are asked to discuss the logistics of the transfer with the LAC representative responsible for the transfer.
These Guidelines supersede the Guidelines on File Formats for Transferring Information Resources of Enduring Value (2014).
3. Definitions
See Appendix A.
4. Context
These Guidelines are part of LAC's Preservation Policy Framework (2022) and Policy on Digital Preservation (2024). The Framework and Policy mandates that digital material acquired and managed by LAC be accessible over time, and that consideration be given to digital preservation requirements and resource capacity. The sustainability of digital material should therefore be a consideration in all acquisition activities.
File formats are specific patterns or structures that organize and define data. Some formats contain only one stream of uncompressed data, others may contain codecs to encode and compress the data and others may support several streams of media.
In addition to file formats, there are also container or encapsulating formats. These formats can contain and support various types or layers of data and metadata. Each of these formats may be handled by different programs, processes, or hardware but for the data stream to be interpreted properly, the information must be wrapped together.
The ability to preserve and use digital information is at risk if the computer hardware and software needed to access the information are no longer available or if the format specifications are not obtainable. The use of appropriate file formats is therefore critical to sustainable long-term preservation. Due to a mix of technical and practical issues, certain file formats are more suitable for preservation.
In accordance with sections 8 (2), and 10 of the Library and Archives of Canada Act, and section 2 (a) and (b) of the Legal Deposit of Publications Regulations, these Guidelines outline the appropriate file formats for submission to LAC of digital publications affected by Legal Deposit. While the Library and Archives of Canada Act section 10 (4) entitles LAC to collect all or any published versions and formats of a given title, LAC's preferred file formats for accessibility and preservation are defined within these Guidelines.
In accordance with sections 7, 12 and 13 of the Library and Archives of Canada Act, these Guidelines outline the appropriate digital formats that support any agreements between LAC and Government of Canada (GC) institutions for the transfer of digital archival records. Where such a transfer is governed by an existing records transfer agreement that specifies a digital format other than what is outlined in these Guidelines, GC institutions must consult with LAC prior to preparing the transfer.
These Guidelines will also apply to other acquisition agreements in which LAC representatives specify the file formats for transfer.
5. Purpose
These Guidelines identify the number and types of submittable file formats to those in which LAC has reasonable confidence that they can be preserved and made accessible over time, supporting sustainability of preservation actions and long-term access.
Adherence to these Guidelines will allow LAC to achieve the following:
- Acquisition of file formats identified as being sustainable when feasible.
- Ensuring of long-term access to digital material in LAC's collection.
- Alignment with international good practice in digital preservation.
6. Approach
The following criteria are considered when evaluating the sustainability of a given formatFootnote 1:
Principle |
Criteria |
Transparency and openness of format |
The degree to which the format is proprietary vs. open and the degree to which the full file format specification is freely available. |
Adoption as a preservation standard |
The extent to which the format has been formally adopted by national libraries, archives, and other memory institutions internationally. |
Stability and compatibility |
The degree to which the format is:
- Backward and forward compatible.
- Protected against file corruption.
The relative frequency of updated or replacement versions of the format over time.
|
Format external dependencies and interoperability |
The requirements for the use of the format.
The degree to which the format relies on a particular hardware or software.
|
7. Instructions
These Guidelines identify broad content categories covering all digital material acquired by LAC and provide a listing of the "preferred" and "accepted" file formats for each category.
The file formats covered in this document have been divided into the following content categoriesFootnote 2 and subcategories:
- Text
- Presentations
-
Email
- Still images
- Digital photographs
- Scanned text
- Digital audio
-
Digital moving images
- Digital cinema
- Digital video
- Geospatial
- Computer Aided Design
- Data sets
The transfer file formats are identified as either:
- Preferred for transfer; or
- Acceptable for transfer.
Preferred formats are those formats that are readily usable and have been identified by LAC as possessing a high degree of long-term sustainability. These formats require little or no immediate preservation action to achieve appropriate levels of preservation and to ensure the content remains accessible.
Acceptable formats are those that meet some but not all of the sustainability principles outlined in section 6. These formats may require LAC to perform some preservation actions on ingest to ensure their long-term sustainability.
As a general rule, LAC will only accept file formats for transfer listed in these Guidelines. Content providers are responsible to ensure that digital material are in a preferred or acceptable file format at the time of transfer. LAC reserves the right to refuse any file that is not in a preferred or acceptable file format and to request the migration of the files to a preferred or acceptable format. Digital material may be exempted from compliance on a case-by-case basis after consultation with LAC representatives from the functional area responsible for acquisition and preservation.
8. Preferred and acceptable file formats for transfer
Please see Appendix C for a list of LAC's preferred and acceptable file formats for transfer.
File formats are organized by content category and listed as either acceptable or preferred. Formats are listed by name and include a reference to the relevant specification that defines appropriate encoding methods. Where required, the format category tables include a column that specifies the codec that must be used with each format. Content providers must submit files that comply with both the format and codec that are listed.
The formats in each section are organized alphabetically and do not imply an order of preference for any given format. LAC always prefers to receive a preferred file format over an acceptable file format if both exist; however, if only an acceptable file format exists, there is no need for the content provider to migrate the content to a preferred file format prior to transfer.
In some cases, the content provider must take additional steps to ensure that files are accessible for long-term preservation by:
- Deactivating file level encryption;
- Deactivating digital rights management technologies;Footnote 3
- Embedding in each file all fonts necessary to interpret the information;Footnote 4
- Providing a copy of special software and/or technical documentation needed to access the fileFootnote 5
- Providing metadataFootnote 6 either embedded within the file itself or in an accompanying digital file.
9. Roles and responsibilities
Responsibility for administering and maintaining these Guidelines rests with the Director, Digital Collections Operations.
Directors and LAC staff involved in the acquisition and preservation of digital material are responsible for communicating, operationalizing, and facilitating understanding of these Guidelines for content providers.
Content providers are to follow these Guidelines and consult with LAC on any matters that may impede their ability to comply with these Guidelines.
10. Monitoring, evaluation and review
The Director, Digital Collections Operations Division, is responsible for maintaining this Guideline, for monitoring the application, and for reporting on compliance.
Evaluation and review of these Guidelines will be undertaken every three (3) years by representatives of the branches responsible for acquisition and preservation, or earlier as required.
11. Consequences
While strongly recommended, compliance with this Guidelines remains optional unless otherwise stated. Some consequences of non-compliance with these Guidelines are those that impact the sustainability, accessibility, and digital preservation of digital material transferred to LAC.
12. Information
Please address any questions about these Guidelines to:
Director, Digital Collections Operations Division
Digital Services Sector
Library and Archives Canada
550 de la Cité Boulevard
Gatineau, Québec
K1A 0N4
Appendix A: Definitions
- Acceptable format
-
A file format that meets some but not all of LAC’s sustainability principles. This format may require LAC to perform some preservation actions on ingest to ensure long-term sustainability.
- Access
-
Access occurs when clients can find, identify, view, obtain and use holdings.
- Accessible
-
Digital material is accessible when physical, technological and geographical barriers to the content are removed and when it can be used by as many people as possible.
- Acquisition
-
Acquisition is the process of adding publications and records to LAC’s documentary heritage collections. Acquisition occurs when LAC formally gains control over publications and records for their long-term preservation, and subsequently assumes the responsibility for the management of its metadata and for its use by future generations. For clarity, documentary heritage acquired by LAC is Crown property.
- Bitmap
-
An image created from a series of bits and bytes that form pixels. Each pixel carries a value that defines a bits/bytes colour or greyscale. Such images are also known as raster images.
- Codec
-
Hardware or software capable of encoding and/or decoding a data stream for transmission. When used with digital audio or video, the term codec refers to the digital signal encapsulated in a wrapper.
- Container format
-
A format that can contain and support various types or layers of audio, video, still imagery and their associated metadata. For the data stream to be properly interpreted, the information must be encapsulated or wrapped together. The wrapper refers to a particular way of storing and synchronizing data content into a single file.
- Content providers
-
All persons and organizations transferring digital material to Library and Archives Canada – these include publishers, Government of Canada Institutions and private donors.
- Compression
-
The encoding of information using fewer bits than in the original. There are two forms of data compression – lossless and lossy. A lossless compression technique discards no information. It looks for more efficient ways to represent data, while making no compromises in accuracy. Lossy compression accepts some degradation in the data to achieve smaller file sizes. Because of this degradation in quality, lossy compression should be avoided.
- Computer Aided Design (CAD)
-
Vector programs used to create animations that represent two- and three-dimensional surfaces of inanimate objects. CAD and vector graphics programs can output binary and XML formats.
- Data sets
-
Data stored in defined fields such as databases and spreadsheets.
- Database formats
-
Organized collections of data that conform to a logical structure. Database formats are determined by data models that describe specific data structures used to model an application and generally include navigational, relational, and hybrid models.
- Digital audio
-
File formats that encode recorded sound as machine readable files by converting acoustic sound waves into digital signals. Digital audio formats are generally composed of both a wrapper format and an encoding method or codec. Audio file stream encodings are independent of the audio container file format.
- Digital cinema
-
Both born-digital cinematic productions and digital moving image files created by digitizing motion picture film.
- Digital material
-
A broad term encompassing digital surrogates created as a result of converting analogue materials to digital form (digitization), and "born digital" for which there has never been and is never intended to be an analogue equivalent.
- Digital moving images
-
A sequence of bitmap digital images displayed in rapid succession at a constant rate, giving the appearance of movement. Digital moving image file formats function as containers or wrappers to provide storage areas for any moving image essence, associated audio essence (if present), as well as metadata. Moving image essence data contained within a given wrapper file format is encoded for playback using a specific codec. The parameters of the codec employed determines the presence and method of compression that was used to store the digital moving image data within the wrapper. This category includes two subcategories: digital cinema and digital video.
- Digital preservation
-
Digital preservation is all actions taken to slow deterioration of or prevent damage to the collections, and to ensure that its access, use and meaning, and its capacity to be accepted as evidence of what it purports to publish and record, are maintained over time.
- Digital photographs
-
Both still photographs produced by digital cameras as well as scanned images of photographic prints, slides, and negatives.
- Digital rights management technologies
-
Technologies to prevent unauthorized use or reproduction of digital content and devices.
- Digital video
-
Both born-digital video and digital files created by digitizing video from an analog source.
- Email
-
Electronic communication transmitted over the Simple Mail Transfer Protocol (SMTP) between two or more accounts. Email is composed of a header, message body and attachments. The header is structured metadata that establishes the provenance of the record. Data that must be present is: sender name and address; names and addresses of all recipients; sent date; and, received date. The message body is the intellectual content of the message. Attachments are any additional objects sent with the email.
- Encapsulating format
-
See container format.
- Encryption
-
The use of an algorithm to render a file unreadable. A decryption key is required to undo the work of the algorithm.
- End-of-record marker
-
In a file varies in accordance with the operating system this is used to create the file. In a MAC OS environment a carriage return (CR - ASCII code OxOD) is placed at the end of a record. In a DOS or Windows OS environment a CR+ a Line Feed (LF – ASCII code 0x0A) is placed at the end. In UNIX only a LF appears at the end.
- File format
-
Specific pattern or structure that organizes and defines data. Some formats contain only one stream of uncompressed data, others may contain codecs to encode and compress the data, and others may support several streams of media.
- Geospatial data
-
Data may be contained within a database to enable analysis across the datasets (e.g. geo-database), united within a complex file format structure where one geospatial file is comprised of several distinct, but related, formats (e.g. shapefile), or contained within a single file (e.g. GML).
- Metadata
-
Information used to contextualize, manage, preserve and provide access to records.
- Migration
-
The movement of digital information from one software/hardware environment/storage medium to another as standards and technology evolve. Ensures continuity of information contained in file formats over time.
- Preferred format
-
A file format that is readily usable and has been identified by LAC as possessing a high degree of long-term sustainability. This format requires little or no immediate management to achieve appropriate levels of preservation.
- Presentation format
-
A format that conveys graphical information to audiences as a slide show.
- Preservation
-
Preservation is all actions taken to slow deterioration of or prevent damage to the collections, and to ensure that its access, use and meaning, and its capacity to be accepted as evidence of what it purports to publish and record, are maintained over time.
- Raster image
-
See bitmap.
- Scanned text
-
A photograph of a printed page produced by either a digital camera or scanner.
- Spreadsheets
-
Tables made up of columns and rows that contain cells of data. Relationships between cells can be pre-defined as mathematical formulas.
- Still images
-
Files that are sampled and bitmapped as a grid of rectangular dots, picture elements (pixels) or points of color.
- Sustainability
-
Sustainability is the quality of meeting the needs of the collections and its current users without outstripping LAC’s resource capacity or compromising the needs of future users.
- Text
-
There are two general types of text: plain and formatted. Formatted text files contain encoded ASCII data and format definitions that display the information in a defined pattern. Plain text files contain encoded ASCII or Unicode data that has no formatting or layout code to influence the presentation of the data.
- Vector graphics
-
Digital images made up of object-oriented images that use the geometry of points, lines, curves and polygons to represent images.
- Wrapper
-
See container format.
Appendix B: References
Library of Congress. Recommended Formats Statements https://www.loc.gov/preservation/resources/rfs/. Accessed February 19, 2025.
Library of Congress. Sustainability of Digital Formats. http://www.digitalpreservation.gov/formats/. Accessed February 19, 2025.
National Archives and Records Administration. NARA Bulletin 2014-04 Format Guidance for the Transfer of Permanent Electronic Records. https://www.archives.gov/records-mgmt/bulletins/2014/2014-04.html. Accessed February 19, 2025.
National Archives (UK). File formats for transfer. https://www.nationalarchives.gov.uk/information-management/manage-information/digital-records-transfer/file-formats-transfer/. Accessed February 19, 2025.
Open Preservation Foundation. International Comparison of Recommended File Formats. https://openpreservation.org/resources/member-groups/international-comparison-of-recommended-file-formats/. Accessed February 19, 2025.
Smithsonian Institution Archives. Recommended Preservation Formats for Electronic Records. https://siarchives.si.edu/what-we-do/digital-curation/recommended-preservation-formats-electronic-records. Accessed February 19, 2025.
Appendix C: List of preferred and acceptable file formats for transfer
C.1 Text Formats
C.2 Presentation Formats
C.4 Formats for Still Images
This content category contains two subcategories: digital photographs and scanned text.
C.4.1 Digital Photographs
C.4.2 Scanned Text
C.5 Digital Audio Formats
C.6 Formats for Digital Moving Images
This content category contains two subcategories: digital cinema and digital video.
C.6.1 Digital Cinema
Acceptable Formats
|
Acceptable Codecs
|
Format Specifications
|
Digital Cinema Package (DCP)
Unencrypted Interop or SMPTE compliant
|
JPEG 2000
(as outlined by the DCI specifications)
|
Digital Cinema Initiatives, DCI Specification, DCSS Version 1.4.3, 2023
|
C.6.2 Digital Video
C.7 Geospatial Formats
Preferred Formats
|
Format Specifications
|
Band Interleaved by Line (BIL)
|
BIL, BIP, and BSQ raster files
|
Band Interleaved by Pixel
|
BIL, BIP, and BSQ raster files
|
Band Interleaved Sequential (BSQ)
|
BIL, BIP, and BSQ raster files
|
Digital Elevation Model (DEM)
|
USGS - National Geospatial Data Standards - Digital Elevation Model Standards (archive.org)
|
Environmental Systems Research Institute (ESRI) Arc/Info ASCII Grid
|
ESRI ASCII Raster Format:
https://desktop.arcgis.com/en/arcmap/latest/manage-data/raster-and-images/esri-ascii-raster-format.htm
http://webhelp.esri.com/arcgisdesktop/9.1/index.cfm?id=886&pid=885&topicname=ASCII%20to%20Raster%20(Conversion)
https://desktop.arcgis.com/en/arcmap/latest/manage-data/raster-and-images/esri-ascii-raster-format.htm
|
Environmental Systems Research Institute (ESRI) Shapefile (SHP)
|
ESRI Shapefile Technical Description
|
GeoTiff
|
GeoTiff Format Specification, Version 1.8.2, Revision 1.0, 2000
|
Geography Markup Language (GML)
|
ISO 19136-1:2020, Geographic information — Geography Markup Language (GML) — Part 1: Fundamentals
Geography Markup Language - Open Geospatial Consortium (ogc.org)
|
Keyhole Markup Language (KML)
|
KML - Open Geospatial Consortium (ogc.org)
|
C.8 Computer Aided Design Formats
C.9 Formats for Data Sets
Tabular data from databases and spreadsheets must meet the following requirements:
- Each record must contain an end-of-record marker;
- Each field within a file must be defined with the same fixed width;
- Each record must be defined with the same logical record length;
- All fields within a record in a database, or tuples in a relational database, should have the same logical format;
- A record should not contain nested repeating groups of data;
- Every file must be accompanied by documentation that specifies the field names and the field definitionsFootnote 8.