Digitization guidelines

Table of contents

Introduction

Many Government of Canada institutions are choosing to scan analogue records in order to save on physical storage space or make their records more accessible to users.

However, just scanning records is not enough. The goal is to create digitized records that are:

  • Authoritative;
  • Legally admissible in place of the original source records; and
  • Accessible for as long as required.

Library and Archives Canada strives to receive the best archival records, records that will serve as witness to the history of Canada and be accessible to the public over the long term. This includes digitized archival records.

Like original source records, authoritative digitized records provide evidence and serve as historical proof. However, the best technology and the highest resolution are of no benefit if the authority of the digitized record is not established. The International Organization for Standardization (ISO), in ISO 15489-1:2016 Information and documentation — Records management — Part 1: Concepts and principles, describes an authoritative record as one that has authenticity, reliability, integrity, and useability.

Producing authoritative records is not only about the technical specifications of digitization, but also about ensuring that the process of digitizing is documented and auditable.

The digitization process involves more than just capturing images. It also includes planning, assessing, preparing, digitizing, compiling metadata, running quality assurance checks, and storing and managing the digitized records. It is necessary to have in place policies and procedures, and to fully plan and document the digitization process. Each of the sections below relates to one aspect of the digitization process, and sets out best practices to follow and requirements to consider.

Purpose

These guidelines are written to assist Government of Canada (GC) institutions ensure they have defensible digitization processes. This means that institutions should be able to demonstrate that a digitized record is a true and accurate version of the source record. This also means that the digitization process is documented in order to show that institutions have adhered to all requirements for producing an authoritative digitized record.

These guidelines set out best practices for digitization activities. For some institutions, digitization activities mean an occasional project to digitize records; for others, this means a formal digitization program. In both cases, decisions should be made regarding all considerations before digitization work begins. This is necessary to ensure the authenticity, reliability, integrity, and useability of every digitized record. Following these guidelines will ensure that institutions’ digitized records will fulfill the same ongoing business needs and meet future requirements as did the source record.

These guidelines do not presume to be a comprehensive method for the digitization process. Rather, they are a starting point to guide GC institutions in establishing digitization programs and projects that will ensure authoritative digitized records. If your institution does not plan to dispose of the source originals (i.e., the scanned version will be an access copy used only for reference), these guidelines do not apply since the authoritativeness of the digitized version is not as critical when the original version still exists as the official copy.

Policy and planning

All GC institutions should have a broad internal policy in place to guide all digitization projects and programs. The fundamental policy principles should include the following:

  • A digitized record must be useable, have integrity, be deemed authentic and reliable, support all business activities, and be able to withstand legal scrutiny;
  • A digitized record must be generated under set policies and practices, be fully documented, and be maintained within an official corporate repository;
  • Disposal of source analogue records may be carried out only by authorization of the Librarian and Archivist of Canada, for example, through an institution-specific Disposition Authorization or MIDA 2018/013 Disposition Authorization for the Destruction of Source Records following Digitization).

The policy should state the purpose of digitization, specify when digitization is appropriate, and set the institution’s criteria for document selection. The policy should outline what criteria need to be set, approved, and documented for each digitization project.

Best practices provide that a departmental procedures manual be created and implemented to ensure that all digitization projects adhere to the same process. Additionally, a comprehensive plan should be put in writing for each digitization project. This includes an outline of the digitization plan, details regarding the project-specific benchmarks, and a list of the required approvals. As well, all steps taken must be documented.

It is important to distinguish between the policy, which will apply to all projects (i.e., your digitization program), and the requirements that will be specific to a particular project. For example, the policy should state that each digitization project must have set quality assurance criteria and that the criteria must be project-specific, based on business needs, and recorded in a digitization manual.

Document selection

Criteria for the selection of records suitable for digitization should be included in your institutional policy. Consider user requirements and document attributes when selecting documents.

Records are suitable for digitization if:

  • They are frequently used;
  • They are essential to the provision of services;
  • They are needed in order to action files via use of a workflow;
  • They are needed by multiple users;
  • The users are geographically dispersed;
  • The users require immediate access;
  • The original format makes it difficult to access the record (e.g., large map); and
  • The record is fragile. Scanning will allow access to a copy and will thus protect the fragile original from damage that can result from handling.

Some records will require more time and effort—and will be more costly—to scan: those in a very large or very small format; files containing a variety of documents; records needing extra preparation because they are stapled, bound, rolled, etc.; fragile records requiring specialized handling and care.

Records NOT suitable for digitization include the following:

  • Transitory records;
  • Records with a short retention period, which may not warrant the cost of scanning; and
  • Records that have intrinsic value, or that have been identified as archival and that are to be transferred to LAC in their original state. Always confirm with your LAC archivist whether the records have intrinsic value before proceeding with digitization.

Outsourcing digitization

The choice of in-house or outsourced digitization will be based on many factors, including whether digitization will be a one-time project or an ongoing program requirement.

Outsourcing provides many benefits: institutions do not pay the up-front cost of technology; the budget is more contained; economies of scale can be achieved from the volume that vendors manage; and vendors may have expertise that the institution lacks.

However, when digitization is outsourced, consideration needs to be given to the security of the records during transport and digitization, and ongoing communication between the GC institution and the vendor is highly recommended throughout the process.

When choosing an outsourced vendor, other considerations include (but are not limited to): a vendor’s ability to scan according to digitization or evidence standards; use of appropriate technology; and ability to handle particular formats.

Additionally, the vendor should adhere to quality assurance practices, and provide certification of assurance for all digitization activities.

Public Service and Procurement Canada (PSPC) offers a document imaging service for all levels of government. Please consult the PSPC webpage for more information.

Roles and responsibilities

Governance should be defined for all digitization projects. Documenting approvals and accountability helps establish the authenticity and reliability of the record.

Digitization projects require a combination of skills from staff with different areas of expertise. Clear roles and responsibilities, well-defined reporting lines, and detailed communication plans will ensure that the project runs smoothly and that the final product is authentic and reliable.

Remember: Authorization to destroy source records once they have been digitized can be given only through LAC’s disposition authorizations (for example, an institution specific Disposition Authorization or MIDA 2018/013 Disposition Authorization for the Destruction of Source Records following Digitization). Your LAC archivist can assist with related questions.

Risk assessment

GC institutions should assess the risks involved in digitizing (or not digitizing) records before they undertake any digitization work. Institutions should document the level of risk they are prepared to accept and specify any planned mitigation. The following are examples of risk:

  • Risk of destroying source records
    • In some cases, the digitized version of records may not be accepted as having evidentiary value. Digitized records need to be complete, accurate, reliable, accessible, and authentic, and must meet legal requirements. GC institutions should consider the following:
      • Whether it is feasible to digitize the source record so that it can have these characteristics;
      • Whether issues such as missing metadata or the inability to meet quality standards will affect the digitization process and render the digitized image unsatisfactory as a substitute for the original;
      • What extent of loss or degree of change in record characteristics between the source record and the digitized version does the institution find acceptable? This determination must take into consideration legal requirements applicable to records.
  • Despite precautions, there is a risk that source records with intrinsic value could accidently be destroyed.
  • There is a risk that institutions may destroy a record before quality assurance is performed.
  • There may be citizen expectations attached to the records. Citizens may want the original records preserved for their perceived intrinsic value (not shared by the institution or LAC) Examples of this may include old ledgers or old property deeds (that are no longer valid).
  • Cost of digitization
    • Institutions should compare the cost of digitizing versus the cost of maintaining the source records.
    • It is important to consider the cost of continued maintenance of digital records over time as hardware and software become obsolete and records require migration.
  • Risks involved in not digitizing
    • Lack of efficiency: Digital records make collaboration easier and are more easily accessible, especially over geographical distance.
    • High cost of long-term analogue record storage.
    • Deterioration of source records over time. Fragile records may need to be digitized so that there can be an access copy and the original can thereby be protected.

Format, indexing, and metadata requirements

The final format of the digitized image should be determined on the basis of business needs and legal requirements. Annex A provides a chart of recommended technical requirements, including best practice choices for format. Records identified as archival will need to be in a format acceptable for transfer to Library and Archives Canada. Please see LAC’s Guidelines on File Formats for further information.

There are three types of digitization of paper records, each allowing a different degree of access:

  1. Page images—the digitized record is static. It cannot be changed, and its contents cannot be searched.
  2. Full text—the digitized record is transformed into machine-readable text through either manual keying or use of an Optical Character Recognition (OCR) program.
  3. Encoded text, or full text with mark-up—the digitized record has the same options as the full-text, with further annotations to increase search functionality.

The choice of format and the type of access needed will determine how much indexing is required or possible.

The type of digitization chosen for other formats will vary. For example, for spatial information you may use manual processes such as tablet digitizing or heads-up digitizing, or an automated process such as scanning and vectorization. Each institution should determine the best way to proceed with digitization based on the specific format of the record.

Indexing ensures that records will be reliable in the future, that records are accessible and retrievable, and that they are appropriately stored and managed throughout their lifecycle. Digitized records must be indexed; otherwise, they will not be found by users.

Indexing can occur at several points in the digitization process: image capture and recapture, quality assurance, and transfer into a designated corporate repository. All previous audits and metadata/indexing associated with the source system should be preserved so that the integrity of the record can always be established. The indexes created should be retained for at least as long as the records to which they relate.

Bibliographic indexing relates to the contents of the record and the management thereof; this should align with the metadata required for any electronic records. For further information about GC metadata requirements, please see the GC Standard on Metadata and LAC Minimum Metadata Set for Digital Archival Government Records.

Biographic indexing refers to the digitization process and needs to be captured at the time of digitization. Institutions should determine the biographic metadata they need.

Biographic metadata can include the following:

  • During capture (scan) (and re-capture if necessary)
    • Image reference (number of pages in original);
    • Digitization date and time;
    • Number of pages digitized;
    • Digitization equipment operator and device name;
    • Cross referencing information about the image;
  • During quality assurance
    • Batch reference;
    • Quality assurance operator;
    • Quality assurance check approval date;
  • Data transfer
    • Transfer date (date that image moves into repository);
    • Transfer title;
    • Transfer (method) description;
    • Transfer reason;
    • Transfer receiving (name of entity).

Preparation of records

The proper preparation of documents is necessary to ensure the highest-quality digitized images. The amount of preparation will depend on the condition and format of the documents being digitized. Typically, only basic preparation of source records is required to ensure efficient digitization processes. However, in some cases, such as for folded, rolled or fragile documents, or for damaged tapes, more extensive work may be required. In some circumstances (such as large format items requiring specialized equipment), it may be necessary to scan related records separately, and to ensure their original order is documented in the metadata.

Personnel who prepare the documents should identify potential issues with the documents so the person operating the scanner can make the appropriate adjustments.

Quality assurance

Quality assurance is the process of verifying whether the digitized record meets requirements. It involves checking the operation and output of digitization processes against agreed benchmarks to ensure that these benchmarks have been met and that the digitized images are acceptable as a substitute for the original record. The quality of the image is subjective, and the criteria for whether an image is acceptable should be determined prior to digitization and documented for each project. The required image quality is based on the purpose of the digitized record (i.e., whether the digitized record will serve as the official version of the record), though it is recommended to always scan to the highest possible quality to ensure that the record will be accepted as authentic.

As well as determining the criteria for an acceptable image, each project or program should determine:

  • How many errors are acceptable in a sample;
  • The sample size of the records to be examined; and
  • Who will perform that assessment.

Quality assurance activities should be logged, and each batch of digitized images should be certified as having passed quality control. It is recommended that someone other than the equipment operator perform the quality assessment.

The quality control process should at a minimum:

  • Verify image accuracy;
  • Metadata quality and accuracy; and
  • The completeness of the digitized version

Errors can be major or minor, and include such things as the image being skewed, insufficient contrast, illegibility of characters, and speckle on the image. Quality control should be performed at several intervals throughout the digitization process: image capture, recapture, indexing, quality assurance and transfer of images. Quality control inspections should also be performed regularly on all the equipment being used to ensure proper function and calibration.

AIIM TR34-1996, Sampling Procedures for Inspection by Attributes of Images in Electronic Image Management (EIM) and Micrographics Systems provides further guidance on sample size and acceptable error ratio.

Classification and integration into the electronic document and records management system (EDRMS)

Classification ensures that the digitized images can be integrated into the file classification system and the corporate electronic document and records management system (EDRMS). The digitized records need to be identified with the file number and the retention and disposition information. These specifications should be aligned for analogue, digital and digitized records. It is the contents of a record, not the format, that determines retention and disposition, sensitivity, and access permissions, and the digitized versions should have the same characteristics as the source records.

For various reasons, derivative copies may be created along with the official version of the record. Derivative copies may include:

  • The same image prepared for different output intents (e.g., a compressed version suitable for posting online);
  • Versions with additional edits (e.g., contrast may be sharpened in a photo, specks may be removed from a textual document).

These should be clearly identified as copies through the use of naming conventions in the title of the file.

Security requirements

Institutions should plan for both the physical security of the records during the digitization process and the security of the information. Records that have a sensitivity of Protected B or higher should be digitized in an environment that protects the records against unauthorized access, disclosure and removal.

Identification of protected and secret information should be added into the metadata of digitized records. It should be possible to apply the same access restrictions to the digitized image as were applied to the source record. If the source records are being destroyed, they should be destroyed in a manner consistent with their security level.

Follow any GC policy instruments and procedures regarding the security of electronic information.

Transfer of source records and digitized images

The digitization process necessitates physically moving records, either within the department itself for in-house digitization or to an off-site vendor. Departments should define procedures for the transfer of the source documents and digitized files to ensure that records remain secure and that their authenticity has been maintained during transfer.

When any transfer is implemented, information such as the date of transfer, the name of the courier, the location, the receipt of records, and the same information for the return of the source records and digitized files should be documented.

Whatever method is used for transferring digitized records, it is important to ensure that the digitized files are not altered during transfer by using fixity information, such as a checksum.

Storage and preservation

If active, records need to be stored in a designated corporate repository that meets all requirements for the management of the records throughout the full lifecycle.

If the digitized records are being put in dormant storage, departments should ensure that the storage solution, whether online or on physical carriers, has search and access capacity, can manage the records over long-term—including their disposition—and protects the authenticity of the record. Digitized records, like all other records, are subject to ATIP and litigation requests, even when in storage. When choosing storage solutions for dormant digitized records, consider the response times required by end users if access is needed.

The life expectancy of the technology may be much shorter than the required retention period of the records. As a result, they would necessitate active management and migration of the records over time. Departments should plan for the necessary storage and a schedule for migration and/or conversion of the digital records as technology changes; this applies for both active and dormant records.

The costs associated with the medium- to long-term maintenance and accessibility of digital records, including for responding to ATIP requests, are often overlooked and should be included in planning. These costs may be more than required to physically store the original source records, and should be considered as a factor when making the decision to digitize.

Disposition of source records

Source records may be kept after digitization. More frequently, however, they are destroyed. Remember: GC institutions may dispose of original source records only as prescribed in their institution-specific Disposition Authorization or MIDA 2018/013 Disposition Authorization for the Destruction of Source Records following Digitization). Source records must not be destroyed if LAC has identified them as archival records that must be transferred in their original format. Always consult with your LAC archivist before disposition of archival source records.

There is a risk in destroying source records, even those records not identified as archival. Institutions should ensure that risk has been assessed and documented.

Failure to articulate a policy and procedures for the disposal of source records may give the appearance that these records have been disposed of in bad faith.

Some records may need to be maintained in their original format for legal reasons, even if the digitized version is sufficient for business needs. GC institutions should seek legal advice before destroying source records.

Original source records subject to a preservation order relating to litigation (including records that are scheduled for destruction) should not be destroyed while the preservation order is in place.

Before destroying source records, sufficient time should be allowed to ensure that quality control and indexing are fully completed and that the digitized records have been accurately and completely transferred to the EDRMS or secure storage.

As with all records, disposition actions should be fully documented.

If the source records are being kept, they should have the same retention and disposition actions as the digitized version and their accessibility and preservation over time should be managed.

Documentation requirements

Adequate documentation is key to defensibility. If the authenticity of the digitized record is challenged, institutional policy and procedures and evidence that they were followed contribute to proving the authenticity of the records.

Most directives on digitization recommend the creation of a project manual for each digitization initiative, which should include all the necessary information, documented in one place. While institutional policy and procedures on digitization should outline criteria for making decisions on requirements, such as metadata, quality control, and format, the manual should document the specific choices made for the records in the current project.

The necessary documentation includes the following:

  • Reason for records selection;
  • Risk assessment and mitigation, including for destruction of source record;
  • Legal reviews;
  • Security and privacy protection assessment and management;
  • Internal approvals;
  • Project requirements for document preparation, metadata, format, technical specifications, error tolerance, sampling, and quality control standards;
  • Any enhancements should be documented (enhancements should never be applied to the master image; they are for use on derivative copies for researcher use only);
  • Activity and audit logs should be maintained during the digitization process, with a view to tracking the work being done and keeping a record of the technicians doing the work. Logs should contain sufficient information to provide evidence of the authenticity of the digitized records;
  • Quality control logs and reports for images, metadata and machinery;
  • Documentation of the destruction of source records;
  • For out-sourced digitization, maintain all documentation with the vendor, including contracts, agreements, progress reports, monthly volume and costs, invoices, and error reporting;
  • Chain of custody logs for the transfer of both source records and digitized copies.

Conclusion

These guidelines were written to provide a high-level overview of best practices for digitization activities. Producing authoritative records is not only about the technical specifications of digitization; it is also about ensuring that the process of digitizing is documented and auditable for as long as needed for business, legal, and regulatory reasons.

Institutions are encouraged to consult the sources listed in Annex C for additional information. For clarification about the value of records, including archival value, please consult your LAC archivist.

Annex A: Technical specifications
 ResolutionScanning ratioColour profileBit depthCompressionFormat
Textual documents –Black-and-white300 ppi to 600 ppi
4000 pixels across longest dimension
1:1greyscale8lossless Tagged Image File Format (TIFF)
PDF/A
Textual documents –Colour300 ppi to 600 ppi
4000 pixels across longest dimension
1:1red-green-blue (RGB)24lossless TIFF
PDF/A
Photographs – Black-and-white 35 mm2700 ppi1:1greyscale8losslessTIFF
4 x 5; 5 x 7800 ppi
8 x 10400 ppi
4000 pixels across longest dimension
Photographs –  Colour 35 mm2700 ppi1:1RGB24losslessTIFF
4 x 5;
5 x 7
800 ppi
8 x 10400 ppi
4000 pixels across longest dimension
Maps, architectural plans, blueprints300 ppi to 600 ppi
6000 pixels to 8000 pixels across longest dimension
1:1greyscale16losslessTIFF
PDF/A
GeoTIFF
RGB24
Microfilm and microfiches300 ppi to 600 ppi1:1greyscale8 losslessTIFF
PDF/A
JPEG 2000
Negatives – Black-and-white35mm2400 ppi1:1greyscale8losslessTIFF
PDF/A
JPEG 2000
4 x 5;
5 x 7
600 ppi
8 x 10300 ppi
3000 pixels across longest dimension
Negatives – Colour35 mm2400 ppi  1:1RGB24 losslessTIFF
PDF/A
JPEG 2000
4 x 5;
5 x 7
600 ppi
8 x 10300 ppi
3000 pixels across longest dimension

Annex B: Terminology

Access copy: A file that captures the minimum amount of information in order to meet basic demands to view the informational content of a record. Source: Operational Standards for Digitization, Library and Archives Canada, Digital Operations and Preservation Branch.

Authenticity: An authentic record is one that can be proven to be what it purports to be; has been created or sent by the agent purported to have created or sent it; and was created or sent when purported. Source: International Organization for Standardization (2016). ISO 15489-1:2016 Information and documentation — Records management — Part 1: Concepts and principles.

Bibliographic Information: Information regarding the content and context of a document. It is created by the organization (possibly obtained from the Source Record) and aids in the retrieval of an image. Source: CAN/CGSB-72.11-93 Microfilm and Electronic Images as Documentary Evidence.

Biographic Information: Information regarding image capture that may include the date captured, the time, the operator identification, the capture device identification and location and details of modification, if any. Source: CAN/CGSB-72.11-93 Microfilm and Electronic Images as Documentary Evidence.

Digitization: The process of converting analog records into digital format. The process broadly includes: selection, assessment, prioritization, project management and tracking, preparation of originals for digitization, metadata creation, collection and management, digitizing (the creation of digital objects from physical originals), quality management, submission of digital resources to delivery systems and into a repository environment, and assessment and evaluation of the digitization effort. Source: Operational Standards for Digitization, Library and Archives Canada, Digital Operations and Preservation Branch.

Digitization project: Retrospective, back-capture of existing sets of non-digital records to enhance accessibility and maximize re-use. Note 1 to entry: In such projects, the business action has been completed on non-digital form of the record prior to digitization and for ongoing management purposes the non-digital record on which the business action took place, or which evidences the action, remains the official record of action. Note 2 to entry: The non-digital source records for both forms of digitization should be subject to an assessment process to determine whether there are good reasons to retain them prior to any consideration of disposition. Once non-digital records are converted into digital records, many of the management and preservation issues for born-digital records apply. Source: International Organization for Standardization (2010). ISO/TR 13028:2010 Information and documentation - Implementation guidelines for digitization of records.

File format: Specific pattern or structure, which organizes and defines data. Some formats contain only one stream of uncompressed data, others may contain codecs to encode and compress the data, and others still may support several streams of media. Source: Operational Standards for Digitization, Library and Archives Canada, Digital Operations and Preservation Branch.

Format: The arrangement of information. Source: Operational Standards for Digitization, Library and Archives Canada, Digital Operations and Preservation Branch.

Integrity: A record that has integrity is one that is complete and unaltered. Source: International Organization for Standardization (2016). ISO 15489-1:2016 Information and documentation — Records management — Part 1: Concepts and principles.

Intrinsic value: The usefulness or significance of a record derived from its physical or material qualities, inherent in its original form and generally independent of its content, that are integral to its nature and would be lost in reproduction. Intrinsic value is often associated to the rarity or age of the support as well as its artistic or esthetic quality. Source: Destruction of Source Records following Digitization, Library and Archives Canada, MIDA 2018/013.

Quality assurance: Part of quality management focused on providing confidence that quality requirements will be fulfilled. Source: International Organization for Standardization (2015). ISO 9000:2015 Quality management systems — Fundamentals and vocabulary.

Reliability: A reliable record is one whose contents can be trusted as a full and accurate representation of the transactions, activities or facts to which they attest, and; which can be depended upon in the course of subsequent transactions or activities. Source: International Organization for Standardization (2016). ISO 15489-1:2016 Information and documentationRecords managementPart 1: Concepts and principles.

Source Record: A record from which a digitized version has been created. Source: Destruction of Source Records following Digitization. Library and Archives Canada, 2018/013.

Transitory records: Records that are not of business value. They may include records that serve solely as convenience copies of records held in a government institution repository, but do not include any records that are required to control, support, or document the delivery of programs, to carry out operations, to make decisions, or to provide evidence to account for the activities of government at any time. Source: Disposition Authorization for Transitory Records Library and Archives Canada, 2016/001.

Useability: A useable record is one that can be located, retrieved, presented and interpreted within a time period deemed reasonable by stakeholders. Source: International Organization for Standardization (2016). ISO 15489-1:2016 Information and documentationRecords managementPart 1: Concepts and principles.

Annex C: Bibliography

Alberta. Digitization Process Standard (2015).

AIIM TR34-1996, Sampling Procedures for Inspection by Attributes of Images in Electronic Image Management (EIM) and Micrographics Systems.

ANSI/AIIM TR15-1997, Planning Considerations Addressing Preparation of Documents for Image Capture.

Archives of Manitoba. Digitizing Records (2018).

Canadian General Standards Board. CAN/CGSB-72.34-2017, National Standard of Canada. Electronic Records as Documentary Evidence. This standard is available at no cost from the Canadian General Standards Board.

Government of the Northwest Territories. Office of the Chief Information Officer (2018). Guideline – Digitization.

International Organization for Standardization (2010). ISO/TR 13028:2010 Information and documentationImplementation guidelines for digitization of records.

International Organization for Standardization (2012). ISO 13008:2012 Information and documentationDigital records conversion and migration process.

International Organization for Standardization (2016). ISO 15489-1:2016 Information and documentationRecords management — Part 1: Concepts and principles.

Library and Archives Canada. 2018/013 Destruction of Source Records following Digitization.

Library and Archives Canada. Operational Standards for Digitization. Library and Archives Canada, Digital Operations and Preservation Branch.

Newfoundland and Labrador. Office of the Chief Information Officer. Guideline – Record Imaging Services (2015).

Provincial Archives of New Brunswick. Digitization Standard (2019).