A new version of Collection search

We're giving Collection search a new search engine.

Online search tools are powered by a back-end technology called a search engine. We are switching the search engine that powers Collection search. This will help us fix known limitations with Collection search, improve the algorithm that delivers your search results, and offer new features and an improved interface. Learn what to expect as we make this change.

On this page

What to expect

A new look and feel

With the change in search engine, we are taking the opportunity to make improvements to the appearance and usability of Collection search in response to your feedback. These changes include:

  • a new way of advanced searching, with customizable keyword search fields
  • separating the search form from the search results, allowing you to easily modify your search criteria or start a new search
  • improvements to the item display, including adding sections for Hierarchy and Finding aid at the top of the page

Two options for searching

For now, you can try the new Collection search or stick to the old one as we make adjustments to the new one to make sure it works for you. We will not make any further changes or improvements to the old product while we focus our efforts on implementing this new search engine and making the new Collection search work well for all researchers.

Please give your feedback on the new Collection search, so we can get it right.

A difference in your search results

More relevant results

The old search engine had a built-in dictionary of synonyms, which means it would give you extra results based on its synonyms. The new search engine only returns stemming words of the term entered.

We can create and update a dictionary of synonyms in the new search engine to better serve our context and our clients. For example, we can build a list of synonyms for Indigenous research terms.

Results that are not restricted based on accented characters

The new search engine treats accented characters neutrally and will not include or exclude results based on whether your search terms had accents.

For example: searching "Québec" (with an accent) will return all results with the word spelled with an accent first and then follow with records that are spelled "Quebec" (without the accent).

Search features

The old search engine had search and access limitations. Here are some of the enhancements you can expect with the new search engine.

New search engine

  •   Up to 30,000 results per search
  •   5000 results can be listed in the export
  •   A customizable algorithm
  •   Accents are treated neutrally
  •   Result count in "Limit to" is exact
  •   Unlimited number of "Limit to" filters can be applied
  •   We can customize the synonym dictionary
  •   No duplicate results
  •   Keywords can be highlighted in the search results
  •   All records searchable regardless of size

Old search engine

  •   Up to 5000 browsable results per search
  •   3000 results can be listed in the export
  •   An algorithm that we can't modify
  •   Accents returned inconsistent results
  •   Results count in "Limit to" was inaccurate
  •   Maximum of 100 "Limit to" filters could be applied
  •   Proprietary synonym list, with no ability to customize them
  •   Duplicate search results
  •   Unable to highlight keywords in the search results
  •   Some records could not be searched due to size

About the algorithm

The new search engine uses a ranking algorithm called Okapi Best Match 25 (BM25). It figures out which documents are most relevant to what you're searching for, using a combination of how often terms appear, how unique they are, and the length of the documents.

Advantages

  • Ranks documents based on how terms are spread across the collection, making it flexible for different documents and queries.
  • Works well for longer queries because it handles term repetition and takes document length into account.

Disadvantages

  • Doesn't consider the meaning of the query terms or documents and treats accented characters neutrally. Your search results will include homonyms. For example, searching for the word "pêche" (the French word for fishing) will return records that contain the word "péché" (the French word for sin).
  • Doesn't personalize results, treating all user queries the same, so it may not deliver results tailored to individual users.

Stemming

Stemming is the process of reducing words to their root or base form, called the "stem." This helps improve the accuracy of search results by treating variations of a word as equivalent. For example, the words "running," "runner," and "ran" might all be reduced to the stem "run," or "fishing" and "fisherman" might be reduced to the stem "fish." It helps computers understand different forms of a word as the same thing, making it easier to search for information.

The stemming approach of the old search engine was not publicly disclosed, and we couldn't adjust it to researchers' needs. With the new search engine, we know the stemming approach and have more control over its impact on your search results. One challenge we face is making sure the stemming approach works for records in both English and French.

What this means for your research: you will likely get more results than in the old Collection search based on the root of many words. With your continued feedback, we can adjust this approach to work better for you over time.