Skip to main content
Skip to "About government"
Language selection
Français
Government of Canada /
Gouvernement du Canada
Search
Search the website
Search
Menu
Main
Menu
Jobs and the workplace
Immigration and citizenship
Travel and tourism
Business and industry
Benefits
Health
Taxes
Environment and natural resources
National security and defence
Culture, history and sport
Policing, justice and emergencies
Transport and infrastructure
Canada and the world
Money and finances
Science and innovation
You are here:
Canada.ca
Library and Archives Canada
Services
Services for galleries, libraries, archives and museums (GLAMs)
Theses Canada
Item – Theses Canada
Page Content
Item – Theses Canada
OCLC number
1057419802
Link(s) to full text
LAC copy
Author
Dallas, Fraser,
Title
Math information retrieval using a text search engine
Degree
M. Math -- University of Waterloo, 2018
Publisher
Waterloo, Ontario, Canada : University of Waterloo, 2018.
Description
1 online resource (xi, 67 pages) :illustrations (some colour)
Notes
"A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer Science."
Includes bibliographical references (pages 61-65).
Abstract
Combining text and mathematics when searching in a corpus with extensive mathematical notation remains an open problem. Recent results for math information retrieval systems on the math and text retrieval task at NTCIR-12, for example, show room for improvement, even though formula retrieval appears to be fairly successful. This thesis explores how to adapt the state-of-the-art BM25 text ranking method to work well when searching for math and text together. Symbol layout trees are used to represent math formulas, and features are extracted from the trees, which are then used as search terms for BM25. This thesis explores various features of symbol layout trees and explores their effects on retrieval performance. Based on the results, a set of features are recommended that can be used effectively in a conventional text-based retrieval engine. The feature set is validated using various NTCIR math only benchmarks. Various proximity measures show math and text are closer in documents deemed rel- evant than documents deemed non-relevant for NTCIR queries. Therefore it would seem that proximity could improve ranking for math information retrieval systems when search- ing for both math and text. Nevertheless, two attempts to include proximity when scoring matches were unsuccessful in improving retrieval effectiveness. Finally, the BM25 ranking of both math and text using the feature set designed for formula retrieval is validated by various NTCIR math and text benchmarks.
Other link(s)
hdl.handle.net
uwspace.uwaterloo.ca
Subject
MathML (Document markup language)
Information retrieval.
Text processing (Computer science)
Web search engines.
Mathematical notation Data processing.
MathML (Langage de balisage)
Recherche de l'information.
Traitement de texte.
Moteurs de recherche sur Internet.
Mathematics information retrieval
MIR
Mathematical content representation
MathML
Okapi BM25
Lucene
Date modified:
2022-09-01