Skip to main content
Skip to "About government"
Language selection
Français
Government of Canada /
Gouvernement du Canada
Search
Search the website
Search
Menu
Main
Menu
Jobs and the workplace
Immigration and citizenship
Travel and tourism
Business and industry
Benefits
Health
Taxes
Environment and natural resources
National security and defence
Culture, history and sport
Policing, justice and emergencies
Transport and infrastructure
Canada and the world
Money and finances
Science and innovation
You are here:
Canada.ca
Library and Archives Canada
Services
Services for galleries, libraries, archives and museums (GLAMs)
Theses Canada
Item – Theses Canada
Page Content
Item – Theses Canada
OCLC number
1335713641
Link(s) to full text
LAC copy
Author
Steeg, Evan W.
Title
Automated motif discovery in protein structure prediction.
Degree
Ph.D. -- University of Toronto, 1997.
Publisher
[Toronto, Ontario] : University of Toronto, 1997
Description
1 online resource
Abstract
The protein structure prediction problem (PSP) is one of the central problems in molecular and structural biology. A computational method that could produce a correct detailed three-dimensional structural model for a protein, given its linear sequence of amino acids, would greatly accelerate progress in the biomedical sciences and industries. This thesis presents PSP as a combinatorial optimization problem, the most straightforward formulations of which require search of an exponentially-large conformation space and are known to be NP-Hard. This otherwise intractable search can in practice be reduced or eliminated through the discovery and use of motifs. Motifs are abstractions of observed patterns that encode structurally important relationships among constituent parts of a complex object like a protein tertiary structure. Motif discovery is accomplished by particular combinatorial search and statistical estimation methods. This thesis explores in detail two particular motif discovery subproblems, and discusses how their solutions can be applied to the overall structure prediction problem: (1) For a complex multi-stage prediction task, what makes a good intermediate representation language? We address this question by presenting and analyzing methods for the discovery of protein secondary structure classes that are more predictable from amino acid sequence than the standard classes of $\alpha$-helix, $\beta$-sheet, and "random coil". (2) Given a database of M objects, each characterized by values $a\sb{ij}\in {\cal A}\sb{j}$ for each of N discrete variables $\{c\sb{j}\}\sbsp{j=1}{N},$ return the list of "most interesting" higher-order features $\gamma\sb{l},$ i.e., sets of $k\sb{l}$ variables with highest estimated correlation, for any $2 \le k\sb{l} \le N$. In the PSP context, the problem is the detection of correlations between amino acid residues in an aligned set of evolutionarily-related protein sequences. We present and analyze a fast procedure, based on multinomial sampling and a novel coding scheme, that avoids the exhaustive search, prior limits on the order k, and exponentially large parameter space of other methods. The focus of this thesis is PSP, but the techniques and analysis are also aimed at wider application to other hard, multi-stage prediction problems.
Other link(s)
hdl.handle.net
www.collectionscanada.ca
tspace.library.utoronto.ca
Date modified:
2022-09-01