Thesis
Knowledge representation for curation and analysis of crop phenotypic data underpinning nutritional security
Southern Cross University
Doctor of Philosophy (PhD), Southern Cross University
2021
DOI:
https://doi.org/10.25918/thesis.157
Metrics
100 File views/ downloads
127 Record Views
Abstract
Achieving global and regional nutritional security presents many challenges, including maximizing
dietary inputs from crop plants. Comparison of nutritional composition between crops, cultivars
and associated plant genetic resources requires access to datasets that provide insights into genetic
and environmental sources of variation. At present, the ability to access and compare crop
phenotypic datasets is limited, particularly for traits relating to dietary nutrition. To date, the use of
common trait descriptors or formal systems of knowledge representation such as controlled
vocabularies or ontologies has been fragmentary.
The primary aim of this thesis is to describe the evaluation and development of ontologies that
represent information about crop phenotypic traits in compliance with the Findable, Accessible,
Interoperable, and Reusable (FAIR) principles. This provides the opportunity to compare
approaches in generating a generalised trait ontology applicable to many crops. Generic efforts to
standardize the description and management of crop trait data are critically reviewed, and several
issues identified, such as inconsistencies in the syntactic and semantic crop trait descriptions that
currently constrain their exchange and comparison.
Bambara groundnut (Vigna subterranea) was used as an exemplar underutilised minor crop to
assess available formal frameworks to curate trait datasets so that they are accessible for
comparative analysis. The challenges in assembling and using a crop-specific Trait Dictionary (TD)
are outlined, and the extent to which any Crop Ontology (CO) derives knowledge from existing
ontological definitions and relationships quantified. Evaluation of syntactic and semantic
cohesiveness of trait descriptors within the Crop Ontology (CO) system helped to identify generic
issues limiting reuse of trait names for comparative analysis of major and minor crops. These
included inconsistencies in trait names assigned within the 28 published CO:TDs, a relative lack
of cross-referencing to other ontologies, and a flat ontological structure for classifying traits.
The Trait Dictionary (CO_366) was generated for bambara groundnut, and formal terms were
associated with trait descriptors and experimental datasets. These were curated to be compliant
with the Minimal Information About a Plant Phenotyping Experiment (MIAPPE) metadata
standard, and made available within the publicly accessible CropStoreDB database.
For the representation of dietary nutritional information, the Crop Dietary Nutrition Ontology
(CDNO) was first proposed and then developed to provide a more comprehensive vocabulary
compatible with terms and concepts widely used within national and international Food
Composition Databases (FCDBs). The CDNO represents a hierarchical structured vocabulary with
two major classes ‘dietary nutritional components’ and ‘nutritional component concentration’,
developed by reusing terms from the Chemical Entities for Biological Interest (CHEBI), the
Environmental Ontology (ENVO), Phenotype and Trait Ontology (PATO), Relation Ontology
(RO) and the Plant Ontology (PO). The reuse of terms from existing ontologies created terms
categorised as pre-composed (pre-coordinated), which follows the Open Biological and Biomedical
Ontology (OBO) principles for establishment of logically well-formed, scientifically accurate and
interoperable controlled vocabularies. The CDNO is open and will be releasing versions constantly
with new updates being track in the GitHub repository.
The CDNO involved collaboration and co-development of terms with the Food Ontology (FoodOn)
for the representation of organismal materials terms, so that they could be assigned in different
combinations as post-composed terms with the CDNO ‘nutritional component concentration’
terms. To demonstrate how the CDNO can assist in management and navigation of crop datasets
that quantify concentration of chemical components, specific use case implementations are
presented. The use case describes the association of existing trait descriptors for bambara groundnut
with CDNO terms incorporated into an axiom that reuses terms from FOODON, PO and NCBI
organismal classification ontology (NCBITaxon).
A more systematic and inclusive approach was proposed for a second-generation crop phenotypic
trait ontology system that would meet the challenges of data integration and downstream analysis,
by ensuring consistency in knowledge representation and implementation of machine-readable
axiomatic definitions. This would require establishing a universal, systematic and deeper set of
ontology sub-classes reusing terms from the Plant Trait Ontology (TO) and other relevant
ontologies.
In conclusion, generic systems of knowledge representation for crop plant phenotypes are expected
to facilitate curation of datasets according to FAIR principles. Systems such as CDNO should help
increase informed dialogue between crop scientists, breeders and other stakeholder in agricultural
production and food supply, including dieticians and nutritionists.
Details
- Title
- Knowledge representation for curation and analysis of crop phenotypic data underpinning nutritional security
- Creators
- Liliana Andrés Hernández
- Contributors
- Graham King (Supervisor) - Southern Cross UniversityRamil Mauleon (Supervisor) - Southern Cross University
- Awarding Institution
- Southern Cross University; Doctor of Philosophy (PhD)
- Theses
- Doctor of Philosophy (PhD), Southern Cross University
- Publisher
- Southern Cross University
- Number of pages
- xx, 231
- Identifiers
- 991012956600502368
- Copyright
- © Liliana Andrés Hernández 2021
- Academic Unit
- Faculty of Science and Engineering; Southern Cross Plant Science
- Resource Type
- Thesis