Press "Enter" to skip to content

The “Unknome”: A Database of Human Genes We Know Almost Nothing About

Researchers from the UK have developed a publicly accessible database, the “unknome”, which lists thousands of understudied proteins encoded by human genes. By assigning a “knownness” score to each protein based on existing scientific knowledge, the platform aids researchers in exploring these proteins’ functions, many of which play critical roles in cellular processes.

Accelerating research by sharpening the focus on unknown proteins.

UK researchers have developed a new publicly accessible database, and they hope to see it shrink over time. That’s because it is a compendium of the thousands of understudied proteins encoded by genes in the human genome, whose existence is known but whose functions are mostly not.

The database, dubbed the “unknome,” is the work of Matthew Freeman of the Dunn School of Pathology, University of OxfordThe University of Oxford is a collegiate research university in Oxford, England that is made up of 39 constituent colleges, and a range of academic departments, which are organized into four divisions. It was established circa 1096, making it the oldest university in the English-speaking world and the world's second-oldest university in continuous operation after the University of Bologna.” data-gt-translate-attributes=”[{“attribute”:”data-cmtooltip”, “format”:”html”}]”>University of Oxford, England, and Sean Munro of MRC Laboratory of Molecular Biology in Cambridge, England, and colleagues, and is described in the open access journal PLOS Biology. Their own investigations of a subset of proteins in the database reveal that a majority contribute to important cellular functions, including development and resilience to stress.

The sequencing of the human genome has made it clear that it encodes thousands of likely protein sequences whose identities and functions are still unknown. There are multiple reasons for this, including the tendency to focus scarce research dollars on already-known targets, and the lack of tools, including antibodies, to interrogate cells about the function of these proteins.

But the risks of ignoring these proteins are significant, the authors argue, since it is likely that some, perhaps many, play important roles in critical cell processes, and may both provide insight and targets for therapeutic intervention.

To promote more rapid exploration of such proteins, the authors created the unknome database, that assigns to every protein a “knownness” score, reflecting the information in the scientific literature about function, conservation across speciesA species is a group of living organisms that share a set of common characteristics and are able to breed and produce fertile offspring. The concept of a species is important in biology as it is used to classify and organize the diversity of life. There are different ways to define a species, but the most widely accepted one is the biological species concept, which defines a species as a group of organisms that can interbreed and produce viable offspring in nature. This definition is widely used in evolutionary biology and ecology to identify and classify living organisms.” data-gt-translate-attributes=”[{“attribute”:”data-cmtooltip”, “format”:”html”}]”>species, subcellular compartmentalization, and other elements.

Based on this system, there are many thousands of proteins whose knownness is near zero. Proteins from model organisms are included, along with those from the human genome. The database is open to all and is customizable, allowing the user to provide their own weights to different elements, thereby generating their own set of knownness scores to prioritize their own research.

To test the utility of the database, the authors chose 260 genes in humans for which there were comparable genes in flies, and which had knownness scores of 1 or less in both species, indicating that almost nothing was known about them. For many of them, a complete knockout of the gene was incompatible with life in the fly; partial knockdowns or tissue-specific knockdowns led to the discovery that a large fraction contributed to essential functions influencing fertility, development, tissue growth, protein quality control, or stress resistance.

The results suggest that, despite decades of detailed study, there are thousands of fly genes that remain to be understood at even the most basic level, and the same is clearly true for the human genome. “These uncharacterized genes have not deserved their neglect,” Munro said. “Our database provides a powerful, versatile, and efficient platform to identify and select important genes of unknown function for analysis, thereby accelerating the closure of the gap in biological knowledge that the unknome represents.”

Munro adds, “The role of thousands of human proteins remains unclear and yet research tends to focus on those that are already well understood. To help address this we created an Unknome database that ranks proteins based on how little is known about them, and then performed functional screens on a selection of these mystery proteins to demonstrate how ignorance can drive biological discovery.”

Reference: “Functional unknomics: Systematic screening of conserved genes of unknown function” by João J. Rocha, Satish Arcot Jayaram, Tim J. Stevens, Nadine Muschalik, Rajen D. Shah, Sahar Emran, Cristina Robles, Matthew Freeman and Sean Munro, 8 August 2023, PLOS Biology.
DOI: 10.1371/journal.pbio.3002222

This work was supported by the Medical Research Council, as part of United Kingdom Research and Innovation. RDS was funded by the Engineering and Physical Sciences Research Council and by the Alan Turing Institute through a Turing Fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Source: SciTechDaily