Over the past few years, numerous studies of osteoporosis have been published with a rising trend year by year. However, the abundant scientific information about previously identified osteoporosis-related genes is dispersed in thousands of publications. To address this need, we build the OsteoporosAtlas database 1.0 (http://biokb.ncpsb.org/OsteoporosAtlas/), which is an online database provided by National Center for Protein Science (Beijing); it contains a comprehensive list of osteoporosis-related genes with their correspondent evidences in literature, derived by text mining and manual curation.
In order to serve better, we provide three query approaches for searching OsteoporosAtlas : search by protein name, search by nucleotide sequence and search by protein sequence.
For the query by gene name, the user can enter a gene name in the search box of "Gene Symbol", and see a drop down list with auto-completed gene symbols present in the OsteoporosAtlas. After selecting one of them and clicking the 'Search' button, the search engine will run and return the queried results containing a table showing the queried gene, the supporting literature evidence. If users click the 'Reset' button, all current search terms will be deleted.
The users also can search the gene by Nucleotide sequence or protein sequence, the sequence identity score from BLAST will be listed in the parenthesis after the description. Users can specify the matched gene symbol and click “continue” for result page.
Clicking the hyperlink of the gene symbol for an individual osteoporosis-related gene in the 'Gene' column, users can see well annotated information of the gene and the cross references to external databases (i.e. Ensembl, NCBI Gene, UniProtKB, neXtProt and Antibodypedia).
The user also can input words or select a term in the drop list in the text box to filter the listed genes or diseases from the results by sub-string match. In addition, clicking the small triangle in the head of each column in the table, the correspondent results in the table will be resorted in ascending/descending order by the column.
Clicking the number of the evidence abstracts or sentences, a table containing gene, disease, the PubMed ID, the evidence sentence and the manual validation information will be displayed to the users. The user can click on the evidence to see the original abstract with the highlighted key words, i.e. gene symbols and disease terms.
The page of "Browse & Download" shows three different methods for browsing detailed information on osteoporosis-related genes: browse by encoding gene, browse by microRNA and browse by biomarker. All the information for osteoporosis-related genes and their supporting literature evidences can be downloaded for further analysis.
(1) data collection and curation
To obtain a complete list of publications for osteoporosis-related genes, we performed a comprehensive search for osteoporosis-related literature abstracts in PubMed. Gene-nomenclature recognition and extraction from these abstracts for human osteoporosis-related gene candidates were performed by self-developed ontology-based bio-entity recognizer.A list of human osteoporosis-related genes together with their evidence from PubMed abstract was compiled in the following three steps:
First, data collection: we collected 79,947 sentences in 7,358 abstracts containing the keywords of “osteoporosis”, “osteoporoses”, “bone loss, age-related”, “age-related bone loss”, and “perimenopausal bone loss”.
Second, candidate data extraction: 1,656 osteoporosis-related candidate genes and 13,292 candidate evidence sentences were extracted by using self-developed ontology-based bio-entity recognizer.
Third, manual curation: all candidate genes and evidence sentences were manually curated by our experts. 527 encoding genes and 131 miRNAs were finally confirmed as human osteoporosis-related genes. In addition, we integrated osteoporosis-related genes confirmed in DisGeNET, an integrated comprehensive platform with information of human disease-associated genes and variants. Finally, our database consists of 617 osteoporosis-related encoding genes, 131 osteoporosis-regulated microRNAs.
The biomarker roles of certain genes/proteins are recognized and marked. All extracted osteoporosis-related genes/proteins and their biomarker roles were loaded into MySQL database.
(2) Performance Evaluation of bio-entity recognizer:
Entity recognition usually uses precision (number of correct entities identified divided by number of entities identified), recall (number of correct entities identified divided by number of entities in the sample), F-measure (the harmonic mean of precision and recall) to evaluate proformance.
The Colorado Richly Annotated Full Text (CRAFT 2.0) corpus, the linguistic annotation of a significant new resource of 97 full-text biomedical publications, is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. Abstract data of CRAFT 2.0 corpus was used as the independent test set to evaluate the performance of our bio-entity recognizer by identifying gene/protein based on Protein Ontology. After evaluation, the precision, recall and F-measure of our bio-entity recognizer for identifying gene/protein were 0.959, 0.802 and 0.874 respectively.
CRAFT 2.0-independent test result
Three rounds of strict manual curation for the candidate osteoporosis-related genes have taken to guarantee data quality. First, two experienced researchers independently checked all candidate genes and supporting evidence ; second, these selected genes with corresponding evidences were submitted to an internal review, in which each of the selected genes names were manually reviewed by a reviewer team consisting of three experts; third, all co-authors were asked to randomly check osteoporosis-related genes from our website to make sure that all identified genes stored in our database are high credible osteoporosis-related genes. Finally, we obtained 617 osteoporosis-related encoding genes and 134 regulated microRNAs. All the evidence sentences that regarded as validated evidence have been manually curated and included on the top of each osteoporosis-related gene page.
The login is only demanded when users do the manual curation. Users can log in by clicking here
We will monitor and validate community curation feedback before inclusion in the database.
Here we offer a trial account for the curation function (Account: "test@test.com", Password: "test").
Note: All the other functions of OsteoporosAtlas including data retrieval, browsing and downloading do not require any login or registration.
As it is difficult to validate the massive sentences of co-occurrence between the osteoporosis-related genes and human disease terms,we leave the part of manual validation to all logged in users. we added a manual curation function to the sentences, with which the users can provide their feedback by simply clicking the "Yes" or "No" button to confirm or reject the evidence phrases. "Yes(1)"stands for the number of the sentence considered as "right" while "No(0)"stands for the number of the sentence considered as "error" .
Functional Role means the biological functions of osteoporosis-related genes in the molecular mechanism of the occurrence and development of osteoporosis.
A biomarker is a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention.
A drug target is a molecule in the body, usually a protein, that is intrinsically associated with a particular disease process and that could be addressed by a drug to produce a desired therapeutic effect.
The sentences or terms matching key words of genes or key words of osteoporosis are extracted from PubMesd abstracts by our bio-entity recognizer and highlighted in our database.
Gene names mean formal gene symbols and their synonyms provided by RefSeq.
Biological process mean standardized biological process names and their common synonyms provided by Gene Ontology.
MicroRNA names mean formal microRNA symbols and their synonyms provided by miRBase.
Yes, "Feedback" feature was provided to our website, with which the users can submit new genes to our database manually; the database will be updated periodically in future. In addition, the users can also send email to us for further questions or potential collaborations.