AAgAtlas is an online database provided by National Center for Protein Science (Beijing); it contains a comprehensive list of human autoantigens with their correspondent evidences in literature, derived by text mining and manual curation.
Two query approaches are provided for searching autoantigens, query by gene symbol and query by disease term.
For the query by gene symbol, the user can enter a gene symbol in the "Gene Symbol" search box, and a drop down menu will provide auto-completed gene symbols present in the AAgAtlas. After selecting one of them and clicking the 'Search' button, the search engine will run and return the queried results containing a table showing the autoantigen gene, related diseases as well as the supporting literature evidence. If users click the 'Reset' button, all current search terms will be deleted.
For the query by disease term, the user can enter a disease term in the "Disease Term" search box, and a drop down menu will provide auto-completed disease terms present in the AAgAtlas. After selecting one of them and clicking the 'Search' button, the search engine will run and return the queried results containing a table showing the autoantigen gene, related diseases as well as the supporting literature evidence. If users click the 'Reset' button, all current search terms will be deleted.
By clicking the hyperlink of the gene symbol for an individual autoantigen in the 'Gene' column, users can see the basic information of the autoantigen gene and the cross references to external databases (i.e. Ensembl, NCBI Gene, UniProtKB, neXtProt and Antibodypedia).
When the user clicks the small triangle in the head of each column in the table, the correspondent results in the table will be resorted in ascending/descending order by the column. And the user can also input words or select a term in the drop list in the text box to filter the listed genes or diseases from the results by sub-string match.
When the user clicks the number of the evidence abstracts or sentences, a table containing gene, disease, the PubMed ID, the evidence sentence and the manual validation information will be displayed. The user can click on the evidence to see the original abstract in which the key words, i.e. gene and disease names, are highlighted.
The complete list of human autoantigens or autoantigen-related diseases can be browsed by clicking the ‘Browse’ button in the navigation bar. All the information for autoantigen genes and their supporting literature evidences can be downloaded (red arrow) for further analysis.
Our data collection relies on text mining in autoantigen-related PubMed abstracts with subsequent manual validation. Bio-entity recognition and extraction from these abstracts for human autoantigen candidates were performed by a custom ontology-based bio-entity recognizer. After the evaluation against the CRAFT corpus for gene/protein recognition based on the Protein Ontology (PR), the precision, recall, F-measure of our recognition tool are 0.810, 0.883, 0.845, which are on par with current state-of-the-art biomedical annotation systems like BeCAS [Nunes T, et al. Bioinformatics, 2013, 29(15): 1915.].
In detail, we compiled a list of human autoantigens together with their related diseases and evidences from PubMed abstract using the following three steps. First, all PubMed abstract were fetched through the NCBI E-utilities API and the autoantigen-related abstract were extracted by our custom bio-entity recognizer with the keywords 'autoantigen' or 'autoantibody' or their lexical variants like 'auto-antigen', 'auto antigen', 'autoantigens',' auto-antigens', 'auto antigens', 'auto-antibody', 'auto antibody', 'autoantibodies', 'auto-antibodies' or 'autoantibodies'. 45,830 autoantigen-related abstracts and 94,313 sentences were obtained. Second, a dictionary of human gene/protein name and synonyms was built by integrating all the HGNC-mapped terms and their synonyms in Protein Ontolology. A list of human genes were recognized and extracted from the sentences by our bio-entity recognizer based on this dictionary. Gene symbols which co-occured with the autoantigen keywords at single-sentence level were considered as candidate autoantigens. Third, all 3,984 candidate genes from 25,520 original abstracts and 43,253 evidence sentences in which genes and the keyword co-occurred were manually curated by five experienced researchers for three rounds, confirming 1,126 genes and 1,072 related diseases.
To address this issue, we have taken three rounds of strict manual curation for the autoantigens. First, all sentences with autoantigen names were checked and selected by two experienced researchers independently; second, these selected sentences were submitted to an internal review, in which all autoantigen names were manually reviewed and approved one by one by a reviewer panel consisting of three experts; third, we asked all co-authors to randomly check autoantigens from our website make sure that all genes loaded into our database are bona fide autoantigens. Finally, we obtained 1,126 genes and 1,072 related diseases. All the evidence sentences that have been manually curated are regarded as validated evidence and included on the top of each autoantigen page. In addition, we added a manual curation function to the evidence phrases, with which the users can provide their feedback by simply clicking the "Yes" or "No" button to confirm or reject the evidence phrases. B2M is used as an example in the figure shown below:
Yes, as described above, we provide a manual annotation function at the end of each evidence phrase, by which the user can confirm the evidence or deny it by simply clicking “Yes” or “No”. We will update our database periodically to include the feedback from the user.
In addition, we added “Feedback” feature to our website, with which the users can submit new genes to our database manually; the database will be updated periodically in future.
The users can also send email to us for further questions or potential collaborations.
Yes, the login is necessary for the users to do the manual curation or download data from our database. We will monitor and validate community curation feedback before inclusion in the database.
Powered by Geneworks Bio-IT Team