Functional annotation:Public protein databases TAIR10 (Reiser et al., 2017), trEMBL, Swissprot, COG, and nr allowed annotation of 13984, 23070, 9040, 3419, and 24003 genes, respectively. KEGG orthology annotation of 7435 genes was performed using the online GhostKOALA tool (Kanehisa et al., 2016) from the KEGG database, and gene ontology (GO) annotation of 19094 genes was performed by means of InterProScan (Jones et al., 2014).. 25346 gene functional domains were annotated using the localized PfamScan tool (El-Gebali et al., 2019).

Gene Families:We predicted 1382 protein kinases (PKs), 422 transcription regulators (TRs), and 1382 transcription factors (TFs) using iTAK software as described previously (Zheng et al., 2016). We used the Hidden Markov model (HMM) profiles of ubiquitin conserved protein domains from UUCD (Gao et al., 2013) to predict the ubiquitin proteins in L. japonica, identifying 1517 members of the ubiquitin protein family in L. japonica. We also identified 203 CYP450 genes and 328 genes that encode Ethylene-responsive element binding factor-associated Amphiphilic Repression (EAR) motif-containing proteins based on orthologous relationships between known CYP450/EAR motif-containing proteins and proteins from L. japonica.

Reference:
Reiser, L., Subramaniam, S., Li, D., and Huala, E. (2017). Using the Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes. Curr Protoc Bioinformatics 60: 1 11 11-11 11 45.
Kanehisa, M., Sato, Y., and Morishima, K. (2016). BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. Journal of molecular biology 428: 726-731.
Jones, P., Binns, D., Chang, H.Y., Fraser, M., Li, W.Z., McAnulla, C., McWilliam, H., Maslen, J., Mitchell, A., Nuka, G., Pesseat, S., Quinn, A.F., Sangrador-Vegas, A., Scheremetjew, M., Yong, S.Y., Lopez, R., and Hunter, S. (2014). InterProScan 5: genome-scale protein function classification. Bioinformatics 30: 1236-1240.
El-Gebali, S., Mistry, J., Bateman, A., Eddy, S.R., Luciani, A., Potter, S.C., Qureshi, M., Richardson, L.J., Salazar, G.A., Smart, A., Sonnhammer, E.L.L., Hirsh, L., Paladin, L., Piovesan, D., Tosatto, S.C.E., and Finn, R.D. (2019). The Pfam protein families database in 2019. Nucleic acids research 47: D427-D432.
Zheng, Y., Jiao, C., Sun, H., Rosli, H.G., Pombo, M.A., Zhang, P., Banf, M., Dai, X., Martin, G.B., Giovannoni, J.J., Zhao, P.X., Rhee, S.Y., and Fei, Z. (2016). iTAK: A Program for Genome-wide Prediction and Classification of Plant Transcription Factors, Transcriptional Regulators, and Protein Kinases. Molecular plant 9: 1667-1670.
Gao, T., Liu, Z., Wang, Y., Cheng, H., Yang, Q., Guo, A., Ren, J., and Xue, Y. (2013). UUCD: a family-based database of ubiquitin and ubiquitin-like conjugation. Nucleic Acids Res 41: D445-451.