Standard Information Model for NLP

In the context of a “NLP standards” project initiated by the VA and Qing Zheng, with multiple collaborations, and including an “ontology” and an information model, we developed the latter by combining two existing standards: the HL7 Clinical Document Architecture, and the GrAF (Graph Annotation Format), an XML serialization of the ISO 24612 standard, LAF (Linguistic Annotation Framework). This information model, along with the “ontology” of NLP, eases sharing and combining annotated corpora and NLP applications. We called it CDA+GrAF, and it can be used to represent all kinds of text annotations and serve as a pivot data model for annotations exchange and combination

We also evaluated it for the Shared Annotated Resources (ShARe) project, to ease sharing and enable interoperability. It was successfully tested with two data model conversion tools we created, to convert annotations from the Knowtator tool to CDA+GrAF, and vice-versa.



  • Meystre, S. M., Boonsirisumpun, N., Elhadad, N., Savova, G. K., & Chapman, W. W. (2014). Standards-Based Data Model for Clinical Documents and Information in the Shared Annotated Resources (ShARe) Project (pp. 1–1). AMIA Summits Transl Sci Proc, CRI.
  • Meystre, S. M., Lee, S., Jung, C. Y., & Chevrier, R. D. (2012). Common data model for natural language processing based on two existing standard information models: CDA+GrAF. Journal of Biomedical Informatics, 45(4), 703–710.