Skip to content

doclang-project/doclang

Repository files navigation

DocLang

DocLang

PyPI version Python uv Ruff Checked with mypy pre-commit License Apache 2.0

DocLang is the AI-native markup format for unstructured content — including documents, images, and more. It maps cleanly to LLM tokens while preserving structure, semantics, layout, and geometry in a single, unambiguous representation.

This repository is the home of the normative specification and the reference toolkit for DocLang. If you build with LLMs and VLMs on real-world content, this is where the standard lives.

Specification

The source of the specification is available in spec.md and exports to different formats can be found in the exports/ directory.

Reference Toolkit

The commands below illustrate basic scenarios. For advanced installation and usage options (minimal install, platform notes, custom Schematron backends, Python API), see the toolkit README.

Installation

pip install "doclang[schematron-saxon]"

Validation

doclang validate -n my_document.dclg

Packaging

doclang pack my_document.dclg

Citation

If you use DocLang in academic or technical work, please cite the specification:

@misc{doclang_2026,
  title        = {DocLang: Universal AI Document Format},
  author       = {{DocLang Project}},
  year         = {2026},
  version      = {main},
  howpublished = {\url{https://github.com/doclang-project/doclang}},
}

Development

To work on this repository — setup, tests, reference generation, releases — see CONTRIBUTING.md.

We ❤️ Open Source AI

DocLang is developed in the open and supported by the LF AI & Data Foundation. Learn more about the project at doclang-project.

License

DocLang is licensed under the Apache License 2.0. See LICENSE for details.

About

DocLang spec and reference toolkit

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages