Own Your Data: Unlocking Documents with Docling
Docling is rapidly becoming the de-facto standard in open source document AI. The project has achieved remarkable adoption with over 45K GitHub stars, more than 1.5 million monthly downloads, and multiple top rankings on global GitHub and HuggingFace trending leaderboards. Incubated as a Linux Foundation AI & Data project, Docling provides local-first, enterprise-grade capabilities, excelling at parsing complex layouts, extracting tables, and converting unstructured documents into AI-ready structured formats.
In this hands-on session, you'll get a chance to:
- ingest and parse multiple doc formats including PDF, DOCX and more
- convert complex tables into usable formats
- extract and prepare images for AI processing
- preserve metadata for visual grounding
- explore AI integration with frameworks like LangChain to power RAG and model training