Markdown and Plain Text Chunking Tool

Reverse engineered prompt

Build me a small Python tool that takes either a Markdown file or plain text and turns it into clean chunk objects I can hand off to an indexing step later.

For Markdown, I want it to understand heading structure, ignore hash signs inside fenced code blocks, and build a document tree with a root, nested sections, and chunk nodes under the right section. For plain text, I want a simpler flat document where the root just contains chunks. Each chunk should include the text plus an embedding, and the whole thing should be easy to flatten into JSON shaped records.

Please support two chunking modes, a smarter semantic mode that uses embeddings to find topic boundaries, and a basic fixed size mode as a faster fallback. Keep the tree immutable and give me simple helpers to walk it, flatten it, and count node types. A basic command line entrypoint with demo input is enough, plus tests that cover both Markdown and plain text paths. Do not add search or Solr integration. Look up current docs online if you need to.

Want more depth? Deep Reverse

santabasnet/hybrid-graph-rag-chunker — reverse-engineered prompt

Reverse engineered prompt