Microsoft open-sourced a Python tool for converting files and office documents to Markdown

cm0002@lemmy.world · 10 days ago

Microsoft open-sourced a Python tool for converting files and office documents to Markdown

refalo@programming.dev · edit-2 10 days ago

If like me you were wondering if MS actually provided their own parsers for their Office file formats… they did not.

It seems to just be a bunch of random pyxyz 3rd-party support libraries all mashed together.

mormund@feddit.org · 10 days ago

What do you mean by parser? Office docs are just zipped XML files. They are trivial to parse. The hard part is all the quirks the document renderers have, which makes it impossible to perfectly match the output. But markdown can’t handle any complex formatting anyway

Sibbo@sopuli.xyz · 10 days ago

Maybe the people that wrote their parser have left the company? Typical big software corp problem.

GissaMittJobb@lemmy.ml · 10 days ago

I mean, the parser would still be there even if the people left the company, right? The source code remains.

Creat@discuss.tchncs.de · 10 days ago

It might also be somewhere, but nobody knows where.

Phoenix3875@lemmy.world · 10 days ago

Reading through the source code, it’s more of a repackaging of other open source libraries, probably for its AI effort.

Chais@sh.itjust.works · 8 days ago

So Pandoc but worse?

Microsoft open-sourced a Python tool for converting files and office documents to Markdown

Microsoft open-sourced a Python tool for converting files and office documents to Markdown

GitHub - microsoft/markitdown: Python tool for converting files and office documents to Markdown.