Talk to MS Word files?

Can C++ be used to talk to MS Word files or even talk to Word VBA? I mean do things like extract text, get data from document properties, etc. If anybody has any links or helpful pages please share. Thanks so much
https://duckduckgo.com/?t=ffab&q=microsoft+word+document+c%2B%2B

New DOCX files are relatively easy.
They're really just ZIP files containing a bunch of files containing XML.
Yes, C++ can "talk" to MS Word files, but it has to speak real slow.

Data is data, and MS Word files are data files that require understanding the structure of how the data is arranged. Why expend the time and effort when other programmers have already done the work for you so you can create, read and write in DOCX format using a 3rd party library.

https://github.com/amiremohamadi/DuckX

I'd bet there are other libraries available to do much the same with other MS Office formats, doing a 'net search would be rewarding.

On a side note DuckX is one of the libraries available using vcpkg. vcpkg is a package manager that makes it really easy to obtain, update and use 3rd party libraries. Especially if you use MS Visual Studio. Though vcpkg can be used with other compilers via CMAKE, and isn't just for Windows.

https://vcpkg.io/en/index.html

Understanding the internals of a data format like DOCX is a worthwhile endeavor, but if all you want to do is have programs access the data using a 3rd party library is a good idea. Why reinvent a flat wheel.
Thank you salem c and George P your answers are super helpful. I will look into these resources.
I am interested in making a commercial plugin that creates a Word file (*.docx) from another application. Does anyone know if I will have to get licensing from Microsoft to do so? Thanks
> Does anyone know if I will have to get licensing from Microsoft to do so?
If you're just using published APIs to access the data, you should be fine.

https://visualstudio.microsoft.com/license-terms/vs2022-ga-community/
This surprised me TBH. At one point, resale of s/w made through the freebie version was not allowed. Apparently, it now is.
I think there are (at least) two different approaches:

1. It is possible to "embed" Microsoft Word (or other Office components) into your own application via ActiveX control. This way your application kind of "remote controls" Word. But, clearly, this will require that Microsoft Word is installed on the system where your application is running. The user also will require a license for Microsoft Word, in addition to your application. On the pro side, all the gory details of reading or writing DOCX files will be "hidden" in Word and your application only needs to send to required commands.

https://www.codeproject.com/Articles/764/Using-MS-Office-in-an-MFC-Application


2. The other approach is to make your application read/write DOCX files directly. This does not require Microsoft Word to be installed at all. And it does not require an Office license from Microsoft. But, of course, it will now be totally up to your application to parse/generate the DOCX files! In fact, the DOCX format is specified in the Office Open XML (OOXML) standard, which is an "open" standard. So, in theory, it is totally possible to create your own DOCX parser/generator. After all, DOCX (and friends) are really just XML wrapped in a ZIP file. But, then again, Office Open XML has often been criticized to be extremely complex, so that only Microsoft can fully implement it... Unzipping and parsing the XML is only the very first step. Making sense of the data contained in the XML structure is the real challenge! 😏

https://en.wikipedia.org/wiki/Office_Open_XML

Using third-party libraries (e.g. "DuckX") for parsing and/or writing DOCX files may be an option to make approach (2) much easier. But then you have to figure out whether the third-party library provides all the features (API) that you need, and whether the license of the third-party library is compatible with your project! For example, "DuckX" uses the MIT license, which is relatively liberal - good for you.


Does it really have to be DOCX, though? Most GUI frameworks (e.g. Qt) support some form of "rich text" documents out-of-the-box:
https://het.as.utexas.edu/HET/Software/html/richtext.html
https://het.as.utexas.edu/HET/Software/html/qtextdocument.html#details
Last edited on
I am interested in making a commercial plugin that creates a Word file (*.docx) from another application


One of the best available is aspose.words for C++. However it is a commercial product and is not free.
https://products.aspose.com/words/cpp/
Topic archived. No new replies allowed.