Read Word Docx File Using POI

To read word docx file using POI we use the XWPFWordExtractor class. This class comes from the package org.apache.poi.xwpf.extractor and has a method getText() that returns all the content of the file in simple String.

We will examples of the XWPFWordExtractor with simple and complex data in a Word docx file.

1. Read simple data from Docx

Lets have a word file as below
Read Word Docx File Using POI

Now lets read it

Output
ReadDocUsingPOI2

2. Read table from Docx file

Now lets try to read a file which has table data in it. We will add some table contents to the above file and again try to run the code to see the output
Read Word Docx File Using POI

Output
Read Word Docx File Using POI

As you can see that the XWPFWordExtractor.getText() will always return simple String that it reads.
We will share some of the examples of reading various other data of the Docx like header, footer, paragraphs, tables etc in our upcoming posts.