To read word docx file using POI we use the XWPFWordExtractor class. This class comes from the package org.apache.poi.xwpf.extractor and has a method getText() that returns all the content of the file in simple String.
We will examples of the XWPFWordExtractor with simple and complex data in a Word docx file.
1. Read simple data from Docx
Lets have a word file as below
Now lets read it
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
package com.kscodes.test; import java.io.File; import java.io.FileInputStream; import org.apache.poi.xwpf.extractor.XWPFWordExtractor; import org.apache.poi.xwpf.usermodel.XWPFDocument; public class ReadDocUsingPOI { public static void main(String args[]) { XWPFDocument document = null; FileInputStream fileInputStream = null; try { File fileToBeRead = new File("C:\\kscodes_temp\\SimpleFileToRead.docx"); fileInputStream = new FileInputStream(fileToBeRead); document = new XWPFDocument(fileInputStream); XWPFWordExtractor extractor = new XWPFWordExtractor(document); System.out.println("The Contents of the Word File are ::"); System.out.println("--------------------------------------"); System.out.println(extractor.getText()); } catch (Exception e) { System.out.println("We had an error while reading the Word Doc"); } finally { try { if (document != null) { document.close(); } if (fileInputStream != null) { fileInputStream.close(); } } catch (Exception ex) { } } } } |
Output
2. Read table from Docx file
Now lets try to read a file which has table data in it. We will add some table contents to the above file and again try to run the code to see the output
Output
As you can see that the XWPFWordExtractor.getText() will always return simple String that it reads.
We will share some of the examples of reading various other data of the Docx like header, footer, paragraphs, tables etc in our upcoming posts.