C#: How to (parse the text content)/ read from Microsoft Word Document doc

Get the Text out of a Microsoft word file. Read from MS word.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices.ComTypes;
 
namespace readDOC
{
    class Program
    {
        static void Main(string[] args)
        {
            Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
            object miss = System.Reflection.Missing.Value;
            object path = @"C:\DOC\myDocument.docx";
            object readOnly = true;
            Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss);
            string totaltext = "";
            for (int i = 0; i < docs.Paragraphs.Count; i++)
            {
                   totaltext += " \r\n "+ docs.Paragraphs[i+1].Range.Text.ToString();
            }
            Console.WriteLine(totaltext);
            docs.Close();
            word.Quit();
        }
    }
}

20 thoughts on “C#: How to (parse the text content)/ read from Microsoft Word Document doc

  1. micheal

    Thank you for your article, Good way to read word document in c#, the way may can not used in asp.net, I used spire.doc to read document in asp.net.

    Reply
    1. liycn

      NeoOne, the library in your post works for me because 1. It works with docx files; 2. you do not need Word installed on the user’s machine as you mentioned :P

      It is a shame that there is no source code and the last update from the author is Sep 2012.

      Anyway the dll does what it says on the tin. Thank you.

      Reply
  2. Matthew

    How about

    string totaltext = docs.Content.Text;

    to get all the text from a word document.

    Reply
  3. Mirza

    can any one say me how to load word file on website and extract some fields from document and display those fields in different textbox’s like how we do in resume upload in job postal sites

    Reply
  4. Navanath Navaskar

    I have used above code but it is giving error as :

    Error 1 The type or namespace name ‘Office’ does not exist in the namespace ‘Microsoft’ (are you missing an assembly reference?) c:\users\nagendra\documents\visual studio 2012\Projects\ConsoleApplication1\ConsoleApplication1\Program.cs 13 23 ConsoleApplication1
    Error 2 The type or namespace name ‘Office’ does not exist in the namespace ‘Microsoft’ (are you missing an assembly reference?) c:\users\nagendra\documents\visual studio 2012\Projects\ConsoleApplication1\ConsoleApplication1\Program.cs 13 76 ConsoleApplication1
    Error 3 The type or namespace name ‘Office’ does not exist in the namespace ‘Microsoft’ (are you missing an assembly reference?) c:\users\nagendra\documents\visual studio 2012\Projects\ConsoleApplication1\ConsoleApplication1\Program.cs 17 23 ConsoleApplication1

    I want to read office file in windows form.
    plz reply fast.

    Reply
    1. Azhar

      Add com reference in your project , follow the steps..

      right click on your project name in project solution window
      click on add reference
      then select COM ,
      search microsoft word 12.0 or 9.0 which is available
      click ok ,
      now your code is running…

      Reply
  5. George

    ^^
    1. Add reference to your project as Microsoft.Office.Interop.Word .You can find it in the .NET section

    2. In solution, add using Microsoft.Office.Interop.Word
    regds
    George

    Reply
  6. pravin

    if I want particular paragraph from word document then how can I access it using above code…

    Reply
  7. Andy F

    Thank you very much for some great code. I have been struggling with this very activity, but you have made it clean and clear. Very nice! Again, Thanks.

    Reply
  8. Alexey

    You can get text or read your doc file by using this .NET API for Word . It is not a free API but offers free trial so you can try it. You can also get sample codes from their documentation page like i do because i have subscribed to their website.

    Reply
  9. Ashish

    I have a structured word template. I want to retrieve content in JSON format. Can anyone explain me how it can be achieved?

    Reply
  10. aishvarya

    {“Creating an instance of the COM component with CLSID {000209FF-0000-0000-C000-000000000046} using CoCreateInstanceFromApp failed due to the following error: 80040154 Class not registered (Exception from HRESULT: 0x80040154 (REGDB_E_CLASSNOTREG)). Please make sure your COM object is in the allowed list of CoCreateInstanceFromApp.”}

    i got the error in Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application(); this line……

    i m already included Microsoft.Office.Interop.Word in project references

    and there is no option available COM in solution window

    plz kindly request you plz help me to solved it… i m stuck with this error last from 2 month… i tried all solution but not worked for me….

    i m worked with vs2015 and installed word 2016… and i want to read word file from given path …

    Reply
  11. Ankit

    i am not able to read table as same as its original form .
    normally text are read in well manner but table is not print properlly

    Reply

Leave a Reply to Ashish Cancel reply

Your email address will not be published.