C#: How to (parse the text content)/ read from Microsoft Word Document doc

Get the Text out of a Microsoft word file. Read from MS word.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices.ComTypes;
 
namespace readDOC
{
    class Program
    {
        static void Main(string[] args)
        {
            Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
            object miss = System.Reflection.Missing.Value;
            object path = @"C:\DOC\myDocument.docx";
            object readOnly = true;
            Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss);
            string totaltext = "";
            for (int i = 0; i < docs.Paragraphs.Count; i++)
            {
                   totaltext += " \r\n "+ docs.Paragraphs[i+1].Range.Text.ToString();
            }
            Console.WriteLine(totaltext);
            docs.Close();
            word.Quit();
        }
    }
}

13 thoughts on “C#: How to (parse the text content)/ read from Microsoft Word Document doc

  1. micheal

    Thank you for your article, Good way to read word document in c#, the way may can not used in asp.net, I used spire.doc to read document in asp.net.

    Reply
    1. liycn

      NeoOne, the library in your post works for me because 1. It works with docx files; 2. you do not need Word installed on the user’s machine as you mentioned :P

      It is a shame that there is no source code and the last update from the author is Sep 2012.

      Anyway the dll does what it says on the tin. Thank you.

      Reply
  2. Mirza

    can any one say me how to load word file on website and extract some fields from document and display those fields in different textbox’s like how we do in resume upload in job postal sites

    Reply
  3. Navanath Navaskar

    I have used above code but it is giving error as :

    Error 1 The type or namespace name ‘Office’ does not exist in the namespace ‘Microsoft’ (are you missing an assembly reference?) c:\users\nagendra\documents\visual studio 2012\Projects\ConsoleApplication1\ConsoleApplication1\Program.cs 13 23 ConsoleApplication1
    Error 2 The type or namespace name ‘Office’ does not exist in the namespace ‘Microsoft’ (are you missing an assembly reference?) c:\users\nagendra\documents\visual studio 2012\Projects\ConsoleApplication1\ConsoleApplication1\Program.cs 13 76 ConsoleApplication1
    Error 3 The type or namespace name ‘Office’ does not exist in the namespace ‘Microsoft’ (are you missing an assembly reference?) c:\users\nagendra\documents\visual studio 2012\Projects\ConsoleApplication1\ConsoleApplication1\Program.cs 17 23 ConsoleApplication1

    I want to read office file in windows form.
    plz reply fast.

    Reply
    1. Azhar

      Add com reference in your project , follow the steps..

      right click on your project name in project solution window
      click on add reference
      then select COM ,
      search microsoft word 12.0 or 9.0 which is available
      click ok ,
      now your code is running…

      Reply
  4. George

    ^^
    1. Add reference to your project as Microsoft.Office.Interop.Word .You can find it in the .NET section

    2. In solution, add using Microsoft.Office.Interop.Word
    regds
    George

    Reply
  5. Andy F

    Thank you very much for some great code. I have been struggling with this very activity, but you have made it clean and clear. Very nice! Again, Thanks.

    Reply
  6. Alexey

    You can get text or read your doc file by using this .NET API for Word . It is not a free API but offers free trial so you can try it. You can also get sample codes from their documentation page like i do because i have subscribed to their website.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>