C#: Parse a Sentence Containing a Word from Text using Regular Expressions.

Recently, I had received an email from someone asking me how to obtain all sentences containing a specific word. So, I made this quick post. The code below shows how to use regular expressions to parse all sentences from text, then check to see if the sentence contains a specific word.

//Look for sentences containing the word "bank"
string word = "bank";
//Text String
string fulltext = @"Starting in the early 1960s federal banking regulators interpreted provisions of the Glass–Steagall Act to permit commercial banks and especially commercial bank affiliates to engage in an expanding list and volume of securities activities. By the time the affiliation restrictions in the Glass–Steagall Act were repealed through the Gramm–Leach–Bliley Act of 1999 (GLBA), many commentators argued Glass–Steagall was already “dead.” Most notably, Citibank’s 1998 affiliation with Salomon Smith Barney, one of the largest US securities firms, was permitted under the Federal Reserve Board’s then existing interpretation of the Glass–Steagall Act. President Bill Clinton publicly declared ""the Glass–Steagall law is no longer appropriate."" Many commentators have stated that the GLBA’s repeal of the affiliation restrictions of the Glass–Steagall Act was an important cause of the late-2000s financial crisis.  Some critics of that repeal argue it permitted Wall Street investment banking firms to gamble with their depositors' money that was held in affiliated commercial banks. Others have argued that the activities linked to the financial crisis were not prohibited (or, in most cases, even regulated) by the Glass–Steagall Act. Commentators, including former President Clinton in 2008 and the American Bankers Association in January 2010, have also argued that the ability of commercial banking firms to acquire securities firms (and of securities firms to convert into bank holding companies) helped mitigate the financial crisis.";
 
//Match Collection for every sentence
MatchCollection matchSentences = 
    Regex.Matches(fulltext, @"([A-Z][^\.!?]*[\.!?])");
//Alternative pattern :  @"(\S.+?[.!?])(?=\s+|$)"
 
//counter for sentences.
int foundSentenceWithWord = 0;
foreach (Match sFound in matchSentences)
{
    foreach (Capture capture in sFound.Captures)
    {
        string current_sentence = capture.Value;
        //if you don't want to match for words like 'bank'er  or  'bank'ing
        //use the word boundary "\b"
        //change this pattern to   @"\b"+word+@"\b"
        Match matchWordInSentence = 
            Regex.Match(capture.Value, word, RegexOptions.IgnoreCase);
        if (matchWordInSentence.Success)
        {
            Console.WriteLine("Sentence Found Containing '" + word+"' :");
            Console.WriteLine(current_sentence); Console.WriteLine();
            foundSentenceWithWord++;
        }
    }
}
Console.WriteLine();
Console.WriteLine("Found " + foundSentenceWithWord 
    + " Sentences Containing the word '" + word + "'");
Console.WriteLine();

Output:

One thought on “C#: Parse a Sentence Containing a Word from Text using Regular Expressions.

  1. prabha

    Thanks to help me.But I need coding in java and also in that hashtable must be implemented

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *