Monthly Archives: April 2015

Words used in Reddit’s usernames (430,000 users)

A Word Cloud of English words used within 430K Reddit usernames.

The data is from here. It was uploaded by reddit user Phycoz, in response to my previous post about Tumblr. Dictionary words searched and counted, were limited by greater than 4 characters in length. The same c# code from the previous post, Words used in Tumblr’s usernames (380,000 users) was used. Wordle for rendering the image.

redditusers_wordcloud_1

redditusers_wordcloud_2

Words used in Tumblr’s usernames (380,000 users)

The data is a parse of 380,000 usernames. Link Here

C# was written to:

  • parse an English dictionary for words. Link here
  • parse all tumblr usernames
  • search for each English word with a length of 5 or greater within each username
  • store each English word found, and count its frequency throughout all names
  • Wordle was used to generate a word cloud of the most frequent words.

Most Frequent English Words Found in TUMBLR’s Usernames
TUMBLR_USERNAME_ENGLISHWORDS
Continue reading

C#: Tumblr Blog Description Top Word Frequency List Visualization

Browsing reddit; I found a parsed dataset from Tublr. Link Here.
I used code from an older post to obtain a list of unique words, and their frequencies.
Next, I used a default TermCloud – Sample from GoogleCharts’s Additional Charts Gallery, to generate this image in a web browser.

Tublr blog description: words with a frequency greater than 5,000, ordered by most to least frequent.
FINAL_TUBLR_WORDFREQ

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
 
namespace ordertumblrwords
{
    class Program
    {
        static void Main(string[] args)
        {
            int counter = 0;
            string line;
            Dictionary<string, int> hashfreq = new Dictionary<string, int>();
            System.IO.StreamReader file =
               new System.IO.StreamReader("C:\\Book\\New folder\\OUTPUT.txt");
            while ((line = file.ReadLine()) != null)
            {
                string[] parts = line.Split(':');
                int freq = int.Parse(parts[1].ToString().Trim());
                hashfreq.Add(parts[0], freq);
                counter++;
            }
            file.Close();
            int q = 0;
            string masterstring = "";
            foreach (KeyValuePair<string, int> item in hashfreq.OrderByDescending(key => key.Value))
            {
                if (item.Value > 5000)
                {
                    Console.WriteLine("data.setValue(" + q + ", 0, '" + item.Key + "');");
                    Console.WriteLine("data.setValue(" + q + ", 1, "+item.Value+");");
                    q += 1;
                    masterstring += "data.setValue(" + q + ", 0, '" + item.Key + "');\r\n";
                    masterstring += "data.setValue(" + q + ", 1, " + item.Value + ");\r\n";
                }
 
            }
            StreamWriter streamWrite;
            streamWrite = File.AppendText("C:\\Book\\MANTASMAIN.txt");
            streamWrite.WriteLine(masterstring);
            streamWrite.Close();
        }
    }
}