Category Archives: LOL

LOL! Big Data Analytics!

So what’s the best way to compare data?

Say you have a bunch of different data, of all different types and sizes.
You’ve got Square Data and Circle Data.


And Square Data is Different from Circle Data.
You know they are both Shapes, but one has 4 corners and the other has 0 corners.
“and thats ridiculous!” from either perspective.

Also, if we look closer we can see that circle data looks like this.

And Square Data looks like this.

Well, I can’t compare this “J” with “l” ! >:O

WHAT AN OUTRAGE!  This is very frustrating! >:<

. . .

But WHAT IF!  I had even more granular data about each J and l?

And ‘J’ looked like this.

And  ‘l’ looked like this.

:D!   Fantastic!  They have something in common!  COLORS !


meh, so now what? Well, we can record the frequency of each color related to each individual ‘J’ or ‘l’.

“so what?”

And then we can sort the color data of each J or l from most frequent to least.



Then we can take the top x most frequent colors and call that a set (set of colors for each J and l).  So now, by putting emphasis on frequency we can attempt to make relevance!


We can now play with Data!

We can figure out and see what the most optimal algorithm for DataSet Comparison is!  Is it top 5 most frequent terms of “J” compared with the top 5 most frequent terms of “l” with at least 2 matches make a relation between “J” and “l”? Maybe its top 15 compared with the top 50 with x matches?  Maybe 5 vs. 15 with x matches?

If the colors were “special” words I find that the  top 10 vs. top 10 with 4 matches or more, works best.  But this could change at any moment!  I could wake up tomorrow and decide differently.  There is no absolute truth here,  teh absolute truth is in teh data!