LOL! Big Data Analytics!

So what’s the best way to compare data?

Say you have a bunch of different data, of all different types and sizes.
You’ve got Square Data and Circle Data.

And Square Data is Different from Circle Data.
You know they are both Shapes, but one has 4 corners and the other has 0 corners.
“and thats ridiculous!” from either perspective.

Also, if we look closer we can see that circle data looks like this.

And Square Data looks like this.

Well, I can’t compare this “J” with “l” ! >:O

WHAT AN OUTRAGE!  This is very frustrating! >:<

. . .

But WHAT IF!  I had even more granular data about each J and l?

And ‘J’ looked like this.

And  ‘l’ looked like this.

:D!   Fantastic!  They have something in common!  COLORS !

“boring!”

meh, so now what? Well, we can record the frequency of each color related to each individual ‘J’ or ‘l’.

“so what?”

And then we can sort the color data of each J or l from most frequent to least.

“lame…”

“._.”

Then we can take the top x most frequent colors and call that a set (set of colors for each J and l).  So now, by putting emphasis on frequency we can attempt to make relevance!

“Whatever!”

We can now play with Data!

We can figure out and see what the most optimal algorithm for DataSet Comparison is!  Is it top 5 most frequent terms of “J” compared with the top 5 most frequent terms of “l” with at least 2 matches make a relation between “J” and “l”? Maybe its top 15 compared with the top 50 with x matches?  Maybe 5 vs. 15 with x matches?

If the colors were “special” words I find that the  top 10 vs. top 10 with 4 matches or more, works best.  But this could change at any moment!  I could wake up tomorrow and decide differently.  There is no absolute truth here,  teh absolute truth is in teh data!

:D

One thought on “LOL! Big Data Analytics!”

1. Raimundas

It must be pointed out that a tremendous amount of overlapping quantitative and qualitative data is produced in our world. Nowadays, neither ICT people nor scientists, experts or decision makers from different applied fields use so-called raw data. And in general, the object of computerized system analysis, decision making processes and information processing shifts from raw data towards more sophisticated computing according to the following scheme:
Data -> Information -> Knowledge-> Wisdom.
As a matter of fact, each step in this transformation scheme is performed on the basis of certain operations. For example, the transformation of raw data into information includes the procedure of data mapping on some context.
Indeed, let us say that a datum +37C˚ presents entirely different information in different contexts: when we speak about someone’s health (APPROXIMATELY OK); when we speak about the weather in Lisbon (HOT FOR TOURISTS); or when we speak about metal melting (REAL DISASTER).
More sophisticated operations are involved in the process when information is to be transformed into knowledge. Such a mapping is usually based on a test feedback paradigm: information used in a closed feedback control loop to reach a certain goal becomes the knowledge which guarantees positive or negative result with some degree of uncertainty.
The last step in this scheme (transformation of knowledge into wisdom) is still under discussion and thorough investigation, and we do not know any suitable formalized procedures except for knowledge implementation and summarizing of obtained results.
Properly speaking, the general transformation scheme should be described as follows:
Data -> Information = Contextual Data (CD);
Information -> Knowledge = Meta Information (MI) (or Theory);
Knowledge -> Wisdom = Cognitive Knowledge (CK).