Data is just an abstraction of associations - or: A philosophical interlude

In my previouse posting I´ve described the Pile view of the "world of data": focus on associations, not data. Todays I wanted to put meat on this theoretical skelleton and show you code I wrote. But first let me insert a little philosophical deviation. It might help to distinguish Pile from other (more or less novel) approaches to "data management" - and either draw more fire from the RDBMS camp or drive them off altogether :-) We´ll see...

In the past days while trying to explain Pile to others I stumbled across two insights I´d like to share with you. Hope they help you as much as they helped me. Please give them a chance.

Elementary particles instead of bricks

We´re all used to seeing the world through data centric glasses. First come tables and rows and objects and files. Then come relationships between those data entities. That kind of thinking is so ingrained in our brains (and education) it´s hard to shed it. But do we need to? Well, since we can solve so many problems looking through this pair of glasses, it seems as if it´s sufficient. Sure, to be content with this conceptual tool because it solves problems, is ok. That´s the purpose of any world view and theory or model.

But ability to solve problems does not mean being the only possible interpretation of the world or any inherent "truth".

So once you hit a wall with an established world view (or theory or model) it seems prudent to question it. This questioning must be allowed at any time. Whoever denies this is dogmatic - that´s how I understand scientific thinking. Asking questions all the time is at the very heart of science.

One of the questions I asked the established data models was: What´s your granularity? And my impression is: the granularity when talking of objects and tables/rows is very, very coarse. Tables/rows and objects and files are bricks to build a data world with. And that´s great! So many buildings can be constructed using bricks.

But let´s be honest: In the end you can only do so much with bricks. There are inherent limits to what you can do with them. You can try to ignore those limitations, but finally you will succumb to them.

Now look at the world. Is it build from bricks? No, it´s not bricks, but something much, much smaller. Elementary particles are the building block of our world. And even though there are not so many different elementary particles (proton, neutron, electron to constitute atoms), the world is so, so, so rich in materials and forms. How can that be? Well, that´s because elementary particles can be related to each other in an infinite number of ways. It´s the associations between the few different elementary particles which makes possible the variety of the universe.

Now, when I say, objects and tables/rows are bricks, what are the elementary particles of data structures? It´s associations or relations like in Pile. With objects or files you can build only so much. But when modelling (data) structures with just Pile relations, you can build everything. There´s no limit to what you can model.

Or look at the field of 3D graphics: There is only so much you can model with spheres or torusses or cubes. But you can model any shape using triangles! They are the most basic building blocks for geometric structures/bodies, they are the elementary particles of 3D models:


(Source: http://www.computing-objects.com/en/meshtools_gallery_3dr.html)

Now look at Pile´s relations. They are triangles too! A relation connects two "things" and itself again is a new "thing". I´d say, you cannot get more basic than that - and be equally simple and powerful.

You might say, "Hey, why do I need AB in this picture? Why not draw a straight line between A and B? Wouldn´t that be even more simple?" You´re right, that would look simpler, but it would introduce a data - relation dichotomie. "Data" (A and B) would be different from the relation (the connecting line) between them. In so far I´d say, in the end this would not be simpler, because there would be more concepts than in Pile which only knows relations.

My bottom line: Pile describes the most fundamental model for informational structures there is. So in essence it would be wrong to say "Hey, you can model a Pile using an RDBMS!". Rather you should think "I can model an RDBMS or a file system using Pile relations!" That´s a fundamental shift in thinking!

As important this focus switch is, of course it is important to have concepts and structures on a higher level of abstraction. You want to have bricks to build houses and not arrange elementary particles. So to have RDBMS and files and objects is a good thing. But it´s also important to know they are just convenient abstractions. It´s like looking at a fridge and using a fridge - but knowing in the end it´s just made up of elementary particles.

Sometimes it´s good to see things as high level entities - and sometimes it´s good to see them as aggregations of low level entities. The ability to switch perspective is valuable, I´d say.

Data is an abstraction

Once I had this picture of associative triangles in my mind, I then thought about the nature of data. What is data anyway, if it´s so important to us? If you look at Wikipedia´s definition you read: "data are numbers, word, images etc." or "statement[s] accepted at face value". And I´d add: data are units of meaning as a statement is supposed to mean something. (Let´s leave context dependency out of the picture for the moment.) A number is a "unit of meaning", a letter is, an image file too.

Now, what is data made of? Just 1 and 0. Numbers, letters, files are just sequences of bits. You determine where a unit of meaning starts and ends. You determine the interpretation of a sequence of bits.

1/0, On/Off themselves are no data. They only become data when you interpret them. By themselves they are only "signals" (or a signal and a missing signal). So a number or a letter is a sequence of signals. Data emergence from this sequence only if you assign it meaning, e.g. taking 8 signals and saying "This is a byte and it means 'A'."

Datum "A" thus is this sequence of signals: 01000001 (hex 41). And interpreting these signals as "A" is moving to a higher level of abstraction. You give up detail to get a more manageable entity, e.g. a letter instead of a sequence of signals.

So I´d say: There is no data. Data is just a necessary and useful abstraction. What there is are only signals and associations of signals.

With regard to Pile that means: There need not even be any data outside a Pile. The very "atoms" of a Pile are the only two signal values 1 and 0. From them anything you want can be built.

Only 1 and 0 are left outside the (world of) Pile. And since they are so basic, you could even get rid of them and make them axiomatic concepts of Pile as the only really needed Terminal values.

Can you see now, why I think, Pile is so fundamental? Data is just an abstraction and Pile´s relations are enough to tie together the basic signals of 1 and 0 to form larger units which then can be interpreted as data in any way you like.

Ok, now, enough with philosophy! Next time I´m gonna show you down to earth C# code. How´s that?

3 Comments

  • Holy spectacles, Ralf-man!



    It's like the wall we're climbing is really the floor!



    Oh, wait - it is the floor.



    Um, so why are we still walking funny? Is it because we've gotten so used to it?

  • @Udi: Yes, I guess us walking funny is because we´re so used to it (and it helped us to solve many problems). But quite some problems are still unsolved and my feeling is, we need some fundamental shift in how we view the "data management" world.



    It´s like discovering the world is made up of atoms instead of... whatever else there might be, like stone and wood and iron.



    Many times a new view will not change how we act in out everyday programmer lifes. Like we are not concerned with atoms when we buy some bread. But at different times it´s important to know there are atoms to build completely new things.

  • @RalfB: I´m not suggesting 1 and 0 should be the only Terminal values. I just wanted to show it would work.



    As I said in my latest post: The designer of a Pile agent (or Pile application) just needs to carefully think about which data units to choose as Tvs, because whatever you choose becomes opaque. Tvs are black boxes without any structure to relate to.

Comments have been disabled for this content.