We use cookies and other tracking technologies to improve your browsing experience on our site, analyze site traffic, and understand where our audience is coming from. To find out more, please read our privacy policy.

By choosing 'I Accept', you consent to our use of cookies and other tracking technologies.

We use cookies and other tracking technologies to improve your browsing experience on our site, analyze site traffic, and understand where our audience is coming from. To find out more, please read our privacy policy.

By choosing 'I Accept', you consent to our use of cookies and other tracking technologies. Less

We use cookies and other tracking technologies... More

Login or register
to apply for this job!

Login or register to start contributing with an article!

Login or register
to see more jobs from this company!

Login or register
to boost this post!

Show some love to the author of this blog by giving their post some rocket fuel 🚀.

Login or register to search for your ideal job!

Login or register to start working on this issue!

Engineers who find a new job through Golang Works average a 15% increase in salary 🚀

Blog hero image

Data Enlightenment

Yehonathan Sharvit 22 May, 2020 | 5 min read

The Journey

The purpose of this article is to guide you toward data enlightenment by illustrating the advantages of programming by with data instead of objects.

Data enlightenment is a 3-step journey:

yoga-pose-1082172_1280.jpg

First step: Awareness

You code on a language that supports only objects, like C++, Java or C#.

You code, you suffer, everything is complicated... You don't understand why...

One day, you become aware of your suffering.

Second step: Choice

You code on a hybrid language like JavaScript, Ruby or Python.

That's much more fun than before, but still objects are there and it causes you suffering.

You choose to write as much code as you can using only data.

Last step: Gratitude

You code on a data language like Clojure.

There is nothing to say. No words can really express your feelings.

Your heart is full of gratitude. Your are fully enlightened.

Objects and Data

Let me start by clarifying what I mean by data and objects.

An object is an entity made of:

  • members usually managed by setters and getters
  • methods

In this article, we focus the discussion around members and we do not deal at all with polymorphism and inheritance.

Here is an example of a common object in Java - a Product with name and price:

class Product {
 String name;
 int price;
 
 Product(String name, int price) {
    this.name = name;

    this.price = price;
 }
 String getName() {
    return this.name();
 }
 
 void setName(String name) {
    this.name = name; 
 }
 

 int getPrice() {
    return this.price;
 }
 
 void setPrice(int price) {
    this.price = price;
 }
}

Product pencil = new Product(pencil, 2);

Remark: The code is not always as verbose as in this example as there are various ways to avoid the verbosity of setters and getters, even in Java (see lombok).

In the context of this article, by Data I mean a dictionary (a.k.a hash map) with arbitrary keys and values (think about JSON).

Here is how we create a piece of Data in JavaScript:

var pencil = {
  name: "pencil",
  price: 2
}

Universality of dictionaries

The biggest issue with using objects to represent data is that one has to create a class for each piece of data.

Usually, we have different classes for similar entities in different modules. The fact that similar entities share similar fields is not easy to leverage in the object realm and there is no generic way to instantiate object of class A from object of class B even when the two classes have the same fields. By a generic way, I mean a piece of code that doesn't depend on class A and B.

In a typical e-commerce application, we would have classes for users, customers, products etc... Even worse, we would create separate classes to represent a product depending on what module handles the product. For instance:

  • ProductInApp for the representation of the product when handled in the application module
  • ProductInDb for the representation of the same information in a way that can be handled by our DB driver

ProductInApp and ProductInDb might have the exact same fields - maybe with different names - but it doesn't save us from creating two classes. In addition to that, there is no generic way in the realm of objects to convert from ProductInApp and ProductInDb. One has to write a specific ProductInDb constructor that receives ProductInApp as an argument. (And another UserInDb constructor that receives UserInApp as an argument etc...).

On the other hand, in the realm of data, we manipulate dictionaries. Dictionaries are universal. We can write generic functions to manipulate them. For instance, one can clone a dictionary without any knowledge about the fields in the dictionary. One can also add fields to a dictionary. The only thing that is required is the name of the field and the value that needs to be associated to this field.

Imagine, for instance, that before sending our data to the database, we want to add a created_at field with the current timestamp.

This is how it might look like in JavaScript:

function addTimeStamp(data) {
   var res = data.clone();
   res.timeStamp = new Date();
}

Remark: There is no deep clone function available out of the box in JavaScript. Several libraries provide implementation for deep cloning (See e.g. cloneDeep in lodash)

addTimeStamp is a generic function: it works with any kind of data: users, products etc... It doesn't matter.

We can even generalize our addTimeStamp function by passing to it the name of the field for the timestamp (we might prefer created_at over timestamp in some cases). The code is still quite trivial:

function addCustomTimeStamp(data, field_name) {
   var res = data.clone();
   res[field_name] = new Date();
}

Imagine writing something like that in a standard Object Oriented language. It would involve super advanced tricks like reflection. While in a data language it's a simple generic function.

Serialization without reflection

Communication between web frontend and backend or between http services over REST is string based. Usually, we don't pass objects over the wire.

In order to represent the information stored in an object as a string, one has to serialize the object. In order to serialize an object, one has to either:

  • depends on the class of the object to serialize
  • uses reflection

Both are quite cumbersome.

Remark: libraries like Jackson for Java make it easier to serialize objects.

In the data realm, serialization comes for free and it works with any piece of data. For instance, Javascript provides a JSON.stringify method:

var pencil = {
  name: "pencil",
  price: 2
};
var pencilStr = JSON.stringify(pencil); 

Testability with no mocks

What about testing?

Imagine you want to use Amazon EC2 API to create machine instances programmatically.

Let's take a look a code sample in Java using Java SDK for EC2:

RunInstancesRequest runRequest = RunInstancesRequest.builder()
        .imageId(amiId)
        .instanceType(InstanceType.T1_MICRO)
        .maxCount(1)
        .minCount(1)
        .build();

RunInstancesResponse response = ec2.runInstances(runRequest);

How can you write unit tests for this code? How can you maker sure that the various methods (imageId, instanceType, maxCount and minCount) are called with the correct arguments?

Usually, it involves mocking and the code for the unit tests becomes rapidly very complicated.

Let's compare it with a similar code sample in Javascript using JavaScript SDK for EC2:

var instanceParams = {
   ImageId: 'AMI_ID', 
   InstanceType: 't2.micro',
   MinCount: 1,
   MaxCount: 1
};

var response = ec2.runInstances(instanceParams);

The big difference is that now we are in the data realm: instead of passing an object to runInstances, we pass a dictionary.

Writing a unit test that checks the dictionary has correct keys and values is trivial. It doesn't require any mocking and the code for it is quite simple.

Conclusion

There are still a lot to cover and there are definitely advantages of Object Oriented programming (like type checking, refactoring tools ...) that are difficult to achieve with Data Oriented programming. That might be the topic of a future article.

We have illustrated three main advantages of the data oriented approach:

  • We use universal dictionaries instead of a class for each kind of data
  • We serialize data for free without reflection
  • We write unit tests that validate the keys and values of our dictionaries instead of mocking methods

I hope that I was able to motivate you to take a step forward in your data enlightenment journey.

  • If you code in OO, move forward to an hybrid language
  • If you code in a hybrid language, write as much code as you can with data
  • If you code in a data language, fill your hear with gratitude

I wish you a happy Data Enlightenment journey!

Author's avatar
Yehonathan Sharvit
Believe in Elegance and Simplicity. Loves Clojure.
    Clojure
    JavaScript
    ruby
    Shell
    ClojureScript
    python
    java
    big data

Related Jobs

Related Issues

viebel / klipse-clj
viebel / klipse-clj
  • Open
  • 0
  • 0
  • Intermediate
  • Clojure
viebel / klipse
  • Open
  • 0
  • 0
  • Intermediate
  • Clojure
viebel / klipse
  • Open
  • 0
  • 0
  • Intermediate
  • Clojure
  • $100
viebel / klipse
  • 1
  • 0
  • Intermediate
  • Clojure
viebel / klipse
  • Open
  • 0
  • 0
  • Intermediate
  • Clojure
  • $80
viebel / klipse
  • Open
  • 0
  • 0
  • Advanced
  • Clojure
  • $80
viebel / klipse
  • Started
  • 0
  • 2
  • Advanced
  • Clojure
  • $180
viebel / klipse
  • Open
  • 0
  • 0
  • Intermediate
  • Clojure
viebel / klipse
  • Started
  • 0
  • 2
  • Intermediate
  • Clojure
  • $80

Get hired!

Sign up now and apply for roles at companies that interest you.

Engineers who find a new job through Golang Works average a 15% increase in salary.

Start with GithubStart with Stack OverflowStart with Email