Scala and the '@transient lazy val' pattern

Given a you have a Scala object holding some data that you want to store or send around by serializing the object. It turns out that the object is also capable of performing some complex logic and it stores the results of these calculations in its field values. While it might be efficient to store the calculation results in memory for later lookup, it might be a bad idea to also serialize these fields as this will consume space you do not want to sacrifice or as this will increase network throughput (e.g., in Spark) resulting in more time being consumed than it requires to recalculate the fields. Now one could write a custom serializer for this task, but let us be honest: thats not really the thing we want to spent our time on.

This is where the @transient lazy val pattern comes in. In Scala lazy val denotes a field that will only be calculated once it is accessed for the first time and is then stored for future reference. With @transient on the other hand one can denote a field that shall not be serialized.

Putting this together we can now write our "recalculate rather than serialize logic":

class Foo(val bar: String) extends Serializable {
  @transient lazy val baz: String = {
    println("Calculate baz")
    bar + " world"
  }
}

Here the baz field of Foo will be calculated at most once per deserialization:

// Create object of class Foo
val foo = new Foo("Hello")

// baz field is only calculated once
foo.baz
foo.baz

// Serialize foo
import java.io._
val bo = new ByteArrayOutputStream
val o = new ObjectOutputStream(bo)
o.writeObject(foo)
val bytes = bo.toByteArray

// Deserialize foo
val bi = new ByteArrayInputStream(bytes)
val i = new ObjectInputStream(bi)
val foo2 = i.readObject.asInstanceOf[Foo]

// baz field is recalculated once and only once
foo2.baz
foo2.baz

This pattern is also a way to handle fields that contain object which simply are not serializable but which you do not want to recreate to often (like IO streams). As long as your class carries around the necessary information for reconstructing them (like filenames for streams) it can still be made serializable.


comments powered by Disqus