jose.omg.lol

Turn your JSON column data into rich objects

Sometimes, your domain model will require ancillary data that isn't transcendent or structured enough to be a top-level concern of your application. A common solution for storing such data is using PostgreSQL JSON and JSONB columns. One annoyance I've had dealing with these columns, though, is they spread lots of knowledge across your application.

For this post, let's work with the following domain model:

Say you're creating a website builder. You'll have a Website model owned by a User. Users can add arbitrary, unstructured styles to their websites. Let's not worry too much about what a style is. We could be referring to CSS styles or some other construct. What does matter is that different styles don't share the same structure; But they do carry more weight than simple key-value pairs (e.g., a style can be valid or invalid). We'll persist these styles in a JSONB column on our PostgreSQL database. Websites will also have a string as a title.

To summarize, here's a portion of our schema:

ActiveRecord::Schema[7.0].define(version: XXXX_XX_XX_XXXXXX) do
  # ...
  create_table "websites" do |t|
    # ...
    t.uuid "owner_id", null: false
    t.string "title", null: false
    t.jsonb "styles", default: {}, null: false
  end
end

Currently, anything that calls #styles on a Website instance needs to know that the method returns a hash. This understanding differs from knowing that #owner returns a User instance or that #title returns a string because hashes stand in a weird spot between these two. They're richer than strings and other "primitives" but less robust than objects. Let me elaborate.

When you receive a hash, there's an implicit assumption you'll know what to do with its contents. You'll have to know what it holds, how to iterate it, how to access nested fields, etc. This assumption isn't problematic when hashes are shallow and narrow. But hashes can grow to a point where callers need to know too much about them. Once that happens, we'll likely want to attach behavior to them and create POROs that take these hashes as arguments. Eventually, we might refactor away from using hashes at all and make their contents core models backed by actual database tables.

It's sometimes advisable to get ahead of these inconveniences by having methods such as #styles return objects instead of hashes from the very beginning. Two scenarios can happen at that point: if your data gets complex, you'll be able to migrate away from JSONB with ease, and you'll have a place to put behavior until then. If it never does, you'll still reap the benefits of object-oriented design.

Here's how you might go about turning your JSON column data into rich objects.

Working with plain-old JSON

Let's start with the schema for our Website model; defining our styles column as follows:

ActiveRecord::Schema[7.0].define(version: XXXX_XX_XX_XXXXXX) do
  # ...
  create_table "websites" do |t|
    # ...
    t.uuid "owner_id", null: false
    t.string "title", null: false
    t.jsonb "styles", default: {}, null: false
  end
end

Doing this will be enough to read and write JSON values. Now, let's use Rails's ActiveRecord::Attributes API to hook into the process which coerces the JSON data stored in the database to Ruby hashes.

First, we'll need to create a class that implements #deserialize (the method called by ActiveRecord when deserializing our JSON data). In practice, it's recommended you inherit from the existing type class that's closest in behavior to your new custom type. This way, you won't miss out on new behavior implemented in future versions of Rails. You can choose from a number of classes defined in ActiveRecord::Type and ActiveModel::Type, or in ActiveRecord::ConnectionAdapters, if it's a database-specific construct.

In this case, we're going to inherit from ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Jsonb. Which, in turn, inherits from ActiveRecord::Type::Json:

class Websites::Styles < ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Jsonb
  Base = Struct.new(:name, :value, keyword_init: true) do
    def valid?
      # do work
    end
  end

  def deserialize(value)
    hash = super
    Base.new(**hash)
  end
end

Ignore the Base = Struct.new block for now and look at our #deserialize method. It's leveraging the original implementation of ActiveRecord::Type::Json to turn JSON data into a hash. Then, we use that hash to instantiate a new class.

Now, let's talk about what we're instantiating. It could be anything! In this case, it's a Struct. Please don't pay too much attention to the details of my Struct. It's just a placeholder implementation.

I do encourage using Structs here. Remember, Rails is still using its underlying type system behind the scenes. And that includes serializing your objects before saving them.

Structs behave similarly to hashes in many ways. One of those ways is ActiveSupport::JSON#encode can take either of them and turn them into a JSON string:

ActiveSupport::JSON.encode(Struct.new(:name).new("Jose"))
# => "{\"name\":\"Jose\"}"

Rails's internal type system will still be expecting to deal with hashes. So maybe try not to stray too far away from them. I've been using this in production for months now and haven't run into any issues. But my implementations are purposely simple. I'd advise you to follow the same principle.

Finally, we just have to instruct Website to use our new class when deserializing styles:

class Website < ApplicationRecord
  attribute :styles, Websites::Styles.new, default: {}
end

You can also register your type in an initializer to get a nicer DSL.

Working with JSON arrays

Our implementation needs to change a tiny bit when dealing with JSON arrays. Say our schema looks like this now:

ActiveRecord::Schema[7.0].define(version: XXXX_XX_XX_XXXXXX) do
  # ...
  create_table "websites" do |t|
    # ...
    t.uuid "owner_id", null: false
    t.string "title", null: false
    t.jsonb "styles", default: [], null: false
  end
end

Our "type" class would be mostly the same, except we'd have to iterate through the array and instantiate a class for each JSON object:

class Websites::Styles < ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Jsonb
  # Base = ... omitted for brevity

  def deserialize(value)
    hash_array = super
    hash_array.map { Base.new(**_1.symbolize_keys) }
  end
end

Notice how calling super will now return an array of hashes instead of a hash (assuming you saved your data as an array in the first place, of course).

Finally, we tell our Website model to use this class when deserializing styles and ask it to return an empty array if the value is ever nil.

class Website < ApplicationRecord
  attribute :styles, Website::Style.new, default: []
end

Working with polymorphism

Here's one last trick. You can instantiate different classes if you give the JSON objects some knowledge about your domain model. I'm sure there are better ways to do this, but lately I've been hooking into ActiveRecord callbacks like so:

class Website < ApplicationRecord
  before_validation :add_type_to_styles

  private

  def add_type_to_styles
    self[:styles] = self[:styles][:type] = something_that_determines_type
  end
end

or, if it's an array:

class Website < ApplicationRecord
  before_validation :add_type_to_styles

  private

  def add_type_to_styles
    self[:styles] = self[:styles].map do |style|
      style[:type] = something_that_determines_type
      style
    end
  end
end

You can avoid using callbacks if you pass the type along whenever you're building each object, assuming it's available to you.

Then, we can instantiate different objects depending on what they contain as their type, like so:

class Websites::Styles < ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Jsonb
  # Base = ... omitted for brevity

  def deserialize(value)
    hash = super
    klass = "Styles::#{hash["type"]}".safe_constantize || Base
    klass.new(**hash)
  end
end

Conclusion

Returning hashes from JSON column accessors is acceptable in many cases. But, sometimes, you know your JSON data is likely to become deeper and wider with time.

In those cases, I'd invite you to consider returning objects from your JSON column accessors instead of hashes. That way, if your data ever gets more complex, you'll be able to migrate away from JSON with ease, and you'll have an appropriate place to put behavior until then.

I hope this post was helpful to you. 'Till next time!

Get notified about new posts