Tips for Counter Caches
When working on database powered application, often performance issues come from:
N+1 queries. Where we have a list of records, and we make extra queries for each record.
Performing expensive count queries. Often, those queries need to do the whole table reads.
A good illustrative example is this list of posts on a site like Product Hunt.
How many queries do we need to make to implement this? 🤔
1 query for a list of posts (by some conditions)
then for every post following queries are needed
1. comments count - "SELECT COUNT(*) FROM comments WHERE post_id = $1"
2. likes count "SELECT COUNT(*) FROM likes WHERE post_id = $1"
3. list of topics - "SELECT ..."
4. has the current user liked "SELECT ...
This makes it 4 queries for every post. If we have 100 posts, this is 401 queries. 🫣 😬
In this post, I'm going to focus on two count queries - "comments count" and "likes count". How to handle "list of topics" and "has current user liked" is something I'm going to cover in the future.
Counter Cache
The way to solve these queries is simple:
Don't do those queries, store the "count" value in the "post" table.
This technique is called "counter cache".
It is quite popular in the Ruby on Rails world. It is even built in to Active Record.
The way the counter cache works is as follows
"[parent]" table table have column named "[child]_count"
when rows from "[child]" tables
are inserted, the "[parent].[child]_count" column is incremented
are deleted, the "[parent].[child]_count" column is decremented
The following code in Ruby on Rails will automatically handles this:
class Comment < ActiveRecord::Base
belongs_to :post, counter_cache: true
end
Of course, we can implement this using pure SQL 👉 This is how can be done in PostgreSQL.
Surprisingly, most Node.js ORM libraries don't have this built-in, and even Elixir's Ecto doesn't have it 🤯
Drawbacks
Counter caches are great. So, I should have one for every association. 🤨
Of course not. As with every technique, it has its tradeoffs.
1) Wrong counters
As with every type of cache, keeping it up-to-date and correct is important.
2) Making write operations slow
Counter cache causes additional workload on every write, update, or delete operation, thereby slowing down these operations.
For instance, when millions of likes are created on a single post at the same time, the post-likes counter will be updated excessively, which can result in database lock issues with post rows and cause slowdowns in vote insert operations. and slows those operations.
One solution is to use the Slotted Counter Pattern.
Another solution is moving counters from the database and in something like Redis.
3) Increase storage costs
Storage is cheap, yes, but being wasteful is expensive.
Counters are often needed for the most important tables in the application. Those are also the most inner-connected tables. So, if counter cache columns are hastily added, those already big tables get many extra columns.
Every extra column adds weight when there are tables with billions of records.
4) Tooling for them outside of Active Record needs to be improved.
Outside of Ruby on Rails, I mostly work with Node.js, and the ORM database handling situation isn't pretty.
Tips for using counter caches with Active Record
This section is about Active Record because this is where I have the most experience and because it has the best tooling. 🙈
There are a couple of different ways to have a counter cache; when I add a counter cache, I start from the following list one by one and pick the first that works. 😎
1/ The built-in Active Record counter cache option for the `belongs_to` association. This is the most simple and fastest option, but it has limited flexibility.
class Post < ApplicationRecord
belongs :category, counter_cache: true
# we explicitly the column name
belongs :category, counter_cache: :posts_count
end
2/ The counter_culture gem with its wide range of options.
It supports switching between columns, increment/decrement with different values and etc.
class Post < ApplicationRecord
belongs :category
# track only the published posts count in category
counter_culture(
:category,
column_name: -> { :published_post_count if _1.published? }
)
# track the sum of all words in articles in the category
counter_culture(
:category,
column_name: :total_words_count,
delta_magnitude: -> { _1.words_count }
)
end
3/ Custom counter cache code.
We need custom code to handle the counters when logic is so specific. For those, I use callbacks because this logic should be encapsulated.
Example: Having category -> post -> author relationship, where we aim for each category to have a "unique_authors_count", which hows the unique number of authors who have posts in this category. Implemented something like this following
class Post < ActiveRecord::Base
belongs_to :post
belongs_to :author
after_save :update_category_unique_authors_count
after_destroy :update_category_unique_authors_count
private
def update_category_unique_authors_count
category.update_columns(
unique_authors_count: category.posts.count('DISTINCT(user_id)'),
updated_at: Time.current
)
end
end
4/ Use slotted_counters where the database table with counter cache is under heavy write load.
The idea is to split the counter column in to multiple tables. To prevent deadlocks. I used this very rarely.
class Post < ActiveRecord::Base
has_slotted_counter :votes
end
Showcase: Not only for counts
In Angry Building, I have a lot of tables with numbers like like the following.
Imagine doing an extra query for every number in this screenshot. This will be too much.
Our building tables have the following columns
apartments_count - number of apartments
balance_amount - counter of how much money is in the building
monthly_fee_amount - how much fees in total pay all apartments
obligations_amount - are how much fees are not paid
debtors_count - how many apartments have obligations
open_issues_count - number opened issues in the issue tracker, not total
For all those counts, I use the counter_culture gem
class Issue < ActiveRecord::Base
belongs_to :building
# only update "open_issues_count" when issue is opened
counter_culture(
:building,
column_name: -> { :open_issues_count if _1.opened?
}
end
class MoneyTranfer < ActiveRecord::Base
belongs_to :building
# counter cache is sum of amount of the money transfer
counter_culture(
:building,
column_name: :balance_amount,
delta_magnitude: -> { _1.amount }
)
end
class Apartment < ActiveRecord::Base
# use default rails counter for "apartments_count"
belongs_to :building, counter_cache: true
# if apartment has obligations update counter
counter_culture(
:building,
column_name: -> { :debtors_count if _1.debtor? }
)
# sum all monthly fees for all apartments
counter_culture(
:building,
column_name: :monthly_fee_amount,
delta_magnitude: -> { _1.monthly_fee_amount }
)
end
It is useful to have counts at the record level because it allows me to display them in the app UI without having to access the database. For example, I can decide whether or not to show a new payment button based on an apartment's unpaid fees.
Conclusion
Every time I create new tables, I think about how the records from those tables will be used, whether I need counter caches, and so on. I reach for counter cache columns very often.
If you have any questions or comments, you can ping me on Threads, LinkedIn, Mastodon, Twitter or just leave a comment below ðŸ“