Integral Review

Welcome to my personal blog! I use it to share what I'm currently learning or thinking about, usually on topics related to technology, business, and health.

Using embeddings to display related articles with Supabase in Ruby on Rails

To improve this blog, I wanted to suggest further reading by showing related articles at the end of each post. A good solution would be to use embeddings (which are representations of the text of an article as numerical vectors) to find the closest neighbors. However, this blog uses SQLite and therefore can't store vectors natively.

Considering this feature is small and non-essential, I decided to store it on Supabase, which provides managed PostgreSQL databases with the pgvector addon. Here's how I set that up.

First, create your Supabase account. Their free plan will be more than enough. You'll get credentials to help you set up your database.yml file:

default: &default
  adapter: sqlite3
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  primary:
    <<: *default
    database: storage/development.sqlite3
  queue:
    <<: *default
    database: storage/development_queue.sqlite3
    migrations_paths: db/queue_migrate
  vector:
    <<: *default
    adapter: postgresql
    migrations_paths: db/vector_migrate
    database: postgres
    username: <%= Rails.application.credentials.dig(:supabase, :username) %>
    password: <%= Rails.application.credentials.dig(:supabase, :password) %>
    host: aws-0-eu-west-3.pooler.supabase.com
    port: 6543
    migrations_paths: db/vector_migrate
    prepared_statements: false
lang-yaml

Rails handles multiple databases effectively and manages migrations just as it does with your primary database. In this case, we define the migration path as db/vector_migrate. We can now enable the pgvector extension and add the table that will store our embeddings:

class CreatePostEmbeddings < ActiveRecord::Migration[8.0]
  def change
    enable_extension "vector"

    create_table :post_embeddings do |t|
      t.bigint :blog_id, null: false
      t.bigint :post_id, null: false

      t.column :embedding, :vector

      t.timestamps
    end

    add_index :post_embeddings, [:blog_id, :post_id], unique: true
  end
end
lang-ruby

You can now run the migration: rails db:migrate and create the associated model:

class PostEmbedding < VectorRecord
  belongs_to :post
  belongs_to :blog

  has_neighbors :embedding
endlang-ruby

As you can see in the model, we'll be using the neighbor gem to add nearest neighbor search to ActiveRecord. In particular, the gem defines nearest_neighbors method that we'll use to find related articles:

Post.first.post_embeddings.nearest_neighbors(:embedding, distance: "euclidean").where(blog: blog).first(5)
lang-ruby

Of course, we'll need to generate the embeddings first. A simple POST request to the OpenAI API will get us our vector. Right now, the text-embedding-3-small model is both very inexpensive and provides decent results, so we'll use that. In any case, it's easy enough to regenerate the embeddings later if we need to.

def insert_embedding
  embedding = generate_openai_embedding
  PostEmbedding.find_or_create_by!(post: self, blog: blog).update!(embedding: embedding)
end

private

def generate_openai_embedding
  response = HTTParty.post(
    "https://api.openai.com/v1/embeddings",
    headers: {
      "Authorization" => "Bearer #{Rails.application.credentials.openai_api_key}",
      "Content-Type" => "application/json"
    },
    body: {
      model: "text-embedding-3-small",
      input: title.to_s + "\n" + content&.to_plain_text.to_s
    }.to_json
  )

  response["data"][0]["embedding"]
end
lang-ruby

We're ready to display related articles. Since PostEmbedding is a standard Rails model, there's nothing special to do here. However, considering that we're connecting to a remote database, the latency is going to be much higher than usual.

The solution I implemented is to move the related posts section to a turbo frame that gets eager-loaded. We then add a cache to the HTML response:

class RelatedController < BlogController
  def show
    @post = Post.find(params[:post_id])

    expires_in 1.day, public: true

    render "posts/related", layout: false, cached: true
  end
end
lang-ruby

And that's it! The page isn't slowed down, cost is negligible, and we can use vectors even with vanilla SQLite. 

#ai #development #ruby-on-rails #tutorial

💬 Comments
Subscribe to future posts