To improve this blog, I wanted to suggest further reading by showing related articles at the end of each post. A good solution would be to use embeddings (which are representations of the text of an article as numerical vectors) to find the closest neighbors. However, this blog uses SQLite and therefore can't store vectors natively.
Considering this feature is small and non-essential, I decided to store it on Supabase, which provides managed PostgreSQL databases with the pgvector addon. Here's how I set that up.
First, create your Supabase account. Their free plan will be more than enough. You'll get credentials to help you set up your database.yml file:
Considering this feature is small and non-essential, I decided to store it on Supabase, which provides managed PostgreSQL databases with the pgvector addon. Here's how I set that up.
First, create your Supabase account. Their free plan will be more than enough. You'll get credentials to help you set up your database.yml file:
default: &default adapter: sqlite3 pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %> timeout: 5000 development: primary: <<: *default database: storage/development.sqlite3 queue: <<: *default database: storage/development_queue.sqlite3 migrations_paths: db/queue_migrate vector: <<: *default adapter: postgresql migrations_paths: db/vector_migrate database: postgres username: <%= Rails.application.credentials.dig(:supabase, :username) %> password: <%= Rails.application.credentials.dig(:supabase, :password) %> host: aws-0-eu-west-3.pooler.supabase.com port: 6543 migrations_paths: db/vector_migrate prepared_statements: false lang-yaml
Rails handles multiple databases effectively and manages migrations just as it does with your primary database. In this case, we define the migration path as db/vector_migrate. We can now enable the pgvector extension and add the table that will store our embeddings:
class CreatePostEmbeddings < ActiveRecord::Migration[8.0] def change enable_extension "vector" create_table :post_embeddings do |t| t.bigint :blog_id, null: false t.bigint :post_id, null: false t.column :embedding, :vector t.timestamps end add_index :post_embeddings, [:blog_id, :post_id], unique: true end end lang-ruby
You can now run the migration: rails db:migrate and create the associated model:
class PostEmbedding < VectorRecord belongs_to :post belongs_to :blog has_neighbors :embedding endlang-ruby
As you can see in the model, we'll be using the neighbor gem to add nearest neighbor search to ActiveRecord. In particular, the gem defines nearest_neighbors method that we'll use to find related articles:
Post.first.post_embeddings.nearest_neighbors(:embedding, distance: "euclidean").where(blog: blog).first(5) lang-ruby
Of course, we'll need to generate the embeddings first. A simple POST request to the OpenAI API will get us our vector. Right now, the text-embedding-3-small model is both very inexpensive and provides decent results, so we'll use that. In any case, it's easy enough to regenerate the embeddings later if we need to.
def insert_embedding embedding = generate_openai_embedding PostEmbedding.find_or_create_by!(post: self, blog: blog).update!(embedding: embedding) end private def generate_openai_embedding response = HTTParty.post( "https://api.openai.com/v1/embeddings", headers: { "Authorization" => "Bearer #{Rails.application.credentials.openai_api_key}", "Content-Type" => "application/json" }, body: { model: "text-embedding-3-small", input: title.to_s + "\n" + content&.to_plain_text.to_s }.to_json ) response["data"][0]["embedding"] end lang-ruby
We're ready to display related articles. Since PostEmbedding is a standard Rails model, there's nothing special to do here. However, considering that we're connecting to a remote database, the latency is going to be much higher than usual.
The solution I implemented is to move the related posts section to a turbo frame that gets eager-loaded. We then add a cache to the HTML response:
class RelatedController < BlogController def show @post = Post.find(params[:post_id]) expires_in 1.day, public: true render "posts/related", layout: false, cached: true end end lang-ruby
And that's it! The page isn't slowed down, cost is negligible, and we can use vectors even with vanilla SQLite.
💬 Comments