If you ever wanted super minimal analytics for your website that you could host yourself, I might be able to help. That's what I was looking for as well. There are some options out there, but they all seemed overkill for my requirements: Knowing the number of unique pageviews for each page, what my main sources are, or what countries drive the most traffic. Considering how simple it was, I thought I would spend a few hours building it. The result is what I call Potato Analytics 🥔.
What makes Potato unique: Super low resource requirements
Typically, analytics would track every single page view. Traditional analytics track every single page view, which means processing and storing a lot of data for each action your users take on the website. When accessing historical data, you're processing all of this data again. There are some powerful databases for this purpose, but the downside of all this data is that it's more likely to take effort maintaining and analyzing.
Considering I just wanted the number of unique pageviews for each path, I used another approach: HyperLogLog (HLL). HLL is everything I love about math; it's a data structure that estimates the cardinality of a set. This means that we do not need to store every request to still be able to estimate its count. That's a game-changer as the storage requirements become a lot more reasonable, especially if your website is very popular. This will save us from having to do any kind of sampling; once written to disk, the data won't change. HLL also allows counting across days with unions so you can get the unique number of visitors over a month.
Thankfully, there's an HLL extension for PostgreSQL (and here's the docker image with it installed). Ideally, I would have preferred using SQLite to make it even easier to work with, but I couldn't find an appropriate extension. In any case, PostgreSQL is easy to run, fast, and stable, so it will be fine for our purpose.
Because we only care about unique visitors, we do not need to send a request to our analytics for every single page view. So we ask the client to keep track of all the pages visited recently and only send tracking calls for new URLs. This dramatically reduces the number of requests sent, as you would expect.
Together, this means we won't be getting a lot of requests, don't need to reprocess the data once it has been written to the database, and finally, the storage requirements will stay very modest.
Privacy benefits for users
Finally, because the data is aggregated and hashed, we do not track users individually, which is better for them and means fewer cookie banners to set up for us. The downside of this approach is that we do not get to perform advanced queries, but I find this need rarely arises for personal projects.
Setting it up for yourself
Setting up Potato Analytics is easy. I'm personally using Kamal with this configuration, but you could also go with Render or many other hosting services. Check out the README to get more insights.
💬 Comments