We're here to share our key learnings from one year of running our own completely vertical Farcaster infra that powers far.quest & other Wield apps - with no downtime when reaching 80K+ DAU!
We're here to cover core, digestible ideas that apply more to Farcaster than general infra - you should already believe in e.g. Pagerduty, alerting, monitoring, etc! If you want to learn about general infra/SRE, I recommend Google's SRE Books.
We went with MongoDB for it's "pay to scale" enterprise infrastructure without downtime - you can easily shard all the way to 100TB+! Database choice is vital given one of Farcaster's goals is to hit 1B DAU - so a 100TB DB isn't a 0% chance. Unfortunately our choice was contrarian as we don't have database compatibility with a lot of Farcaster tools such as shuttle, which we had to rewrite ourselves. Luckily LLMs & Cursor can do a lot of the heavy lifting 😀
If you want the path of least resistance with complete Farcaster tooling support, go with Postgres - we can't speak on behalf of others infra, but most clients on Farcaster seem fine for now!
We use memcached, but you can use any other caching solution like Redis. Cache anything and everything that can be cached, but make sure to invalidate your cache as new data appears! Caching is critical to social networking given data that spans many fields and queries, I've attached Zuckerberg's famous memcache talk.
I highly recommend over-provisioning your caching cluster to prevent network issues during a burst of requests - it's your first line of defense before hitting expensive database queries!
Farcaster Hubs are fickle beasts to tame into reliability - you might think e.g. 10% CPU usage means you have enough vCPU - but we've found overprovisioning to be the way to go given bursts of inter-network data such as prune events (shoutout Rish!).
Hubs can easily become unhealthy (and even crash the operating system!), and the last thing you want is hub downtime - so make sure to try scaling vertically before horizontally like you would with databases.
The last thing you want is manual Notion configurations with tribal knowledge - as of recently our complete production infra is defined in Terraform, making it easy to add/remove containers, servers, hubs, load balancers, alerting, and more.
Another advantage of text-based infra is allowing LLMs like o1-pro to help you scale - even an AWS newbie with some grit can now learn to scale!
Pre-2020, the idea of running our own custom infrastructure as a small team would seem like an impossibility - we'd expect to immediately hire infra engineers for this. With Cursor and ChatGPT, it's amazing how far we've gotten on our own while still being able to focus on product. So that said, start using Cursor and ChatGPT o1-pro today if you haven't already!
We hope these key learnings are helpful, and I hope to see more Farcaster companies sharing their tribal knowledge!
~ jcdenton.cast & nico.cast
Over 21k subscribers