When speaking to Customers and other SysAdmins, one of the “TopTen most frequent questions” is, “How to correctly size my Zimbra infrastructure?”.
As you can imagine, the answer is not simple, and sizing could have several aspects depending on what element we are talking about: Storage, Computational, Redundancy, etc..
Let’s start talking about “Storage Sizing.”
The Art of Sizing
Sizing an infrastructure can be tough, primarily if the rationals are not defined, or constraints are too strict.
Some people talk about The art of sizing — and I agree with them — but good news: as an Art, everyone can improve himself and use some practical tips to achieve his goal.
It’s essential to understand and measure the environment in which our infrastructure will evolve.
Business Email context is unique. Moreover, emails are required for almost every operation that deals with any type of online service, at least to register, access and recover credentials.
The number of email users is still growing.
73% of EU individuals aged 16 to 78 use internet to send / receive emails in 2018. Compare this to only 48% in 2007. — Eurostat “Individuals using the internet for sending/receiving e-mails” (2019)
72% of consumers say that email is their favored conduit of communication with companies they do business with. 61% say they like to receive promotional emails weekly and 28% want them even more frequently. — MarketingSherpa (2015)
86% of professionals prefer to use email when communicating for business purposes. (HubSpot, 2017)
Let’s consider the escalation of the last years. Radicati reports tath the daily number of Emails and Email Users in billions was:
2016 → 3.1 Users and 215.3 emails
2017 → 3.7 Users and 269.6 emails
2018 → 3.8 Users and 281.6 emails
2019 → 3.9 Users and 293.6 emails
The number of users and emails is constantly growing. And so space required in our infrastructure.
Scale-out can be tricky when we have to work with on-premises infrastructures that have limited resources or — worst case — a physical environment with physical storage.
The (apparently) easiest way
Maths can help with the brute-force approach: size the server to provide all the space according to the maximum user quota!
This approach can be useful if you are a small company but it doesn’t consider many variables and it is not applicable — for example — if you don’t want or cannot apply a quota.
Moreover, the more extensive the storage is, the higher the costs or the worst the performances.
Limiting user quota to a few GBs could have been an option some years ago but, nowadays, emails are more frequent, with larger attachments and user requests are so more challenging
The average size of an email is estimated at around 75KB, 120Kb talking about “business emails”, and the average sending rate is 75 emails/user/day.
This means that — as business user — you have to consider at least 10/15 MB/day for each user.
A different point of view
The “Quota” approach is not what we need.
Rather than “How much space do I need?” we should ask “How much space do I need for the next 6/12 months considering my user usage?”. Let’s make an example.
1000 users — 10GB Quota 1000 user * 10 GB ⇒ 10 TB
1000 users — 10MB/Day 1000 users * 10MB ⇒ 10GB/day * 365days ⇒ 3.5 TB/Year
Assuming my users are involved enough to send and receive 10Mb each one, approximately 15GB each day.
Even if they never delete any messages, reserve 5TB — 50% of the “Quota Approach” — is enough.
How much is the difference between 5 and 10 TeraByte? Not much if we consider pricing on market consumer.
But if we consider the $/GB and $/IOPS, for an SSD stack, raid arrays, redundancy and backups, the gap increase.
The question behind this scenario is another one: after 1 year, what can we do?
Maybe using Logical Volume should be enough for some installation.
It requires a little downtime, but it should be acceptable.
Another option should be using a storage layer that manages this complexity for us, but it means raising the overall costs.
We also have to consider that the user’s behavior could change rapidly: number of users, kind of communication, number of messages.
Let’s take another step further.
Hierarchical storage management
Hierarchical storage management (HSM) is the architectural approach to answer our initial question.
By definition, it is a data storage technique that automatically moves data between high-cost and low-cost storage media.
It is a concept that aims to separate the different tiers used by core application and data stored, separating data based on a “policy”. Typically, the policy is time-based, but it’s just the usual way, it is not a constraint.
This approach can be very flexible, and many storage vendors have integrated this concept to design “intelligent” or “smart” cached systems. Unfortunately, these appliances usually are not so cheap.
What if the application itself uses the HSM technique to store the data? What does it mean for a Zimbra Infrastructure?
Let’s take another example to clarify this point. We can split the user emails into 3 groups
- the emails send and received in the last few days
- the emails “flagged” or “tagged” or “unread”, that user will open soon or he opens frequently
- all the other emails
Zimbra structures data into 3 system:
- MariaDB that stores all the metadata related to a message.
- Lucene, that store all the information needed for the full-text search
- Filesystem, that store the “blob” — the raw data of the object
Metadata are used to render folders and their content, to execute queries and searches, to manage all the interaction with the users. Metadata keeps a few KBs, independently from the message size, and statistically, they use about 10/15% of the entire store size.
Instead, blobs are used only when the user wants to access the message preview or the attachments.
For the end-user there is no evidence that blobs are on a separate layer: the users’ latency and bandwidth are several orders of magnitude smaller than the storage ones. Moreover, the availability of the secondary store doesn’t impact service availability itself: if the secondary store is unavailable, the user receives a temporary error for the missing blob.
Thinking about HSM makes us review our initial question. Using HSM, we can
- keep divided mount-point for database, index, and blobs
- use different mount-point according to different policies
- use different storages, with different performance and costs
- compress data on storage, optimizing storage usage, space and I/O throughput
- keep blobs on a device that can be taken offline without impact on provided mail service