Once you’re spinning up your Amazon OpenSearch Service area, it’s good to determine the storage, occasion sorts, and occasion rely; resolve the sharding methods and whether or not to make use of a cluster supervisor; and allow zone consciousness. Typically, we think about storage as a tenet for figuring out occasion rely, however not different parameters. On this put up, we provide some suggestions based mostly on T-shirt sizing for log analytics workloads.
Log analytics and streaming workload traits
Once you use OpenSearch Service in your streaming workloads, you ship knowledge from a number of sources into OpenSearch Service. OpenSearch Service indexes your knowledge in an index that you just outline.
Log knowledge naturally follows a time collection sample, and due to this fact a time-based indexing technique (day by day or weekly indexes) is beneficial. For environment friendly administration of log knowledge, you should implement time-based index patterns and set retention durations. You additional outline time slicing and a retention interval for the information to handle its lifecycle in your area.
For illustration, think about that you’ve a knowledge supply producing a steady stream of log knowledge, and also you’ve configured a day by day rolling index and set a retention interval of three days. Because the logs arrive, OpenSearch Service creates an index per day with names like stream1_2025.05.21
, stream1_2025.05.22
, and so forth. The prefix stream1_*
is what we name an index sample, a naming conference that helps group-related indexes.
The next diagram reveals three major shards for every day by day index. These shards are deployed throughout three OpenSearch Service knowledge cases, with one duplicate for every major shard. (For simplicity, the diagram doesn’t present that major and duplicate shards are all the time positioned on totally different cases for fault tolerance.)
When OpenSearch Service processes new log entries, they’re despatched to all related major shards and their replicas within the energetic index, which on this instance is just as we speak’s index as a result of day by day index configuration.
There are a number of necessary traits of how OpenSearch Service processes your new entries:
- Whole shard rely – Every index sample can have a D * P * (1 + R) whole shards, the place D represents retention in days, P represents major shards, and R is the variety of replicas. These shards are distributed throughout your knowledge nodes.
- Lively index – Time slicing signifies that new log entries are solely written to as we speak’s index.
- Useful resource utilization – When sending a _bulk request with log entries, these are distributed throughout all shards within the energetic index. In our instance with three major shards and one duplicate per shard, that’s a complete of six shards processing new knowledge concurrently, requiring 6 vCPUs to effectively deal with a single
_bulk
request.
Equally, OpenSearch Service distributes queries throughout the shards for the indexes concerned. In the event you question this index sample throughout all 3 days, you’ll have interaction 9 shards, and want 9 vCPUs to course of the request.
This may get much more sophisticated whenever you add in additional knowledge streams and index patterns. For every further knowledge stream or index sample, you deploy shards for every of the day by day indexes and use vCPUs to course of requests in proportion to the shards deployed, as proven within the previous diagram. Once you make concurrent requests to multiple index, every shard for all of the indexes concerned should course of these requests.
Cluster capability
Because the variety of index patterns and concurrent requests will increase, you may rapidly overwhelm the cluster’s sources. OpenSearch Service consists of inside queues that buffer requests and mitigate this concurrency demand. You may monitor these queues utilizing the _cat/thread_pool
API, which reveals queue depths and helps you perceive when your cluster is approaching capability limits.
One other complicating dimension is that the time to course of your updates and queries is determined by the contents of the updates and queries. As requests are available, the queues are filling on the fee you’re sending them. They’re draining at a fee that’s ruled by the accessible vCPUs, the time they tackle every request, and the processing time for that request. You may interleave extra requests if these requests clear in a millisecond than in the event that they clear in a second. You need to use the _nodes/stats OpenSearch API to watch common load in your CPUs. For extra details about the question phases, consult with A question, or There and Again Once more on the OpenSearch weblog.
In the event you see the queue depths growing, you’re transferring right into a “warning” space, the place the cluster is dealing with load. However in case you proceed, you can begin to exceed the accessible queues and should scale so as to add extra CPUs. In the event you begin to see load growing, which is correlated with queue depth growing, you’re additionally in a “warning” space and may think about scaling.
Suggestions
For sizing a site, think about the next steps:
- Decide the storage required – Whole storage = (day by day supply knowledge in bytes × 1.45) × (number_of_replicas + 1) × variety of days retained. This accounts for the extra 45% overhead on day by day supply knowledge, damaged down as follows:
- 10% for bigger index measurement than supply knowledge.
- 5% for working system overhead (reserved by Linux for system restoration and disk defragmentation safety).
- 20% for OpenSearch reserved house per occasion (phase merges, logs, and inside operations).
- 10% for extra storage buffer (minimizes influence of node failure and Availability Zone outages).
- Outline the shard rely – Approximate variety of major shards = storage measurement required per index / desired shard measurement. Spherical as much as the closest a number of of your knowledge node rely to take care of even distribution. For extra detailed steering on shard sizing and distribution methods, consult with “Amazon OpenSearch Service 101: What number of shards do I want” For log analytics workloads, think about the next:
- Really helpful shard measurement: 30–50 GB
- Optimum goal: 50 GB per shard
- Calculate CPU necessities – Really helpful ratio is 1.25 vCPU:1 Shard for decrease knowledge volumes. Greater ratios are beneficial for bigger volumes. Goal utilization is 60% common, 80% most.
- Select the correct occasion sort – Think about the next based mostly in your nodes:
Let’s take a look at an instance for area sizing. The preliminary necessities are as follows:
- Day by day log quantity: 3 TB
- Retention interval: 3 months (90 days)
- Duplicate rely: 1
We make the next occasion calculation.
The next desk recommends cases, quantity of supply knowledge, storage wanted for 7 days of retention, and energetic shards based mostly on the previous pointers.
T-Shirt Dimension | Knowledge (Per Day) | Storage Wanted (with 7 days Retention) | Lively Shards | Knowledge Nodes | Major Nodes |
XSmall | 10 GB | 175 GB | 2 @ 50 GB | 3 * r7g.giant. search | 3 * m7g.giant. search |
Small | 100 GB | 1.75 TB | 6 @ 50 GB | 3 * r7g.xlarge. search | 3 * m7g.giant. search |
Medium | 500 GB | 8.75 TB | 30 @ 50 GB | 6 * r7g.2xlarge.search | 3 * m7g.giant. search |
Giant | 1 TB | 17.5 TB | 60 @ 50 GB | 6 * r7g.4xlarge.search | 3 * m7g.giant. search |
XLarge | 10 TB | 175 TB | 600 @ 50 GB | 30 * i4g.8xlarge | 3 * m7g.2xlarge.search |
XXL | 80 TB | 1.4 PB | 2400 @ 50 GB | 87 * I4g.16xlarge | 3 * m7g.4xlarge.search |
As with all sizing suggestions, these pointers symbolize a place to begin and are based mostly on assumptions. Your workload will differ, and so your precise wants will differ from these suggestions. Make sure that to deploy, monitor, and alter your configuration as wanted.
For T-shirt sizing the workloads, an extra-small use case encompasses 10 GB or much less of information per day from a single knowledge stream to a single index sample. A small use case falls between 10–100 GB per day of information, a medium use case between 100–500 GB of information, and so forth. Default occasion rely per area is 80 for many of the occasion household. Seek advice from the “Amazon OpenSearch Service quotas “ for particulars.
Moreover, think about the next finest practices:
Conclusion
This put up supplied complete pointers for sizing your OpenSearch Service area for log analytic workloads, protecting a number of crucial points. These suggestions function a stable start line, however every workload has distinctive traits. For optimum efficiency, think about implementing further optimizations like knowledge tiering and storage tiers. Consider cost-saving choices reminiscent of reserved cases, and scale your deployment based mostly on precise efficiency metrics and queue depths.By following these pointers and actively monitoring your deployment, you may construct a well-performing OpenSearch Service area that meets your log analytics wants whereas sustaining effectivity and cost-effectiveness.