At the moment, AWS introduced that Amazon Kinesis Knowledge Streams now helps report sizes as much as 10MiB – a tenfold enhance from the earlier restrict. With this launch, now you can publish intermittent bigger information payloads in your information streams whereas persevering with to make use of current Kinesis Knowledge Streams APIs in your purposes with out further effort. This launch is accompanied by a 2x enhance within the most PutRecords request dimension from 5MiB to 10MiB, simplifying information pipelines and lowering operational overhead for IoT analytics, change information seize, and generative AI workloads.
On this publish, we discover Amazon Kinesis Knowledge Streams massive report assist, together with key use instances, configuration of most report sizes, throttling issues, and greatest practices for optimum efficiency.
Actual world use instances
As information volumes develop and use instances evolve, we’ve seen growing demand for supporting bigger report sizes in streaming workloads. Beforehand, while you wanted to course of information bigger than 1MiB, you had two choices:
- Cut up massive information into a number of smaller information in producer purposes and reassemble them in shopper purposes
- Retailer massive information in Amazon Easy Storage Service (Amazon S3) and ship solely metadata by Kinesis Knowledge Streams
Each these approaches are helpful, however they add complexity to information pipelines, requiring further code, growing operational overhead, and complicating error dealing with and debugging, notably when prospects must stream massive information intermittently.
This enhancement improves the convenience of use and reduces operational overhead for patrons dealing with intermittent information payloads throughout numerous industries and use instances. Within the IoT analytics area, linked automobiles and industrial tools are producing growing volumes of sensor telemetry information, with the dimensions of particular person telemetry information often exceeding the earlier 1MiB restrict in Kinesis. This required prospects to implement advanced workarounds, reminiscent of splitting massive information into a number of smaller ones or storing the massive information individually and solely sending metadata by Kinesis. Equally, in database change information seize (CDC) pipelines, massive transaction information could be produced, particularly throughout bulk operations or schema adjustments. Within the machine studying and generative AI house, workflows are more and more requiring the ingestion of bigger payloads to assist richer function units and multi-modal information sorts like audio and pictures. The elevated Kinesis report dimension restrict from 1MiB to 10MiB limits the necessity for a lot of these advanced workarounds, simplifying information pipelines and lowering operational overhead for patrons in IoT, CDC, and superior analytics use instances. Prospects can now extra simply ingest and course of these intermittent massive information information utilizing the identical acquainted Kinesis APIs.
The way it works
To begin processing bigger information:
- Replace your stream’s most report dimension restrict (
maxRecordSize) by the AWS Console, AWS CLI, or AWS SDKs. - Proceed utilizing the identical
PutRecordandPutRecordsAPIs for producers. - Proceed utilizing the identical
GetRecordsorSubscribeToShardAPIs for shoppers.
Your stream can be in Updating standing for a couple of seconds earlier than being able to ingest bigger information.
Getting began
To begin processing bigger information with Kinesis Knowledge Streams, you may replace the utmost report dimension by utilizing the AWS Administration Console, CLI or SDK.
On the AWS Administration Console,
- Navigate to the Kinesis Knowledge Streams console.
- Select your stream and choose the Configuration tab.
- Select Edit (subsequent to Most report dimension).
- Set your required most report dimension (as much as 10MiB).
- Save your adjustments.
Be aware: This setting solely adjusts the utmost report dimension for this Kinesis information stream. Earlier than growing this restrict, confirm that every one downstream purposes can deal with bigger information.
Most typical shoppers reminiscent of Kinesis Shopper Library (beginning with model 2.x), Amazon Knowledge Firehose supply to Amazon S3 and AWS Lambda assist processing information bigger than 1 MiB. To study extra, discuss with the Amazon Kinesis Knowledge Streams documentation for giant information.
You may as well replace this setting utilizing the AWS CLI:
Or utilizing the AWS SDK:
Throttling and greatest practices for optimum efficiency
Particular person shard throughput limits of 1MiB/s for writes and 2MiB/s for reads stay unchanged with assist for bigger report sizes. To work with massive information, let’s perceive how throttling works. In a stream, every shard has a throughput capability of 1 MiB per second. To accommodate massive information, every shard quickly bursts as much as 10MiB/s, ultimately averaging out to 1MiB per second. To assist visualize this conduct, consider every shard having a capability tank that refills at 1MiB per second. After sending a big report (for instance, a 10MiB report), the tank begins refilling instantly, permitting you to ship smaller information as capability turns into accessible. This capability to assist massive information is repeatedly refilled into the stream. The speed of refilling depends upon the dimensions of the massive information, the dimensions of the baseline report, the general visitors sample, and your chosen partition key technique. If you course of massive information, every shard continues to course of baseline visitors whereas leveraging its burst capability to deal with these bigger payloads.
As an example how Kinesis Knowledge Streams handles completely different proportions of enormous information, let’s study the outcomes a easy check. For our check configuration, we arrange a producer that sends information to an on-demand stream (defaults to 4 shards) at a charge of fifty information per second. The baseline information are 10KiB in dimension, whereas massive information are 2MiB every. We carried out a number of check instances by progressively growing the proportion of enormous information from 1% to five% of the whole stream visitors, together with a baseline case containing no massive information. To make sure constant testing situations, we distributed the massive information uniformly over time for instance, within the 1% state of affairs, we despatched one massive report for each 100 baseline information. The next graph exhibits the outcomes:

Within the graph, horizontal annotations point out throttling prevalence peaks. The baseline state of affairs, represented by the blue line, exhibits minimal throttling occasions. Because the proportion of enormous information will increase from 1% to five%, we observe a rise within the charge at which your stream throttles your information, with a notable acceleration in throttling occasions between the two% and 5% eventualities. This check demonstrates how Kinesis Knowledge Streams manages growing proportion of enormous information.
We advocate sustaining massive information at 1-2% of your complete report rely for optimum efficiency. In manufacturing environments, precise stream conduct varies based mostly on three key components: the dimensions of baseline information, the dimensions of enormous information, and the frequency at which massive information seem within the stream. We advocate that you simply check along with your demand sample to find out the precise conduct.
With on-demand streams, when the incoming visitors exceeds 500 KB/s per shard, it splits the shard inside quarter-hour. The guardian shard’s hash key values are redistributed evenly throughout youngster shards. Kinesis mechanically scales the stream to extend the variety of shards, enabling distribution of enormous information throughout a bigger variety of shards relying on the partition key technique employed.
For optimum efficiency with massive information:
- Use a random partition key technique to distribute massive information evenly throughout shards.
- Implement backoff and retry logic in producer purposes.
- Monitor shard-level metrics to determine potential bottlenecks.
In case you nonetheless must repeatedly stream of enormous information, think about using Amazon S3 to retailer payloads and ship solely metadata references to the stream. Consult with Processing massive information with Amazon Kinesis Knowledge Streams for extra info.
Conclusion
Amazon Kinesis Knowledge Streams now helps report sizes as much as 10MiB, a tenfold enhance from the earlier 1MiB restrict. This enhancement simplifies information pipelines for IoT analytics, change information seize, and AI/ML workloads by eliminating the necessity for advanced workarounds. You’ll be able to proceed utilizing current Kinesis Knowledge Streams APIs with out further code adjustments and profit from elevated flexibility in dealing with intermittent massive payloads.
- For optimum efficiency, we advocate sustaining massive information at 1-2% of complete report rely.
- For greatest outcomes with massive information, implement a uniformly distributed partition key technique to evenly distribute information throughout shards, embody backoff and retry logic in producer purposes, and monitor shard-level metrics to determine potential bottlenecks.
- Earlier than growing the utmost report dimension, confirm that every one downstream purposes and shoppers can deal with bigger information.
We’re excited to see the way you’ll leverage this functionality to construct extra highly effective and environment friendly streaming purposes. To study extra, go to the Amazon Kinesis Knowledge Streams documentation.

