Data retention strategy and tuning

Flow Monitor can process millions of NetFlow records per minute from NetFlow enabled devices and the Flow Publisher, while also gathering interface data through direct SNMP polling of individual devices. The number of flow records retained in raw form directly impacts the size of the Flow Monitor databases and performance of data intensive operations such as report generation and display. Flow Monitor uses data compression, culling, and archival strategies to minimize the impact data retention has on system storage and operations. The following diagram illustrates the different stages of the data retention strategy and the relative impact of each stage on the number of flow records stored in the Flow Monitor databases.

Initial data compression

The first step of the data retention strategy is accomplished during the interval between collections of the raw data. Flow records with the same key data that occur during the interval between consecutive data collections are consolidated into a single flow record. This results in a small reduction in records, with a longer data collection interval creating a larger reduction. Use the Data collection interval option to adjust this interval.

Raw data compression

Raw data compression happens during the hourly roll-up. Each hour an hour's worth of raw NetFlow records are aged out of the hourly retention period and are compressed into a single record. While the start and stop times for individual flows may be lost, this compression provides an initial savings in data storage. Use the Retain raw data for x hours option to determine how long you want to maintain raw data before rolling it up into an hourly data records.

Culling flow data

The next step in the retention strategy is to cull the flow data so that the smallest flow records are removed from the data to be stored. This is done by ordering the flow records by size, and retaining a percentage of the total number of flow records, based on the size in bytes of the traffic represented by the number of bytes reported by the flows. The system is configured to maintain between 97 and 99 percent of the flow records by size (number of bytes), discarding the bottom 1-3 percent of the flow traffic. While the discarded records represent only a small percentage with respect to the total number of bytes represented by the flow data, they can represent thousands of individual flow records in environments where there are many dropped connections, port scans, or other activity resulting in flows with small byte counts. By culling these records, we can see a large reduction in storage requirements, and a corresponding increase in performance of data intensive operations, all with a minimal reduction in data retention. This culling of flow data takes place when the collector writes raw records and when doing roll-ups from raw to hourly data and from hourly data to daily data. Use the Percentage of traffic to retain option to set the percentage of the flow data you want to retain.

Daily flow data compression

Following this culling of data, a data compression takes place during the daily roll-up. Each day, a days worth of hourly roll-ups are aged out of the daily retention period, and are compressed into a single record for the day. Use the Retain hourly data for x days option to determine how long you want the hourly roll-up records to be maintained in the Active Flow Monitor database, before they are rolled-up into a daily record.

Flow data archival

Finally, each day, daily data is archived. This archival removes daily data that has aged out of the daily retention period. Each day during the daily roll-up, a daily record is written to the NetFlow Archive and is removed from the NetFlow Active database. Use the Retain daily data for x days option to determine how long you want the daily roll-up records maintained in the Active Flow Monitor database before they are archived in the Flow Monitor Archive database.

Data retention tuning

Data retention can be tuned manually, by adjusting the Data collection interval, Percentage of traffic to retain, and the retention periods for the various stages of the data retention strategy (Raw flow data, hourly flow data and daily flow data), or it can be tuned automatically by selecting the Auto tune flow data retention option.

When you have enabled auto tuning of the system (Auto tune flow data retention option is selected), the system adjusts the data retention periods to maintain the number of records within a normal range that optimize data storage and system performance. Using information gathered from the database, Flow Monitor approximates the growth rate of the database, and adjusts the retention settings to ensure that the total size of the database is maintained in the recommended band of a minimum of 1 million to a maximum of 10 million flow records. The recommended band is based on storage requirements for each stage in the data retention strategy.

When you manually adjust the Data Retention settings (Auto tune flow data retention option is cleared), you are presented with guidance in the message area at the bottom of the dialog as you adjust each setting. This feedback provides you with information about how the current, or proposed setting affects the database size with respect to the maximum recommended database size (10 million records). For the raw data, hourly data, and daily data, the maximum recommended database size is compared against all of the data in these categories and is based on the size of the Flow Monitor Active database. For the Archive daily data after setting, the guidance is based on the size of the Flow Monitor Archive database.

See Also

Managing Flow Monitor Settings

Flow Monitor settings

Configure Flow Monitor to listen for NetFlow data

Setting the logging level

Configuring data retention settings