Skip to content

RocksDB Properties#

Property Description Example
ROCKSDB_MAX_BACKGROUND_JOBS Specifies the maximum number of concurrent background jobs (both flushes and compactions combined). ROCKSDB_MAX_BACKGROUND_JOBS: "2"
ROCKSDB_ALLOW_CONCURRENT_MEMTABLE_WRITE If true, allow multi-writers to update mem tables in parallel. Only some memtable factorys support concurrent writes; currently it is implemented only for SkipListFactory. Concurrent memtable writes are not compatible with inplace_update_support or filter_deletes. ROCKSDB_ALLOW_CONCURRENT_MEMTABLE_WRITE: "true"
ROCKSDB_ENABLE_PIPELINED_WRITE By default, a single write thread queue is maintained. The thread gets to the head of the queue becomes write batch group leader and responsible for writing to WAL and memtable for the batch group. If enablePipelinedWrite() is true, separate write thread queue is maintained for WAL write and memtable write. A write thread first enter WAL writer queue and then memtable writer queue. Pending thread on the WAL writer queue thus only have to wait for previous writers to finish their WAL writing but not the memtable writing. Enabling the feature may improve write throughput and reduce latency of the prepare phase of two-phase commit. ROCKSDB_ENABLE_PIPELINED_WRITE: "false"
ROCKSDB_DB_WRITE_BUFFER_SIZE Amount of data to build up in memtables across all column families before writing to disk. This is distinct from ColumnFamilyOptions.writeBufferSize(), which enforces a limit for a single memtable. This feature is disabled by default. Specify a non-zero value to enable it. ROCKSDB_DB_WRITE_BUFFER_SIZE: "0"
ROCKSDB_RANDOM_ACCESS_MAX_BUFFER_SIZE This is a maximum buffer size that is used by WinMmapReadableFile in unbuffered disk I/O mode. We need to maintain an aligned buffer for reads. We allow the buffer to grow until the specified value and then for bigger requests allocate one shot buffers. In unbuffered mode we always bypass read-ahead buffer at ReadaheadRandomAccessFile When read-ahead is required we then make use of MutableDBOptionsInterface.compactionReadaheadSize() value and always try to read ahead. With read-ahead we always pre-allocate buffer to the size instead of growing it up to a limit. This option is currently honored only on Windows Default: 1 Mb Special value: 0 - means do not maintain per instance buffer. Allocate per request buffer and avoid locking. ROCKSDB_RANDOM_ACCESS_MAX_BUFFER_SIZE: "0"
ROCKSDB_WRITABLE_FILE_MAX_BUFFER_SIZE This is a maximum buffer size that is used by WinMmapReadableFile in unbuffered disk I/O mode. We need to maintain an aligned buffer for reads. We allow the buffer to grow until the specified value and then for bigger requests allocate one shot buffers. In unbuffered mode we always bypass read-ahead buffer at ReadaheadRandomAccessFile When read-ahead is required we then make use of MutableDBOptionsInterface.compactionReadaheadSize() value and always try to read ahead. With read-ahead we always pre-allocate buffer to the size instead of growing it up to a limit. This option is currently honored only on Windows Default: 1 Mb Special value: 0 - means do not maintain per instance buffer. Allocate per request buffer and avoid locking. ROCKSDB_WRITABLE_FILE_MAX_BUFFER_SIZE: "0"
ROCKSDB_ALLOW_MMAP_READS Allow the OS to mmap file for reading sst tables. ROCKSDB_ALLOW_MMAP_READS: "false"
ROCKSDB_ALLOW_MMAP_WRITES Allow the OS to mmap file for writing. ROCKSDB_ALLOW_MMAP_READS: "false"
ROCKSDB_BYTES_PER_SYNC Allows OS to incrementally sync files to disk while they are being written, asynchronously, in the background. Issue one request for every bytes_per_sync written. ROCKSDB_BYTES_PER_SYNC: "0"
ROCKSDB_WAL_BYTES_PER_SYNC Same as setBytesPerSync(long) , but applies to WAL files ROCKSDB_WAL_BYTES_PER_SYNC: "0"
ROCKSDB_RATELIMITER_RATE_BYTES_PER_SEC rateBytesPerSecond this is the only parameter you want to set most of the time. It controls the total write rate of compaction and flush in bytes per second. Currently, RocksDB does not enforce rate limit for anything other than flush and compaction, e.g. write to WAL. ROCKSDB_RATELIMITER_RATE_BYTES_PER_SEC: "0"
ROCKSDB_MAX_OPEN_FILES Number of open files that can be used by the DB. You may need to increase this if your database has a large working set. Value -1 means files opened are always kept open. You can estimate number of files based on target_file_size_base and target_file_size_multiplier for level-based compaction. For universal-style compaction, you can usually set it to -1. ROCKSDB_MAX_OPEN_FILES: "0"
ROCKSDB_CF_WRITE_BUFFER_SIZE Amount of data to build up in memory (backed by an unsorted log on disk) before converting to a sorted on-disk file. Larger values increase performance, especially during bulk loads. Up to max_write_buffer_number write buffers may be held in memory at the same time, so you may wish to adjust this parameter to control memory usage. Also, a larger write buffer will result in a longer recovery time the next time the database is opened. ROCKSDB_CF_WRITE_BUFFER_SIZE: "0"

ROCKSDB_CF_COMPRESSIONTYPE_LZ4COMPRESSION

ROCKSDB_CF_COMPRESSIONTYPE_ZSTDCOMPRESSION

ROCKSDB_CF_COMPRESSIONTYPE_ZLIBCOMPRESSION

Compress blocks using the specified compression algorithm. This parameter can be changed dynamically. Default: SNAPPY_COMPRESSION, which gives lightweight but fast compression.

ROCKSDB_CF_COMPRESSIONTYPE_LZ4COMPRESSION: "false"

ROCKSDB_CF_COMPRESSIONTYPE_ZSTDCOMPRESSION: "false"

ROCKSDB_CF_COMPRESSIONTYPE_ZLIBCOMPRESSION: "false"

 

ROCKSDB_CF_LEVEL_COMPACTION_DYNAMIC_LEVEL_BYTES With this option on, from an empty DB, we make last level the base level, which means merging L0 data into the last level, until it exceeds max_bytes_for_level_base. And then we make the second last level to be base level, to start to merge L0 data to second last level, with its target size to be 1/max_bytes_for_level_multiplier of the last levels extra size. After the data accumulates more so that we need to move the base level to the third last one, and so on. ROCKSDB_CF_LEVEL_COMPACTION_DYNAMIC_LEVEL_BYTES: "false"
ROCKSDB_CF_BLOOMLOCALITY Control locality of bloom filter probes to improve cache miss rate. This option only applies to memtable prefix bloom and plaintable prefix bloom. It essentially limits the max number of cache lines each bloom filter check can touch. This optimization is turned off when set to 0. The number should never be greater than number of probes. This option can boost performance for in-memory workload but should use with care since it can cause higher false positive rate. ROCKSDB_CF_BLOOMLOCALITY: "0"
ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL Set compaction style for DB. ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL: "false"
ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL_SIZERATIO Percentage flexibility while comparing file size. If the candidate file(s) size is 1% smaller than the next file's size, then include next file into this candidate set. ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL_SIZERATIO: "1"
ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL_MINMERGEWIDTH The minimum number of files in a single compaction run. ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL_MINMERGEWIDTH: "2"
ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL_MAXSIZEAMPPERCENT The size amplification is defined as the amount (in percentage) of additional storage needed to store a single byte of data in the database. For example, a size amplification of 2% means that a database that contains 100 bytes of user-data may occupy upto 102 bytes of physical storage. By this definition, a fully compacted database has a size amplification of 0%. Rocksdb uses the following heuristic to calculate size amplification: it assumes that all files excluding the earliest file contribute to the size amplification. Default: 200, which means that a 100 byte database could require upto 300 bytes of storage. ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL_MAXSIZEAMPPERCENT: "200"
ROCKSDB_CF_COMPRESSIONSTYLE_FIFO   ROCKSDB_CF_COMPRESSIONSTYLE_FIFO: "false"
ROCKSDB_CF_COMPRESSIONSTYLE_FIFO_ALLOWCOMPACTION If true, try to do compaction to compact smaller files into larger ones. Minimum files to compact follows options.level0_file_num_compaction_trigger and compaction won't trigger if average compact bytes per del file is larger than options.write_buffer_size. This is to protect large files from being compacted again. ROCKSDB_CF_COMPRESSIONSTYLE_FIFO_ALLOWCOMPACTION: "false"
ROCKSDB_CF_COMPRESSIONSTYLE_FIFO_MAXTABLEFILESIZE Once the total sum of table files reaches this, we will delete the oldest table file ROCKSDB_CF_COMPRESSIONSTYLE_FIFO_MAXTABLEFILESIZE: "1024"
ROCKSDB_CF_COMPRESSIONSTYLE_NONE   ROCKSDB_CF_COMPRESSIONSTYLE_FIFO_MAXTABLEFILESIZE: "1024"
ROCKSDB_CF_LEVEL0FILENUMCOMPACTIONTRIGGER Number of files to trigger level-0 compaction. A value < 0 means that level-0 compaction will not be triggered by number of files at all. ROCKSDB_CF_LEVEL0FILENUMCOMPACTIONTRIGGER: "0"
ROCKSDB_CF_LEVEL0SLOWDOWNWRITESTRIGGER Soft limit on number of level-0 files. We start slowing down writes at this point. A value < 0 means that no writing slow down will be triggered by number of files in level-0. ROCKSDB_CF_LEVEL0SLOWDOWNWRITESTRIGGER: "0"
ROCKSDB_CF_LEVEL0STOPWRITESTRIGGER Soft limit on number of level-0 files. We start slowing down writes at this point. A value < 0 means that no writing slow down will be triggered by number of files in level-0. ROCKSDB_CF_LEVEL0STOPWRITESTRIGGER: "0"
ROCKSDB_CF_MAX_WRITE_BUFFER_NUMBER The total maximum number of write buffers to maintain in memory including copies of buffers that have already been flushed. Unlike AdvancedMutableColumnFamilyOptionsInterface.maxWriteBufferNumber(), this parameter does not affect flushing. This controls the minimum amount of write history that will be available in memory for conflict checking when Transactions are used. When using an OptimisticTransactionDB: If this value is too low, some transactions may fail at commit time due to not being able to determine whether there were any write conflicts. When using a TransactionDB: If Transaction::SetSnapshot is used, TransactionDB will read either in-memory write buffers or SST files to do write-conflict checking. Increasing this value can reduce the number of reads to SST files done for conflict detection. Setting this value to 0 will cause write buffers to be freed immediately after they are flushed. If this value is set to -1, AdvancedMutableColumnFamilyOptionsInterface.maxWriteBufferNumber() will be used. Default: If using a TransactionDB/OptimisticTransactionDB, the default value will be set to the value of AdvancedMutableColumnFamilyOptionsInterface.maxWriteBufferNumber() if it is not explicitly set by the user. Otherwise, the default is 0. ROCKSDB_CF_LEVEL0STOPWRITESTRIGGER: "0"
ROCKSDB_CF_MAX_WRITE_BUFFER_NUMBER_TO_MAINTAIN The total maximum number of write buffers to maintain in memory including copies of buffers that have already been flushed. Unlike AdvancedMutableColumnFamilyOptionsInterface.maxWriteBufferNumber(), this parameter does not affect flushing. This controls the minimum amount of write history that will be available in memory for conflict checking when Transactions are used. When using an OptimisticTransactionDB: If this value is too low, some transactions may fail at commit time due to not being able to determine whether there were any write conflicts. When using a TransactionDB: If Transaction::SetSnapshot is used, TransactionDB will read either in-memory write buffers or SST files to do write-conflict checking. Increasing this value can reduce the number of reads to SST files done for conflict detection. Setting this value to 0 will cause write buffers to be freed immediately after they are flushed. If this value is set to -1, AdvancedMutableColumnFamilyOptionsInterface.maxWriteBufferNumber() will be used. Default: If using a TransactionDB/OptimisticTransactionDB, the default value will be set to the value of AdvancedMutableColumnFamilyOptionsInterface.maxWriteBufferNumber() if it is not explicitly set by the user. Otherwise, the default is 0. ROCKSDB_CF_MAX_WRITE_BUFFER_NUMBER_TO_MAINTAIN: "0"
ROCKSDB_CF_NUMLEVEL Set the number of levels for this database If level-styled compaction is used, then this number determines the total number of levels. ROCKSDB_CF_NUMLEVEL: "0"
ROCKSDB_CF_TARGETFILESIZEBASE The target file size for compaction. This targetFileSizeBase determines a level-1 file size. Target file size for level L can be calculated by targetFileSizeBase * (targetFileSizeMultiplier ^ (L-1)) For example, if targetFileSizeBase is 2MB and target_file_size_multiplier is 10, then each file on level-1 will be 2MB, and each file on level 2 will be 20MB, and each file on level-3 will be 200MB. ROCKSDB_CF_TARGETFILESIZEBASE: "0"
ROCKSDB_CF_MAXBYTESFORLEVELBASE The upper-bound of the total size of level-1 files in bytes. Maximum number of bytes for level L can be calculated as (maxBytesForLevelBase) * (maxBytesForLevelMultiplier ^ (L-1)) For example, if maxBytesForLevelBase is 20MB, and if max_bytes_for_level_multiplier is 10, total data size for level-1 will be 200MB, total file size for level-2 will be 2GB, and total file size for level-3 will be 20GB. ROCKSDB_CF_MAXBYTESFORLEVELBASE: "0"
ROCKSDB_CF_MULTIPLIER The ratio between the total size of level-(L+1) files and the total size of level-L files for all L. ROCKSDB_CF_MULTIPLIER: "0"
ROCKSDB_CF_TABLECONFIG_ENABLE Enable tableconfig for columnfamily ROCKSDB_CF_TABLECONFIG_ENABLE: "false"
ROCKSDB_CF_TABLECONFIG_BLOCKSIZE Approximate size of user data packed per block. Note that the block size specified here corresponds to uncompressed data. The actual size of the unit read from disk may be smaller if compression is enabled. This parameter can be changed dynamically. ROCKSDB_CF_TABLECONFIG_BLOCKSIZE: "4000"
ROCKSDB_CF_TABLECONFIG_CACHEINDEXANDFILTERBLOCKS Indicating if we'd put index/filter blocks to the block cache. If not specified, each "table reader" object will pre-load index/filter block during table initialization. ROCKSDB_CF_TABLECONFIG_CACHEINDEXANDFILTERBLOCKS: "false"
ROCKSDB_CF_TABLECONFIG_FORMATVERSION

We currently have five versions:

#0 - This version is currently written out by all RocksDB's versions by default. Can be read by really old RocksDB's. Doesn't support changing checksum (default is CRC32).

#1 - Can be read by RocksDB's versions since 3.0. Supports non-default checksum, like xxHash. It is written by RocksDB when BlockBasedTableOptions::checksum is something other than kCRC32c. (version 0 is silently upconverted)

#2 - Can be read by RocksDB's versions since 3.10. Changes the way we encode compressed blocks with LZ4, BZip2 and Zlib compression. If you don't plan to run RocksDB before version 3.10, you should probably use this.

#3 - Can be read by RocksDB's versions since 5.15. Changes the way we encode the keys in index blocks. If you don't plan to run RocksDB before version 5.15, you should probably use this. This option only affects newly written tables. When reading existing tables, the information about version is read from the footer.

#4 - Can be read by RocksDB's versions since 5.16. Changes the way we encode the values in index blocks. If you don't plan to run RocksDB before version 5.16 and you are using index_block_restart_interval > 1, you should probably use this as it would reduce the index size.

#This option only affects newly written tables. When reading existing tables, the information about version is read from the footer.

ROCKSDB_CF_TABLECONFIG_FORMATVERSION: "0"
ROCKSDB_CF_TABLECONFIG_PINL0FILTERANDINDEXBLOCKSINCACHE Indicating if we'd like to pin L0 index/filter blocks to the block cache. If not specified, defaults to false. ROCKSDB_CF_TABLECONFIG_PINL0FILTERANDINDEXBLOCKSINCACHE: "false"

ROCKSDB_CF_TABLECONFIG_INDEXTYPE_KHASHSEARCH

ROCKSDB_CF_TABLECONFIG_INDEXTYPE_KBINARYSEARCH

ROCKSDB_CF_TABLECONFIG_INDEXTYPE_KTWOLEVELINDEXSEARCH

Sets the index type to used with this table.

ROCKSDB_CF_TABLECONFIG_INDEXTYPE_KHASHSEARCH: "false"

ROCKSDB_CF_TABLECONFIG_INDEXTYPE_KBINARYSEARCH: "false"

ROCKSDB_CF_TABLECONFIG_INDEXTYPE_KTWOLEVELINDEXSEARCH: "false"


Last update: July 23, 2021