leveldb的DBSequence从哪里来,到哪里去?
(Owed by: 春夜喜雨 http://blog.csdn.net/chunyexiyu)
leveldb数据库的DBSequence从哪里来,到哪里去?
大概的情形是,leveldb的记录初始DBSequence为0,随着记录的增加,记录sequence不断随着增加,并持久化到文件中。
细节在哪呢?
sequence如何一步一步递增加上来的;
WriteBatch同一批写入的记录,dbsequence一样吗?
如果一样的话,WriteBatch写入一批记录后,dbsequence怎么变化的?
一般DBSequence会出现跳变吗?是一直能反映记录数吗?
dbsequence记录到哪?记录到sstable中吗?记录到manifest中吗?记录到wal中吗?或者记录在哪?
下面,一步步来分析看。
1. 初始空库的首次写入
第一条记录来的时候,或第一个WriteBatch写入的时候;
其中一条记录写入情况,也包装成WriteBatch,按WriteBatch写入;
WriteBatch的结构:
在WriteBatch的rep_开头的12字节里,前8个字节记录db-sequence,后4个字节记录count数量。
也就是一个WriteBatch对应的一批记录,统一使用的同一个db-sequence。
// WriteBatch header has an 8-byte sequence number followed by a 4-byte count.
static const size_t kHeader = 12;
void WriteBatch::Clear() {rep_.clear();rep_.resize(kHeader);
}
int WriteBatchInternal::Count(const WriteBatch* b) {return DecodeFixed32(b->rep_.data() + 8);
}
void WriteBatchInternal::SetCount(WriteBatch* b, int n) {EncodeFixed32(&b->rep_[8], n);
}
SequenceNumber WriteBatchInternal::Sequence(const WriteBatch* b) {return SequenceNumber(DecodeFixed64(b->rep_.data()));
}
void WriteBatchInternal::SetSequence(WriteBatch* b, SequenceNumber seq) {EncodeFixed64(&b->rep_[0], seq);
}
2. DBSequence的来源
每次写入记录时,会在WriteBatch中写入的db-sequence。并基于WriteBatch的记录数,来更新最新的db-sequence。
所以可以认为DBSequence的变动来源于每次记录的写入。
其中,WriteBatch的db-sequence并不是一开始写入进来的,而是在leveldb::DBImpl::Write中写入时赋值的,写入时赋值从记录的version中获取值,更新值。
从下面的代码中,可以看到
- WriteBatch中的db-sequence从version中获取;
- WriteBatch写入之后,要更新db-sequence到version;
- DB-Sequence增加的大小是WriteBatch写入记录的个数;
Status DBImpl::Write(const WriteOptions& options, WriteBatch* updates) {
...uint64_t last_sequence = versions_->LastSequence();
...WriteBatchInternal::SetSequence(updates, last_sequence + 1);last_sequence += WriteBatchInternal::Count(updates);
...versions_->SetLastSequence(last_sequence);
...
}
3. DBSequence的持久化
DBSequence的持久话,有了写入之后,会把最新Sequence记录到文件中,持久化下来。
每当有数据写入到库中,就会发生dbsequence更新;
dbsequence数据更新时,会首先持久化到WAL日志上,当Write时,会把带有db-sequence的WriteBatch写入到WAL日志中。
Status DBImpl::Write(const WriteOptions& options, WriteBatch* updates) {
...uint64_t last_sequence = versions_->LastSequence();
...WriteBatchInternal::SetSequence(updates, last_sequence + 1);
...status = log_->AddRecord(WriteBatchInternal::Contents(updates));
...
}
DBSequence数据会存入到了记录的key中去,并在数据MinorCompact时,存储到sstable中,key with dbsequence信息都会记录进来;
不过sstable中的sequence一般只是查询用,不会用作其它用途,例如不会用作dbsequence的还原等操作。
有了新的sstable之后,这个新增文件的记录要记录到manifest文件,记录到manifest文件时,会记录当前的dbsequence。
每次LogAndApply记录一条edit记录到manifest文件,就会记录最新的dbsequence。
Status VersionSet::LogAndApply(VersionEdit* edit, port::Mutex* mu) {
...edit->SetLastSequence(last_sequence_);
...std::string record;edit->EncodeTo(&record);s = descriptor_log_->AddRecord(record);
...
}
4. DBSequence的还原
上面的存有db-sequnce值的文件,包括了WAL文件,sstable,manifest文件。
通常用于DBSequence还原的文件主要是manifest文件和WAL日志文件。
从manifest文件的最后一条有效记录中,中可以获取到已经存到sstable的db-sequence信息。
另外从wal日志中呢,从WAL日志的最后一条有效记录中,可以获取到最新的db-sequence,这是还未写入到sstable的最新记录的dbsequence;
但wal日志也可能是没有的,例如所有数据都已经放到sstable中了,此时的wal日志是空的,从manifest中获取即可。
总体过程,先从manifest还原db-sequence,然后再考虑从存在的wal中还原出最新的db-sequence。
Status DBImpl::Recover(VersionEdit* edit, bool* save_manifest) {
...
s = versions_->Recover(save_manifest);
SequenceNumber max_sequence(0);
...// Recover in the order in which the logs were generatedstd::sort(logs.begin(), logs.end());for (size_t i = 0; i < logs.size(); i++) {s = RecoverLogFile(logs[i], (i == logs.size() - 1), save_manifest, edit,&max_sequence);if (!s.ok()) {return s;}// The previous incarnation may not have written any MANIFEST// records after allocating this log number. So we manually// update the file number allocation counter in VersionSet.versions_->MarkFileNumberUsed(logs[i]);}
...if (versions_->LastSequence() < max_sequence) {versions_->SetLastSequence(max_sequence);}
...
}
从manifest还原db-sequence:
Status VersionSet::Recover(bool* save_manifest) {
...
while (reader.ReadRecord(&record, &scratch) && s.ok()) {
...VersionEdit edit;s = edit.DecodeFrom(record);
...if (edit.has_last_sequence_) {last_sequence = edit.last_sequence_;have_last_sequence = true;}
...
last_sequence_ = last_sequence;
...
}
从wal中还原db-sequence:
Status DBImpl::RecoverLogFile(uint64_t log_number, bool last_log,bool* save_manifest, VersionEdit* edit,SequenceNumber* max_sequence) {
...
while (reader.ReadRecord(&record, &scratch) && status.ok()) {
...WriteBatchInternal::SetContents(&batch, record);
...const SequenceNumber last_seq = WriteBatchInternal::Sequence(&batch) +WriteBatchInternal::Count(&batch) - 1;if (last_seq > *max_sequence) {*max_sequence = last_seq;}
...
}
5. dbsequence从哪来的,到哪去了呢?
回到问题:dbsequence从哪来的,到哪去的?
dbsequence从哪来的:
dbsequence是从0开始,随着write的Batch记录数的增多,不断加上来的,通常最新的dbsequence可以反映总的记录数。
dbsequence到哪了去了:
dbsequence会存储到wal日志,sstable中,manifest这些持久化的文件中。
dbsequence会附加到每个key的后面作为key的隐藏部分。
(Owed by: 春夜喜雨 http://blog.csdn.net/chunyexiyu)