Skip to content

Split cold and warm index writes#3618

Open
rallen090 wants to merge 64 commits into
masterfrom
ra/index-active-block-cold-write-batch
Open

Split cold and warm index writes#3618
rallen090 wants to merge 64 commits into
masterfrom
ra/index-active-block-cold-write-batch

Conversation

@rallen090

@rallen090 rallen090 commented Jul 22, 2021

Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread src/dbnode/storage/index.go Outdated
// the active block. We ensure that can't happen by always writing the cold index in-mem data
// directly.
warmBatch := index.NewWriteBatch(batch.Options())
coldBatch := index.NewWriteBatch(batch.Options())

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the simplest implementation for splitting. Assuming this approach works I can fix this up to avoid needless allocs (e.g. if no cold writes, just use the original batch instead of newing two more, etc.)

@robskillington robskillington Jul 22, 2021

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's only ever a single goroutine at a time ever calling writeBatchForBlockStart(...) can we actually just use a lock (like a new "writeBatchesLock") and then always reuse the same warm batches and cold batches struct and reset at the start of each write batch call? (not sure if it's reuseable but should technically be trivial to make reuseable).

// NB: users are expected to use `NewEntry` to construct these objects.
type Entry struct {
relookupAndIncrementReaderWriterCount func() (index.OnIndexSeries, bool)
queryableBlockRetriever series.QueryableBlockRetriever

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note @robskillington the way I'm exposing the way to get "block states" which we check against for RequiresColdFlushForBlockStart, is by having the Entry have a pointer to the dbShard which implements this series.QueryableBlockRetriever type. Let me know of any thoughts here.

Comment thread src/dbnode/storage/index/block.go Outdated
Comment on lines +322 to +323
// Don't pass stats back from insertion into a cold block,
// we only care about warm mutable segments stats.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, can't we make that decision in the caller not to use the stats? Or does the "Combine()" method do it automatically? Could we make it a "CombineWithoutStats()" so that at least this doesn't respond with just an empty datastructure that we don't really know should be empty or not from the viewpoint of the caller?

@codecov

codecov Bot commented Jul 22, 2021

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.0%. Comparing base (1032c31) to head (25894ae).
⚠️ Report is 487 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (1032c31) and HEAD (25894ae). Click for more details.

HEAD has 88 uploads less than BASE
Flag BASE (1032c31) HEAD (25894ae)
collector 16 4
aggregator 16 4
m3em 16 4
cluster 16 4
msg 16 4
metrics 16 4
dbnode 16 0
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #3618      +/-   ##
=========================================
- Coverage    57.3%   46.0%   -11.4%     
=========================================
  Files         551     213     -338     
  Lines       64021   19078   -44943     
=========================================
- Hits        36712    8784   -27928     
+ Misses      24115    9608   -14507     
+ Partials     3194     686    -2508     
Flag Coverage Δ
aggregator 57.2% <ø> (ø)
cluster ∅ <ø> (∅)
collector 58.4% <ø> (ø)
dbnode ?
m3em 46.4% <ø> (ø)
metrics 19.8% <ø> (ø)
msg 74.4% <ø> (+<0.1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1032c31...25894ae. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

if !ok {
return false
}
coldBlocks := entry.Series.ColdFlushBlockStarts(v)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method unfortunately doesn't quite do what I think you might think it does, it only returns cold blocks that "need to be flushed":
https://github.com/m3db/m3/blob/2e48cd9759e228a92114eaf2b20d28e7f1ee19aa/src/dbnode/storage/series/buffer.go#L436:L445

I would go and implement a new method on src/dbnode/storage/series/series.go which calls src/dbnode/storage/series/buffer.go and returns True if a block start has a cold bucket that's not empty.

i.e. something like:

func (b *dbBuffer) ColdWritesAtBlockStartExist(blockStart xtime.UnixNano) bool {
	bucketVersions, ok := b.bucketsMap[blockStart]
	if !ok {
		return false
	}

	for _, bucket := range bucketVersions.buckets {
		if bucket.writeType == ColdWrite && bucket.streamsLen() > 0 {
			return true
		}
	}

	return false
}

Base automatically changed from r/index-active-block to master August 26, 2021 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants