Skip to content

DOC-13858 Server File-Based Rebalance for Data Service#4105

Open
rao-shwe wants to merge 9 commits into
release/8.1from
DOC-13858-fbr-for-data-service
Open

DOC-13858 Server File-Based Rebalance for Data Service#4105
rao-shwe wants to merge 9 commits into
release/8.1from
DOC-13858-fbr-for-data-service

Conversation

@rao-shwe

@rao-shwe rao-shwe commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

This is my reformatting/editing of Schewtha's draft of the FBR docs.

Important Note: This draft does not contain documentation for the Web Console's FBR settings. The UI wasn't ready when this draft was written and revised. This draft is mainly for review before inclusion before the early Totoro release. Other features are taking prioority over getting the GUI documentation done.

Cribbing CoPilot's summary of these changes:

Adds documentation for the new Data Service File-Based Rebalance (FBR) feature in Couchbase Server Totoro, including a new REST API reference page, navigation entry, conceptual coverage in the rebalance learn page, a new bucket-level dataServiceRebalanceType parameter, and updates to general settings, node management, and the 8.1 new-features page. Also performs cosmetic cleanup (removing italic emphasis) in the rebalance learn page and updates the shared user/pwd/host/port REST parameter partial.

Changes (with links to the preview):

You will need the Docs Team credentials on Confluence to view the preview.

@ggray-cb ggray-cb force-pushed the DOC-13858-fbr-for-data-service branch from b86953c to 3a8583c Compare May 14, 2026 13:16

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds documentation for the new Data Service File-Based Rebalance (FBR) feature in Couchbase Server 8.1, including a new REST API reference page, navigation entry, conceptual coverage in the rebalance learn page, a new bucket-level dataServiceRebalanceType parameter, and updates to general settings, node management, and the 8.1 new-features page. Also performs cosmetic cleanup (removing italic emphasis) in the rebalance learn page and updates the shared user/pwd/host/port REST parameter partial.

Changes:

  • New REST reference page file-based-data-rebalance.adoc documenting GET/POST /internalSettings usage for FBR, plus nav and rebalance-table entries.
  • New dataServiceRebalanceType bucket parameter documentation added to rest-bucket-create.adoc, with cross-references from general settings, add-node-and-rebalance, and the rebalance learn page.
  • Conceptual FBR section added to rebalance.adoc and a 8.1 new-features entry, plus formatting cleanup of italic markers across rebalance.adoc and rest-rebalance-overview.adoc.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
modules/ROOT/nav.adoc Adds new FBR REST page to the navigation.
modules/rest-api/partials/user_pwd_host_port_params.adoc Renames placeholders to uppercase (HOST, PORT, USER, PASSWORD).
modules/rest-api/partials/rest-rebalance-table.adoc Adds GET/POST /internalSettings rows for FBR.
modules/rest-api/pages/rest-rebalance-overview.adoc Removes italic emphasis from prose.
modules/rest-api/pages/rest-bucket-create.adoc Documents new dataServiceRebalanceType bucket parameter with examples.
modules/rest-api/pages/file-based-data-rebalance.adoc New REST reference page for configuring FBR via /internalSettings.
modules/manage/pages/manage-settings/general-settings.adoc Adds description of the new FBR concurrent-moves UI setting.
modules/manage/pages/manage-nodes/add-node-and-rebalance.adoc Adds note that FBR is used automatically during node addition (EE).
modules/learn/pages/clusters-and-availability/rebalance.adoc Adds conceptual FBR section and removes italic emphasis throughout.
modules/introduction/partials/new-features-81.adoc Adds the FBR entry under Data Service new features.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread modules/rest-api/pages/file-based-data-rebalance.adoc Outdated
Comment thread modules/rest-api/pages/file-based-data-rebalance.adoc Outdated
Comment thread modules/rest-api/pages/file-based-data-rebalance.adoc Outdated
Comment thread modules/rest-api/pages/file-based-data-rebalance.adoc Outdated
Comment thread modules/rest-api/pages/file-based-data-rebalance.adoc Outdated
Comment thread modules/learn/pages/clusters-and-availability/rebalance.adoc Outdated
Comment thread modules/learn/pages/clusters-and-availability/rebalance.adoc Outdated
Comment thread modules/learn/pages/clusters-and-availability/rebalance.adoc Outdated
Comment thread modules/rest-api/pages/file-based-data-rebalance.adoc Outdated
Comment thread modules/ROOT/nav.adoc
ggray-cb and others added 5 commits May 19, 2026 16:00
COmmitting some of CoPilot's suggestions

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@ggray-cb ggray-cb self-assigned this Jun 24, 2026
@ggray-cb ggray-cb marked this pull request as ready for review June 24, 2026 19:51

@hyunjuV hyunjuV left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The info about File-Based Rebalance throttle rate setting snapshot_download_throttle_bytes should be added, along with metric kv_ep_snapshot_read_bytes that can be monitored to view the throughput.

  2. The rest of the comments were mostly about where the phrasing/text suggests that rebalance type decision is made for each vBucket (not true) and that rebalance type decisions are made for scenarios like when either file-based or DCP rebalance is likely to perform x percent faster than the other or if all data is 100% memory resident, etc (not true as far as I'm aware, but should double check with Ben Huddleston).

While true that for ephemeral buckets, there's no backfill phase (so, no file-based rebalance), for non-ephemeral buckets, memory residency is not considered in the decision to do file-based rebalance or DCP rebalance (I do not believe).

Comment on lines +31 to +32
* *Automatic rebalance type selection*: The server automatically determines whether FBR or DCP is more efficient for each vBucket move.
When FBR is not applicable or not expected to be faster, the server falls back to DCP automatically.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not believe that this is true the way it's written (where the server automatically determines which method is more efficient for each vBucket move).

When FBR is enabled (which it is, by default) the backfill is done using the same method for all vBuckets. Generally, it's always done using file-based rebalance unless there is storage migration or eviction policy change pending. So, the "Automatic rebalance type selection" does apply when there are scenarios where DCP rebalance is required for all phases -- like when storage migration or eviction policy change is pending effect. If DCP rebalance is required for all phases, then, that's what the Server will do (even with FBR enabled).

cc @BenHuddleston

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server does not know whether or not FBR or DCP is more efficient for any given vBucket move. The server will use FBR if possible for any given vBucket, otherwise DCP is used. FBR can only be used for new vBucket builds (i.e. it was not on the node on which it is being build before), only if configured to do so (FBR settings are enabled), and only if no storage/eviction policy migration is currently being done.

Some of this confusion probably comes from calling this "file based rebalance", as the actual feature is file based backfill (one particular rebalance step).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest:

  • Automatic rebalance type selection: The server automatically determines whether FBR can be used for a given vBucket move.
    When FBR is not applicable, the server falls back to DCP automatically.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @BenHuddleston .

The below summary is to correct any mistaken comments I may have made in my original review comments:

  • The server does automatically determine whether FBR can be used for a given vBucket move (or, as in the documentation phrasing, for each vBucket move).
  • However, this determination is not made based on perf considerations (like whether file copy or DCP would be faster).
  • Also, this determination is not made based on memory residency (like whether data is 100% memory resident).
  • But, for ephemeral buckets, DCP is always used (i.e. file copy is not used since there's no persistent storage).

So, as noted by Ben in his suggestion above, best to just say that the server automatically determines whether file-based or DCP rebalance is applicable and not go into specifics.


* *Separate vBucket move concurrency for FBR*: A new setting, `dataServiceFileBasedRebalanceMovesPerNode`, controls the maximum number of concurrent file-based vBucket moves per node.
This is independent of the existing `rebalanceMovesPerNode` setting, which applies to DCP rebalance.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a new File-Based Rebalance throttle rate setting (snapshot_download_throttle_bytes) which sets the max rate at which file-based rebalance snapshots will be transferred between nodes. Very high transfer rates means that rebalance will proceed very quickly but may have a negative impact on KV operation latencies during rebalance. A value of 0 means that snapshot transfer is unthrottled.

Additional details:

  1. snapshot_download_throttle_bytes is an option in GET, POST /pools/default/settings/memcached/global

  2. In the UI (under Advanced Rebalance Settings for the Data Service), the setting of 150 MiB/s translates to snapshot_download_throttle_bytes=157286400. By default, the value is 0 (unthrottled).

  3. The rate of FBR snapshots transfer can be seen with a rate function applied to the metric kv_ep_snapshot_read_bytes.

image::clusters-and-availability/replicaVbucketMove.png[,640,align=left]

The move has two principal phases. Phase 1 is _Backfill_. Phase 2 is _Book-keeping_.
The move has two principal phases. Phase 1 is Backfill. Phase 2 is Book-keeping.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The picture and the description is slightly different for File based vs DCP rebalance.
The existing pictures are for DCP rebalance.

FYI -- this Google doc has info on how the picture would look different for file-based rebalance.


The move has four principal phases.
Phase 1, _Backfill_, and Phase 2, _Book-keeping_, are identical to those required for replica vBuckets; except that the _Book-keeping_ phase includes additional _Persistence Time_.
Phase 1, Backfill, and Phase 2, Book-keeping, are identical to those required for replica vBuckets; except that the Book-keeping phase includes additional Persistence Time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in this section as in the "Rebalance Phases for Replica vBuckets".
The picture and the description is slightly different for File based vs DCP rebalance.
The existing pictures are for DCP rebalance.

FYI -- this Google doc has info on how the picture would look different for file-based rebalance.


Since vBucket moves are highly resource-intensive, Couchbase Server allows the concurrency of such moves to be _limited_: a setting is provided that determines the maximum number of concurrent vBucket moves permitted on any node.
Since vBucket moves are highly resource-intensive, Couchbase Server allows the concurrency of such moves to be limited: a setting is provided that determines the maximum number of concurrent vBucket moves permitted on any node.
The minimum value for the setting is `1`, the maximum `64`, the default `4`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DCP rebalance, the minimum value for the setting is 1, the maximum 64, the default 4.
For FBR, the minimum value for the setting is 1, the maximum 1024, the default 4. (I see that this info is presented in the file-based rebalance section.)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both settings can be set as high as 1024.

Comment on lines +191 to +192
Scenarios where DCP may be faster::
For example, DCP can be faster when the data resident ratio is 100%.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the Server automatically does DCP rebalance (for the backfill) in this scenario -- should check with @BenHuddleston

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not, FBR is used whenever it is possible to do so. It's quite hard to determine which type of backfill would be faster as it likely depends on disk characteristics, storage backend, fragmentation, and item sizes.

==== Performance

The primary goal of FBR is to deliver significant improvements to rebalance speed for large datasets.
The target throughput is 1 TB of data movement in 30 minutes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should be specific since the actual rebalance time depends on too many variables. Please remove line 198.

Changing one setting does not affect the other.

The setting may be established by means of the xref:manage:manage-settings/general-settings.adoc#rebalance-settings[Couchbase Web Console] or the xref:manage:manage-settings/general-settings.adoc#rebalance-settings-via-rest[REST API].

@hyunjuV hyunjuV Jul 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add info about FBR throttling option -- see info below.

=== File-Based Rebalance Throttle Rate Setting

Since file-based rebalance can increase network usage, there's a way to throttle the file transfer rate, if needed. By default, there is no throttling.
The File-Based Rebalance throttle rate setting is called snapshot_download_throttle_bytes, and it sets the max rate at which file-based rebalance snapshots will be transferred between nodes. Very high transfer rates means that rebalance will proceed very quickly but may have a negative impact on KV operation latencies during rebalance. A value of 0 means that snapshot transfer is unthrottled.

Additional details:

snapshot_download_throttle_bytes is an option in GET, POST /pools/default/settings/memcached/global

In the UI (under Advanced Rebalance Settings for the Data Service), the setting of 150 MiB/s translates to snapshot_download_throttle_bytes=157286400. By default, the value is 0 (unthrottled).

The rate of FBR snapshots transfer can be seen with a rate function applied to the metric kv_ep_snapshot_read_bytes.

Therefore, replication has successfully distributed the contents of `travel-sample` across both nodes, providing a single replica vBucket for each active vBucket.

NOTE: By default, Couchbase Server Enterprise Edition automatically uses File-Based Rebalance (FBR) to move data for eligible vBuckets during node addition.
The server selects the optimal rebalance method for each vBucket move transparently.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 190 should be removed -- the rebalance method is not selected for each vBucket move.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment, it is selected on a per-vBucket basis.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Ben's comment, line 190 is OK.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO "optimal" implies "the most performant" which we do not consider at all (see other comments). I would remove line 190 or at least the word optimal.


The valid values are:

* `auto` (default): The server automatically selects File-Based Rebalance (FBR) or DCP for each vBucket move based on which is estimated to be at least 10% faster.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm aware:

  • Not true that file-based rebalance or DCP is chosen for each vBucket move.
  • Also not true that file-based or DCP decision is made based on estimates of which is likely to be faster.

Should just say:
The server automatically selects File-Based Rebalance (FBR) or DCP.

cc @BenHuddleston

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous, it is selected on a per-vBucket basis and we do not consider the efficiency of the move, only the eligibility.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Ben's comment:

The server does automatically select File-Based rebalance (FBR) or DCP for each vBucket move, but it's not based on which is estimated to be faster (not based on any performance estimates). So, best to just say that server automatically selects without going into details on how.

Added FBR-specific versions of the rebalance phase diagrams.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants