=====================
Full RGW Object Dedup
=====================

Full RGW object deduplication adds ``radosgw-admin`` commands to remove
duplicated RGW tail objects and to collect and report dedup statistics.


Admin Commands
==============

- ``radosgw-admin dedup estimate``:
   Starts a new dedup estimate session (aborting first any existing session).
   No changes are made to the existing system. Only statistics will be
   collected and reported.
- ``radosgw-admin dedup exec --yes-i-really-mean-it``:
   Starts a new dedup session (aborting first any existing session).
   Performs a full dedup, finding duplicated tail objects and removing them.

   This command can lead to **data loss** and should not be used on production
   data!!
- ``radosgw-admin dedup pause``:
   Pauses an active dedup session (dedup resources are not released).
- ``radosgw-admin dedup resume``:
   Resumes a paused dedup session.
- ``radosgw-admin dedup abort``:
   Aborts an active dedup session, releasing all resources used by it.
- ``radosgw-admin dedup stats``:
   Collects and displays last dedup statistics.
- ``radosgw-admin dedup throttle --max-bucket-index-ops=<count>``:
   Specifies maximum allowed bucket index read requests per second for a single
   RGW server during dedup, ``0`` means unlimited.
- ``radosgw-admin dedup throttle --stat``:
   Displays dedup throttle setting.


Skipped Objects
===============

Dedup estimate process skips the following objects:

- Objects smaller than :confval:`rgw_dedup_min_obj_size_for_dedup` (unless they
  are multipart).
- Objects with different placement rules.
- Objects with different pools.
- Objects with different storage classes.

The full dedup process skips all of the above and additionally skips
**compressed** and **user-encrypted** objects.

The minimum size object for dedup is controlled by the following
configuration option:

.. confval:: rgw_dedup_min_obj_size_for_dedup


Estimate Processing
===================

The dedup estimate process collects all the needed information directly from
the bucket indices, reading one full bucket index object a thousand entries at
a time.

The bucket index objects are sharded between the participating members so each
bucket index object is read exactly one time. The sharding allows processing to
scale almost linearly, splitting the load evenly between the participating
members.

The dedup estimate process does not access the objects themselves
(data/metadata), which means its processing time won't be affected by the
underlying media (SSD/HDD) storing the objects. The bucket indices are
virtually always accessed from a fast medium: placement on SSD
:ref:`is recommended <hardware-recommendations>` and they are cached heavily
in memory.

The administrator can throttle the estimate process by setting a limit on the
number of bucket index reads per second per an RGW server (each read brings
1000 object entries) using:

.. prompt:: bash #

   radosgw-admin dedup throttle --max-bucket-index-ops=<count>

A typical RGW server performs about 100 bucket index reads per second (i.e.
100,000 object entries). For example, setting ``count`` to 50 would then
typically slow down the estimate process by half.


Full Dedup Processing
=====================

The full dedup process begins by constructing a dedup table from the bucket
indices, similar to the estimate process above.

This table is then scanned linearly to purge objects without duplicates,
leaving only dedup candidates.

Next, we iterate through these dedup candidate objects, reading their complete
information from the object metadata (a per-object RADOS operation). During
this step, we filter out **compressed** and **user-encrypted** objects.

Following this, we calculate a cryptograhically strong hash of the candidate
object data. This involves a full-object read which is a resource-intensive
operation. The hash ensures that the dedup candidates are indeed perfect
matches. If they are, we proceed with the deduplication:

- Increment the reference count on the source tail objects one by one.
- Copy the manifest from the source to the target.
- Remove all tail objects on the target.


Split Head Mode
===============

Dedup code can split the head object into 2 objects

- one with attributes and no data and
- a new tail object with only data.

The new tail object will be deduped, unlike the head objects, which cannot
be deduplicated.
This feature is only enabled for RGW objects without existing tail objects
(in other words, objects sized 4 MB or less).


Memory Usage
============

 +------------------+----------+
 | RGW Object Count |  Memory  |
 +==================+==========+
 | 1M               | 8 MB     |
 +------------------+----------+
 | 4M               | 16 MB    |
 +------------------+----------+
 | 16M              | 32 MB    |
 +------------------+----------+
 | 64M              | 64 MB    |
 +------------------+----------+
 | 256M             | 128 MB   |
 +------------------+----------+
 | 1024M (1G)       | 256 MB   |
 +------------------+----------+
 | 4096M (4G)       | 512 MB   |
 +------------------+----------+
 | 16384M (16G)     | 1024 MB  |
 +------------------+----------+