Cleaning is an important process on a Data Domain system. It is important because it is used to prevent overwriting data. Unfortunately, this process can impact the performance of a system, and it can take more than 24 hours to complete in unusual cases. This document will help identify what is happening during a particular phase. It is possible to get an idea of how long previous cleaning sessions have taken by searching the messages log for "cleaning completed".
DD OS 4.2 and earlier:
In DD OS 4.2 and earlier, there are 6 phases. These steps are also present in 4.3.
An understanding of them will help understanding the 4.3 and above cleaning phases.
Candidate - The candidate phase is run to select a subset of data to clean and remember what is in the data.
Enumerate - enumerate all the files in the logical space and remember what data is alive.
Merge - do an index merge to flush index data to disk.
Filter - if duplicate data has been written, find out where it is.
Copy - copy live data forward and free the space it occupied
Summary - create a summary of the live data that's on the system.
DD OS 4.3 and up to DD OS 5.4, explanation of the phases listed below:
Beginning in DD OS 4.3, the cleaning process (Full Cleaning) will take one of two paths, depending on the number of containers in use. This is due to a limit on the number of containers that can be cleaned on a single cleaning run.
Sampling is required for a filesystem that uses more containers than the limit. In that case, the cleaning process will perform focused cleaning on a subset of containers that have the most reclaimable space. All cleaning phases below will be followed including phases 5-8.
Note that phases 6-9 will restrict their working set to the candidate containers obtained in phase 5.
Different DDR models have different amounts of memory so the amount of physical space that can be cleaned in a single cleaning run depends on that. On systems that are fairly empty with the number of containers used below 25-30% of the total container set, all the physical space can be cleaned in a single cleaning run. The cleaning process will complete much more quickly for these systems because the cleaning process will skip directly from phase 4 to phase 9, the copy phase, eliminating phases 5-8. Note that the phases skipped will be displayed as 100% complete.
Pre-enumerate - enumerate all the files in the logical space. It may only sample part of the data to help with estimating where live data is located in physical space.
Pre-merge - do an index merge to flush index data to disk.
Pre-filter - if duplicate data has been written, find out where it is.
Pre-select - select the physical space that has the most dead data. This is what we want to clean.
At this point the cleaning process will follow one of the two paths described above, depending on the number of containers in the filesystem.
Other information about GC/cleaning:
Note that the phase number may be different for the version of DDOS on your system. For Full Cleaning, phase 1 (pre-enumeration) and 6 (enumeration) can take a long time when the following conditions are present:
DD OS 4.2 and earlier:
In DD OS 4.2 and earlier, there are 6 phases. These steps are also present in 4.3.
An understanding of them will help understanding the 4.3 and above cleaning phases.
Candidate - The candidate phase is run to select a subset of data to clean and remember what is in the data.
Enumerate - enumerate all the files in the logical space and remember what data is alive.
Merge - do an index merge to flush index data to disk.
Filter - if duplicate data has been written, find out where it is.
Copy - copy live data forward and free the space it occupied
Summary - create a summary of the live data that's on the system.
DD OS 4.3 and up to DD OS 5.4, explanation of the phases listed below:
Beginning in DD OS 4.3, the cleaning process (Full Cleaning) will take one of two paths, depending on the number of containers in use. This is due to a limit on the number of containers that can be cleaned on a single cleaning run.
Sampling is required for a filesystem that uses more containers than the limit. In that case, the cleaning process will perform focused cleaning on a subset of containers that have the most reclaimable space. All cleaning phases below will be followed including phases 5-8.
Note that phases 6-9 will restrict their working set to the candidate containers obtained in phase 5.
Different DDR models have different amounts of memory so the amount of physical space that can be cleaned in a single cleaning run depends on that. On systems that are fairly empty with the number of containers used below 25-30% of the total container set, all the physical space can be cleaned in a single cleaning run. The cleaning process will complete much more quickly for these systems because the cleaning process will skip directly from phase 4 to phase 9, the copy phase, eliminating phases 5-8. Note that the phases skipped will be displayed as 100% complete.
Pre-enumerate - enumerate all the files in the logical space. It may only sample part of the data to help with estimating where live data is located in physical space.
Pre-merge - do an index merge to flush index data to disk.
Pre-filter - if duplicate data has been written, find out where it is.
Pre-select - select the physical space that has the most dead data. This is what we want to clean.
At this point the cleaning process will follow one of the two paths described above, depending on the number of containers in the filesystem.
- Candidate - due to memory limitations, only a fraction of physical space can be cleaned in each cleaning run. The candidate phase is run to select a subset of data to clean and remember what is in the data.
- Enumerate - enumerate all the files in the logical space and remember what data is active.
- Merge - do an index merge to flush index data to disk.
- Filter - determine what duplicate data has been written and find out where it is.
- Copy - copy live data forward and free the space it used to occupy
- Summary - create a summary of the live data that's on the system.
Other information about GC/cleaning:
Note that the phase number may be different for the version of DDOS on your system. For Full Cleaning, phase 1 (pre-enumeration) and 6 (enumeration) can take a long time when the following conditions are present:
- Poor Lp locality: Cleaning will be slowed if there is significant fragmentation across containers.
- Very high global compression: If 2 DDRs consume the same amount of physical space (i.e. # of containers) and one DDR has x50 global compression and the other has x100 global compression, then the time it takes to enumerate the second DDR would be longer than the first DDR because we have a much larger logical space to traverse in the second DDR.
- Many small files
- Replication is lagging behind.
- For Full Cleaning, the runtime of phase 1 and phase 6 depends on the logical size of the filesystem (i.e. Logical Bytes).
- The runtime of other phases depends on the physical size of the fileystem (i.e. # of containers in use).
- Performance bottleneck
- Before DD OS 4.5, index merge could be a performance bottleneck. This has been fixed in DD OS 4.5 and beyond.
- Pre-enumeration/enumeration/copy phases are the most time-consuming phases in GC/cleaning.
- Copy phase (phase 9 in Full Cleaning or Phase 11 in Physical Cleaning) can take a long time for the following cases:
- High live percentage of containers selected for copy forward: Not enough physical data deleted before running GC/cleaning. It is possible that GC/cleaning is being run more often than needed.
- Additional processing: Re-encryption, recompression, features, sketching
- This feature exists in DDOS 5.0 and beyond. Upgrade from pre-5.0 to 5.0 and beyond will experience slowness in the first round of GC/cleaning since features are computed in each container.
- Enabling delta replication requires sketch, which will require an extra cycle in GC/cleaning to recompute sketch during the copy phase.
- GZ local compression is significantly more expensive than LZ.
- The performance cost of encryption and key-rotation (5.2) is significant.
- Note that the percentage complete in the enumeration phase (phase 6 in Full Cleaning or phase 9 in Physical Cleaning) can actually drop if new files are added to the system while cleaning is in progress, or if a new fastcopy or checkpoint is created.
- Copy phase (phase 9 in Full Cleaning or phase 11 in Physical Cleaning) generally takes the longest as this is where deleting/copying takes place. This is where the results of the cleaning can be observed with the "df" command.
No comments:
Post a Comment