June 04, 2015

DD Replication Sizing Guide

Replication Sizing Guide

PURPOSE

There are a number of factors to be taken into consideration when setting up replication. This article explains the considerations.

Disk Space Capacity
Network Bandwidth
CPU Performance
APPLIES TO

All Data Domain systems
Software Releases versions 4.7 and above
Replication
SOLUTION

When configuring two machines for replication, one of the parameters essential to proper configuration is the need to ensure that the machines to be used are adequately sized for the job. In this regard, it is necessary to check the size of the file systems on the source and target machines to make sure that the target is sufficient in size to handle the data load that is placed upon it by the source machine.

In the case of directory replication, it is also necessary to ensure that the processing power of both machines is fairly equal in order to reduce the possibility of encountering significant lag between source and destination databases.

Determine the system capacity. At the Data Domain system prompt type:
filesys show space

==========  SERVER USAGE   ==========  
Resource             Size GiB   Used GiB   Avail GiB   Use%   Cleanable GiB*  
------------------   --------   --------   ---------   ----   --------------  
/backup: pre-comp           -        0.0           -      -                -  
/backup: post-comp    32877.8        0.0     32877.8     0%              0.0  
/ddvar                  189.0        8.7       170.7     5%                -  
------------------   --------   --------   ---------   ----   --------------  
Figure 1

This command gives a command line graphical display of the system size in GiB.

Replication System and Directory Size Restrictions

The available space (denoted in red) displays the system disk space that is available for backup data. In this example the total system size of this machine is 32877.8 GiB. If this output was from a destination Data Domain system configured for collection replication 181589 then the source Data Domain system can be any size up to 32877.8 GiB (it cannot exceed this value). If configured for directory replication 181594, the size of the source Data Domain system can be larger than the destination Data Domain system; However, the directory to be replicated at the source directory cannot be larger than the destination directory. To summarize, the destination location for replication must be equal to, or larger than, the source replication context. The following subsections describe factors that affect the performance of collection and directory replication, and provide guidelines for sizing systems to meet performance goals.
TCP/IP Performance Considerations

Both collection and directory replication use TCP/IP for networking; therefore their performance is also limited by the performance of TCP/IP. In particular, TCP/IP handles dropped packets poorly and has difficulty handling high bandwidth-high delay networks. Packet drop rates as low as 0.1% severely degrade network throughput, particularly for high bandwidth-high delay networks. If the network drops packets, the only work around is to use a WAN acceleration appliance such as those provided by Cisco, Silverpeak, Juniper or Riverbed, among others.

Networks with bandwidth <= T2, and RTT (Round Trip Time) up to one second provide good throughput.
Networks of >= T3, will encounter significant throughput degradation starting with a RTT of 300-500ms.
More generally, throughput under packet loss is approximately:

Throughput = MSS /(RTT * sqrt(p))

MSS minimum segment size (typically 1460 bytes)
RTT round trip time
p probability of packet loss
Collection Replication Considerations

Collection replication can usually saturate the network link up to about 70MB/s in network throughput, and is generally insensitive to network RTT and load on the source and destination.

Collection replication replicates two types of containers.

Data containers generated by user writes
Recipe containers generated by cleaning
Cleaning copies live data from existing containers to new, more compact containers. Recipe containers do not contain any data, just lists of segment fingerprints, which are processed and reconstituted at the destination. Recipe processing is highly I/O intensive. When processing recipe containers, the network bandwidth used by collection replication drops significantly.

Directory Replication Considerations

Directory replication throughput can be limited by both the available network bandwidth and by the filtering/packing process. The filtering/packing overheads are proportional to the amount of logical data to be replicated. Directory replication, therefore, has two throughput limits to keep in mind. The first is the network or post-compressed (post-comp) throughput, and the second is the logical or pre-compressed (pre-comp) throughput. It is important to consider both limits when sizing systems.

In addition to network and filtering/packing limits, directory replication throughput is higher when using multiple contexts and can vary significantly depending on the level of compression, data locality, and load on the source and destination systems. The following shows the ideal single and multi-context pre-comp throughput by model:

Ideal single and multi-context pre-comp throughput by model
Ideal single context pre-comp throughput: DD690+3xES20 => 137MB/s
DD460 => 80MB/s
Ideal multi-context pre-comp throughput: DD690+3xES20 => 430MB/s
DD580 => 260MB/s
DD460 => 200MB/s
Note: Sustained throughput for typical user environments is 25-50% lower.

Due to characteristics of the directory replication protocol, the pre-comp throughput is also reduced by high RTT (Round Trip Time), particularly for high-bandwidth networks.

For >= T1 networks, there will be significant throughput degradation starting at RTT >= 300ms.
For >= T3 networks, there is significant throughput degradation starting at RTT >= 50ms.
Because of the packing/compressing overhead on the source restorer, using a more CPU intensive local compression algorithm such as gz or gzfast instead of the default lz algorithm, can dramatically reduce replication throughput.

A good way to verify TCP/IP throughput is to do the following:

At the Data domain system prompt (4.7 and above) on the destination system type:
net iperf server port 2051

At the Data domain system prompt (4.7 and above) on the source system type:
net iperf client port 2051

Using /dev/urandom compensates for WAN accelerators like Riverbed, but limits performance to about 50Mbit/s (T3).

Replication uses TCP port 2051, which must be open through firewalls.
Verify that the available network bandwidth is sufficient to replicate the expected rate of post-comp changes.
Verify that the pre-comp throughput is sufficient to replicate the expected rate of logical (pre-comp) changes, use 25-50% of ideal throughput just to be safe.

Force replication over specific NIC on DataDomain systems

Forcing Replication Traffic Over a Specific Network Interface (NIC)


PURPOSE

This article explains how to force replication traffic over a specific ethernet port. This may be necessary if one of the Data Domain systems moves to a different network and replication is needed to go over an alternate route.

APPLIES TO

  • All Data Domain systems
  • Software Releases 4.1 and later
  • Replication

SOLUTION

Follow the steps below to change replication from ETH0 to ETH1. The steps must be completed on both the source and destination systems.
  1. Connect to both Data Domain systems using the Command Line Interface (CLI) 180649.
  2. Disable the file system (Skip this step if using DD OS version 5.0 and later).
    filesys disable
    sysadmin@test# filesys disable
    
    Please wait........
    
    The filesystem is now disabled.
    
    
  3. Modify the connection host of the partner system per replication context
    replication modify connection-host
    Example:
    Using the IP address:
    replication modify rctx://1 connection-host 192.168.6.48
    Using the host name:
    replication modify rctx://1 connection-host neweth1
    Note: neweth1 is a hostname of the partner system on ETH1 which gets resolved into IP address 192.168.6.48 as shown above.
  4. If a route needs to be added to allow the Data Domain system to connect to the partner system over the appropriate interface, see Adding a Static Route 180764.
  5. Disable and renable replication (temporarily disabling all contexts). The replication service must be restarted for changes to take effect.
    replication disable all
    replication enable all
  6. Verify replication network traffic is now using the new interface.
    iostat 2
    # iostat 2
    
    05/16 12:23:53
    
    INTERVAL: 2 secs
    
    "-" indicates that system is busy and unable to get recent data.
    
    -------------------------------------------------------------------------------------------------------------------------------------------  
    
    CPU  CPU  State    NFS   NFS  NFS  NFS  NFS  CIFS  eth0 MB/s   eth1 MB/s   eth2 MB/s   Disk KiB/s      Disk NVRAM KiB/s     Repl KB/s
    
    busy max  'CDBVMS' ops/s proc recv send idle ops/s in    out   in    out   in    out   read    write   busy read    write   in      out
    
    ---- ---- -------- ----- ---- ---- ---- ---- ----- ----- ----- ----- ----- ----- ----- ------- ------- ---- ------- ------- ------- -------  
    
      0%   0%              0   0%   0%   0%   0%     0     0     0     0    74     0     0       0       0   0%       0       0       0       0
    
      6%  18%              0   0%   0%   0% 100%     0     0     0     0    32     0     0       0       0   0%       0       0       0       0
    
      1%   8%              0   0%   0%   0% 100%     0     0     0     0    48     0     0       0     186   0%       0      76       0       0
    
      1%   6%              0   0%   0%   0% 100%     0     0     0     0    50     0     0       0    4137   1%       0    1685       0       0
    
    
    • The above example shows outbound network activity on ETH1 (as indicated in red).
7.  Run the command #filesys enable if your DDOS is 4.9.x and below.  For DDOS 5.0 and above Disabling the file system to perform this procedure is not required.
Note: If your problem persists after executing the steps in this article, please contact your contracted support provider, Gather an Autosupport and Create a service request

Add static route on DataDomain

Adding a Static Net or Host Route

PURPOSE

A static route allows the administrator to specify the network interface (NIC) through which network traffic will be directed to a target. This allows the administrator to balance network load across the available NICs

APPLIES TO

  • All Data Domain systems
  • All Software Releases
  • Replication

SOLUTION

At the Data Domain system command prompt type:
Adding a route for specifc network to go through a specific interface on DD
# route add -net netmask <> gw eth#
Example:
# route add -net 192.168.1.0 netmask 255.255.255.0 gw 192.168.1.2 eth0b
Note: It will add a network route for the 192.168.1.0 network using gateway 192.16.1.2 via eth0b interface on DDR
Adding a route for a specific host on DD
route add -host gw [interface]
Example:
# route add -host 192.168.6.48 gw 10.10.10.1 eth1
Note: It adds route for a host 192.168.6.48 on the interface eth1 of Data Domain system.