August 11, 2009

OS tuning parameters

  • Windows Registry Settings
    1. Open REGEDT32 and navigate to: HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\SERVICES\ AFD\PARAMETERS
    2. Add a new DWORD value to the DefaultSendWindow key and set the value to 65536 (decimal).
    3. Add a new DWORD value to the DefaultReceiveWindow key and set the value to 65536 (decimal).
    4. Within REGEDT32, navigate to the following location: HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\SERVICES\ TCPIP\PARAMETERS
    5. Add a new DWORD value to the GlobalMaxTcpWindowSize key and set the value to 65536 (decimal).
    6. Add a new DWORD value to the TcpWindowSize key and set the value to 65536 (decimal).
    7. Add a new DWORD value to the Tcp1323Opts key and set the value to 3.
    8. Restart the Windows server.
  • Windows Netbackup Buffer Changes
    • {NB INSTALL PATH}/NetBackup/NET_BUFFER_SZ = 1048576
    • {NB INSTALL PATH}/NetBackup/db/config/SIZE_DATA_BUFFERS = 65536
    • {NB INSTALL PATH}/NetBackup/db/config/SIZE_DATA_BUFFERS_DISK = 262144
    • {NB INSTALL PATH}/NetBackup/db/config/NUMBER_DATA_BUFFERS = 16
    • {NB INSTALL PATH}/NetBackup/db/config/NUMBER_DATA_BUFFERS_DISK = 32
  • Linux Kernel Changes to /etc/sysctl.conf for most DD models
    • net.core.rmem_default = 262144
    • net.core.wmem_default = 262144
    • net.core.rmem_max = 262144
    • net.core.wmem_max = 262144
    • net.ipv4.tcp_rmem = 4096 262144 1048576
    • net.ipv4.tcp_wmem = 4096 262144 1048576
    • vm.lower_zone_protection = 250
    • Run sysctl -p
  • Linux Kernel Changes to /etc/sysctl.conf for DD565, DD580 and DD690 models
    • net.core.rmem_default = 262144
    • net.core.wmem_default = 262144
    • net.core.rmem_max = 2097152
    • net.core.wmem_max = 2097152
    • net.ipv4.tcp_rmem = 8192 524288 2097152
    • net.ipv4.tcp_wmem = 8192 524288 2097152
    • vm.lower_zone_protection = 250
    • Run sysctl -p
  • Solaris Kernel Changes
    • Create a file /etc/rc3.d/S90ddr. Enter the following two lines in the file:
      • ndd -set /dev/tcp tcp_recv_hiwat 131072
      • ndd -set /dev/tcp tcp_xmit_hiwat 131072
    • In the file /etc/system, add the following lines:
      • set nfs:nfs3_max_threads=16
      • set nfs:nfs3_async_clusters=4
      • set nfs:nfs3_nra=16
      • set rpcmod:clnt_max_conns=1
      • set fastscan=131072
      • set handspreadpages=131072
      • set maxpgio=65536
  • Unix Netbackup Buffer Changes
    • echo 1048576 > /usr/openv/netbackup/NET_BUFFER_SZ
    • echo 262144 > /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS
    • echo 262144 > /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS_DISK
    • echo 16 > /usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS
    • echo 32 > /usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS_DISK

Backup performance and NIC cards

If backup or restore jobs are running slowly, verify that the network interface cards (NIC) are set to full duplex. Half duplex often causes poor performance. For assistance on how to view and reset duplex mode for a particular host or device, consult the documentation that the manufacturer provides, or try the following.

1 Log in to the host that contains the network interface card(s).

2 Enter the following command to view the current duplex setting.

ifconfig -a On some operating systems, this command is ipconfig. Example output from a NAS filer:
e0: flags=1948043 mtu
1500
inet 10.80.90.91 netmask 0xfffff800 broadcast 10.80.95.255
ether 00:a0:98:01:3c:61 (100tx-fd-up) flowcontrol full
e9a: flags=108042 mtu 1500
ether 00:07:e9:3e:ca:b4 (auto-unknown-cfg_down) flowcontrol full
e9b: flags=108042 mtu 1500
ether 00:07:e9:3e:ca:b5 (auto-unknown-cfg_down) flowcontrol full
In this example, the network interface that shows “100tx-fd-up” is running in full duplex. Only interface e0, the first in the list, is at full duplex.

Note: A setting of “auto” is not recommended, because devices can auto negotiate to half duplex.

3 The duplex mode can be reset by using the ifconfig (or ipconfig) command. For example:
ifconfig e0 mediatype 100tx-fd

4 For most hosts, you can set full-duplex mode permanently, such as in the host’s /etc/rc files. Refer to the host’s documentation for more information.

How to Import Images in NetBackup

Step by step procedure on how to import NetBackup backup images via the NetBackup Administration Console GUI.

Details:
Importing media requires a 2 step process, first a Phase 1 import must be run followed by a Phase 2 import.

- The Phase 1 import procedure assigns the media to the Media Server in the EMM database, and reads the media to create a HEADER file in the ImageDB for each backup found. If the backup image being imported spans media, a Phase 1 needs to be performed on ALL media before running the Phase 2.
NOTE: The Phase 1 option is referred too as "Initiate Import" in the NetBackup Administration GUI.

- The Phase 2 reads the media more thoroughly and creates the FILES file in the images database. The FILES file contains a list of all the files contained in the backup image.


The screen shots below will help identify the steps involved with importing a image.

1. Under the Catalog section of the console, select Actions -> Initiate Import.


2. Enter the Media ID and Media Server in the Initiate Import box.


3. After the Phase 1 has been completed on all media, select the criteria in which to search for backup
image(s) produced during the phase 1 import. The main search parameters will be the media ID and/or a date time range covering the backup date. Press Search Now to find the backup image(s) available for Phase 2 import.


4. Select/highlight one or more of the backup image(s) available for importing, right click on the backup
image(s) and select Import (Do not select "initiate import" again). This section of the import may take as long as the original backup (perhaps longer if the backups were multiplexed).

Troubleshooting Status Code 58

STATUS CODE 58: Can't connect to client. Troubleshooting procedures for Status 58 errors.

Exact Error Message
status code 58: can't connect to client

Details:
Overview:
A status 58 error results from either a TCP SYN request sent to a client from the NetBackup (NBU) server that was not acknowledged or the server was not resolvable so a TCP SYN request was not sent. The majority of the causes of this issue are due to the client not listening on the BPCD or VNETD ports, the master server unable to resolve the client by hostname, or the client can not resolve the NBU server by its IP address. NBU clients running 6.x release, though it uses the vnetd daemon for incoming connections, it still requires bpcd to perform the hostname compare to authenticate the NBU server.

Troubleshooting:
Below are steps you can use to help isolate the issue.

When troubleshooting status 58 errors on a NetBackup client, the first thing to test is to whether or not you can access the client from the client Host Properties from the master server. If that works the same ports are involved when backing up the client as to when you access the client host properties. Connecting to the client host properties from the master server would prove the master server can access the client without any problems. So if using a storage unit on the master server you should be able to backup the client without getting the status 58 error.

In 6.x environments you can use the command- bptestbpcd to verify you can connect to both the vnetd and bpcd ports on the client server.
e.g.
bptestbpcd -verbose -debug -client

See the tech note 277901 for additional details on use of that command. http://entsupport.symantec.com/docs/277901

If still getting status 58 errors after ensuring the master server can connect to the client, focus on the media server used during the backup. The problem is more than likely with the client hostname lookup or gethostbyname call made to DNS or local host file on the media server.

If the master server can not connect to the client when trying to access the client host properties, or the point of failure appears to be with the media server, below are steps you can take to further troubleshoot this issue-

To test the master/media server resolution of the client server hostname run the following command-
/netbackup/bin/./bpclntcmd -hn

Since reverse lookups is part of the NBU server to client connections make sure the client can also be resolved by its IP address-
/netbackup/bin/./bpclntcmd -ip

On the client test the resolution of the NBU servers by issuing the same commands. These commands should be run against all of the media servers that may be trying to backup the client server-
/netbackup/bin/./bpclntcmd -hn

/netbackup/bin/./bpclntcmd -ip

Verify you are able to "ping" the client's IP address from the NBU server. If this fails consult with your Network Administrator and client server System Administrator to resolve the layer 3 or IP network connectivity. Double check the server's NIC's IP address and netmask to ensure they are configured correctly.

A very useful command to help with testing client and server resolution is the command -
bpclntcmd -pn when run from a NBU client or media server.
What that command does when ran on a client as example,
1. The client gets the first server listed in it's server's list and knowing it is the master server does a forward lookup of that hostname or a gethostbyname call to DNS or host file.
2. Once that is successful you should see the message-"expecting response from (master server hostname)". If the forward lookup fails or the client does not have the bprd port 13720 defined in the registry for Windows clients or in /etc/services for UNIX clients you will not see that message displayed. Create a bplist log on the client at verbose 5 and rerun that command.
3. When the client connects to the bprd port on the master server, the master server performs two functions. It first does a simple reverse lookup of the incoming IP address and that is the first name returned by the command. Then it goes into the policy database and lookup what hostname it is using when backing up the client server. That is the second hostname returned by the command. See the example below-

In this example the client hostname is dotto.veritas.com and the master server is hal9000.veritas.com-

C:\Program Files\VERITAS\NetBackup\bin>bpclntcmd -pn
expecting response from server hal9000.veritas.com
dotto.veritas.com dotto.veritas.com 10.82.110.6
3412
  • Creating a bplist log on the client at verbose 5 should show the attempt to resolve the master server and make a connection request to the NBU server.
  • Creating a bprd log on the master server should show the incoming connection as well.
  • The same bprd log when create on the master server would show the attempt to connect to the client using the client host properties.

Resolution:
The next step if the client is still failing with a status 58 after you verified the servers are resolvable to each other using the bpclntcmd command, is to ensure the client server has the bpcd port in listening mode-

For Windows clients-
netstat -a and look for the BPCD and VNETD port are listening-
e.g.
TCP dotto:vnetd dotto.veritas.com:0 LISTENING
TCP dotto:bpcd dotto.veritas.com:0 LISTENING

For UNIX client verify the ports are listening by issuing the same command-

# netstat -a |grep bpcd
*.bpcd *.* 0 0 49152 0 LISTEN

#netstat -a |grep vnetd
*.vnetd *.* 0 0 49152 0 LISTEN

For Windows clients if it not listening on the ports verify bpinetd is running on the client using task mgr.


For UNIX clients verify BPCD and VNETD are defined in /etc/service and inetd-
# vi /etc/services
bpcd 13782/tcp bpcd
vnetd 13724/tcp vnetd

# vi /etc/inetd.conf
bpcd stream tcp nowait root /usr/openv/netbackup/bin/bpcd bpcd
vnetd stream tcp nowait root /usr/openv/bin/vnetd vnetd

For UNIX client server if the /etc/services file for bpcd shows it is using tcpd then there is a TCP wrapper on the bpcd port 13782-
This example shows a TCP wrapper running on the bpcd port when defined in inetd.conf-
bpcd stream tcp nowait root /usr/local/bin/tcpd /usr/openv/netbackup/bin/bpcd bpcd


Locally on the client create a bpcd log and increase the verbose level to 5.

On the client server to test whether the bpcd binary is executable and will generate a log issue the following command-
UNIX- netbackup/bin/./bpcd

Windows- program files\VERITAS\netbackup\bin:bpcd

That should generate the following log entry and prove the binary is executable and generates a log entry-

BPCD log. You should see something like this -
16:47:45.986 [5296.3376] <2> bpcd main: offset to GMT 21600
16:47:45.986 [5296.3376] <2> bpcd main: Got socket for input 3
16:47:46.017 [5296.3376] <2> logconnections: getsockname(3) failed: 10038
16:47:46.017 [5296.3376] <16> bpcd setup_sockopts: setsockopt 1 failed: h_errno 10038
16:47:46.017 [5296.3376] <2> bpcd main: setup_sockopts complete
16:47:46.158 [5296.3376] <2> vauth_acceptor: ..\libvlibs\vauth_comm.c.332: Function failed: 17 0x00000011
16:47:46.158 [5296.3376] <16> bpcd main: authentication failed: 17

It is normal to have it end with the <16> bpcd main: authentication failed: 17 error. But this test proves the binary is functioning correctly.

Then to test the bpcd port locally on the client server from the command prompt, run this telnet test using the loopback interface-

telnet localhost 13782 or

telnet 127.0.0.1 13782

That telnet to the loopback interface using the localhost hostname should generate a log that looks like this-

16:49:35.352 [3336.4360] <2> bpcd main: offset to GMT 21600
16:49:35.352 [3336.4360] <2> bpcd main: Got socket for input 376
16:49:35.352 [3336.4360] <2> logconnections: BPCD ACCEPT FROM 127.0.0.1.3845 TO 127.0.0.1.13782
16:49:35.352 [3336.4360] <2> bpcd main: setup_sockopts complete
16:49:35.414 [3336.4360] <2> bpcd peer_hostname: Connection from host localhost (127.0.0.1) port 3845
16:49:35.414 [3336.4360] <2> bpcd valid_server: comparing hal9000.veritas.com and localhost
16:49:35.414 [3336.4360] <4> bpcd valid_server: localhost is not a master server
16:49:35.414 [3336.4360] <16> bpcd valid_server: localhost is not a media server either
16:49:39.189 [3336.4360] <16> bpcd main: read failed: The operation completed successfully.

It generates the <16> error because it does not understand what telnet is and also fails to authenticate the localhost as a NBU server.

For Linux clients if they are missing a library file required by bpcd or vnetd you would get this type of error message-

telnet localhost bpcd
Trying 127.0.0.1...
Connected to clientname.domainname.com
Escape character is '^]'.
bpcd: error while loading shared libraries: libstdc++-libc6.2-2.so.3:
cannot open shared object file: No such file or directory

Contact the OS vendor to obtain the required library file.

If the telnet to localhost works try the same telnet test from the master server using the client hostname to the bpcd port-
telnet (client hostname) 13782

That should generate a log entry as seen below showing the master server being accepted-

16:52:46.077 [1160.5436] <2> bpcd main: offset to GMT 21600
16:52:46.077 [1160.5436] <2> bpcd main: Got socket for input 400
**The client sees the incoming connection from the master server using the IP address 10.82.105.254**
16:52:46.077 [1160.5436] <2> logconnections: BPCD ACCEPT FROM 10.82.105.254.44554 TO 10.82.110.6.13782
16:52:46.077 [1160.5436] <2> bpcd main: setup_sockopts complete
**Performs a reverse lookup of the incoming IP address and gets the hostname**
16:52:46.092 [1160.5436] <2> bpcd peer_hostname: Connection from host hal9000.veritas.com (10.82.105.254) port 44554
**Then compares the hostname to the server list on the client**
16:52:46.092 [1160.5436] <2> bpcd valid_server: comparing hal9000.veritas.com and hal9000.veritas.com
**The hostname compare succeeds**
16:52:46.092 [1160.5436] <4> bpcd valid_server: hostname comparison succeeded
16:52:49.476 [1160.5436] <16> bpcd main: read failed: The operation completed successfully.

If any of these telnet tests fails to generate a log entry then there is something outside of NBU that is preventing access to the client's port. Most likely firewall software or a TCP wrapper was placed on the ports. Checking what services the client is running for Windows clients (Try Turning off MS firewall software for a quick test) or check the host.allow/deny files on the UNIX clients would be a good place to review next.

For UNIX clients you can try removing the BPCD port from inetd.conf and run the command "bpcd -standalone" to see if that gets the port listening.

Note: When testing connections to bpcd note the delta time difference between the bpcd log message e.g. "16:52:46.077 [1160.5436] <2> logconnections: BPCD ACCEPT FROM 10.82.105.254.44554 TO 10.82.110.6.13782 " and the log message "16:52:46.092 [1160.5436] <4> bpcd valid_server: hostname comparison succeeded" Due to the hard coded bprbm timeout of 1 minute this time difference can not exceed 60 seconds. If it exceeds that timeout then try using a host file entry to avoid DNS latency.

If the client had been working at NBU 5.x release but now fails after upgrading to NBU 6.x, you can try using 5.x defaults for the client connection to see if that gets the failing client working again. To do so, from the Administration Console:
1. Launch the NetBackup Administration Console, connecting to the master server
2. Expand Host Properties in the left pane
3. Select Master Server in the left pane
4. Click the name of the master server in the right pane
5. Select the Client Attributes section
6. Add the name of the client in question if it isn't listed
7. In the Connect Options tab for the client, make the following changes:
BPCD connect back -> "Random port"
Ports -> "Reserved port"
Daemon connection port -> "Daemon port only"

Note: You DO NOT have to stop and restart when making changes to the master server client attribute tab. Just simply click apply and okay to commit those changes.

If unable to resolve this issue place a call to Symantec Technical Support for NetBackup for help in troubleshooting this issue.

Troubleshooting Slow Backups

Some common issues to look about for Slow backups :

* Are you using compression? If so, is it hardware based, or software based? If it is software based, consider turning it off for now. Try your backups without compression to see what the
speed is. It is likley that your are running into a group of files that are already compressed. NetBackup may be trying to compress a file that is already compressed, and this wil slow it
down.

* Are you backing up a server that has a LOT of small files? This will cause a drop in speed.

* Are you using the allow multiple streams in the policy? If so,turn it off and try again. If not, turn it on and try again.

* Are there other process' that are running on the network at the time these backups are taking place? If so, consider adjusting your schedule.

* Are you setting your drives to multiplex? If so, make sure that you are not setting it too high.

* Is this the only client with this issue?

* Are you using the Open File Option? if so, disable it and test your speed.

* Is this server in more than one policy? If so, are these policies running during the same time window?

* If this is a GBE network you may have a server with a setting problem on the NIC. Some NIC's cannot do the 1000 auto and must be set to 1000 full. Some older NICs cannot handle 1000 full and must be pulled and replaced with ones that can. This solved my issues on four separate systems and may be your issue, otherwise look at switches for defective performance..

February 20, 2009

Policy tracking

Overview:
NetBackup has a Policy Management utility to track and report when a policy is deleted, or when a schedule or client are removed.  This feature does not track policy modifications or when items are added to a policy.  However, it will track and report when a policy is deleted or when a schedule or a client is removed if the policy was previously inventoried and tracked using this feature.

To enable this feature, use the following procedure:

1. Create the following touchfile \NetBackup\LOG_CLASS_QUERIES

2. Set the Inventory flag with the following command:
Windows: \netbackup\bin\admincmd\bppllist -inventory
Unix: /install_path/netbackup/bin/admincmd/bppllist -inventory

3. Enable Inventory Tracking by executing bppllist -inventory via a script or system schedule.

Examples:
a. Create an end notify script to execute after a job or a session:
b. Create a scheduled process (Windows Scheduled Task or UNIX cron) to execute bppllist -inventory:

For manual policy inventory tracking, simply execute bppllist -inventory via a command line or via batch file.

4. Monitor the bperror log, All Log Entries, or Problems report for deleted policies, schedules and clients.  Along with the bperror report, inventory tracking will produce an Application Event Log record on Windows when a policy is deleted.
Example entry: Policy inventory found deleted policy Test

5. Periodically truncate the PolicyQueries.log.  This resides in the logs directory.  The user is responsible for the administration of the log file (periodic truncation, etc.).


Recommended best practices:
- Automate Inventory Tracking via a Windows Schedule Task or Unix cron to execute bppllist -inventory at least once a day.
- Review the NetBackup Problems report daily via the bperror command or the All Log Entries report.  Records will appear here that indicate when policies, schedules or clients have been removed.


How it works:
After the touchfile is created and bppllist -inventory is executed, policy information is written into the classinv file.

Windows:
\netbackup\db\config\classinv

Unix:
/install_path/netbackup/db/config/classinv

Please note that the very first time that bppllist -inventory is executed, Inventory Tracking is not accomplished. Inventory Tracking takes place when bppllist -inventory is run a second time and thereafter.

As many as three new folders are created in NetBackup to facilitate this feature.  If they do not exist and are required, they may be created the first time that bplist -inventory is run.

Windows:
\netbackup\db\cltmp
\netbackup\db\cltmp_internal
\netbackup\db\cltmp_template

Unix:
/install_path/netbackup/db/cltmp
/install_path/netbackup/db/cltmp_internal
/install_path/netbackup/db/cltmp_template

The feature also creates a new debug log called PolicyQueries.log:

Windows:
\netbackup\logs\PolicyQueries.log

Unix:
/install_path/netbackup/logs/PolicyQueries.log

Inventory tracking will report the loss of a policy, schedule or client. The classinv file and PolicyQueries.log are generated after the inventory flag is set and policy tracking is enabled.  After the initial inventory and subsequent inventory tracking commands have completed, the changes will be recorded in the bperror log.

Note: The more often inventory tracking is accomplished and the bperror reports are reviewed, the less likely a deleted policy/schedule/client will go unnoticed for an extended period of time.   Please note that inventories that are run too frequently can have an adverse affect on performance, depending upon the NetBackup environment/configuration.


Output Examples:
Problems Report
10/16/2008 12:43:55   carpediem   carpediem   Error      0        General   Policy  inventory found deleted policy dummy
10/16/2008 12:51:58   carpediem   carpediem   Error      0        General   Policy  inventory found deleted client membrane for policy policy1234
10/16/2008 12:51:58   carpediem   carpediem   Error      0        General   Policy  inventory found deleted schedule full for policy policy1234


Troubleshooting:
Ensure that the dash is present when typing bppllist -inventory.  If the dash is not typed, an error will appear stating that the policy does not exist (Figure 1)

Figure 1
# ./bppllist inventory
the specified policy does not exist in the configuration database (230)