August 23, 2010

NDMP restore exits with Staus 0, but nothing is restored

Network Data Management Protocol restore from a duplicated image exits with status 0, but nothing was restored to the filer.

Details:

The problem occurs when a Media Manager Storage Unit that is configured with Maximum Fragment Size,which is set to a specific value, is used for duplicating Network Data Management Protocol (NDMP) backup images.

NDMP backups are sent to an NDMP Storage Unit. Backups to an NDMP Storage Unit do not fragment images. Duplication (performed manually or through VERITAS NetBackup (tm) Vault) of NDMP backups to a Media Manager Storage Unit with Maximum Fragment Size set to a specific value will cause duplicated images to be fragmented. During the restore from these duplicated images, the NDMP host does not know how to handle the fragmentations introduced by the backup tape manager (bptm) during duplication. The restore will eventually complete with no error, but there will not be any files restored. This is also found to affect the Direct Access Restore (DAR) functionality.


Workaround:

To restore images from such tapes, it is necessary to reduplicate the images to a Media Manager Storage Unit without setting the Maximum Fragment Size. The process will create a new copy with a single fragment image that can be recognized by NDMP hosts.


Solution:

For NDMP duplications, select a Media Manager Storage Unit without setting the Maximum Fragment Size.

How to change and set the Network Data Management Protocol (NDMP) progress timeout value

DOCUMENTATION: How to change and set the Network Data Management Protocol (NDMP) progress timeout value.

Details:

Some Network Data Management Protocol (NDMP) operations (such as a non-Direct Access Recovery (DAR) restore) may take a long time. The default Veritas NetBackup (tm) behavior is an 8-hour timeout value when waiting for NDMP operations to complete. It is possible to modify this timeout value by creating theNDMP_PROGRESS_TIMEOUT file on the NetBackup media server. The file is created in the/usr/openv/netbackup/db/config/ directory on a UNIX/Linux media server, and in the\veritas\netbackup\db\config directory on a Windows media server.
The file must only contain a single number, which is the desired timeout value, in minutes.

To create the file, open up a UNIX or WINDOWS command prompt on the media server, change to the configdirectory, and execute the following command:

echo 1440 > NDMP_PROGRESS_TIMEOUT

This creates the NDMP_PROGRESS_TIMEOUT file, and specifies a timeout value of 24 hours (1440 minutes).

Starting with NetBackup patch levels 6.0MP7 and 6.5.4, a higher timeout above 1440 minutes can be used in the file.
The new maximum limit of 7 days (10080 minutes) can be used on media servers with NetBackup patch levels 6.0MP7 / 6.5.4 and higher.

After adding or modifying the file, stop and restart the NetBackup daemons on the media server.

January 26, 2010

Solaris 10 tuning

DOCUMENTATION: Tuning Solaris 10 shared memory for NetBackup Media Server processes (bptm)

Exact Error Message
EXIT STATUS 89: problems encountered during setup of shared memory

Details:
Introduction
Prior to Solaris 10, shared memory tuning was performed by adding or changing entries in the /etc/system file and rebooting the server - see TechNote 295295 (linked below) for advice on tuning Solaris 9 and earlier.

With the introduction of Solaris 10, Sun have deprecated the use of /etc/system settings and introduced the Resource Controls Facility. This allows projects to be created for applications and the resources tuned dynamically on a per-project basis.

As part of this change, Sun set the default amount of shared memory to be 25% of the total system memory. For most NetBackup media server configurations this default will be sufficient. If not then NetBackup jobs may fail with the error:

EXIT STATUS 89: problems encountered during setup of shared memory

If this happens, the instructions in this TechNote should be used.

This TechNote is divided into two sections:
  • First Time Setup
  • Modifying the Tuning

To check if the first time setup has already been performed, use the projects command. If first time setup has not been performed, a message similar to the following will be displayed:

# /bin/projects -l NetBackup
projects: project "NetBackup" does not exist

First Time Setup
Here are the steps to create a project for NetBackup and have the Solaris 10 Service Management Facility (SMF) launch NetBackup daemon processes in that project. This results in the bptm process being launched with the values tuned in the NetBackup project.

First create a NetBackup project (id = 1000) and "tune" it. The following example sets the maximum amount of shared memory to 8GB:

(The commands below have had line breaks inserted for readability, but each should be entered on a single command line.)

# /usr/sbin/projadd -U root -c "NetBackup resource project" -p 1000 NetBackup

# /usr/sbin/projmod -a
-K 'project.max-shm-ids=(privileged,256,deny)' NetBackup

# projmod -a
-K 'project.max-sem-ids=(privileged,1024,deny)' NetBackup

# projmod -a
-K 'project.max-msg-ids=(privileged,256,deny)' NetBackup

# projmod -a
-K 'project.max-shm-memory=(privileged,8589934592,deny)' NetBackup

Note: TechNote 183702 (linked below) contains advice on appropriate sizing for NetBackup shared memory requirements.

This may be checked with the projects command:
# projects -l NetBackup
NetBackup
projid : 1000
comment: "NetBackup resource project"
users : root
groups : (none)
attribs: project.max-msg-ids=(privileged,256,deny)
project.max-sem-ids=(privileged,1024,deny)
project.max-shm-ids=(privileged,256,deny)
project.max-shm-memory=(privileged,8589934592,deny)

Now arrange for SMF to launch the NetBackup daemon processes in the NetBackup project just created. This should be performed for both the vnetd and bpcd daemons.

Find the NetBackup vnetd service:
# /usr/sbin/svccfg list | grep vnetd
network/vnetd/tcp

See the current settings:
# svccfg -s network/vnetd/tcp listprop | grep project
inetd_start/project astring default

Change the service to run in the NetBackup project:
# svccfg -s network/vnetd/tcp setprop inetd_start/project=NetBackup

Check the change is applied:
# svccfg -s network/vnetd/tcp listprop | grep project
inetd_start/project astring NetBackup

Now do the same for the bpcd daemon:
# svccfg list | grep bpcd
network/vnetd/tcp
# svccfg -s network/bpcd/tcp listprop | grep project
inetd_start/project astring default
# svccfg -s network/bpcd/tcp setprop inetd_start/project=NetBackup
# svccfg -s network/bpcd/tcp listprop | grep project
inetd_start/project astring NetBackup


Check the effects of the tuning on a running process with the prctl command. In this example, the bpps command on the media server shows the process id of bptm to be 3428 and the output of the prctl command shows the shared memory setting to be 8GB:
# /usr/openv/netbackup/bin/bpps | grep bptm
root 3428 1 0 12:05:39 ? 0:00 bptm
# /bin/prctl -n project.max-shm-memory 3428
process: 3428: bptm
NAME PRIVILEGE VALUE FLAG ACTION RECIPIENT
project.max-shm-memory
privileged 8.00GB - deny -
system 16.0EB max deny -


Modifying the Tuning
Once first time setup has been performed, it is necessary to use different command options to modify the settings; repeating the first time settings can lead to unpredictable results.

First ensure the NetBackup daemons are being launched by SMF in the NetBackup project:
# svccfg -s network/vnetd/tcp listprop | grep project
inetd_start/project astring NetBackup
# svccfg -s network/bpcd/tcp listprop | grep project
inetd_start/project astring NetBackup

Now review the NetBackup project settings:
# projects -l NetBackup
NetBackup
projid : 1000
comment: "NetBackup resource project"
users : root
groups : (none)
attribs: project.max-msg-ids=(privileged,256,deny)
project.max-sem-ids=(privileged,1024,deny)
project.max-shm-ids=(privileged,256,deny)
project.max-shm-memory=(privileged,8589934592,deny)

To modify a project attribute, use the projmod command with the -s switch. For example, to increase the maximum shared memory setting to 10GB (10737418240), use the command:
# projmod -s -K 'project.max-shm-memory=(privileged,10737418240,deny)' NetBackup

Check the change was successful and that the project contains just one entry for the tunable attribute that you specified:
# projects -l NetBackup | grep max-shm-memory
project.max-shm-memory=(privileged,10737418240,deny)

Catalog Recovery Procedure

DOCUMENTATION: Catalog recovery procedure: an example of disaster recovery.

Details:
Sample Catalog recovery procedure
This is only an example of procedures used. Not all steps are required and may not be applicable to all situations.

These directions assume that NetBackup has already been installed on the DR (disaster recovery) server.

Note: these instructions describe a full recovery of the master for redeployment into production.

1. Load media into DR site robot.
For safety, only load the catalog tape initially.

Alternately, load all media as write protected.

2. Validate the bp.conf and vm.conf (if applicable) configuration files.
Additional things to check:
  • Check /usr/openv/volmgr/vm.conf for a MEDIA_ID_BARCODE_CHARS entry, if it is needed.
  • Check /usr/openv/netbackup/db/config for the touch files NUMBER_DATA_BUFFERS and SIZE_DATA_BUFFERS.
  • Check /usr/openv/netbackup for the touch file NET_BUFFER_SZ.
3. In the GUI, run the device configuration wizard.
Uncheck all servers except the master server.
Run the wizard to configure the devices.
On the first result dialog, check for any limitations.

To change from the default drive densities, if desired:
In the Drag and Drop Configuration dialog, select the drive and click on the Properties button. Verify the Drive Density.
Repeat this for each drive.

In the Configure Storage Unit dialog, select the storage unit and click Properties to verify the storage unit settings.

4. Run an inventory of the robot.
Preview the inventory and verify the media type is correct.

Make note of the barcodes and any changes from the production site, as different robots can return different barcodes for the same tape. For instance, LTO tapes can have "L#" on the end of the barcode. This can be disabled via the robot console option for short labels.

Update the volume configuration.

5. Add the following line to bp.conf:
RESOURCE_MONITOR_INTERVAL = 3600

This will change media server polling from 10 minutes to 1 hour.

6. Make copies of the DR environment bp.conf and vm.conf files
# cd /usr/openv/netbackup
# cp bp.conf bp.conf.dr
# cd ../volmgr
# cp vm.conf vm.conf.dr

7. Recover the entire catalog.
This is performed from the GUI on the master server. Always log in to the Java GUI using the short hostname, as the fully qualified domain name may not match the production site.

Note: Optionally, the catalog recovery can be performed from the command line:
# /usr/openv/netbackup/bin/admincmd/bprecover -wizard

8. Manually deactivate all backup policies.
From the GUI, select all policies, right click and select Deactivate. This may take a while.

Be sure that all policies are deactivated before proceeding.

9. Shut down NetBackup.
# /usr/openv/netbackup/bin/bp.kill_all

Verify with bpps -x that only /opt/VRTSpbx/bin/pbx_exchange is running.

10. Prep the bp.conf and vm.conf configuration files.
Copy bp.conf and vm.conf to bp.conf.prod and vm.conf.prod:
# cd /usr/openv/netbackup
# cp bp.conf bp.conf.prod
# cd ../volmgr
# cp vm.conf vm.conf.prod

Then, copy back the bp.conf.dr and vm.conf.dr to bp.conf and vm.conf:
# cd /usr/openv/netbackup
# cp bp.conf.dr bp.conf
# cd ../volmgr
# cp vm.conf.dr vm.conf

Verify that the hostnames of any remote Windows console servers are included in the bp.conf with a SERVER entry.

11. Make sure bp.conf and vm.conf are configured correctly to reflect DR environment.
Append FORCE_RESTORE_MEDIA_SERVER entries to bp.conf for each media server not present at DR that were used to do backups in production. The syntax of these entries is as follows:

FORCE_RESTORE_MEDIA_SERVER =

12. Perform a partial startup nbemm.
This will allow modification of the nbemm database without the job manager running and kicking off jobs.

# /usr/openv/netbackup/bin/nbdbms_start_stop start
# /usr/openv/netbackup/bin/nbemm

Run bpps -x to verify that nbemm and NB_dbsrv are running

13. Deactivate all media servers not participating in the DR.
# /usr/openv/netbackup/bin/admincmd/nbemmcmd -updatehost -machinename -machinestateop set_admin_pause -machinetype media -masterserver

Be sure to execute this command against every unavailable media server.

14. Start nbevtmgr and bpdbm.
# /usr/openv/netbackup/bin/nbevtmgr
# /usr/openv/netbackup/bin/initbpdbm

Again, run bpps -x to verify that bpdbm and nbevtmgr are running.

15. Delete all storage units.
This step is optional, but will give a cleaner experience. Either use the GUI or the command line.

From the command line:
# /usr/openv/netbackup/bin/admincmd/bpstulist -go | cut -f 1 -d ' ' > /tmp/stu_groups
# /usr/openv/netbackup/bin/admincmd/bpstulist | cut -f 1 -d ' ' > /tmp/stu_list
# for i in `cat /tmp/stu_groups` ; do echo "/usr/openv/netbackup/bin/admincmd/bpstudel -group $i" ; done >> /tmp/delete_stu_groups
# for i in `cat /tmp/stu_list` ; do echo "/usr/openv/netbackup/bin/admincmd/bpstudel -label $i" ; done >> /tmp/delete_stus
# sh /tmp/delete_stu_groups

Note: Be sure to delete storage unit groups first prior to deleting storage units!

16. Delete all tape devices from the command line.
# /usr/openv/netbackup/bin/admincmd/nbemmcmd -deletealldevices -allrecords

Verify no devices are returned:
# /usr/openv/volmgr/bin/tpconfig -emm_dev_list -noverbose

17. Stop and restart NetBackup.
# /usr/openv/netbackup/bin/bp.kill_all

Use bpps -x to verify that only /opt/VRTSpbx/bin/pbx_exchange is running.

# /usr/openv/netbackup/bin/bp.start_all
...
# bpps -x
NB Processes
------------
root 13809 13808 0 15:05:07 ? 0:00 /usr/openv/netbackup/bin/nbproxy dblib nbjm
root 13775 1 0 15:05:03 ? 0:00 /usr/openv/netbackup/bin/bpcompatd
root 13786 1 1 15:05:04 ? 0:00 /usr/openv/netbackup/bin/bpdbm
root 13757 1 0 15:05:01 ? 0:00 /usr/openv/netbackup/bin/nbrb
root 13747 1 0 15:04:59 ? 0:00 /usr/openv/netbackup/bin/nbevtmgr
root 13795 13786 0 15:05:05 ? 0:00 /usr/openv/netbackup/bin/bpjobd
root 13856 1 0 15:05:12 ? 0:00 /usr/openv/netbackup/bin/nbsvcmon
root 13813 1 1 15:05:08 ? 0:01 /usr/openv/netbackup/bin/nbstserv
root 13811 13810 1 15:05:07 ? 0:01 /usr/openv/netbackup/bin/nbproxy dblib nbpem
root 13818 1 1 15:05:09 ? 0:01 /usr/openv/netbackup/bin/nbrmms
root 13752 1 2 15:05:00 ? 0:02 /usr/openv/netbackup/bin/nbemm
root 13808 13797 0 15:05:07 ? 0:00 sh -c "/usr/openv/netbackup/bin/nbproxy" dblib nbjm
root 13770 1 1 15:05:02 ? 0:01 /usr/openv/netbackup/bin/bprd
root 13844 1 0 15:05:11 ? 0:00 /usr/openv/netbackup/bin/nbsl
root 13797 1 0 15:05:05 ? 0:00 /usr/openv/netbackup/bin/nbjm
root 13810 13804 0 15:05:07 ? 0:00 sh -c "/usr/openv/netbackup/bin/nbproxy" dblib nbpem
root 13804 1 0 15:05:06 ? 0:00 /usr/openv/netbackup/bin/nbpem
root 13742 1 0 15:04:57 ? 0:02 /usr/openv/db/bin/NB_dbsrv


MM Processes
------------
root 13783 1 1 15:05:04 ? 0:01 vmd -v


Shared Symantec Processes
-------------------------
root 142 1 1 16:46:54 ? 0:52 /opt/VRTSpbx/bin/pbx_exchange

18. In the GUI, run the device configuration wizard to configure shared drives.
Uncheck all servers except the robot control host.
Run the wizard to configure the devices
On the first result dialog, check for any limitations

To change from the default drive densities, if desired:
In the Drag and Drop Configuration dialog, select the drive and click on the Properties button. Verify the Drive Density.
Repeat for each drive.

In the Configure Storage Unit dialog, select the storage unit and click Properties to verify the storage unit settings.

If needed, repeat the process for any additional servers.

Using the Device Monitor, make sure that no drives have the RESTART bit set. Restart ltid on the servers if needed.

19. Run the robot inventory.
Before running the inventory, use the GUI to verify all the recovery media are set to non-robotic.
If they are not, select all robotic media, right click and select Move. Make sure Volume is in a robotic library is unchecked.

Note: The volume group may be "---" - this is okay.

Hit OK.

Make note of the barcodes and any changes from the production site (the presence of the "L#" tag). The hardware should be able to toggle the "L#" tag (long vs. short labels). If that is not possible, the barcode of the media can be changed with the following command:
# /usr/openv/volmgr/bin/vmchange -barcode -m

20. Verify that restores work.

More information on DR procedures can be found in Chapter 7 of the NetBackup Troubleshooting Guide (linked below).