Note that this alert is not the same as these alerts:
Space usage in Data Collection has exceeded 90%.
Space usage in /ddvar has exceeded 90%.
APPLIES TO
All Data Domain systems that are being monitored through the EMC Data Protection Advisor tool or through a custom script that logs into and out of the system excessively.
All Software Releases prior to 5.2.1.0.
PURPOSE
This article explains how to troubleshoot cases in which the customer receives the alert Space usage in root has exceeded 90%. as a result of excessive logins and logouts by a custom script or the DPA tool.
CAUSE
Every successful login and logout on a DDR is logged in the /var/log/wtmp file. This file is rotated monthly; if there are a large number of logins and logouts over the course of a month this file can grow to become very large thus triggering this alert. There is also a known bug that can cause root to fill (see "Separate potential cause" below).
SOLUTION
To determine if the login/wtmp file issue described herein is responsible for the alert:
Enter bash mode.
Check the space utilization in the root partition with the df –h command:
!!!! dd630-rtp1 YOUR DATA IS IN DANGER !!!! # df -h
Filesystem Size Used Avail Use% Mounted on
/dev/dd_dg00p15 2.0G 1.6G 377M 91% /
/dev/dd_dg00p14 5.0G 952M 3.8G 20% /ddr
/dev/dd_dg00p13 79G 29G 47G 38% /ddr/var
shm 3.9G 0 3.9G 0% /dev/shm
/dev/dd_dg00p7 13G 13G 0 100% /ddr/col1/repl
localhost:/data 7.7T 21G 7.7T 1% /data
This alert triggers when the Use% on the partition Mounted on / exceeds 90%.
Check the size of /var/log/wtmp and its rotations with the ls –lh /var/log/wtmp* command:
!!!! dd630-rtp1 YOUR DATA IS IN DANGER !!!! # ls -lh /var/log/wtmp*
-rw-rw-r-- 1 root utmp 89M Aug 2 14:57 /var/log/wtmp
-rw-rw-r-- 1 root utmp 92M Jul 31 21:06 /var/log/wtmp.1
If these files exceed 60M in size then they are almost surely the cause of the alert.
Investigate what is causing the log to fill up by dumping it:
!!!! dd630-rtp2 YOUR DATA IS IN DANGER !!!! # last -f /var/log/wtmp | less
sysadmin pts/0 d3-ubuntu.datado Wed Aug 1 18:23 - 19:37 (01:13)
sysadmin ssh d3-ubuntu.datado Wed Aug 1 18:23 - 19:16 (00:53)
sysadmin pts/4 128.222.90.62 Wed Aug 1 14:42 - 15:34 (00:51)
sysadmin ssh 128.222.90.62 Wed Aug 1 14:42 - 14:43 (00:00)
(...)
The second column in this output shows the hostname or IP address of the host from which these logins are occurring. A large number of lines should have the same hostname. Ask the customer whether DPA or their own custom monitoring script is running on that host.
To apply the workaround:
Enter bash mode.
Truncate /var/log/wtmp and its rotations with the echo –n command:
for i in /var/log/wtmp*; do echo -n > $i; done
Check the size of /var/log/wtmp and its rotations with the ls –lh /var/log/wtmp* command:
!!!! dd630-rtp1 YOUR DATA IS IN DANGER !!!! # ls -lh /var/log/wtmp*
-rw-rw-r-- 1 root utmp 0 Aug 2 15:06 /var/log/wtmp
-rw-rw-r-- 1 root utmp 0 Aug 2 15:06 /var/log/wtmp.1
The files should have a size equal to or very close to zero.
Ensure the space utilization in the root partition is less than 90% with the df –h command:
!!!! dd630-rtp1 YOUR DATA IS IN DANGER !!!! # df -h
Filesystem Size Used Avail Use% Mounted on
/dev/dd_dg00p15 2.0G 1.6G 377M 81% /
/dev/dd_dg00p14 5.0G 952M 3.8G 20% /ddr
/dev/dd_dg00p13 79G 29G 47G 38% /ddr/var
shm 3.9G 0 3.9G 0% /dev/shm
/dev/dd_dg00p7 13G 13G 0 100% /ddr/col1/repl
localhost:/data 7.7T 21G 7.7T 1% /data
Check that the alert has cleared.
Set the rotation schedule of /var/log/wtmp to daily in /etc/logrotate.conf by changing the frequency on line 19 from monthly:
17 # no packages own wtmp -- we'll rotate them here
18 /var/log/wtmp {
19 monthly
20 create 0664 root utmp
21 rotate 1
22 }
To daily:
17 # no packages own wtmp -- we'll rotate them here
18 /var/log/wtmp {
19 daily
20 create 0664 root utmp
21 rotate 1
22 }
Save some more space by applying the /var/upgrade/link_to_new_rpmfile.rpm change in the document:
Moved link_to_new_rpmfile.rpm from root to ddr partition 181045
Separate potential cause: The issue with bug 109239 involves a log file being written to the /tmp directory instead of /ddvar. The system may have hit this bug if there is a large "/tmp/sub_kern.info.XXXXX" file in /tmp. The workaround is to simply move or delete that file. The bug is fixed in DD OS 5.4.3.0, 5.5.1.0, 5.6, or later. See bug notes for more details.
No comments:
Post a Comment