What's using all the disk space?

We had an issue a while ago where one of our Mongo instances had run out of available space on the root disk. This was concerning because we use a large separate volume mounted at /data that is dedicated entirely to Mongo:

[ec2-user@ip-10-5-4-206 ~]$ df -kPh
Filesystem      Size  Used Avail Use% Mounted on  
/dev/xvda1       32G   32G     0 100% /
devtmpfs        7.4G   60K  7.4G   1% /dev  
tmpfs           7.4G     0  7.4G   0% /dev/shm  
/dev/xvdf       2.0T  989G  880G  53% /data

Asking which sub-directory was using the most data did not help:

[ec2-user@ip-10-5-4-206 ~]$ sudo du -x -d1 / | sort -n
16  /lost+found  
804 /home  
7140    /bin  
7440    /etc  
11932   /sbin  
22820   /lib64  
40832   /boot  
43268   /opt  
88128   /lib  
100364  /root  
815224  /var  
1331308 /usr  
2469324 /  

All together, those values add up to 4938600 1k blocks (du's default unit), or 4.7G.

Note: using du -x means that du will not traverse other file systems. /data is mounted from a separate volume (as seen in the df output), and it was the root disk I cared about.

But, the root disk said that it was out of space, and this meant that 32G worth of files had to be present. So - what's using all the disk space?

The only place left to look was "under" the /data mount point - /data on the root disk, not the separate volume. This can be accomplished by using a bind mount of root:

[root@ip-10-5-4-206 ~]# mkdir /mnt/root/
[root@ip-10-5-4-206 ~]# mount --bind / /mnt/root/

Navigating to /mnt/root/data would then go to the root disk's /data, not the separate volume. Very quickly, I found what was consuming the disk space:

[root@ip-10-5-4-206 ~]# cd /mnt/root/data/mongo/
[root@ip-10-5-4-206 mongo]# ls -alh
total 26G  
drwxr-xr-x 3 mongod mongod 4.0K Aug 27 00:50 .  
drwxr-xr-x 3 mongod mongod 4.0K Aug 15 14:41 ..  
drwxr-xr-x 2 mongod mongod 4.0K Aug 27 00:50 journal  
-rw------- 1 mongod mongod  64M Oct  4 07:28 local.0
-rw------- 1 mongod mongod  16M Oct  5 20:37 local.ns
-rwxr-xr-x 1 mongod mongod    5 Aug 27 00:50 mongod.lock
-rw-r--r-- 1 mongod mongod  26G Sep 30 08:25 mongod.log
-rw-r--r-- 1 mongod mongod    5 Aug 27 00:50 mongod.pid
-rw-r--r-- 1 mongod mongod   69 Aug 15 14:43 storage.bson

mongod.log was using 26G!

First of all, this is an insane size for a log file.

Second of all, why was it using the root disk's /data?

At Vena, we use AMIs as base images, which we then customize for specific deployments - different datacenters, attached volume sizes, and so on. When we bring up a Mongo instance base image, we perform the following steps:

  1. stop mongod
  2. attach a separate volume
  3. format the volume as xfs
  4. mount the volume as /data
  5. start mongod

It makes sense then that there would be two mongod process IDs (one from before step 1, and one from after step 5):

[root@ip-10-5-4-206 mongo]# cat /mnt/root/data/mongo/mongod.pid
7371  
[root@ip-10-5-4-206 mongo]# cat /data/mongo/mongod.pid
8219  

What didn't make sense was that the root disk's PID corresponded to the running mongod:

[root@ip-10-5-4-206 mongo]# pgrep -l mongod
7371 mongod  

but /data (the separate volume) was where mongod was storing 989G of data, according to df.

Apart from the log file, the timestamps of other files in both directories also seemed suspiciously similar:

[root@ip-10-5-4-206 mongo]# ls -al /mnt/root/data/mongo/mongod*
-rwxr-xr-x 1 mongod mongod           5 Aug 27 00:50 /mnt/root/data/mongo/mongod.lock
-rw-r--r-- 1 mongod mongod 27732103168 Sep 30 08:25 /mnt/root/data/mongo/mongod.log
-rw-r--r-- 1 mongod mongod           5 Aug 27 00:50 /mnt/root/data/mongo/mongod.pid

[root@ip-10-5-4-206 mongo]# ls -al /data/mongo/mongod*
-rwxr-xr-x 1 mongod mongod    0 Aug 27 00:51 /data/mongo/mongod.lock
-rw-r--r-- 1 mongod mongod 4489 Aug 27 00:51 /data/mongo/mongod.log
-rw-r--r-- 1 mongod mongod    5 Aug 27 00:51 /data/mongo/mongod.pid

One way to free up the root disk is to force a log rotate - afterwards, /mnt/root/data/mongo/mongod.log can be deleted:

rsvena0:PRIMARY> use admin  
switched to db admin  
rsvena0:PRIMARY> db.runCommand( { logRotate : 1 } )  
{ "ok" : 1 }

[root@ip-10-5-4-206 mongo]# ls -al /data/mongo/mongod.log*
-rw-r--r-- 1 mongod mongod 309983 Oct  5 20:44 /data/mongo/mongod.log
-rw-r--r-- 1 mongod mongod   4489 Aug 27 00:51 /data/mongo/mongod.log.2016-10-05T20-44-07

[root@ip-10-5-4-206 mongo]# ls -al /mnt/root/data/mongo/mongod.log*
-rw-r--r-- 1 mongod mongod 27732103168 Oct  5 20:44 /mnt/root/data/mongo/mongod.log

It looks like the following happened:

  1. the running mongod (pointing to /mnt/root/data/mongo/mongod.log) did a final write of its log file, and closed the file pointer.

  2. mongod then rotated /data/mongo/mongod.log (instead of /mnt/root/data/mongo/mongod.log). The rotation only amounted to a file rename, so the timestamp did not change.

  3. mongod then opened a new mongod.log in /data/mongo/.

This would explain how after the logRotate occurred both mongod.log files had very similar timestamps.

The only theory that really fits is that mongod never stopped in the first place. This means that it never stopped using mongod.log from when it started the first time (under the mount). Mounting over /data would be transparent to mongod - it would still have open FILE pointers to files under the mount, and new files would be written to the separate volume.

Unfortunately this does not explain why other files like mongod.lock and mongod.pid had similar timestamps under both /mnt/root/data and /data. It also doesn't explain how there were two different PIDs.

Even if we can't explain why this bizarre scenario happened, these two changes prevent the conditions that caused it:

  • We now use Mongo's default configuration in the base image (which uses /var/lib/mongodb as the data directory) and have added a step to reconfigure mongo's root directory to /data/mongo after mongod has stopped. Using separate directories removes the possibility that directories can be mounted over.
  • We also now take additional steps to ensure that mongod is stopped before attaching the new volume and re-configuring it.

Discuss on Hacker News