[BBLISA] accounting for I/O

Daniel Feenberg feenberg at nber.org
Thu Sep 1 15:05:14 EDT 2016


Apparently heavy random I/O overloaded our fileserver last week, and 
response was very slow. We solved the problem with additional spindles, 
but we are curious to know which process is doing the random I/O. Perhaps 
we could approach that user with an offer to help improve their turnaround 
time by changing the code. Our users are mostly inexperienced students so 
the possibility of suboptimal code is certainly there. Most usage is 
sequential access to very large files that does not load the fileserver 
much at all so this has been a new experience for us.

We can easily track bytes/second but a process doing random I/O may use 
very few bytes/second, but still occupy much of the fileservers capacity, 
so it hasn't been fruitful to identify the processes doing the most reads 
and writes. During the period of overload, few disks were showing more 
than kilobytes/second of read or write, yet iostat revealed that several 
disks were continuously at 100%.

A program such as iostat will tell us which physical disk is busy, lsof 
will tell us which file is open by which process, netstat and nfstat will 
give aggregate statistics over all processes, but I can't find a program 
that will tell us which process is occupying the fileservers attention 
with expensive requests.

We couldn't replace all the disks with SSD, but might be able to provide 
SSD for some files, if we could identify the culprits.

Daniel Feenberg



More information about the bblisa mailing list