[BBLISA] accounting for I/O

Aaron Macks upelluri at gmail.com
Thu Sep 1 17:11:24 EDT 2016


Assuming linux, there's some IO data in proc/<PID>/io that could be of
use, though you'll need to do a bit of math, since it has only bytes,
system-calls and characters.  If you can pin down the offending process,
even if it is the server-side process, that might be enough to follow it
with strace and see which files are the bad ones

Aaron

On 9/1/16 5:07 PM, Daniel Feenberg wrote:
> 
> 
> On Thu, 1 Sep 2016, Rob Taylor wrote:
> 
>> Have you tried iotop?
>> It will tell you what processes are moving the most disk io at any
>> given instant.
>> Still might not get you what you want, but it might make it easier to
>> narrow down.
> 
> iotop moves the sequential access processes to the top of the list,
> because a proces doing sequential access processes more kilobytes/second
> than one doing random access (because of cache hits, among other
> reasons). Our problem program is not "top" in iotop.
> 
> Actually, knowing the file name would probably be just as good as
> knowing the process, since we could find the owner of the file and
> contact them.
> 
> dan feenberg
> 
>>
>> rgt
>>
>> Whitehead Network/System Administrator
>>
>> ----- On Sep 1, 2016, at 3:05 PM, Daniel Feenberg feenberg at nber.org
>> wrote:
>>
>>> Apparently heavy random I/O overloaded our fileserver last week, and
>>> response was very slow. We solved the problem with additional spindles,
>>> but we are curious to know which process is doing the random I/O.
>>> Perhaps
>>> we could approach that user with an offer to help improve their
>>> turnaround
>>> time by changing the code. Our users are mostly inexperienced
>>> students so
>>> the possibility of suboptimal code is certainly there. Most usage is
>>> sequential access to very large files that does not load the fileserver
>>> much at all so this has been a new experience for us.
>>>
>>> We can easily track bytes/second but a process doing random I/O may use
>>> very few bytes/second, but still occupy much of the fileservers
>>> capacity,
>>> so it hasn't been fruitful to identify the processes doing the most
>>> reads
>>> and writes. During the period of overload, few disks were showing more
>>> than kilobytes/second of read or write, yet iostat revealed that several
>>> disks were continuously at 100%.
>>>
>>> A program such as iostat will tell us which physical disk is busy, lsof
>>> will tell us which file is open by which process, netstat and nfstat
>>> will
>>> give aggregate statistics over all processes, but I can't find a program
>>> that will tell us which process is occupying the fileservers attention
>>> with expensive requests.
>>>
>>> We couldn't replace all the disks with SSD, but might be able to provide
>>> SSD for some files, if we could identify the culprits.
>>>
>>> Daniel Feenberg
>>>
>>> _______________________________________________
>>> bblisa mailing list
>>> bblisa at bblisa.org
>>> http://www.bblisa.org/mailman/listinfo/bblisa
>>
> 
> _______________________________________________
> bblisa mailing list
> bblisa at bblisa.org
> http://www.bblisa.org/mailman/listinfo/bblisa

-- 
_______________________________________________________
Aaron Macks(aaronm at wiglaf.org) [http://www.wiglaf.org/~aaronm ]
My sheep has seven gall bladders, that makes me the King of the Universe!



More information about the bblisa mailing list