No subject


Mon Aug 9 12:12:47 EDT 2010


etc
That should give you some ideas about throughput requirements. Think about
expected web site usage increase to project potential throughput for the
future.

Regards,

- Eugene Gorelik

On Wed, Aug 11, 2010 at 1:55 PM, Ian Stokes-Rees <
ijstokes at crystal.harvard.edu> wrote:

>
> Diligent readers will recall the thread a few weeks ago on slow disk
> performance with a PATA XRaid system from Apple (HFS, RAID5).  Having
> evaluated the situation, we're looking to get a new file server that
> combines some fast disk with some bulk storage.  We have a busy web
> server that is mostly occupied with serving static content (read only
> access), some dynamic content (Django portal with mod_python/httpd), and
> then scientific compute users who do lots of writes (including a 100
> core cluster).
>
> We have about a $10k budget (ideally $8k).  The current plan looks
> roughly like this:
>
> AMD quad socket MB
> 1x12-core AMD CPU
> 8 GB RAM
> 2x160 GB 7200 RPM SATA drives for system software
> 11x300 GB 15000 RPM SAS2 fast storage (RAID10 + 1 hot swap, 1.5 TB volume)
> 5x2 TB 7200 RPM SATA drives (RAID10 + 1 hot swap, 4 TB volume)
>
> A 3U chassis will be filled, and the 4U chassis will have some empty bays.
>
> We can also upgrade processors and RAM as funds become available and the
> need arises.
>
> This will support a compute cluster (~100 cores), 10-20 users (typically
> 3-4 active), and a busy web server.
>
> Besides the obvious question of whether this setup is sensible/cost
> efficient (mixing two kinds of storage, etc.), the main unknowns we have
> are:
>
> 1. Should we consider running a VM on this same server and host e.g. the
> web server on a VM which accesses files through the virtualization
> layer, rather than a physical network interconnect.
>
> 2. What combination of network filesystem and local file system
> combination makes sense? (currently NFS + ext4 is on the cards)
>
> 3. Should we consider alternatives to GigE for interconnect.
>
> 4. How can we estimate our IOPs and throughput requirements?
>
> 5. Perspectives on SLC SSDs vs. SAS2 w/ 15k drives, since we could
> probably transfer the 11x300 GB SAS2 drive budget to a collection of
> SSDs and live with the reduced storage if that was expected to have a
> big performance benefit.
>
> Thanks in advance for any opinions on this.
>
> Ian
>
>
> _______________________________________________
> bblisa mailing list
> bblisa at bblisa.org
> http://www.bblisa.org/mailman/listinfo/bblisa
>

--0016363105a19710dd048d91838c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Ian,<div><br></div><div>Though it may seem irrelevant, but before answer=
ing your questions I&#39;d like to understand</div><div>what is your web co=
ntent caching strategy ?</div><div><br></div><div>With heavy traffic web si=
tes like the one you described caching may tremendously improve your perfor=
mance=A0</div>
<div>at all tiers and especially for storage.</div><div><br></div><div>Ther=
e are several caching options available: browser caching(<span class=3D"App=
le-style-span" style=3D"font-family: Arial, &#39;Liberation Sans&#39;, &#39=
;DejaVu Sans&#39;, sans-serif; border-collapse: collapse; line-height: 18px=
; ">Cache-Control and Expires HTTP headers...</span>), memory caching (mod_=
memcached,mod_mem_cache), caching proxy (nginx, varnish).</div>
<div><br></div><div>Static pages caching is relatively easy to implement in=
 the web server layer without code modification, for dynamic resources cach=
ing=A0code modification could be required.</div><div><br></div><div>So, I&#=
39;d strongly suggest to explore all your caching options and make decision=
 on storage update after that.</div>
<div><br></div><div><br></div><div>To answer your questions:</div><div><br>=
</div><div>&gt; 1. Should we consider running a VM on this same server and =
host e.g. the<br>web server on a VM which accesses files through the virtua=
lization<br>
layer, rather than a physical network interconnect.</div><div><br></div><di=
v>I&#39;d recommend to keep storage separate from web servers, it will let =
your web tire to scale.</div><div>It is also more secure that way.</div>
<div><br>&gt; 2. What combination of network filesystem and local file syst=
em<br>combination makes sense? (currently NFS + ext4 is on the cards)<br>&g=
t; 3. Should we consider alternatives to GigE for interconnect.</div><div>
<br></div><div>It depends on several factors, one of the important ones is =
how NFS server on the storage side is implemented.</div><div>On the client =
side NFS+ext4 over GB interface are usually sufficient. There are several t=
ools available which allow to simulate=A0</div>
<div>network traffic and measure performance, you can also write your own s=
cript for basic measurements. =A0=A0</div><div><br>&gt; 4. How can we estim=
ate our IOPs and throughput requirements?</div><div><br></div><div>That&#39=
;s a tough one. Try collecting web server logs for several days and from lo=
gs calculate total size of downloaded data per day.</div>
<div>From that you can calculate approximate throughput average per hour/se=
cond, etc</div><div>That should give you some ideas about throughput requir=
ements. Think about expected web site=A0usage increase to project potential=
=A0throughput=A0for the future.</div>
<div>=A0<br>Regards,</div><div><br></div><div>- Eugene Gorelik</div><div><d=
iv><br><div class=3D"gmail_quote">On Wed, Aug 11, 2010 at 1:55 PM, Ian Stok=
es-Rees <span dir=3D"ltr">&lt;<a href=3D"mailto:ijstokes at crystal.harvard.ed=
u" target=3D"_blank">ijstokes at crystal.harvard.edu</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><br>
Diligent readers will recall the thread a few weeks ago on slow disk<br>
performance with a PATA XRaid system from Apple (HFS, RAID5). =A0Having<br>
evaluated the situation, we&#39;re looking to get a new file server that<br=
>
combines some fast disk with some bulk storage. =A0We have a busy web<br>
server that is mostly occupied with serving static content (read only<br>
access), some dynamic content (Django portal with mod_python/httpd), and<br=
>
then scientific compute users who do lots of writes (including a 100<br>
core cluster).<br>
<br>
We have about a $10k budget (ideally $8k). =A0The current plan looks<br>
roughly like this:<br>
<br>
AMD quad socket MB<br>
1x12-core AMD CPU<br>
8 GB RAM<br>
2x160 GB 7200 RPM SATA drives for system software<br>
11x300 GB 15000 RPM SAS2 fast storage (RAID10 + 1 hot swap, 1.5 TB volume)<=
br>
5x2 TB 7200 RPM SATA drives (RAID10 + 1 hot swap, 4 TB volume)<br>
<br>
A 3U chassis will be filled, and the 4U chassis will have some empty bays.<=
br>
<br>
We can also upgrade processors and RAM as funds become available and the<br=
>
need arises.<br>
<br>
This will support a compute cluster (~100 cores), 10-20 users (typically<br=
>
3-4 active), and a busy web server.<br>
<br>
Besides the obvious question of whether this setup is sensible/cost<br>
efficient (mixing two kinds of storage, etc.), the main unknowns we have<br=
>
are:<br>
<br>
1. Should we consider running a VM on this same server and host e.g. the<br=
>
web server on a VM which accesses files through the virtualization<br>
layer, rather than a physical network interconnect.<br>
<br>
2. What combination of network filesystem and local file system<br>
combination makes sense? (currently NFS + ext4 is on the cards)<br>
<br>
3. Should we consider alternatives to GigE for interconnect.<br>
<br>
4. How can we estimate our IOPs and throughput requirements?<br>
<br>
5. Perspectives on SLC SSDs vs. SAS2 w/ 15k drives, since we could<br>
probably transfer the 11x300 GB SAS2 drive budget to a collection of<br>
SSDs and live with the reduced storage if that was expected to have a<br>
big performance benefit.<br>
<br>
Thanks in advance for any opinions on this.<br>
<font color=3D"#888888"><br>
Ian<br>
<br>
</font><br>_______________________________________________<br>
bblisa mailing list<br>
<a href=3D"mailto:bblisa at bblisa.org" target=3D"_blank">bblisa at bblisa.org</a=
><br>
<a href=3D"http://www.bblisa.org/mailman/listinfo/bblisa" target=3D"_blank"=
>http://www.bblisa.org/mailman/listinfo/bblisa</a><br></blockquote></div><b=
r></div>
</div>

--0016363105a19710dd048d91838c--



More information about the bblisa mailing list