[BBLISA] Help with .htaccess file

Doc Radium docradium at gmail.com
Tue Aug 3 21:34:01 EDT 2004


re: robots.txt,
cyveillance.com (a company in the business of finding violations of
service marks
and trademarks in order to sell litigation opportunities to rights holders),
netcraft.com (a company that will fingerprint your web server and its
content in order
to build statistics on your web server software and system
architecture for others
to market products at you), 
nameprotect.com (similar to cyveillance),
iconsurf.com (no business model, they just show people icons harvested off the
web and present them in no particular context to draw people to that web site
willy nilly), and
Compass Communications (whois.sc? no idea what their business model is
other than possibly domain reselling) all ignore robots.txt

Some of them say they honor robots.txt, but don't.

On Tue, 3 Aug 2004 19:44:20 -0400 (EDT), Dean Anderson <dean at av8.com> wrote:
> On Mon, 2 Aug 2004, Scott Ehrlich wrote:
> 
> > I have my domain name hosted with pair.com, who uses apache 1.3.29 on
> > freebsd boxes.
> >
> > I want to either:
> >
> > 1) Use a .htaccess file to prevent any web crawlers/robots from gaining
> > access to one or more directories off my public_html folder
> 
> web crawlers and robots can be cooperatively controled with robot.txt. The
> format of robots.txt is standardized, but it is cooperative, in the sense
> that someone can ignore it. However, the search sites respect this.  This
> isn't really security, so if you have something that must be strictly
> confidential this isn't good.  But if you just don't want the document
> showing up on google, its ok.
> 
> Caching is another issue. To stop web caches from caching documents you
> have to set an expiration date in the document attributes. I'd refer you
> to the apache documentation on www.apache.org for instructions on how to
> do this.
> 
>                --Dean
> 
> 
> 
> _______________________________________________
> bblisa mailing list
> bblisa at bblisa.org
> http://www.bblisa.org/mailman/listinfo/bblisa
>




More information about the bblisa mailing list