Robots.txt Tutorial

Robots.txt Tutorial
Robots.txt Tutorial

IMPORTANT: be very careful when playing about with your robots.txt, its very easy to block off your entire site without realising!…

Covered in this post:

What is a Robots.txt?

A Robots.txt is a file used by  webmaster  to advise spiders and bots (e.g. googlebot)  where in the website they are allowed to crawl. The robots.txt is stored in a website’s root folder, for example:  http://www.domain.com/robots.txt

When should i use Robots.txt?

Your robots.txt should be used to help bots like Googlebot crawl your website and guide where they aren’t supposed to go.

DO NOT use a robots.txt to try and block scrapers or other heavy handed crawlers. At the end of the day it’s up to the bot whether they respect your robots.txt or not, so in all likely hood it won’t even get read by these crawlers.

Another important thing worth mentioning is that anyone can see your robots.txt. So bare this in mind where you are writing it, you don’t want to include anything like: Disallow: /my-secret-area/

How does it all work?

The best way to learn how it works is probably to look at some examples. In its simplest form the contents of a robots.txt might look like this:

User-agent: *
Disallow:

In this example the definitions is saying, for ALL (*) User-Agents, Disallow nothing i.e. feel free to crawl anywhere you want.

To do the opposite and block EVERYTHING you would use:

User-agent: *
Disallow: /

You can also use the ALLOW property which works in the opposite way.

ALLOW EVERYTHING

User-agent: *
Allow: /

BLOCK EVERYTHING

User-agent: *
Allow:

Googlebot and My Robots.txt

You can specify a rule for just Googlebot by using the User-agent property.

BLOCK GOOGLEBOT FROM ACCESSING YOUR SITE

User-agent: Googlebot
Disallow: /

BLOCK GOOGLEBOT-IMAGE FROM ACCESSING YOUR SITE

User-agent: Googlebot-Image
Disallow: /

Google (and some other bots) respect the * or greedy character. This can be very helpful for blocking off areas which contain similar URL parameters. The robots example below tells Googlebot NOT to access anything with a ? in it.

User-agent: Googlebot
Disallow: /*?

Please note, in the past i’ve noticed defining a Googlebot user agent rule has caused Googlebot to completely ignore all other User-agent: * rules… I don’t know whether they have changed this yet but its worth baring it in mind.

[UPDATE] WARNING: If you have ‘User-Agent: *’ AND a ‘User-Agent:Googlebot’, Googlebot will ignore everything you defined in the * definition!!!! I don’t think many people realise this so be very careful. Remember, ALWAYS test your changes using Google Webmaster Tool’s Robots.txt Tool. If you don’t have a account for Google Webmaster Tools, GET ONE NOW!

[UPDATE END]

The X-Robots-Tag HTTP header

You can also define your robots at a page level by using a robots meta tag. The first of these examples is basically the default a bot would assume if no definition was provided…

<meta content="index, follow" name="robots" /> (default - no robots tag)
<meta content="noindex, follow" name="robots" />
<meta content="index, nofollow" name="robots" />
<meta content="noindex, nofollow" name="robots" />

Please note, the NOFOLLOW in these examples should not be confused with the rel=”nofollow” of links.

Useful Tools / Resources

Enabling Short Tags in PHP

php short tag
enabling php short tag tutorial

In this tutorial we look at how you enable short tags in PHP. Short Tags or short_open_tag as it is know in PHP.ini allows you the option to use short style opening tags for PHP code blocks e.g. <? instead of <?php.

To enable this feature on your server open PHP.ini file (this should be somewhere in your PHP install folder e.g.  /bin/php/phpx.x.x/php.ini) and change the short_open_tag setting to:

short_open_tag=On

Please note, it is not generally advised to use Short Tags as this can lead to future problems when migrating to server that doesnt have short_open_tag enabled. Personally I dont like this feature, it seems good at the time but trust me, there’ll be a time where it comes back to haunt you.

MySQL – SELECT * FROM table WHERE date = TODAY

mySQL SELECT WHERE date = today
mySQL SELECT WHERE date = today

The Quick answer is:

SELECT * FROM myTable WHERE DATE(myDate) = DATE(NOW())

I was tearing my hair out trying to figure this the other morning and its really quite simple. I thought i may have use a day month year match or possibly use PHP but thankfully the guys at MySQL included this nice DATE() function which means you don’t have to worry about the hours, minutes and seconds being different. Simples!
MySQL DATE / TIME Functions

The functions used in thie MySQL query are:

* DATE() returns the date without time
* NOW() returns the current date & time (note we’ve used the DATE() function in this query to remove the time)

For more information about MySQL Date and Time functions on the official MySQL site.

CSS: Horizontal Centering Jumps in Firefox

This is one of those situation i think most developers find themselves running into somewhere down the line. You’ve created you CSS template using margin:0 auto and everything looks great in the different versions of IE, Firefox, Oprea, Safari etc but for some reason your centering starts to jump all over the place when you click between various pages in Firefox. What’s going on here then?…

The answer is quite simply, some browsers like Firefox don’t display vertical scroll bars unless required and as a result your template’s centering begins to jump around (16px to be exact) when you navigate between a page that has a height that requires scrollbars and one that doesn’t. Its obvious now, i hear you cry! Don’t worry, i did the same… Doh!

The Solution to Firefox Centering Issue

Don’t worry there’s a simple solution to this centering and as usual, CSS comes to the rescue. The solution is to force vertical scrollbars.

There’s actually a few different ways to fix this scrollbar issue but here’s my favourite:

html {
	overflow-y: scroll;
}

Last time i checked this centering fix worked in Firefox, Safari, IE6 and Opera.

Related Tutorials:

Mozilla Bespin – aka Code in the Clouds

Mozilla Bespin – aka Code in the Clouds

Mozilla labs are currently experimenting with a new online code editor they are developing called Bespin.

Bespin, also know as ‘Code in the Clouds’, uses HTML 5 to provide an online code editor environment for developers to code, store and share projects they are working on. Put simply, the concept is code and access from any machine, similar to Google Documents I guess.

I’ve been having a little play around to see what all the fuss is about and i must admit i like the idea. However, there’s a LONG way to go until we can hangup our Notepad, Vim or Dreamweaver boots.

Bespin Video

Try out Bespin

Wanna give Bespin a try? Go to the Bespin site and try the demo for yourself. Note, you’ll need Firefox 3 or a HTML 5 processing Web Browser.

Tom’s Thoughts on Bespin

I love the concept of the whole thing, coding and accessing development projects from anywhere would be a great thing to have. The idea of open source and sharing code with developers also gives hope, however, we need better functionality and interface intellisense for lazy people like me! Full credit to Mozilla for the insight and effort of this project, keep it up guys!

Google Launches Chrome Browser

Google Chrome Browser

This evening Google released a BETA version of they’re open source browser ‘Chrome‘.

The browser has speed, security and a stable browsing experience as core features. I’ve only had a quick play but I must admit I like what I see so far and of course, most importantly, my site renders perfect in it! 🙂

The browser is based around Googles Webkit browser engine and powered by Google Gears. It also features there own JavaScript virtual machine ‘V8‘ to help handle todays more powerful web applications and make the development of them easier.

Like all modern browsers Chrome is based around tabbed browsing experience, however, Chrome runs its tabs under separate memory space which aims to make the browser more stable and less prone to crashing.

As an open source browser Chrome is clearly out there to compete with the likes of Firefox (Mozilla’s Browser) which currently has a 20% market share and is a big hit with the geeks and techsavies alike. With a brand like Google’s its going to be an interesting competition. My bet is, if Google can impress the geeks Firefox is in trouble.

My verdict so far:

Pros:

  • The browser looks slick and makes the most of the screen.
  • Seems very fast
  • Straight forward to use
  • Search incorporated into the address bar
  • No problems with rendering sites so far
  • Good history features on homepage (when you can find it!)
  • Task Manager / the abiltiy to end processes
  • A developers section (basic at the moment but hopefully this will develop)

Cons:

  • I’m not liking the flash you seem to get when navigating throguh  a website with a non-white background . Watch out epileptics!
  • Obviously no plugins at the moment so no contest with Firefox, yet….
  • Very bare bones browser options at the moment.
  • So far I haven’t been able to find a home button! (UPDATE –  found it under OPTIONS)

Google Comic for Chrome

Check out the comic they released to read a little more about Google’s take on Chrome.

Radiohead Videos

Radiohead Videos

As I’m a big fan of Radiohead and their videos I’ve put together a selection of them for you all to marvel at. If you have never heard of Radiohead or not seen any of their videos, click play NOW!…

Radiohead – House Of Cards

This is Radiohead’s latest video ‘House of Cards’ which is of the album In Rainbows. This video was made without cameras or lights and uses the 3D capture technology – Geometric Informatics and Velodyne LIDAR. Pretty mad but stunning stuff. Theres a video on the making of it which is also worth a watch.

Radiohead – Just

‘Just’ is from Radioheads 1995 album ‘The Bends‘. Aparantly the song is about one of Thom Yorke’s friends. It was directed directed by Jamie Thraves and is notorious for being a tease! watch the video and you’ll see what i mean…

type=”application/x-shockwave-flash” width=”425″ height=”344″>

Radiohead – Street Spirit (Fade Out)

‘Street Spirit (Fade Out)’ is also taken from the 1995 album ‘The Bends‘. The song was inspired by REM, the video directed by Jonathan Glazer. One of my personal favourites and one of the first songs i learned to play on the guitar! 🙂

Radiohead – Paranoid Android

Paranoid Android is track 2 on the 1997 album OK Computer. The title of the track refers to the Marvin the paranoid android from the book The Hitchhiker’s Guide to the Galaxy. The video was made by Magnus Carlsson and features his animated character Robin. It was censored by MTV who removed all nipples from the video and the scene at the end where he chops his arms and legs off!

Funny Squirrels

Squirrels / Squirrel Adverts

A collection of funny squirrel clips and everything that is squirrel:

Air Vigorsol squirrel advert:

 

The classic Carling Black label advert (1989:)

The more recent Carlsberg football squirrel advert:

Bud light squirrel commercial: (what is it with Squirrels and alcohol adverts?

Drunk / Pissed Squirrel

Triumph The Insult Comedy Dog

Triumph The Insult Comedy Dog

Triumph ‘The Comedy Insult Dog’ attends the premier night of Star Wars – Attack Of The Clones, interviewing the die hard fans.

This was the first Triumph clip i saw and it has to be my favourite. ROTFLMAO at the Dragon Master, Darth Vader rip and the female genitalier joke. Triumph we salute you.

 

 

Triumph at the Westminster Dog Show. I’ve got a feeling this was one of the first things Triumph’s did, how they didn’t get arrested or something i have no idea!

 

 

Triumph goes New Jersey to watch Bon Jovi and interview the fans in their home town.

I just love the way he rips the crap of the Bon Jovi fans, some of these guys are serious stereotypes. Watch to the end to see Triumph do a bong and perform with Bon Jovi performing on stage. Fair play to the band for taking it on the chin.

 

 

Triumph interviews the American Idol hopefuls in Hawaii. I guess some of these guys deserve what they get.

 

How to display .htaccess in CuteFTP

How to view htaccess in CuteFTP?
How to view .htaccess in CuteFTP?

This tutorial provides a step by step guide on how set-up CuteFTP to display your .htaccess file.

By default CuteFTP’s filtering system can hide the htaccess so it doesn\’t appear within the list of files on your server. I have know idea why this is exactly but I’m guessing CuteFTP is just trying to protect the configuration file from accidental mishaps.

You have two ways to enable the viewing of htaccess in CuteFTP:

Method One: Apply ‘On-The-Fly’ Server Side Filtering

  1. Right click in root folder:
  2. Select Filter
    CuteFTP Filter Option
  3. Enable server side filtering:CuteFTP: enable server side filtering
  4. Type the following filter command:-L-a
  5. Apply, then .htaccess should appear:
    htacess showing in list of files in CuteFTP

Method Two: Setup Site Manager

  1. Right click the site in site manager:
  2. Select Properties
  3. Select Actions tab
  4. Click Filter button:
  5. Type the following filter command:-L-a
  6. Save, then .htaccess should appear from then on:

Things to Consider

As always, be careful when your messing around with stuff like this, the htaccess file is an important configuration file which if modified incorrectly or deleted can completely kill the site. If your not sure what your doing, leave it alone.

Please note, not all hosts allow for users to configure through their site’s through htaccess. If you upload your htaccess file and it doesn’t do anything this is more than likely the problem, contact your hosts for further advice.

Another thing, htaccess is an Apache configuration file and therefore will not work in IIS. You’ll need something like ISAPI rewrite for IIS.