Sunday, October 16, 2011

HTTP responses

When the web server receives the request it processes it and sends back HTTP response. The HTTP response doesn't differ very much from the request. It contains the data requested from the browser and also additional info about the message itself. HTTP response is structured like the response with 4 parts:

1. Status line - This tells the status of the request, whether was bad or good or an error happened.
2. List of HTTP headers - Headers give additional info about the message, like the type and the length of the message.
3. Empty line - This is an empty line between the status line and the headers.
4. Message body(optional) - This part contains the returned web page or maybe other web resource like image or stream.

Here is an example taken from http://web-sniffer.net/

Status: HTTP/1.1 302 Found  
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=40bebf53f84a37a4:FF=0:TM=1318795218:LM=1318795218:S=Uj37tfkySeEueQGz; expires=Tue, 15-Oct-2013 20:00:18 GMT; path=/; domain=.google.com
Date: Sun, 16 Oct 2011 20:00:18 GMT
Server: gws
Content-Length: 218
X-XSS-Protection: 1; mode=block
Connection: close

<HTML> 
 <HEAD> 
   <meta http-equiv="content-type" content="text/html;charset=utf-8">
   <TITLE>302 Moved</TITLE> 
 </HEAD>
 <BODY>
   <H1>302 Moved</H1>
      The document has moved
      <A HREF="http://www.google.de/">here</A>.
 </BODY> 
</HTML>
 
So the status line contains the given status of the status of the message. There are several most 
important statuses that can be acquired by the browser:

200 - OK - Browser request was successful, follows the returned content.
301 - Moved Permanently - Requested resource is at different location, new URL will be returned in the 
Location header. Browser should use new URL 
302 - Found - requested resource is temporarily at new URL, new URL will be returned in the Location 
header. Browser should still use the same URL. 
400 - Bad Request - Request sent by the browser was invalid. (example: wrong syntax)
403 - Forbidden - Browser accesses web resource that has no permission to. For example browser
 tries to access file that is password protected.
404 - Not Found - Requested resource cannot be found on the server.
500 - Internal Server Error - Problem occurred while processing the request

Now about the response headers. There are several most important response headers that a web 
server can return:

Date - Date and time of the response 
Content-Length - Length of the returned contents in bytes.
Content-Type - MIME type of the contents that follow.
Location - Alternative URL of the requested content, usually used with 301 and 302 status codes.
Server - Info about the web server, like type, name and version
Set-Cookie - Request that an cookie be set on the requesting browser.

One web page can have several other resourses connected to it like, javascript files, css files, 
php file etc. So when one page is requested, usually several HTTP requests/responses will be 
done in order to display the whole web page. 

HTTP requests

HTTP stands for Hyper Text Transfer Protocol. It is a part of the TCP/IP network stack i.e part of set of several networking protocols like, TCP, IP, SMTP, FTP, ICMP, IGMP and others. It used as a main protocol for exchange of hypermedia contents like web pages, video, audio, images and also file sharing. So every computer system that sends request with HTTP to the server acts like a HTTP client and the system that returns response is called HTTP server. Usually the browser (Firefox, IE, Crome) is the HTTP client and as a server it is Apache HTTP server (also Glassfish for Java, IIS for .NET and php).

Every HTTP request consists of 4 parts:

1. Request line - this part of the message tells the web server which resource (URL) the browser has requested, i.e which web page or video/audio should the browser display.
2. List of HTTP headers - HTTP headers are additional info for the resource and how the browser should display the resource. In headers there is an info for cookies, character set of the page or to which page to redirect the user.
3. Empty line - An empty line between the request line and the headers.
4. Message body (optional) - This part of the request message may contain additional data for example data from given form sent via the POST method.

Note: Every line in the HTTP request message should end with carriage return character followed by line feed character, i.e must end with "enter" which ends the line and starts new line.

Here is an example of a given HTTP request message taken from http://web-sniffer.net/  . You can try it yourself

Connect to xxx.xxx.xxx.xxx on port 80 ... ok


GET / HTTP/1.1[CRLF]
Host: www.google.com[CRLF]
Connection: close[CRLF]
User-Agent: Web-sniffer/1.0.37 (+http://web-sniffer.net/)[CRLF]
Accept-Encoding: gzip[CRLF]
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7[CRLF]
Cache-Control: no-cache[CRLF]
Accept-Language: de,en;q=0.7,en-us;q=0.3[CRLF]
Referer: http://web-sniffer.net/[CRLF]
[CRLF]
 
 The request line consists of three parts: 
1. Request method - In this case it is GET, but it can be POST, HEAD, PUT, DELETE, TRACE, 
CONNECT etc. Visit wikipedia page for more info at HTTP.
2. The url to the resource - in this case it is forward slash("/"). We don't know the index page for 
the www.google.com so the server resolves on its own. Usually it would be something like /index.php.
3. The version of the HTTP protocol - In this case it is 1.1 which is default for most modern browsers.  
 
Follows a list of the most common HTTP headers that can be sent via HTTP request

-------------------------------------------------------------------------------------
Accept - List of MIME types that a browser will accept with the returned content.
Example:  Accept: text/html, application.xml, ...
-------------------------------------------------------------------------------------
Accept-Charset - a list of charsets that the browser will accept with the returned
content. Example: Accept-Charset: ISO-8859-1, utf-8
-------------------------------------------------------------------------------------
Accept-Encoding - a list of compression methods that the browser will accept from
the response. Example: Accept-Encoding: gzip,deflate
-------------------------------------------------------------------------------------
Accept-Language - a list of languages that the browser will accept from the recieved
content. Example: Accept-Language: en-gb, pt-br
-------------------------------------------------------------------------------------
Cookie - HTTP cookie sent by the sending server. Example: Cookie: name = John, 
surname: Doe.
--------------------------------------------------------------------------------------
Host - The Host header is the only mandatory header. It is mandatory because most
modern web servers can support several websites on the same machine, Host header
is needed to resolve from which web site the web server should send response to the
browser. Example: Host: www.google.com
-------------------------------------------------------------------------------------
Referer - Referer is the referers URL. Web servers constantly log from which web
page, given new page has been visited from. So the referers url is the url of the page
that the new page has been accessed from. Example: Referer: www.web-sniffer.net
------------------------------------------------------------------------------------
User-Agent - It holds info about the browser such as the type and current version.
Example: User-Agent: Mozilla/5.0 ...Gecko/xxxxxxx  Firefox/3.0.5 
------------------------------------------------------------------------------------

So this was mainly for the HTTP requests. In the next post I will explain the HTTP responses the same
way as requests.