Squid Proxy, and Adzapper in Windows

Browsing website on the internet with high speed data access is a separate pleasure for some people including me. To open a site with no waiting time to load the page of a website is always a dream. But sometimes we’re getting bad-mood if we face a situation where the website that we would like to open are taking too long to load, because it have to wait each elements on the website to download to our computer. Images, swift files and any other elements.

A web page made up-over elements such as images, text, swift files, and others. When we want to open a web page, elements are downloaded to our computer and placed in the temporary folder of the browser we use. We can enjoy a web page perfectly after all the elements that make up the page is finished to download.

We may have a habit to visit the same website every day, especially right now, the social community era, facebook, twitter, myspace, we visit these websites just to make status, commenting a status and other activities. The same images, swift files, will be downloaded every time we visit those websites. This is actually a routine that we can make a strategy about.
Proxy is a tool that we can use to create a strategy for this routine. The proxy has the ability to store images taken from a website that we have accessed before, so the browser no longer need to download the same images from its origin. Proxies are smart enough to recognize the latest content from a website that we access, so proxy will download the latest content and storing it into a place commonly called cache.
Advertising, sometimes we are very disturbed by the ads that float on a websites that we might not want to see it, and the ads usually created using the large size of images. We also can use a proxy to block ads from a website. In addition, the proxy can also be used to block sites.

On a computer network in a company, proxies are necessary nowadays, at least use to save bandwidth and to limit the access to few websites that considered can make the employees forget about work.
The computer network administrators usually install a proxy in linux machine, and for those of you who want to install a proxy on the linux machine, please look for it on Google, there’s a lot of articles around proxy and linux machine. This time we will try to install a proxy into Windows. OK, let’s live it.

Installing Squid Proxy

1.    Download squid proxy from this link http://squid.acmeconsulting.it/download/squid-2.6.STABLE23-bin.zip.
2.    Then extract it on to C:\
3.    And then rename the squid.conf.default file on the etc folder  to squid.conf
4.    Add this line http_access allow localhost exactly under the line http_access allow localnet
5.    Then open the windows console and got to C:\Squid\sbin\, and the type:   squid.exe  -z command just like picture below:

6.    And then run squid as a windows service by typing this command on the console: squid.exe -i. And then check if the squid are installed as a window service like picture below:

7.    And then start the Squid Service.

The next step is configuring the browser in order to use the squid as web proxy, the steps are:
1.    On firefox, simply click tools and then click option, there have to be a option window appear, and in the Advanced menu, click network tab and then click setting button just like the picture below:


2.    And the on the Connection Setting window that appear after the setting button clicked, modify the connection setting just like this picture:


3.    And then browse just like we usually do, for the first time the browsing activity will look like usual,  because the squid proxy will store the contents of the web onto the cache, and later then the browser will fetch the static content from the squid cache.

Installing Adzap
To block the advertisement on the web, we need a squid plug in, which is adzapper. The problem is adzapper is a perl script that only can be execute by using perl environment. So definitely we need to install perl on our windows system. I choose Strawberry Perl as a perl platform to be installed on my windows machine. Strawberry perl can be fetch for download from this url http://strawberry-perl.googlecode.com/files/strawberry-perl-5.12.1.0.msi
After finished download the strawberry perl, then please install the strawberry pearl on your own windows machine. I have installed Strawberry Perl on C:\Web\ on my windows machine. Please pay an attention that all setting of adzapper must match to the location of your Perl installation.
And the next step is download the adzapper script from this URL: http://adzapper.sourceforge.net/scripts/squid_redirect, open the link on your browser, and after all the text/script is appeared, save the script to the C:\squid\etc\ folder and name it as squid-redirect.pl.
And the next step is configuring this adzapper plug in into the squid the we have installed before. Open the squid.conf file, this file is located on C:\Squid\etc\ folder and then add this line on the end of row of the file

redirect_program C:/Web/strawberry/perl/bin/perl.exe c:/squid/etc/squid_redirect.pl

Please make an attention that my perl installation located on C:\Web\ folder, please adjust the setting to the path of your own perl installation.
And after that, restart the proxy squid by using a window service just like the picture before, or maybe by using the command line by typing squid.exe –k reconfigure. And we have finished the adzapper script installation on squid. Now it’s time to check that the adzapper is working properly or not. Open the browser and then open the web site that you knew it have full of ads. The adzapper will block the ads juts like this picture below, and if it does, then the adzapper is working properly.

Actually there are a lot of things that we can do with squid and adzap to fill up our needs about the web-caching problems, please visit the official site of squid on http://squid-cache.org and adzapper http://adzapper.sourceforge.net/ to have a guide to maximize the both function.

Thanks

I Hope this will helpfull

 

Josescalia

Installing Squid Proxy

Advertisements

Creating Java Download Application

On internet activity, we often do download or upload. Download is an internet activity which takes a file from remote computer to our local computer.

Basically, when we download a file from a remote computer or a server, actually our computer reading byte by byte of a file that we want to download. And after all the byte read then the computer will pack it onto a file that contains perfectly same with the file that we like to download.

With this concept, let’s try to make a little experiment and this experiments will implement all the steps above, we can complete this experiment with another steps that we can gather base on our knowledge that we can find out from the above download process concept.

Let’s try to arrange the steps:

  1. URL identification.

To download a file, we need a valid URL address for sure, and the path location of a file that we want to download. Example: http://www.wayofmuslim.com/ebook-islam/AlQuranDigital.zip. this have a meaning that we want to download a file named AlQuranDigital.zip from a remote computer wayofmuslim.com and the path location of this file at ebook-islam/

  1. The size of file.

Each file that we want to download, have to find out the size of it. The purpose of this are we will be able to compare the bytes length that already read and stored in our local computer with the file that exist on remote computer, so we can find out whether the byte we have downloaded is corrupted or not before we can pack it up onto a file.

  1. The content type of file.

This is optional, we can use this concept or not, it’s an option. But knowing the content type of a file in internet technology is the basic knowledge.

With those concept actually we can create our own application that have ability to download a file from a remote computer, yes we can, because the download concept is that simple.

Now let’s proof it by arrange a scenario based on the concept above to create our own Java Download Application. The scenario is:

  1. Identifying URL.

Checking whether the supplied URL valid or not, we will use URL class that exist on Java.

  1. Valid URL.

If the URL is valid, then go straight to create a HTTP Connection to check whether our computer is connected to the internet or not. And if the URL is not valid then just exit the application and show some screens out telling that the connection is not available.

  1. Identifying the Content Type

If the Http Connection status is good, meaning that the host or remote computer can be contacted, the next step is identifying the content type of a file that we would like to download. In this try-it-out let’s just limit it, if the content type of a file is text/html, then we don’t have to download it. Why? The Http Status 404 (File Not Found) can be occurred on a file that we want to download, and we don’t need this html page to download to our computer rite?

  1. Identifying the byte length (Content Length).

Just because we will to read byte by byte of file that we want to download, so it’s very important to store information of a byte length of the file that we want to download into a variable. And for the next usage this information can be some useful things in order to compare the byte length of downloaded byte with the length of byte that exist on remote computer, to find out whether the downloaded file corrupted or not.

  1. Reading byte by byte.

This is the essential things on our try-it-out. Our application should have an ability to read byte by byte of the file that we want to download and the result of the bytes reading will put on a variable which have byte data-type. The way of bytes reading is using the looping from 0 to the content length that we have store it into a variable before.

  1. Comparing local file to a remote file.

The next step is comparing the length of bytes that already read with the length of byte that stored before on a variable. If the both are not same that’s mean the downloaded file is corrupted and this byte will not be packed onto a file.

  1. Pack the read bytes.

After we knew exactly that the length of read bytes is similar to byte length of remote computer, so it’s time to pack it up onto a file. Let’s just make it easier by naming the file exactly same with the name of file that exist on remote computer.

Using the above scenario, let’s try to code just like the source code below:

package org.mojo.download.agent;

import java.io.IOException;
import java.io.InputStream;
import java.io.BufferedInputStream;
import java.io.FileOutputStream;
import java.net.URL;
import java.net.MalformedURLException;
import java.net.URLConnection;

/**
 * Created by IntelliJ IDEA.
 * User: Mojo
 * Date: May 2, 2009
 * Time: 11:39:49 AM
 * To change this template use File | Settings | File Templates.
 */
public class SingleDownloadAgent {

    public static void main(String[] args) {
        if (args.length == 0) {
            System.out.println("Usage : java SingleDownloadAgent <URL>" );
            return;
        } else {
            SingleDownloadAgent agent = new SingleDownloadAgent();
            try {
                doDownload(args[0]);
            } catch (IOException e) {
                System.err.println("Exception e");
            }
        }
    }
    public static void doDownload(String sURL) throws IOException {
       URL u = null;
        //try URL
        try {
            u = new URL(sURL);
        } catch (MalformedURLException ex) {
            System.err.println("Malformed URL : " + ex);
            return;
        }
        catch (IOException ex) {
            System.err.println("An Error Occured : " + ex);
            return;
        }
       //reading Connection
        URLConnection uc = null;
        try {
            uc = u.openConnection();
            //identifying connection
            uc.connect();
        } catch (IOException e) {
            System.out.println("Cannot Connect: Please Check Connection");
            return;
        }

        String contentType = uc.getContentType();
        System.out.println("contentType :" + contentType);

        int contentLength = uc.getContentLength();
        if (contentType.startsWith("text/html") || contentLength == -1) {
            throw new IOException("This is html file.");
        }

        //collecting byte in var data
        InputStream raw = uc.getInputStream();
        InputStream in = new BufferedInputStream(raw);
        byte[] data = new byte[contentLength];
        int bytesRead = 0;
        int offset = 0;
        while (offset < contentLength) {
            bytesRead = in.read(data, offset, data.length - offset);
            if (bytesRead == -1) break;
            offset += bytesRead;
        }
        in.close();

        //file corrupted
        if (offset != contentLength) {
            throw new IOException("Only read " + offset + " bytes; Expected " + contentLength + " bytes ? File Corrupted…");
        }

        //writing byte data to a file
        String filename = u.getFile();
        filename = filename.substring(filename.lastIndexOf('/') + 1);
        FileOutputStream fout = new FileOutputStream(filename);
        fout.write(data);
        fout.flush();
        fout.close();

    }
}

On the above source-code we have two methods, which is the main method and the doDownload method with sURL string as it’s parameter. Let’s discuss it line by line the above source-code:

As usual, at the beginning of the code we declare the package where this class is located and the declaration of imported class that we will need later. And then on the main method we check whether the URL parameter supplied when this class is called. Indeed, this application designed has to supply the URL parameter when this application called.

package org.mojo.download.agent;

import java.io.IOException;
import java.io.InputStream;
import java.io.BufferedInputStream;
import java.io.FileOutputStream;
import java.net.URL;
import java.net.MalformedURLException;
import java.net.URLConnection;

/**
 * Created by IntelliJ IDEA.
 * User: Mojo
 * Date: May 2, 2009
 * Time: 11:39:49 AM
 * To change this template use File | Settings | File Templates.
 */
public class SingleDownloadAgent {

    public static void main(String[] args) {
        if (args.length == 0) {
            System.out.println("Usage : java SingleDownloadAgent <URL>" );
            return;
        } else {
            try {
                doDownload(args[0]);
            } catch (IOException e) {
                System.err.println("Exception e");
            }
        }
    }
………

On this main method, showed that if the URL parameter is not supplied then this application will print out the message of application usage and then directly quit. And if the URL parameter supplied then the application will continuing call the doDownload method with args[0] as it’s parameter.

Now let’s take a look at the second method which is doDownload methods:

   ………
   public static void doDownload(String sURL) throws IOException {
       URL u = null;
        //try URL
        try {
            u = new URL(sURL);
        } catch (MalformedURLException ex) {
            System.err.println("Malformed URL : " + ex);
            return;
        }
        catch (IOException ex) {
            System.err.println("An Error Occured : " + ex);
            return;
        }
    ………

On these lines we run out first scenario which is validating the URL, and we put it on a try-catch block. Why we put it on try-catch block? Because we want to know if the error occurred, where it will be happen.

…………
        //reading Connection
        URLConnection uc = null;
        try {
            uc = u.openConnection();
            //identifying connection
            uc.connect();
        } catch (IOException e) {
            System.out.println("Cannot Connect: Please Check Connection");
            return;
        }

        String contentType = uc.getContentType();
        System.out.println("contentType :" + contentType);

        int contentLength = uc.getContentLength();
        if (contentType.startsWith("text/html") || contentLength == -1) {
            throw new IOException("This is html file.");
        }
…………

And after that, on the lines above code we check the connection to host or remote computer by put it once again on try-catch block, and then the next line we find out the content type of the file that we want to download and put the information on a string variable called contentType. And then we check the content length that we want to download and put in on a integer type variable. And the next lines is continue by the branching to make a decision if the content typeof the file is type/html type, then just quit the application.

    …………
        //collecting byte in var data
        InputStream raw = uc.getInputStream();
        InputStream in = new BufferedInputStream(raw);
        byte[] data = new byte[contentLength];
        int bytesRead = 0;
        int offset = 0;
        while (offset < contentLength) {
            bytesRead = in.read(data, offset, data.length - offset);
            if (bytesRead == -1) break;
            offset += bytesRead;
        }
        in.close();

        //file corrupted
        if (offset != contentLength) {
            throw new IOException("Only read " + offset + " bytes; Expected " + contentLength + " bytes ? File Corrupted…");
        }
    …………

On the lines above, we can see the reading of byte by byte do by our application, using the open up the output-stream and then read the output stream by looping byte and using the content length as it limitation, and then put it the read result into a data variable that have byte data-type. And after the read process is finished the output-stream is closed and then we create a statement to compare the length of read bytes that stored on offset variable with the contentLength variable.

…………
        //writing byte data to a file
        String filename = u.getFile();
        filename = filename.substring(filename.lastIndexOf('/') + 1);
        FileOutputStream fout = new FileOutputStream(filename);
        fout.write(data);
        fout.flush();
        fout.close();

    }
…………

And this is the last codes that we have created, on this codes we pack up the bytes that we have read and stored into a data variable before, into a file. To naming the file, we just named as a same filename that we want to download.

That’s it, our try-it-out for this time. There’s still a lot that we can explore more from this little experiments, such as we can build a GUI as its interface, or adding some progress bar animation, etc. Please explore more of this concept to creating more perfect things.

 

 

Hope it will be useful.

Menteng, March 3rd 2009

 

Josescalia.