Text Box: PROJECT 3

 

                                                GROUP COLLABORATION TOOLS

 

PEER-TO-PEER NETWORKING - GNUTELLA      

 

                                               

Introduction

 

Our group was assigned “group collaboration” tools as our focus, including the subjects of Yahoo groups, Groove, and peer-to-peer networking programs Gnutella and Freenet.  This report will focus on Gnutella, starting with the definition of Gnutella, it’s relation to the World Wide Web, how it works, and additional information and web sites for further research.

 

What is Gnutella

 

Where did Gnutella come from? The evil-geniuses at Nullsoft (creators of Winamp) first developed the protocol in late 1999. Since Nullsoft had recently been acquired by AOL (soon to be AOL-Time Warner), the problems that would arise remain obvious. Nullsoft basically had to cease using the company’s resources to develop this technology because the record labels saw (and still see) it as a threat to their industry.

However, what the technology really accomplishes, is not a threat to any industry; rather, it creates a revamped atmosphere on the Internet, enabling users to share information like never before. To put it simply, Gnutella puts the personal interaction back into the Internet through protocol that allows peer-to-peer communication.  When you run Gnutella software and connect to the Gnutella Network, you bring with you the information you wanted to make public. And you choose what information to share. You can choose to share nothing; you can choose to share one file, a directory, or your entire hard drive (we do not recommend this option).  Because Gnutella runs over the Internet, you can connect directly with someone who's geographically far away just as easily as with your neighbor. This introduces robustness and makes the system virtually failsafe.


One of the major differences between Gnutella and Napster-like software is that those applications are centralized. That means the technology uses central servers where the government agencies can spy on you and infringe on your freedom to search the net.


So Gnutella allows you to search for information anonymously, and it allows you to search for information in a setting that differs from traditional search engines like Yahoo! because unlike search engines like that, the information is not controlled or fed to you. Nothing is pushed at you; you control what you look for.

 

The Gnutella protocol restores the Web's original symmetry, enabling even transient computers to effectively participate as servers. It's far from a complete solution, and alternative systems may eclipse it. Nonetheless, this simple and idiosyncratic protocol is currently in the vanguard of the emergence of the transient Web. The transient Web has the potential to be every bit as disruptive as the conventional "permanent" Web, and possibly more so.

 

 

Gnutella’s Relation to the World Wide Web

 

With Gnutella, file transfer is accomplished via HTTP, the same protocol Web browsers and servers use to transfer Web pages and other data. In the background, each Gnutella application contains a Web-server component for serving files and a primitive browser-like element for retrieving them.

The relation between Gnutella and the Web is therefore quite simple: Gnutella hosts are Web sites, albeit transient ones, and downloading a file from a Gnutella host is technically equivalent to fetching a file from a web site. Most Gnutella applications combine server and client functionality into a package known as a servant.  Even users who share no files, and who instead simply use Gnutella to search and download, are running (empty) web sites while they run servants. This would be equivalent to running a web server each time you started your browser.

 

The transient nature of the sites is, of course, terribly important. As a protocol, Gnutella's specification of HTTP for file transfer is rather mundane. After all, users have always been able to run Web server programs on their PCs, but the problem is that these sites and their contents were impossible to discover by other users due to the transience of the PCs. What's novel is that the Gnutella protocol addresses the problems of how to discover and search transient Web sites.

 

Why these problems deserve addressing is an entirely different story. The key point is that Gnutella can be viewed as augmenting HTTP with an additional layer of intersite routing that enables transient Web sites to be found and searched.

 

This routing supports:

1.      the broadcast transmission of queries across transient sites and the routing back of responses;

2.      the broadcast transmission of "is anybody out there?" pings and the routing back of "I'm here" responses.

 

The Gnutella network is itself similar to the Web in many ways. Open and decentralized, there is no single responsible company, no central server and no single point of failure. Gnutella is a protocol for which many developers have created compatible code, and a Gnutella network exists only to the extent these programs are running and communicating with one another. There is a general public network, and private networks can co-exist in isolation or attached to the public one. Because of this, Gnutella application developers feel themselves akin to makers of Web browser, server, proxy and other applications. They are builders of interoperable software for a network larger than the sum of its parts. In fact, they are makers of transient-Web applications. The Gnutella network is simply one form of the transient web.

 

To better understand the transient web, let's contrast it with the conventional permanent Web on some key points. The web is “web-like” because of hyperlinks. Hyperlinks work under the assumptions that content remains accessible at a fixed URL and that a server specified by the URL is available to serve the content. Unfortunately, both assumptions fail on the transient web. The machine at a given IP address may not be there tomorrow, or in one hour or five minutes or one second. For this reason, with a few exceptions, you cannot browse your way from the permanent web to the transient web, nor will you find transient web sites indexed by conventional search engines.  In fact, the transient web is presently mostly devoid of the sense of "place" that dominates the vocabulary of the permanent web. Normally, we find sites specified by a location or address.  When we can't enter a fixed address or follow a static hyperlink, this sense of place disappears. An alternative sense of  “medium" fills its shoes.

Instead of laboring to locate a particular site carrying a sought-after piece of content, we currently turn to the transient web primarily as a medium. A search goes out into the ether and answers come back. The simplicity of this is so desirable that search engines arose on the permanent web to provide precisely the same experience, although the execution is rather different.

 

 

Advantages and Disadvantages

 

On the transient web instantiated by Gnutella, a search engine is built in to the infrastructure. Without it, no one would find anything. Another way to find transient sites and their content would be to maintain a resource registry - essentially a dynamic DNS that provides a basis for location-independent URLs instead of location-dependent URLs.  A unique registry implies centralization and an external dependency contrary to Gnutella, however, which opts for a more decentralized approach with minimal reliance on outside systems.

 

The transient Web's "sense of medium" has a profound effect on marketing and distribution. As many dot-coms discovered on the conventional Web, just because you build it doesn't mean that they'll come - and if by good fortune they do come, you may not be able to handle the load. Promotional expense is required to expose your address and bring people to your site, and a key promotional tactic is to be listed as prominently as possible in search engines. In a system dominated by a sense of place, you must distribute as many signposts as possible, and reach doesn't come cheap.

Gnutella's search scheme makes the query stream a publicly accessible resource. By tuning in to the query signal on the ether, it's possible for anyone to hear a torrent of broadcast searches and route back responses with results advertising your content. In short, on the transient web enabled by Gnutella, reach is nearly free. Moreover, since content is more important than its source, users are willing to obtain it from almost any site. 

 

Users who download content can easily become re-distributors, leading to the phenomenon known as "superdistribution." The upshot: On the transient web, distribution is almost free.

 

However, Gnutella's search capability is not perfect. From the point of view of the searcher, there is no guarantee your query will reach the sites holding what you seek, and the results that you do receive will arrive in a jumble. From the point of view of the content provider, there is no guarantee you will hear every query you're interested in hearing; maximum possible reach might still take some effort.

 

While Gnutella's search functionality makes queries public, it keeps them anonymous to a degree. Each query is assigned a unique ID at its source. As queries are handed from site to site across the network, each site keeps a temporary record in memory of which neighboring site handed it which query, but no record is passed of who originated the query. In this way, query responses have to route back through the chain, and only your immediate neighbors can correlate your IP address with your queries. The privacy of your queries is therefore dependent upon the hosts to which you are connected, which are likely to be operated by random users such as yourself. By contrast, on the permanent web the privacy of your queries is dependent upon the policies of the search engine you use.  The modest measure of anonymity afforded to queries does not extend to downloading. Some Gnutella applications, e.g. BearShare, keep logs in standard web server format of all file requests and downloads. This strengthens the concept of a running instance of BearShare being effectively equivalent to a transient web site, and it gives the BearShare operator as much visibility into his site's traffic as any web site operator has: time-stamped records of IP addresses and the files they requested and downloaded. Anonymity in downloading is strengthened to the degree transient web site operators do not keep or do not review logs, just as with conventional web sites.

 

Sharing files by hosting them on a transient web site is only somewhat more anonymous than doing so on a permanent web site. It is not difficult to detect and track new sites on the public network.  In fact, simply connecting a Gnutella application to the network will result in passive discovery of host addresses. BearShare, LimeWire and other servents support browsing of the content available on a given site, making it easy to see the entirety of a user's shared files. In the end, the anonymity enjoyed by the operator of a transient web site is no stronger than an ISP's records of and policies related to tracking which customers were assigned which IP addresses when. 

 

Gnutella applications could have security vulnerabilities - cookie and other files may be exposed due to bugs in servents - but no such faults have been identified thus far. This concern is a good argument for installing well-supported, well-tested Gnutella servents instead of unknown applications.

 

A scarcity of hyperlinks, a lack of sense of place, a built-in search engine, negligible marketing costs, negligible distribution costs, semi-anonymous broadcast querying, downloading and sharing anonymity dependent on other users and ISPs;  the transient web as realized through Gnutella certainly introduces some new wrinkles relative to the web we're used to.

 

 

How Gnutella Works

 

The first step in searching for transient web sites and sharing your files via Gnutella, is to get connected.  Meaning, getting connected to the internet / WWW by way of a Internet Service Provider, Network, etc.  Once connected to the internet, you have two choices as to how to use the Gnutella protocol, either by downloading and installing any of several available clients or by visiting a Gnutella web search site like Gnutella.it.

 

Gnutella.it

 

Using this powerful search engine you will be able to search and download from the Gnutella network using a simple web interface.  Thanks to smart caching algorithms, the search results appear within seconds. 


The main window shows the results which are cached in the Gnutella.it database, and they represent the files found within the last few hours.

When you enter a query, a popup window will open and a realtime query will be made.  The timeout of this query is set to 90 seconds in order to increase the likelihood to get relevant results.  The "age" field indicates how much time ago (in minutes) the filelink was added to the cache. The smaller this value, the bigger the probability that the link works.

 

The following is a view of the Gnutella.it main search screen, returning results for a search on “engineering” in alphabetical order.

 

 

 

 

 

 

 


 

 

 

 

 

 

 

 

 

 

Gnutella Clients

 

There are several Gnutella clients available for free download, that can be choosen based on your operating system, and level of functionality you wish to have.  Each application contains a web server component for serving files and a primitive browser element for retrieving them.  A connection to the internet needs to be established, then the downloaded and installed  application run to begin using the Gnutella network.  The following is a list of all the client applications available via the internet which use Gnutella protocol and allow connection to the Gnutella network:

Text Box: WINDOWS
Gnucleus
BearShare
Morpheus
Swapper
XoloX
LimeWire
Phex
 
 
Text Box: MACINTOSH
LimeWire
Phex
 
 
Text Box: UNIX
Gnewtellium
Gtk-Gnutella
Mutella
Qtella
LimeWire
Phex
 
 

 

 

 

 

 

 

 

 

 

 

An example of one of these client applications is Gnucleus.  The two screen prints below, starting with the left figure, shows the browser like nature of this application.  Gnucleus provides four basic functions; a search page, an area to designate files to share over the Gnutella network, a file transfer menu, and an area to chat with others on the network.  The right figures, demonstrates the results of a search in Gnucleus.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Gnutella Searching/Query

 

The protocol for obtaining information over Gnutella is a kind of call-and-response that's more complex than simply pushing news or e-mail. Unlike a centralized server network, the Gnutella network does not use a central server to keep track of all user files.  As a result, the Gnutella model enables file sharing without using servers that do not actually directly serve content themselves.

 

The figure below shows the operation of the protocol. Suppose site A asks site B for data matching "MP3." After passing back anything that might be of interest, site B passes the request on to its colleague at site C -- but unlike mail or news, site B keeps a record that site A has made the request. If site C has something matching the request, it gives the information to site B, which remembers that it is meant for site A and passes it through to that site.

 

The next diagram is another way at looking at the query “path” that Gnutella uses.  Although the first three computers do not have the information the user is looking for, they can be used as a means to connect to the computer that has the information using “pings” and “responses” from connected clients.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 



Gnutella Searching Tips

 

Since the protocol of Gnutella is different than a typical web search engine, there are a few guidelines to use when searching the Gnutella network:

 

o        use key words only

o        do not use astriks, wild cards, dashes, commas, or periods

o        search results are immediate

o        processed by everyone connected

o        new connections made while searching will also be searched and results added

o        no re-search function needed

 

 

Sites/Links/Resources and Additional Information

 

Gnutella Web Sites

 

www.gnutella.com

www.gnutella.it

www.openp2p.com

www.gnutella.wego.com

www.limewire.com/index.jsp/p2p

 

Gnutella FAQ’s

 

www.gnutellaforums.com

www.gnutellanews.com

www.gnutelliums.com

 

 

 

Conclusion

 

The key challenge of building wide area peer-to-peer systems is having a scalable and robust location service.  Gnutella provides this robustness, being a broadcast-based decentralized location service that is a searching and discovering network, promoting free interpretation and response to queries.  The easily available Gnutella clients and Gnutella based web site search engines give hits in the form of filenames, advertising, messages, URL’s, graphics, and other arbitrary content, that allows discovery of the transient web.