Distribution

distributed distribution Peer networks can be used to deliver the services known as Content Distribution Networks (CDNs), essentially comprising the storage, retrieval and dissemination of information. Companies such as Akamai and Digital Harbour have already achieved significant success through installing their own proprietary mdels of this function on a global network level, yet the same functions can be delivered by networks of users even where they have only a dial-up connection. Napster constituted the first instantiation of this potential and subsequent generations of file-sharing technology have delivered important advances in terms of incrasing the robustness and efficiency of such networks. In order to understand the role that peers can be play in this context we must first examine the factors which determine data flow rates in the network in general. The slow roll-out of broadband connections to home users has concentrated much attention on the problem of the so-called 'last mile' in terms of connectivity. Yet, the connection between the user and their ISP is but ne of four crucial variables deciding the rate at which we access the data sought. Problems of capacity exist at multiple other points in the network, and as the penetration of high speed lines into the 'consumer' population increases these other bottlenecks will becme more apparent. If the desired information is stored at a central server the first shackle on speed is the nature of the connection between that server and the internet backbone. Inadequate bandwidth or attempts to access by an unexpected number of clients making simultaneous requests will handicap transfer rates. This factor is known as the 'first mile' problem and is highlighted by instances such as the difficulty in accessing documentation released during the clinton impeachment hearings and more frequently by the 'slash-dot effect'. In order to reach its destination the data must flow across several networks which are connected on the basis of what is known as 'peering' arrangements between the netwrks and faciltated by routers which serve as the interface. Link capacity tends to be underprovided relative to traffic leading to router queuing delays. As the number of ISPs continues to grow this problem is anticipated to remain as whether links are established is essentially an economic question. The third point of congestion is located at the level of the internet backbone through which almost all traffic currently passes at some point. The backbones capacity is a function of its cables and more problematically its routers. A mismatch in the growth of traffic and the pace of technological advance in the area of router hardware and software package forwarding. As more data intensive trasfers proliferate this discrepancy between demand and capacity is further exacerbated leading to delays. Only after negotiating these three congestion points do we arrive at delay imposed at the last mile. What are the benchmarks to evaluate Quality of Service ("Typically, QoS is characterized by packet loss, packet delay, time to first packet (time elapsed between a subscribe request send and the start of stream), and jitter. Jitter is effectively eliminated by a huge client side buffer [SJ95]."Deshpande, Hrishikesh; Bawa, Mayank; Garcia-Molina, Hector, Streaming Live Media over a Peer-to-Peer Network) Current Technologies Current Implementations 1. Storage Service Providers descriptions of akamai freeflow hardware software mix: algorithms plus machines mapping server (fast to check hops to region) and content server http://www.wired.com/wired/archive/7.08/akamai_pr.html sandpiper applications Akamai 13,000 network provider data centers locations edge servers click thru - 20% - 10 - 15% abdonmnet rates 15% + order completion - overladen web servers - reduce delays first static now dynamic and customized (edge server) fig.1 trad server distributed server illustrate delivery speed determinants database/legacy ----- middleware ----- client browser middle - performance/security/simplification of client program operation IRAC issue: cache management TTL value Issue: personalisation/cookie/cms driven content Load Balancing "Load balancing is a technique used to scale an Internet or other service by spreading the load of multiple requests over a large number of servers. Often load balancing is done transparently, using a so-called layer 4 router?." [wikipedia] Lb Appliances LB Software LB Intelligent Switches Traffic Distributors Supernodes Gnucleus Bearshare and Limewire are all compatible. Cisco (DistributedDirector), GTE Internetworking (which acquired BBN and with it Genuity's Hopscotch), and Resonate (Central Dispatch) have been selling such solutions as installable software or hardware. Digex and GTE Internetworking (Web Advantage) offer hosting that uses intelligent load balancing and routing within a single ISP. These work like Akamai's and Sandpiper's services, but with a narrower focus. - wired Data providers concerned to provide optimal delivey to end users are increasingly opting to use specialist services such as Akamai to overcome these problems. Akamai delivers faster content through a combination of propritary load balancing and distribution algorithms combined with a network of machines installed across hundreds of networks where popularily requested data will be cached. (11,689 servers across 821 networks in 62 countries). This spead of servers allows the obviation of much congestion as the data is provided from the server cache either on the network itself (bypassing the peering and backbone router problems and mitigating that of the first mile) or the most efficient available network given load balancing requirements. File Sharing Technologies Popular file sharing utilities arose to satisfy a more worldly demand than the need to ameliorate infrastructural shortfalls. When Shaun Rhyder released his Napster client the intention was to allow end-users to share MP3 files through providing a centralised index of all songs available on the network at a given moment and the ability for users to connect to one another directly to receive the desired file. Essentially popular file sharing utilities enable content pooling. Napser's legal woes generated the necessary publicity to encourage user adoption and for new competitors to enter the market and to innovate further. In the following section I describe some of the later generations of file sharing software and chart their innovations which have brought them into a space of competition with Akamai et al. Original implementation has been credited to [Justin Frankel]? and [Tom Pepper]? from a programming division of AOL (then-recently purchased Nullsoft Inc.) in 2000. On March 14th, the program was made available for download on Nullsoft's servers. The source code was to be relased later, supposedly under the GPL license. The event was announced on Slashdot, and thousands downloaded the program that day. The next day, AOL stopped the availability of the program over legal concerns and restrained the Nullsoft division from doing any further work on the project. This did not stop Gnutella; after a few days the protocol had been reverse engineered and compatible open source clones started showing up. (from Wikipedia) [ENTER DESCRIPTION] The greatest blind spot in McChesney’s analysis however concerns his silence on the issue of intellectual property. Thus, he devotes a section of his internet-chapter to examining the role played by a traditional media manufacturers in determining the contours of the new landscape, their advertising forecasts, their partnerships for the distribution of music, their ownership of high-profile brands etc. without so much as mentioning the important evolution which is taking place in file-sharing technology that is revolutionizing media distribution. What began as a basically centralized model vulnerable to legal attack (Napster) has evolved through at least two further generations. The Gnutella network (Bearshare/Limewire) represents the first, which is decentralized client server application. This allows a much more robust network in the sense that connectivity is not dependent on the legal health of a single operator. A trade-off with this is inefficiency in the locating of files and the problem of free riding users, which actually impede the functionality of the system beyond simply failing to contribute material. Limewire addresses this problem to some degree by providing the option to refuse to download files to users who do not share a threshold number of files. Unfortunately this cannot attenuate the problem of inefficient searches per se, merely offering a disciplinary instrument to force users to contribute. In order to sharpen search capacities in the context of a problematic network design, these networks have taken recourse to nominating certain nodes as super-peers, by virtue of the large number of files they are serving themselves. While essentially efficacious, the consequence is to undermine the legal robustness of the network. The threat is made clear in a paper published last year by researchers at PARC Xerox that analyzed traffic patterns over the Gnutella network and found that one per cent of nodes were supplying over ninety per cent of the files. These users are vulnerable to criminal prosecution under the no electronic theft act and the digital millennium copyright act. The music industry has been reluctant to invoke this form of action thusfar, principally because of their confidence that the scaling problem of the Gnutella community reduces the potential commercial harm it can inflict. As super-peering etc. becomes more effective this may change. Another interesting attribute of the limewire system is the option it provides to set up virtual private networks, so that users can establish perimetered community based upon their own social affinities. Now this is the nightmare of the IP police. Third generation file sharing systems begin with the Freenet architecture outlined by Ian Clarke in 1999. Although the Freenet network has not achieved anything like the same adoption scale as other systems, its design characteristics set the standard, which has been emulated by others, specifically those built on top of the ‘fast track’ system. The crux of Freenet’s genius is in its adoption of ‘small world’ organization. This refers to the experiment carried out by Milligram in the 1960s where 160 people throughout the United States were given letters to be delivered to stockbrokers and asked to pass them only through people that they knew to get them to their final destination. 42 of the letters arrived, using an average of 5.5 intermediaries. The purpose was to illustrate the level of social interconnectivity, and is an experience with which most us are familiar, as when one meets a stranger from a distant clime and discover that you know someone in common. It’s not that everyone has such an expansive social sphere, but rather that there are individuals whose circle of acquaintance cuts across a wide range of social groups. Freenet utilizes this principle through by giving its software a feature, which allows it to retain knowledge of the content available on other nodes; information is retained between sessions. The result is search capability an extremely effective storage and retrieval system. As a result this feature has been emulated by systems such as Audio Galaxy, Kazaa. A crucial point in all of this is that both Gnutella and Freenet are open source/free software, thus allowing non-commercial motivated individuals and groups to take up the baton as the main players progressively move towards a rapprochement with industry. Napster has died attempting to placate its erstwhile enemies, whilst Kazaa will not allow downloads above 128 kilobytes per second in an attempt to appease the same industry, with whose representatives they are currently in negotiation for a license to move to a full commercial platform. These are both proprietary technologies so that they can exclude any rivalrous non-compliant competitors. Audio Galaxy however is under the General Public License. AG deals with the ‘tragedy of the commons’ in a more determined manner(!). Specifically, it only allows the user to transfer more than one file at a time if they are sharing a minimum of 25 files. Likewise, there is no option to not share – the only means of not sharing is to exit AG, which means of course that the user cannot download files either. Similar systems are now been offered by these companies to commercial media distributors such as Cloudcast (Fasttrack) and Swarmcast, using technical devices to allow distributed downloads that automate transfer from other notes when one user logs off. The intention here is clearly the development of software based alternatives to the hardware offered by Akamai, the principle player in delivering accelerated downloads and used by CNN, Apple and ABC amongst others. The point of all this is that there is distribution system available now that can allow the global distribution of critical media. This network is not globally inclusive and is predicated upon access to a telephone line, computer and (preferably) a high speed network connection, but other more powerful economic forces are driving the permeation of all these technologies so that this is a problem which will be progressively mitigated. In any case, exclusion is a fact of all media, whether one considers literacy (print), purchase capacity (television/satellite). Radio is probably fundamentally the most democratic media in an ideal sense, since the cost of acquisition of a receiver is relatively low, and the spread of linguistic range in the content available is basically quite comprehensive. technical descriptions napster gnutella fast track innovations freenet (search algorithms theodore hess) Milligram anecdote open source v proprietary commerical implementations swarmcast, cloudcast, upriser The top four file-sharing systems -- FastTrack, Audiogalaxy, iMesh, and Gnutella -- were used to download 3.05 billion files during August, according to Webnoize. edonk: Client-server based sharing/chat network with sophisticated multi-source downloading (download from someone else even when he's still downloading the same file). FastTrack -- the technology used by Consumer Empowerment, one of the companies sued on Wednesday --has seen traffic grow 60 percent a month over the course of the year. With 970 million files shared, it's the most used file-trading application on the Internet. The other three services -- Audiogalaxy, iMesh and Gnutella -- had 2.08 billion files swapped using the decentralized networks. While none of the systems tops Napster's peak performance of 2.79 billion files shared, industry experts believe it is only time before these services surpass Napster. edonkey sharereactor, filedonkey, filenexus Economic Factors Influencing Peer Distribution The motivation atttracting participation in these networks remains that which inspired Napster's inventor: the opportunity to acquire practically unlimited content. Early in the growth of Napster's popularity users realised that other types of files could be exchanged apart from music, as all that was required was a straightforward alteration of the naming protocal such that the file appeared to be an MP3 (Unwrapper). Later applications were explicitly intended to facilitate the sharing of other media such that that today huge numbers of films, television programs, books, animations, pornography of every description, games and software are available. The promise of such goodies is obvuiously an adequate incentive for users to search, select and install a client server application and to acquire the knowledge necessary to its operation. Inuitive Graphical User Interfaces enable a fairly rapid learning curve in addition to which a myriad of users discussion forums, weblogs and news groups provide all that the curious or perplexed could demand. Internet access pricing plans obviously the key determinant. Motivation - performance - access to goods in kind Whilst it is obvious why users utilise these tools to extract material, it is not so plain why they should also use them to provide material in turn to others and avoid a tragedy of the commons. Key to the willingness to provide bandwidth has been the availability of cable and DSL lines which provide capacity in excess of most individuals needs at a flat rate cost. There is thus no correlation between the amount of bandwidth used and the price paid, so in brief there is no obvious financial cost to the provider. In areas where there are total transfer caps or use is on a strictly metered basis participation is lower for the same reason. For those on flat-pricing packages there are some costs imposed, such as a slow-down in www access rate. A combination of these factors has given rise to free-riding problems as evidenced by the study carried out by researchers at PARC Xerox on the composition of the Gnutella network [ENTER MORE DATA]. There is a fairly high degree of consciousness of this problem however (such users are referred to as 'leeches' and are the subject of endless vitriol on file-sharing boards) and many applications have implemented features to address the issue, a matter o which we will return to below under the rubric of collective action mechanisms. Dangers appropriation Fill in story about Morpheus switch from fasttarck to gnutella free riding h. Freeriding and Gnutella: The Return of the Tragedy of the Commons: Bandwidth, crisis of P2P, tragedy of the commons, Napster's coming difficulty with a business plan and Mojo Karma. Doing things the freenet way. Eyton Adar & Bernardo Huberman (2000) Hypothesis 1: A significant portion of Gnutella peers are free riders. Hypothesis 2: Free riders are distributed evenly across different domains (and by speed of their network connections). Hypothesis 3: Peers that provide files for download are not necessarily those from which files are downloaded. " In a general social dilemma, a group of people attempts to utilize a common good in the absence of central authority. In the case of a system like Gnutella, one common good is the provision of a very large library of files, music and other documents to the user community. Another might be the shared bandwidth in the system. The dilemma for each individual is then to either contribute to the common good, or to shirk and free ride on the work of others. Since files on Gnutella are treated like a public good and the users are not charged in proportion to their use, it appears rational for people to download music files without contributing by making their own files accessible to other users. Because every individual can reason this way and free ride on the efforts of others, the whole system's performance can degrade considerably, which makes everyone worse off - the tragedy of the digital commons ." Figure 1 illustrates the number of files shared by each of the 33,335 peers we counted in our measurement. The sites are rank ordered (i.e. sorted by the number of files they offer) from left to right. These results indicate that 22,084, or approximately 66%, of the peers share no files, and that 24,347 or 73% share ten or less files. The top Share As percent of the whole 333 hosts (1%) 1,142,645 37% 1,667 hosts (5%)2,182,08770%3,334 hosts (10%) 2,692,082 87% 5,000 hosts (15%)2,928,90594%6,667 hosts (20%)3,037,23298%8,333 hosts (25%)3,082,57299%Table 1 And providing files actually downloaded? Again, we measured a considerable amount of free riding on the Gnutella network. Out of the sample set, 7,349 peers, or approximately 63%, never provided a query response. These were hosts that in theory had files to share but never responded to queries (most likely because they didn't provide "desirable" files). Figure 2 illustrates the data by depicting the rank ordering of these sites versus the number of query responses each host provided. We again see a rapid decline in the responses as a function of the rank, indicating that very few sites do the bulk of the work. Of the 11,585 sharing hosts the top 1 percent of sites provides nearly 47% of all answers, and the top 25 percent provide 98%. Quality? We found the degree to which queries are concentrated through a separate set of experiments in which we recorded a set of 202,509 Gnutella queries. The top 1 percent of those queries accounted for 37% of the total queries on the Gnutella network. The top 25 percent account for over 75% of the total queries. In reality these values are even higher due to the equivalence of queries ("britney spears" vs. "spears britney"). Tragedy? First, peers that provide files are set to only handle some limited number of connections for file download. This limit can essentially be considered a bandwidth limitation of the hosts. Now imagine that there are only a few hosts that provide responses to most file requests (as was illustrated in the results section). As the connections to these peers is limited they will rapidly become saturated and remain so, thus preventing the bulk of the population from retrieving content from them. A second way in which quality of service degrades is through the impact of additional hosts on the search horizon. The search horizon is the farthest set of hosts reachable by a search request. For example, with a time-to-live of five, search messages will reach at most peers that are five hops away. Any host that is six hops away is unreachable and therefore outside the horizon. As the number of peers in Gnutella increases more and more hosts are pushed outside the search horizon and files held by those hosts become beyond reach. Easily isolated providers are set up for litigation by the RIAA etc. Solutions? i. In the "old days" of the modem-based bulletin board services (BBS), users were required to upload files to the bulletin board before they were able to download. ii. FreeNet, for example, forces caching of downloaded files in various hosts. This allows for replication of data in the network forcing those who are on the network to provide shared files. iii. Another possible solution to this problem is the transformation of what is effectively a public good into a private one. This can be accomplished by setting up a market-based architecture that allows peers to buy and sell computer processing resources, very much in the spirit in which Spawn was created trust - collective action mechanisms - hashing Security and privacy threats constitute other elements deterring participation both for reasons relating to users normative beliefs opposed to surveillance and fear of system penetration by untrustworthy daemons. The security question has recently been scrutinised in light of the revelation that the popular application Kazaa had been packaging a utility for distributed processing known as Brilliant Digital in their installer package. Although unused thusfar it emerged that there was the potential for it to be activated in the future without the knowledge of the end-user. Viruses .vbs and .exe files can be excluded from searches. MP3s etc are data not executables. Virus spreads via Kazaa (but the article wrongly identifies it as a worm): http://www.bitdefender.com/press/ref2706.php Audio Galaxy: Contains really ugly webHancer spyware that may make your Internet connection unusable. Other Costs CPU Resources Kazaa supernode will use a max of 10% of total CPU resources. Allowa na opt-out. Commercial Implementations According to study executed in early 2001 by Viant consulting there are now more than 500, 000 television and film files being exchanged every day over file sharing networks and through connections made in IRC [tcc p.16 for stats and methodology]. That this is bad news for the copyright owners will not be explored here, rather the fact that this form of P2P provision of the archetypal data heavy content is taking place between users already. In the same reprot the authors assert that content companies have themselves been experimenting with the distributional potential of networks such as gnutella. (Viant, The Copyright Crusade see fn 47). interesting comparisan of acquisition times in TCC at p. 28 http://www.badblue.com/w020408.htm http://www.gnumarkets.com/ commerical implementations swarmcast, cloudcast, upriser mojo nation's market in distributed CDN. Design considerations impeding performance for the sake of other normative objectives freenet - censorship resistance impediment tangler kazaa/morpheus bitrate encoding limit for copyright reasns, easily hacked. Open Source or locked up? closed Bearshare Kazaa Grokster Edonkey Open Limewire GPL Gnucleus Collective Action Mechanisms Limewire Slots & Bandwidth Throttling Gnotella (Windows) Easy-to-use and very popular client written in VB, with many powerful features (search & response filtering, decent bandwidth regulation, multiple searches, private networks, skins..). Limewire: Upload slots represent the number of files other users can download from you at any one time. The default number of slots varies based upon the connection speed you set at installation, and the default bandwidth usage is set at 50 percent of your connection speed. You can self-configure your number of upload slots and percentage of bandwidth usage by clicking on tools>options>uploads. Gnucleus Another new feature is Scheduling, which lets you tell Gnucleus to run on the Gnutella network at certain times during the day. This is useful for people who want to run Gnucleus only at times when the load on their local network is low, like at a college someone might configure Gnucleus to run at night so during the day academic use of the network would not be bogged down. Or at a company so day-time business traffic would not be affected. storage Swarmed Downloads LimeWire 2. First of all, we've allowed for 'swarmed' downloads. If the file you are looking for can be located at multiple hosts, LimeWire will attempt simultaneous downloads from those sources, spidering different portions of the file. Consequently, you'll get your files MUCH faster than what you are used to. Multi-Source Downloads multi-source downloading? A: A particular file may be available on more than one remote computer. With multi-source downloading these various sources are grouped together, and if one sources fails for some reason then another host can take its place. Importance of Hashing and CPU consumption BearShare hashes all your existing files when launched. This is a one-time activity and should not consume more than 25% CPU utilization. Q: What is hashing? A: Hashing is a calculation done on each file to produce a small, unique "hash". BearShare compares hashes to determine if two files are identical. It is important to do this sort of comparison to guarantee that the files being compared are the same, especially when swarming. Superpeering and the erosion of pure peer to peer Early 2001 Limewire new Gnutella hierarchy, whereby high performance machines become 'Ultrapeers'. These machines accept connections from many LimeWire clients while also connecting to the rest of the Gnutella network. Moreover, the Ultrapeer shields these 'regular' LimeWire clients from the CPU and bandwidth requirements associated with Gnutella, directing traffic to clients in an efficient manner. : Any KaZaA Media Desktop can become a SuperNode if they have a modern computer and are accessing the Internet witha broadband connection. Being a SuperNode does not affect your performance noticeable. Other KaZaA users in your neighbourhood, using the same Internet Service provider or located in the same region as you, will automatically upload a small list of files they are sharing to you. When they are searching the will send the search request to you as a SuperNode. The actual download will be directly from the computer who is sharing the file and the persons who is downloading the file, peer-to-peer. Retrival Connections Every connection costs bandwidth of approximatels .5k per second Smart Downloading Smart downloading will retry a given download until it is successful. In other words, if you have tried to retrieve a file from a similar group of files, then LimeWire will try to download any of these sources until itÕs successful. Will also auto resume if interrupted. Search Considerations Search and response take place over the same route, that's why methodology is so important. The fiole is then transferred directly using a HTTP interface Exclusion If there are particular IP addresses you wish to ignore (if, for example, a particular IP address was sending you unsolicited results), click under Hosts where you could enter that IP address into the 'Ignore these hosts' window and click Add. Q.? Is there any means for these networks to allow prioritization of files stored on the same network or on a network nearby so as to minimize the need to travel over the backbone, through multiple peering interfaces etc? AG: The Satellite automatically selects the closest user with the file you want, reducing external bandwidth usage. Kazaa Automatically clusters traffic by network topology to provide fastest download speed and minimal load on ISP backbones. Sharing Incentives Default sharing: Limewire: automatic sharing of downloaded files. Limewire also allows you to require a minimum number of shared files before allowing a download to a given user. AG You must first share at least 25 files to be able to increase the number of simultaneous transfers you can have. distribution - load balancing existing solutions dedicated servers web caching expanding hard disk memory size necessary preconditions flat pricing? broadband figures [see TCC fn 9] - take up - proportion of capacity utilised dial up connections caching server farms mirrors economic redundancy in uncalibrated approach cost of server collocation cost of memory speed of memory capacity growth eDonkey is a client-server-based file sharing network It also means that in the moment you start a download, you may already be uploading the very same file to someone else. This also fixes the freeloader problem since even if you don't share any files, your bandwidth is always put to good use. you only tell the donkey how much bandwidth you want to use for uploads and for downloads. www.sharereactor.com cosmoed2k.da.ru "The VCR is to the American film producer and the American public as the Boston Strangler is to the woman alone." - Jack Valenti, MPAA ------------------ ***************** Three commercial online music suppliers Pressplay Musicnet Full Audio/Clear Channel MusicMatch, MusicNet and FullAudio don't permit burning Pressplay, Emusic, Rapsody burning allowed http://www.neo-modus.com/?page=News http://www.neo-modus.com/?page=Help http://www.climate-dynamics.rl.ac.uk/ FastTrack is the file-trading software being used by Consumer Empowerment, which licenses its technology to Kazaa.com, Grokster and MusicCity. Fraud. Several prolific warez kiddies figured out how to change their MAC address to bill their service to their neighbors or even to our own router (!). We're still not sure exactly how that happened. Sure, we cut them off and connected their modems to a high voltage source as punishment (our contract allowed it), but how many more are there who we didn't catch? Billing issues. People who obviously ran up a very high bandwidth bill would call us and complain when they got their statements, asking us to lower their bills. Our position was that it wasn't our responsibility that they couldn't figure out how to close Napster or stop downloading porn. When they paid with credit card we would sometimes lose the dispute, but things were okay when they paid with cash or check. Expect ation of quality. As you know, a cable modem is a shared medium and cable companies are not at fault for your neighbors' downloading habits. However, it was considered a potential legal liability to be providing a service of varying quality. Modem, 56 Kbps Cable, 512 Kbps T1, 2 Mbps Picture,200 Kb 40 seconds 2 seconds 2 seconds Music track, 4 Mb 13 min 30 seconds 1 minutes 15 seconds Full-length movie, 400 Mb 22 hours 1 hour 45 minutes 25 minutes Five-minute video clip, 20 Mb 1 hour 6 minutes 2 minutes 200 page Novel, 1 Mb 4 minutes 15 seconds 4 seconds For example, a T3/DS3 connection has a capacity of 45 Mbps, while a stream with 30 fps, at 320x240 pixels can have a rate of 1 Mbps. Under such conditions, only 45 clients can be provided a maximum resolution video stream.