Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Technology

Mathematical Analysis of Gnutella 332

jrp2 sent in a paper written by one of Napster's founding engineers. It is a mathematical evaluation of Gnutella discussing why the network won't be able to scale up to any reasonable size. I have been impressed with Gnutella in the past, and have wondered along these same lines in the past.
This discussion has been archived. No new comments can be posted.

Mathematical Analysis of Gnutella

Comments Filter:
  • by BitterOak ( 537666 ) on Tuesday January 15, 2002 @07:32PM (#2845701)
    This has been discussed since shortly after Gnutella came out. It is essentially arguing that if each peer asks several other peers if a song is available, you get an exponential growth of search requests and the network clogs if you have too many users. This was basically the rationale for the Fasttrack type of systems, such as Morpheus, which most people use today.

    • Re:This is old news. (Score:4, Informative)

      by RovingSlug ( 26517 ) on Tuesday January 15, 2002 @08:33PM (#2845994)
      Speaking of the merits of Fasttrack, it also provides robust parallel download and auto-resume. Using Gnutella is painful by comparison.
      • Mod that man up.

        I have never had much luck using Gnutella, the main problem seems to be the lack of parallel download, if you have 20 users all with the same file you want, it is dismally painful to have to pick one.

        Fasttrack on the other hand (Kazaa has a linux client that is IMHO better than the bloated windows offering) works very well in this regard. Choose a file and the client download it in parallel from as many clients as it can, makes for much quicker transfer.
        • Totally NOT true!!! (Score:5, Informative)

          by Jagasian ( 129329 ) on Tuesday January 15, 2002 @11:06PM (#2846521)
          Obviously you haven't used GNUtella for the past year. Xolox [zeropaid.com] is a GNUtella client that allows for parallel downloading, resuming, and Xolox will even look for other sources of the file that you are currently downloading, if the current sources are too slow or down. Basically, with Xolox, you search for a file that you want, and you get results with numbers by them depicting how many sources have the file. That way you don't have to decide which source you want to download from. You decide which file you want to download... and Xolox figures out the rest.

          My average download speeds on Xolox are around 160Mbs. Of course, I am use the ever so crappy AT&T cable modem service... so other people on faster DSL lines will most likely experience faster downloads.

          Next thing you are going to tell me is that Windows is better than Linux because Linux doesn't have any good GUIs or desktop environments for it. Yeah, lets just ignore everything thats out there right now.

          Not only that, but Limewire also supports multisource, segmented, or swarmed downloading. Though Limewire has only recently gotten such functionality, while Xolox has had it for the past year.

          Oh, and GNUtella is free as in beer and as in speech.
          • I think most of us that are familiar with gnutella are using sLimeWire which sucks, or gnapster with the gnutella servers, which also sucks. I'm glad there's a good alternative.
          • Too late... (Score:3, Informative)

            by vrt3 ( 62368 )
            From http://www.xolox.nl [xolox.nl]:
            Dear XoloX-user, Taking into account the latest law suits against p2p clients based on Fasttrack-technology (such as Kazaa), we have decided to discontinue XoloX. As of the 1st of december, XoloX will be shut down and removed from distribution sites. We hope everybody has enjoyed XoloX as long as it has been around and we want to use this opportunity to thank everybody who made a contribution to its development. These last few days will give you some time to finish your downloads and we advise you not to start new transfers. Thanks again and goodbye! --Team XoloX--
          • a paper written by one of Napster's founding engineers

            Just when they lauch their pay service. No, I assure you his/her analysis is totally and utterly impartial. Excuse me while I ask Bill Gates about the scalability of the Linux kernel.
    • what would be funny is if an economist worked on the gnutella project, and they wrote up a paper on how the new napster business model would never succeed.

      or maybe only i would find that funny.
      • Well, I would find that funny. Of course, you don't really need to be an economist to know the new napster will fail, just like you don't need to be a computer scientist to know the Gnutella network was fucked (at least in it's original conception)
  • by Tackhead ( 54550 ) on Tuesday January 15, 2002 @07:33PM (#2845707)
    Earth: Mostly harmless.

    Napster: Sucks ass.

    Gnutella: Doesn't scale.

    (Mod my ass as Flamebait for this, but didn't everyone know about Gnutella's scaling problems, and for-pay Napster sucking ass, based on Slashdot stories months and weeks before today?)

    • by Phexro ( 9814 ) on Tuesday January 15, 2002 @07:52PM (#2845801)
      Earth: Mostly harmless.

      you must have the new edition.
      • Earth: Mostly harmless.
        you must have the new edition.

        With the state of the world being as it is now, i'd say this is an outdated issue... Maybe "partially lethal" would be more appropriate.
  • old news (Score:4, Informative)

    by Silver A ( 13776 ) on Tuesday January 15, 2002 @07:33PM (#2845709)
  • It is a mathamatical evaluation of Gnutella

    Someone has not passed his grammatical evaluations at school :)
    • mathamatical is a spelling error not a grammatical one.
  • Ancient news. (Score:5, Informative)

    by RareHeintz ( 244414 ) on Tuesday January 15, 2002 @07:37PM (#2845722) Homepage Journal
    This story is over 11 months old [slashdot.org]!

    I mean, I know that none of us - including our fine moderators - are perfect, but are they at least paying attention?

    OK,
    - B

    • Worse, there is even an older article [slashdot.org] about this subject.
    • by hodeleri ( 89647 ) <drbrain@segment7.net> on Tuesday January 15, 2002 @08:38PM (#2846013) Homepage Journal
      If you read through this research paper it'll start with N=4 and T=5. As you continue to read through the paper he quotes bandwidth figures from his table using various other N and T values.

      For example, in the very last table (Bandwidth rates for 10qps) he says the bandwidth generated will by 8GB/s, which align with N=8, T=7. Where you to use the N and T values from the beginning, this would be 2.4MB/s, which is off by 3143 and one third times.

      Going back to Joe User's Greatful Dead query, it only generates ~250KB, not 800MB.

      Remember, very very few people are going to modify their TTL or open connections. This ``white paper'' grossly misstates the amount of bandwidth Gnutella generates and seems to be an anti-Gnutalla paper designed to mislead rather than an honest and fair judgment
      • You have missed the point of the paper. The N=4 and T=5 values equate to a maximum reach of 484 clients for a query. To reach "napster" sizes of 1million you need T=7 and N=8.
        That explains the different numbers.

        However, I do agree that a couple numbers seem to be plucked from mid-air, but the argument and maths seems fine :)
      • Additionally, either my eyes are deceiving me or the beer I am consuming is clouding my thought processes.

        Look at his first chart and the "N = 2" case. Why is this only being incremented by 2 each time (2, 4, 6, 8...)? Shouldn't it be multiplied by 2 each time (2, 4, 8, 16...) just as "N = 3" is multiplied by 3, "N = 4" is multiplied by 4, etc?
        • I thought this was an error originally as well but it is not. When you have 2 connetsions (N=2 globally) and a node sends a request to the first two nodes, each of those nodes has 2 connections, one to the original requesting node and one to a new node. Thus the second level nodes can only send the request to 1 new node each.
      • Not only are they less likely to modify TTL, but the need to do so for popular content is not that necesary, the more popular the content the more people will download and share the download with others. Sort of a user based way of propogating media across the internet (while Freenet does it automaticly).
    • Heaven forbid that the /. staff should pack those braincells with friends, family and life rather than the last 24 months of ~= 10 submissions a day! Man, I'm gunna start modding these 'yawn, been there done that' posts as redundant!
    • Sorry, some of us don't remember stories from 11 months ago. They posted it again? Good. Let some of the other people a chance to read it.
    • Taco's not a patent examiner.
  • by DrSkwid ( 118965 ) on Tuesday January 15, 2002 @07:39PM (#2845734) Journal
    and on the same day someone from Napster says not Pay Gnutella won't scale

    .
    • Oh come on! (Score:3, Insightful)

      by Sanity ( 1431 )
      If you are going to criticize a paper, do so on the basis of what they are claiming (there is no shortage of support for the claims he is making), not with conspiracy theories about the author's motivation.
  • by avalys ( 221114 ) on Tuesday January 15, 2002 @07:42PM (#2845752)
    Heh...somehow I read the title as "Mathematical Analysis of Guatemala", since this article has been posted before and Slashdot never posts anything twice.
  • by Khalid ( 31037 ) on Tuesday January 15, 2002 @07:43PM (#2845755) Homepage
    The problem is not that difficult, if you want Gnutella to scale, then you need to avoid the exponential explosition of the number of messages exchanged between the clients as their number grows. The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished; this makes the number of messages exchanged grows in a logarithmic way (this is an outrageous simplification of course, but gives an idea). There are many such expriments with Gnutella two with those ideas, this is what BearShare is trying to do.
    • by zerocool^ ( 112121 ) on Tuesday January 15, 2002 @08:11PM (#2845898) Homepage Journal

      The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished...

      This is exactly the point. This is the only way to properly distribute querys, as anyone who has set up a multi-homed ISP knows. It works on the same principle as BGP routing, i.e. there are routers (super-nodes, or whatever) that have a specific number (an ASN - or in P2P, the supernode address) but there are thousands of computers (casual modem users - p2p) on the internet that these routers have information about. If BGP routing worked this way, nothing would go anywhere. However, by having several nodes giving out information on who has what and how to get it, while the majority of users just download and give out their own info, not pass along info of others, things work much smoother. And with a correct implementation, everyone could have a route to everyone's file list at a minimal bandwidth useage.

    • and I hate to say this, but take an idea from the windows networking world... each machine has an election to see who is going to be the master browser (based on average connected and up times.. the clients that are up and serving the longest and with the shortest down times historically) then we have the next few building the same master browser database but sitting dormant (just listening and cacheing) until the master browser disappears, then the next highest pipes up and says "ohhh lookie me!" thus keeping a master server up (and that master server could load balance with the sub servers by just sending a "busy use 127.0.0.2 or 127.0.0.3" back to the client.

      it could be fixed, and made powerful, self scaling.
    • LimeWire currently implements a variation of this -- what we call "UltraPeers." UltraPeers establish a significantly greater horizon on the network, and there are other distributed protocols that do this in other creative ways, such as Chord out of MIT, which can be found at: http://www.pdos.lcs.mit.edu/chord/ That aside, there is significant evidence to show that a distributed network can scale far better than any centralized network. Remember that Napster had serious scaling problems as well -- you could only see the files from the hosts on whichever server you happened to be logged in to. The only solution to that problem is the brain-dead purchase of a yet faster multi-million dollar server. I would not call that scaling. As everyone else has pointed out, this discussion began and ended in the Gnutella community about a year ago.
    • by Patrick ( 530 ) on Tuesday January 15, 2002 @10:13PM (#2846363)
      The only solution is to structure the network by using "super clients" or "servants" or "super nodes", call them what you want, the later is what KaZaa and Morphus have accomplished; this makes the number of messages exchanged grows in a logarithmic way

      That's not logarithmic. If every client node connects to a "super node," and every other "super node," then what you have is a two-level tree. Growth at each level is O(sqrt{n}), not logarithmic.

      Chord [mit.edu], a p2p research project from MIT, is truly logarithmic. Go read their SIGCOMM'01 paper [mit.edu] for an explanation of how their system works.

      --Patrick

    • In addition to reducing the growth from exponential to sqrt or logarithmic or whatever, the other big advantage of supernodes in a Gnutella-like network is that you can limit supernodes to systems with fast network connections, while regular peers can be on slower network connections, which is a serious bottleneck in a network that needs to send all queries to all peers to be successsful.

      Of course, building an indexing system that scales arbitrarily is difficult, and building an indexing system that recognizes local topologies is also critical. A typical problem universities had with Napster was that if N people at the school wanted a given tune, most of them would be likely to fetch it across the school's limited outside bandwidth instead of most people fetching it from other sites on the fast LAN after the first one or two had downloaded it across the limited part. Napster was able to reduce this problem, at least at some schools, because having a centralized indexing service means that they can enforce more locality by making it easiest for people to find nearby peers. A decentralized system *may* be able to accomplish this, but it's a lot harder.

    • LimeWire [limewire.com] has attacked this problem by introducing "ultrapeers" [yahoo.com], which offload most of the bandwidth to a small subset of hosts. It works really well. Unlike FastTrack, this is an open-protocol with an open-source implementation [limewire.org] available.

      The next step is to add more sophisticated routing protocols between ultrapeers. Many of the algorithms mentioned elsewhere in this post (Chord, CAN, etc.) are contenders for that, as is LimeWire's home-grown query-routing proposal [limewire.com].

      Christopher Rohrs
      LimeWire

  • Everyone knows that Napster was basically a glorified DCC engine rip-off from IRC days of file trading. It made IRC file sharing easy for the average computer user. With the death of Napster as everyone knew it, you still see #mp3 and #mp3tunes and the like on IRC trading files person-to-person like Metallica never existed. I think that when something explodes in popularity you get too many bad people joining in ruining things for the users that are not abusers. When so many people jump on a bandwagon, you get media attention for wrong-doing and that is where the death nail is driven.


    Look at ICQ. It was fairly decent as an instant messaging client until the numbers hit one million or so and then it needed to control everything under the sun and companies could spam through it. File sharing happens through it all the time too.


    I don't care if Gnutella cannot scale to the levels that Napster saw. Smaller is better in my opinion!

    • So you'd be perfectly content to receive answers to your queries that originate from your own machine and download your own content back to yourself?

      Smaller is better, so just one user to search must be best of all! And the download rates are incredible!
      • Each request generates a GUID and clients check each request vs a recent history of GUIDs. It prevents any sort of looping.

        Having developed the first host caching application for gnutella, I can say that the author never fully understood how the network worked.

        His equations may be accurate based on how he thought requests and replied propogated through the network, but he assumed every request had a reply.

        It is true that the bandwidth overhead was large, but I rarely used more than 15KB/s during the times when there were 4000+ clients connected. He says that it might not be possible to reach all 4000 people, but in order for me to know how many users were out there, they all replied to my ping, thus searchable.

        Finally, the very nature of the network doesn't lend itself to protocol updates at all. The protocol was extreamly limited, but once it caught on, not much could be done about updating it short of starting an entirly new protocol. You couldn't just shut it down, and thats the major problem.

        Many proposals were written on how to implement a system without the gnutella limitations, and you are seeing them in many different implementations.
    • Smaller is better in my opinion!


      Unless you want obscure stuff. If i hear that XYZ indie punk band has a great album (Self, or The Proms, for example) I want to hear what they sound like before i buy it, because I don't want to order something from CDNnow or whatever and pay $20/cd and $7/shipping or whatever to get a crappy CD (the juliana theory - emotion is dead: thanks for nothin). But i do want to support indie music if it doesn't suck. So for me, it's morpheus, old-skool napster, gnutella, whatever, as long as it is big, i'll check it out.

      ~z
      • Unless you want obscure stuff. If i hear that XYZ indie punk band has a great album...I want to hear what they sound like before i buy it...


        I listen to punk music and have always enjoyed the openness of the companies that sell the music for non-fans and fans alike to listen before buying. Most indie labels have inexpensive samplers or online mp3 download segments from artists. I listen to many obscure punk bands, and almost always there was a venue to hear them before buying. Toxic shock had the Shock Report with floppy 7" recording samplers. Notes in Thrasher Magazine [thrashermagazine.com] was an excellent review resource. Flipside had samplers. Nowadays you have The Fat Club [fatwreck.com] or Punk-O-Rama [epitaph.com]. Cheap CD offerings where you get about 10 to 15 different bands showcased. Enjoy!

  • 20/20 Hindsight (Score:4, Insightful)

    by nadaou ( 535365 ) on Tuesday January 15, 2002 @07:48PM (#2845779) Homepage
    Yes, but..

    It's sort of like calculating the maximum hull speed for steam ships crossing the Atlantic Ocean and saying there is a theoretical maximum speed to intercontinental travel. Then someone comes along and invents airplanes.

    Gnutella will mutate and evolve, and will at somepoint in the future be replaced by something better when it starts to fall over.

    The demand for Ms. Spears and the Backstreet Boys is just too damn strong for things to stand still.

    I enjoyed that this post was next to the announcement that of the new-and-not-so-improved preview of Napster was out..
    • No, the complaint is not about the concept, but the gross lack of understanding exhibited in the design.

      There are well known workable epidemic algorithms suitable for P2P that have been around for a long time. They generally provide statistical guarantees of success in return for scalable use of bandwidth.

      Epidemic distributed systems should not be attempted by people who do not grok exponential growth. Planning for somebody wiser to innovate around your mess is not responsible.
    • Sorry, but when Gnutella first came out, and I looked at the protocol, I thought to myself, "Gee, this is nice, but when that graph of connections starts getting highly connected and you have all those people spitting out queries and forwarding others there is going to be a humongous sucking sound as the bandwidth is taken." No, I didn't read a paper or do the math, but anyone with a basic grounding in graph theory and computer science would see the shortcomings immediately. Yeah, it will evolve and should since I like this kinda stuff... but it wasn't exactly rocket science. :-/
    • It's sort of like calculating the maximum hull speed for steam ships crossing the Atlantic Ocean and saying there is a theoretical maximum speed to intercontinental travel. Then someone comes along and invents airplanes.

      More to the point, it's like doing that TODAY, when airplanes already exist. Nobody is currently advocating flat p2p systems like the old gnutella in favor of supernode systems like FastTrack or extended gnutella.

      Of course, this paper was written over a year ago, but it shouldn't be news to anyone now.

  • gnutella (Score:3, Interesting)

    by flynt ( 248848 ) on Tuesday January 15, 2002 @07:48PM (#2845782)
    On the topic of this program, a more current story running on msnbc.com right now is telling how it is becoming a severe security risk for users of the program. Here [msnbc.com] is the article.
    • Re:gnutella (Score:3, Insightful)

      This isn't a case of hackers getting into people's systems, it's a case of people who don't understand their own computer's directory structures sharing a bunch of files they shouldn't, unless there's something I missed in this poorly done news story. The real security risk here is not Gnutella, it's ignorance. I know the manual for Win ** is very thin and sketchy, but directories are covered in it.

      It's depressing to think that a lot of people put their computers on a network without even understanding basic concepts like this. (It's even more depressing to call tech support at an ISP and realize you understand more about the problem then they do, but now I'm rambling.)
    • This is very interesting since I got into people's hotmail boxes because the cookies were on the network.

      It was quite simple. Search Gnutella for text files containing the @ sign.

      But one quick question: Would a Linux gnutella program let me share /etc ? I don't know, and I don't want to try it. I'm guessing as long as you are running the program as a regular user you may not be able to actually read the files.

      But still: search for the @! There are plenty of cookies on gnutella for download. The funny thing though is that most users seem to be on dial-up.
      • I'm guessing as long as you are running the program as a regular user you may not be able to actually read the files.

        On my Debian box (and my RedHat partition) just about everything in /etc is world-readable except for /etc/shadow. Not that people are really gonna be interested in a copy of my /etc/init.d/apache file anyway... It's /home you really have to worry about.

  • Gnutella's spawn (Score:5, Informative)

    by PureFiction ( 10256 ) on Tuesday January 15, 2002 @07:51PM (#2845797)
    What I find most interesting are the kinds of projects that have sprung up in Gnutella's wake. Many of these started out as attempts to improve Gnutella, and have since moved on (the Gnutella Next Generation working group never really materialized into anything)

    We had napster and one extreme, gnutella at the other, and in the middle a re a number of partially centralized systems with super peers like Fast Track, such as:
    Open FT [sourceforge.net]
    JXTA Search [jxta.org]
    GNet [purdue.edu]
    NEShare [thecodefactory.org]

    and many others...

    Then there are the alternative projects that use an entirely different mechanism. For example, social discovery as implemented in:
    NeuroGrid [neurogrid.net]
    ALPINE [cubicmetercrystal.com]

    Or distributed keyword hash indexes like:
    Chord [mit.edu]
    Circle [monash.edu.au]
    GISP [jxta.org]
    JXTA Distributed Indexing [jxta.org]

    And many others as well.

    The coming year(s) will see a lot of maturity in these areas, and searching large peer networks will become ever more efficient over time. Gnutella showed us the possibilities of a fully decentralized model, and refinements of its underlying architecture can produce vastly better solutions.

    2002 will be an interesting year for peer networking applications...
    • It's worth noting that giFT/OpenFT [sourceforge.net] just entered its first stage of network testing--and with that in mind, they need as many people as possible to download and run the client so they can test the network. Complete instructions for so doing are given on the website.
  • Slashdotted? (Score:2, Informative)

    Are you feeling Slashdotted? Server got that not-so-fast feeling?

    Maybe you should be trying the Google cache [google.com]!

  • by Cheshire Cat ( 105171 ) on Tuesday January 15, 2002 @08:01PM (#2845851) Homepage
    Since numerous people above have pointed out this is a repeat, everyone should browse the older article and repost all the comments that were modded up to +5, and reap the benefits when that karma comes rollin' in! ;)
  • Choice (Score:2, Insightful)

    by Wheaty18 ( 465429 )
    There is a major flaw in all P2P software, and it has nothing to do with the coding. More people tend to want to take than recieve. I remember seeing a line graph on LimeWire's page (I think?) that showed a monthly progression of the number of people sharing files compared to the number of people downloading files. The 'downloaders' were outweighing the 'uploaders' by a HUGE ammount.

    If everyone was willing to share their files, then there would be no such problem with P2P programs.
    • Uh. Your logic is whack. People are downloading more than they are uploading. If people offer more files, it will increase the number of downloads. (More people want b-spears-naked.jpg than HAVE b-spears-naked.jpg)

      If this were not true, essentially, people would be offering files that nobody wanted, and that would just be stupid.
      • Uh. Your logic is whack. People are downloading more than they are uploading

        Logically, if one person is downloading then another person (peer to peer) is uploading.

        How do people download more than upload?
        • I believe that the point is that there are few people who make their machines available for others to download from. The number of downloads taking place will be equal to the number of uploads taking place, obviously.

          If more and more people use the server only to receive files, and do not make files they receive available to others, then in the end, the people who were making their files available to others will no longer be able to, or they will have to severely limit the bandwidth going out to those who are taking the files.

          The only way to avoid this would be to have nodes that are there simply to retrieve as many good quality files as possible and offer them up for download. But then, it's not really P2P anymore, is it?

          • Don't look at me, I always make my coupla gigs of stuff available for upload on the services I use.

            Problem is, though, the cable company caps me to a 128 kilobit per sec upstream, so there's an imbalance there that I can't do anything about.

            But I do what I can!
        • Re:Choice (Score:2, Informative)

          by Johnny00 ( 213878 )
          Thats exactly how eDonkey2000 [edonkey2000.com] works!

          While your downloading a file, it's immediately made available for upload from you. It uses resume download to download parts of the file you want from multiple sources, some of which don't have all of the file yet too.
    • Re:Choice (Score:2, Informative)

      by mshiltonj ( 220311 )
      When searching for content on the gnutella network, a lot of the results come back showing the host as 192.168.* 10.0.*, which means they are behind a firewall or otherwise not directly routable. In such a case, the user may be unable to correct it, or unaware of the issue entirely

      I was like this for about a week before I realized why I wasn't getting any uploads. I had to open up port 6346 on my home network (linksys router). Also, Napshare lets me "force local ip" to my firewall/ external ip (assigned by RoadRunner). The linksys router does port forwarding on outside requests, so only one computer on my home network can share on that port.

      This thread reminded me that RoadRunner had expired my old ip address and assigned me another and I had forgotten to update my gnutella client to reflect to new ip. So for the past few weeks or so, I had been one of the "non-sharing" people by simple oversight.

      I doubt most limewire/bearshare users know any of this stuff. When running a gnutella client from work, people couldn't do this even if knew about it and wanted to.

      There is a major flaw in all P2P software, and it has nothing to do with the coding. More people tend to want to take than recieve. I remember seeing a line graph on LimeWire's page (I think?) that showed a monthly progression of the number of people sharing files compared to thenumber of people downloading files. The 'downloaders' were outweighing the 'uploaders' by a HUGE ammount.


      If everyone was willing to share their files, then there would be no such problem with P2P programs.
  • have been impressed with Gnutella in the past, and have wondered along these same lines in the past.

    I think we could add:

    "... but since I was too busy doodling and writing dirty, hackish perl when I was in school, I'm glad someone else did the actual math."
  • by Omega ( 1602 ) on Tuesday January 15, 2002 @08:24PM (#2845952) Homepage
    This suggestions of this article are quite thought-provoking, but they also illustrate an interesting point: Napster really isn't P2P.

    In theory, a true Peer-to-Peer file transfer network would exist in a decentralized fashion where you would never have to query a central host for routing or file availability. Napster requires you to route through one of the Napster servers for information. Even introducing Napigator still doesn't alter the Napster model because all it does is allow you to route through a different central host. It seems that all Napster did was integrate a search engine and nameserving into one element (coming from only one provider).

    This isn't to knock the accomplishments of Napster, it was certainly an original idea to incorporate these areas and provide a GUI access client to boot. But it is apparent that Napster developers weren't all that revolutionary in their thinking either.

    The suggestion of true P2P is revolutionary, and the perfect implementation (should it ever arrive) will also be revolutionary. But the Napster model is no different than everyone providing their MP3 list to a website who maintains a list of links on where to download MP3s. Napster simply automated this process. Napster is no more P2P than any TCP/IP connection not operated through a proxy.

    Is http P2P? I'm talking directly to another system, and there is no moderator/mediator. Normally, I have to find out about that system from a 3rd party (e.g. a search engine) -- just like someone obtains a list of links from Napster.

    True, I'm being no better than the author of the original article; because I too am offering no solutions. I'm just holding out hope for true P2P in the future.

  • by eyefish ( 324893 ) on Tuesday January 15, 2002 @08:31PM (#2845980)
    I'm not very familiar with the deep technical details of Gnutella, but isn't there a limit on how far the "horizon" is (i.e.:how many users near by you can see)? If this is correct, all the mathematics here presented apply only in theory and not in practice, as what will happen is that (1) most queries will not be relayed past a "reasonable horizon", and (2) there exists a good (or high?) probability that as long as you're searching for "popular" files, that you will eventually find them.

    Because of this basic and simple observation, I do not foresee gnutella to die anytime soon because of scalability reasons alone (however copy-protection issues are another story).

    Again let me stress that my observation here is based on the strong assumption that the "search horizon" is "reasonable sized" so as not to have to search the whole gnutella network.
    • The limit is his 'T' factor in the paper. By default it's set to 4, but as he points out it would have to be set to around 7 if you were ever going to see the whole network and what everyone has shared if there were a million users (ala Napster). Even considering the T=4 setting the article's results still show that the original Gnutella set up is a horrible bandwidth hog.
    • The other poster nearly got it right. The horizon is a combination of T (the number of times a query is repeated before it dies) and N (the number of connections each computer has). The values for the first client were T=5 and N=4. I'm fairly sure that most clients must have raised the value for T.

      His chart called "reachable users" describes how the horizon grows as T or N change.

      I think now that there are normally over 1000 people in your horizon possibly up to 8000.

      The other thing about the article is that it was written before clients started caching replies and that changes your horizon around quite a bit.

      Quite frankly caching the replies probably helps but the Gnutella protocol is still awful.

      I'm more impressed with Morpheus as a decentralized file sharing network. There is an open source Morpheus client called "gift."

      The weird thing is that the only way to get documentation about how Morpheus works is to download the source tarball for gift and poke around in the READMEs. There is no other public documentation for it any where on the net.

      Basically it sets up tons of little mini servers that index songs for up to around 300 people. Clients have a list of these servers and query them to find files. If you want to a "horizon" of 6000 computers then you only have to make 20 or 30 queries. In Gnutella (without caching) the same horizon would be 6000 queries. No one really knows what it would be with caching and it changes depending on whether it's a popular query or not.

      Actually Gnutella in that case is much worse than just 6000 queries because many computers have no songs shared and are stilled searched where in Morpheus computers that don't share songs do not index. And another thing that makes Gnutella worse is that I think the replies are relayed multiple times instead of just once.

      I'm not a gift developer or user myself... But I would say it was a far better way to go than Gnutella.

  • by BeBoxer ( 14448 ) on Tuesday January 15, 2002 @08:47PM (#2846049)
    As I pointed out last time this was posted, this article is basically 100% FUD. Yes, the amount of traffic goes up. And no, gnutella doesn't scale very well. But the author goes out of his way to make the problem look worse than it actually is. You see, the article only computes the total amount of traffic in the entire network. A number which is both huge and meaningless. You see, by this math, if I send a packet somewhere and it takes 10 hops, well, thats like sending 10 packets!

    At the end of the paper, the author coughs up the big scary number of 63GBps of traffic in the Gnutella network when the nodes each have 8 connections and are using a TTL of 8. Wow! That's a lot of traffic. That certainly isn't scaling! Well, what the author never points out is that, by his own math, the network has 7,686,400 users at this point! When we divide up the total traffic among all of those network links, we get a different view. If you do the math you discover that this is a whopping 72Kbps! Oh no! It's the end of the world! Well, no, it's not. True, it's more than a modem can handle. But it's well within the reach of most cable modem connections. Given that your computer is being expected to handle the search requests of over 7 million other people, it's not that much traffic.

    Don't get me wrong, I agree that Gnutella doesn't scale all that well. But this paper is just plain FUD. The only number that really matters to users is the total bandwidth load on their pipe. By carefully avoiding that number, which isn't very big and scary at all, the auther is clearly lying by ommision. Given all of the real problems networks like Gnutella encounter, it isn't interesting to read this sort of drivel. Why don't we drag out Mathmatical and model how much bandwidth Napster wastes by transmitting the names of all the files being shared even though most of them will never get searched for. Hmmm. lets assume 7,000,000 users. Let's assume that they each share 1000 files with an average filename length of 32 characters. Why, that's 224 Gigabytes of data, and we haven't even done any searches yet! Cleary, Napster doesn't scale. Ugh. This guy might know how to use Mathematica, but I still suspect he worked in the Marketing department. With the same guys who will tell you about their 200Mbps fast ethernet.
    • by DarkEdgeX ( 212110 ) on Tuesday January 15, 2002 @10:37PM (#2846438) Journal
      Plus the author ignores (mostly due to the fact that they didn't exist back when it was written, this IS an old article) the innovations made with Gnutella (and other, newer competing technologies). Specifically, there are now 'search proxies' that exist on Gnutella that cache and return common queries, thus not saturating the network with redundant queries. For a modem user, this makes the network usable if they limit their connections to proxy servers, because the number of searches hitting their client directly shrinks as common queries are sifted through.

      Not to mention there's still room for improvement to the protocol itself-- there's no reason a proxy couldn't cache a list of all files shared by a connected client, then answer queries directly, NEVER sending a query directly to a client. (Ultimately, as people run proxies like this more and more, you'd end up having proxies talking directly to eachother.) The ultimate Gnutella proxy would cache commonly requested files and make them available over a bigger pipe.

      No money in it, but for the Gnutella enthusiast, I could see them running this kind of thing from work off of a QA box, for example, or from their support desk at an ISP. =)
  • by mozkill ( 58658 ) <{moc.liamg} {ta} {tjnetsua}> on Tuesday January 15, 2002 @08:50PM (#2846061) Journal
    its important to know that the author of this paper is Jordan Ritter, who is the co-founder of Napster.
  • by rmckeethen ( 130580 ) on Tuesday January 15, 2002 @09:11PM (#2846125)
    I find it disturbing that the author neglects to mention some critical, and to me anyway, obvious points. Let's talk about just two, bandwidth usage and client optimizations.

    First, if I understand what he's driving at correctly, the bandwidth numbers he gives are for the Gnotella network as a whole, not for each and every client connected to it. This is equivelent to saying "average 'HTTP' usage generates n amount of bandwidth over the Internet", or "DNS traffic will consume x number of bytes on a given network". So what? Would anyone be really shocked if 7,000,000 web browsers generated HTTP and DNS traffic in the gigabyte range? Doesn't bother me. That might be an interesting number to your ISP but as a user of Gnotella I could care less about how much total bandwidth my query for 'The Grateful Dead' takes up. It sure sounds like alot of traffic, but it's distributed over the entire Gnotella network. As long as the traffic isn't high enough to overwhelm individual clients I don't see the problem. These numbers just don't seem to be that important, or am I missing something here?

    The other item the author fails to consider (and I'm going to guess that, as one of the engineers behind Napster, he probably knows better) are client-side optimizations like search caching and differentiation of the clients. The caching arguement goes like this:

    If client A sends out a query to client C looking for 'Grateful Dead' and client B sends out a very similar request to client C , say, 'The Grateful Dead', even basic caching would prevent client C from sending this request back out to the same hosts that responded to the first request made by client A. Again, am I missing something important here? I'm not sure that caching would reduce the traffic dramatically but I'd be willing to bet that it would improve matters significantly, especially for clients that remained 'up' for long periods of time (which is in itself another important factor that seems to be missing here). This just seems so obvious.

    There are bunches of optimizations like this that can be done with the Gnotella application to reduce the overall bandwidth. And this leads to the other half of my point, i.e. the author assumes that each and every client will be functionally the same. They aren't. The Gnotella FAQ tells you to reduce your N if your on a slow connection. This means that not all Gnotella clients are exactly the same now anyway; some have higher N's than others. The FastTrack guys (i.e. KaZaA, Morpheous, et. al.) have already shown that it makes sence from an efficency standpoint to have some clients do more then others via 'supernodes' and the like. This seems like a fairly obvious development on the client side and I can't for the life of me understand why this isn't addressed. I mean, really, isn't the 'client-client' vs. 'client-server' approach really the underlying assumption behind why Napster will scale and Gnotella won't?

    I hate to say it but it looks to me like the author is showing just a little bias here. Hey, I suppose that if I worked on a competing standard I'd trash-talk the competition too but I think his time would be better spent making the Napster approach work better. No matter how you slice it or dice it Napster is pretty much dead while the Gnotella network is still alive and kicking. Maybe it will never scale to 'billions and billions' of hosts but at least it's still around and going strong.

  • How would a P2P with the scaling the likes of which IRC networks use?

    Since I believe IRC scales pretty good why not build the Gnutella network like that?
  • by codemachine ( 245871 ) on Tuesday January 15, 2002 @09:28PM (#2846192)
    There were several responces to this article pointing out that the current Gnutella network is much more scalable than the one discussed in the article. Try looking here [openp2p.com] and here [limewire.com] for articles discussing the changes since early 2000.

    Come on Slashdot, its 2002 not 2000. It looks pretty bad accepting this article right after the Napster one. Does Slashdot or VA own a stake in Napster or something?
  • Here's a really wacked-out thought I had that I've been working on.

    Gnutella clients can sometimes have more "potential" connections out to the network than MAX_CONNECT (because they open, say five, expecting two and get four). If so, why not do a traceroute to each of the hosts and crop out the one that is the most hops away? Iterate cropping until there are MAX_CONNECT active connections.

    This would tend to favor a network that closely reflected the underlying structure of the network - thus reducing any earth-shattering impact on the inet backbone?

    To further force a short-inet-distance perhaps clients should (optionally) not accept connections from far-flung hosts?

    Additionally, clients should count already-seen packets (which they are supposed to drop) against the goodness of a given link - thus reducing routing loops in the network and forcing it to flatten out instead of clump together.

    These might allow clients to have a higher TTL without increasing net net (har har) bandwidth - less duplicated, circularly-routed, lengthy-path, etc, data.

    I suspect (have not checked) that some clients do the latter (routing loop prevention), but I know of none doing the formers.

    I will get around to coding this soon, unless somebody can tell me it's a stupid idea (for a good reason).

    --Nathan
    • Traceroute is very expensive in terms of time, requiring a lot of packets to be sent out, and waiting on replies and on Unix based OS's, requires the program to have root privligages. It also doesn't really solve the problem. A server 7 hops from you can be on a congested link, or running on a really slow server, while a server 9 hops from you can be on a fast server with lots of bandwidth.
  • If what he says is true, that you could generate 14 megs' worth of responses, what's to stop me from forging my IP address to be YOUR IP address, querying for the string mp3, and sitting back and watching the carnage? There would be almost no way to trace this, and it would certainly generate a significant amount of traffic, so what's to stop me? Maybe his statistics are a bit inaccurate, but all the same, you could cause a lot of data to be sent somewhere, while not causing yourself any significant lag at all.
    • read the protocol spec, and you would understand why you can't do this. You don't reply directly to a request. You send it back through your connections and the clients you are connected to only accept replies with the correct information.

      If you had 8 connections and a request comes in from 1 of them, only that 1 connection would accept a reply with the request's guid. The IP information is taken directly from your connection.
  • by Sanity ( 1431 ) on Tuesday January 15, 2002 @10:43PM (#2846454) Homepage Journal
    The scalability issues with Gnutella are clear to anyone who understands how it works. From day one, Freenet [freenetproject.org] was designed with scalability as a core goal. In Freenet, the number of nodes involved, and the time required to retrieve a piece of information, scales logarithmically as the size of the network increases.

    A good analogy might be a detective trying to find a suspect for a crime. The Gnutella approach is akin to going on TV and asking everyone in the area to let you know if they know who did it. It may work once, but the more you do it, the less effective it is. Freenet works as detectives do normally, they gradually home in on their suspect by gathering information, and using that information to refine their search.

    Some say that Freenet only achieves this scalability because it doesn't do the type of "fuzzy" search Gnutella does. You need to know exactly what you are looking for in Freenet to find it. This isn't true, the Freenet searching algorithm can be generalised to allow fuzzy searching. While this has not yet been demonstrated in practice, it is definitely possible in theory.

    It always amazes me that people continue to lament flaws in many current P2P architectures when Freenet has incorporated solutions to those problems almost from its inception.

    Disclaimer: I am Freenet's architect and project coordinator, so you could be forgiven for thinking I am biased, but you are free to review our papers and research to decide for yourself.

    • So... How come after... 2? years Freenet hasn't become a standard or even a well known in the file-sharing world? I'm not trolling, I'm curious. Napster has come and gone, gnutella has come and gone, Now we have fasttrack... Meanwhile, the freenet site just chugs along...
      • Simple - because it isn't ready for public consumption yet. 2 years isn't long for a project like Freenet - look at how long it took Linux to reach wide acceptance, in many ways Freenet is a more complex project since, unlike Linux, it isn't just a reimplementation of code that is already out there, it is a completely new concept.

        Secondly, Freenet isn't really a file-sharing app, despite receiving much inaccurate publicity as "the next Napster". It isn't well adapted to sharing mp3s, nor should it be given its goals.

        We will be releasing 0.5 soon, it will be a huge improvement.

    • Freenet is technically superior and very cool. The last time I tried it though it had a cgi web page based UI. That and nothing worked. It was a while ago, but I had and still have a lot of hope for freenet, but it just does not need to be that complicated. The idea of dedicating space that is separate from the actual files is a cool idea and opens a lot of doors but most people will just see it as wasted space. If I want to share 1000 oggs I am not going to want to dedicate a duplicate 5 gigs just to share them on Freenet, and that made me cringe because it is a weekness to getting content out there. I am off right now to check if changes have been made, like I said those were the problems I saw a long time ago.
    • It always amazes me that people continue to lament flaws in many current P2P architectures when Freenet has incorporated solutions to those problems almost from its inception.

      There is one reason, and one reason only, why this occurs: There is no Freenetster. No P2P file-sharing app that allows you to easily search for and download music/movies/etc. As soon as there is one, Freenet will explode (assuming it really is as scaleable and such as it is made out to be). You want Freenet to be popular? There's only one thing you have to do...

  • Seriously. The latest version (2.1) seems to have solved quite a few of the problems outlines in the 'study'. Anyone who is doubting the scalability of the protocol should give it a try. [limewire.com]
  • Transparent Proxy (Score:2, Interesting)

    by Uncle Dazza ( 51170 )
    Has anyone considered that a transparent proxy might be the solution, or at least a partial solution?

    The internet is more of a tree than a net, at least for the smaller ISP's. So a site can run a transparent proxy that aggregates all it's gnutella clients, and only maintain a few outbound connections for the entire site, as opposed to a few per client. In addition, incoming gnutella connections are intercepted and directed at the proxy (which is essentially another gnutella node).

    This allows ISP's to limit the number of gnutella connections to the rest of the world. In fact, it would be best for them to connect only to other ISP's using a proxy as well.

    This would tend to greatly improve query response time for nodes that are close by, but on the other hand would make it harder to create connections to remote nodes, because that control has been moved from the client to the proxy.

    But an office or an net cafe or school could run the proxy and have a single link between it and the ISP's proxy, instantly connecting the site with all the ISP's users and cutting bandwidth considerably.

    Proxy's can do other things to accelerate searches. If a request for "Grateful Dead" has been forwarded, then there is no need to forward the same query string in the immediate future (say 1 minute). And of course the is the option of caching the file transfers themselves...
  • From the article:

    From above, a whopping 1.2 gigabytes of aggregate data could potentially cross everyone's networks, just to relay an 18 byte search query. This is of course where Gnutella suffers greatly from being fully distributed.

    Actually, I think the RIAA suffers more, since there's no one to sue.

To do nothing is to be nothing.

Working...