Fourth generation peer-to-peer file sharing: my next project

Final Update

I have canceled the copyright-infringement-notice.com domain name and archived the text elsewhere on this blog. All of this content was written in 2012 and hasn’t been updated in years. I am keeping the post you’re currently reading for historical and entertainment purposes. If you follow any outdated advice or information given below, you do so entirely at your own risk. I am not a lawyer and only a fool would take anything I write as legal advice.


(WARNING: I make no promises here; my P2P software is vaporware until I get the details worked out. I don’t want anyone thinking there’s something coming until there actually IS something coming.)

First Generation: In the beginning, there was Napster. Napster was the first user-friendly MP3 sharing program. Sure, songs and media were shared via IRC and FTP sites before Napster, but Napster made it extremely simple and easy to share music with other people. The biggest problem with Napster was that the Napster servers ran everything: they maintained a master index of files and a list of users sharing those files, and connected users together to perform the actual transfer. When record labels got angry, they could easily point to Napster’s centralized catalog and say “there’s no reason you can’t block our songs from being downloaded, because you control the entire process!”

Second Generation: Ahh, yes…Morpheus, Grokster, LimeWire, and the infamous Kazaa. These networks dropped the central index by running searches directly from one computer to multiple other computers. In theory, this removed centralization and made it difficult to shut down the networks. Unfortunately, there was still centralization involved: someone had to tell the computers what other computers were on the network in the first place. The indexing of files was gone, but the network still largely relied on a parent company’s servers to operate. Some of this stuff is still around today with alternative servers being used, but they’re mostly defunct due to the third generation. Well, that and the fact that at least some of these networks had gaping security holes that were easily exploited to render them useless. It was easy as pie to flood the FastTrack network that powered Kazaa and Morpheus with corrupt data.

Third Generation:  Simply put, BitTorrent and eMule. These systems are hybrids; they operate both from servers (in BitTorrent they’re called trackers) as well as with a fully decentralized second network known as DHT (distributed hash tables, NOT dihydrotestosterone, for you chemistry nuts.) Multiple servers are available and there is much less centralization involved, plus DHT doesn’t go through “servers” at all: computers find each other through other computers, in what is known as the DHT “overlay network.” BitTorrent trackers exist which are completely open and that may be freely tacked onto existing torrents to prevent one tracker’s failure from killing the torrent.

However, one thing hasn’t changed since Napster: computers still communicate with each other directly, immediately revealing the IP address of the uploader and downloader to each other. Furthermore, the way that these networks’ servers operate means that hostile parties such as the RIAA, MPAA, porn production companies, etc. can simply connect to a server, request a list of peers for a supposedly infringing file of interest, and the server hands them a big batch of IP addresses that have that file. Even if the servers didn’t make it so easy, it’s trivial to extend a little more effort and scan the DHT networks for peers with that file, so elimination of the servers wouldn’t fix the issue. This is how content owners gather lists of IP addresses to threaten and sometimes drag into court.

Generation 3.5: MUTE file sharing. The reason I’ve labeled this as “generation 3.5” is because it didn’t quite catch enough momentum to grow, and because it still suffers from many security issues that have plagued P2P sharing since the beginning. My solution to the IP address revelation problem is more complicated than MUTE’s, but the essential idea is the same: pass data to peers who then pass them along to their peers, with the origination IP address not included. MUTE had the breakthrough idea for largely killing the IP address problem, but it seems that all effort went into the design of the routing scheme and algorithm, while tackling other logistical flaws was put on the back burner.

The most serious of these are the various forms of poisoning: index poisoning, where bogus index results come back, sometimes in huge enough quantities to make locating the intended data extremely difficult and frustrating; and file poisoning, where the “bogus” index results return real files that do not have the content expected. In the days of the FastTrack network, this became very common, with the worst example being MP3 files containing the first 20 seconds of a song looped repeatedly and cut off at the same track length as the original song, meaning that a cursory listen to the beginning of the MP3 to verify its content would “pass the test” while the MP3 would not actually be what was desired.

More Gen3-esque Software: Perfect Dark and Freenet. These programs have routing constructs similar to MUTE, and combine encrypted caches on the hard drives of users of the network as their “storage.” The only way to retrieve a file is to request it by its “key.” These networks add deniability to the storage of the data, since there’s no way for the user to know what’s in the encrypted data store. Unfortunately, these programs also suffer some issues; Freenet is designed to work like the Web rather than to share large files, and tends to be fairly slow and/or unreliable for that purpose (unpopular content in particular will slow down and eventually just vanish). Perfect Dark uses DHT, so it is no more secure for uploaders and downloaders than any other DHT implementation. Some users of Perfect Dark have been arrested in Japan for uploading popular television series, proving that anonymity is not protected by Perfect Dark in any meaningful way.

The next generation of file sharing programs has to fix the IP address issue completely, while also combating other major security problems (like poisoning, denial-of-service attacks) that have gone insufficiently addressed in previous peer-to-peer file sharing programs.

Don’t get too excited, but here’s where I am going with this: I am hesitant to announce vaporware, but given the amount of interest in my posts regarding copyright infringement notices and my own casual interest in the chilling effects of copyright trolling on free exchange of information and ideas, I have been working out the details of a fourth generation file sharing protocol that solves almost all of the issues surrounding file sharing’s general lack of anonymity and ease of censorship through lawsuits and settlement demands/threats.

I thought about how to fix the problems with torrents and DHT systems such as Kademlia. The solutions that came to mind seemed obvious, the practical applications that I began to come up with were full of glaring holes. When I solved the problem of tracking down an uploader or downloader by IP address, which is the obvious problem with all current systems, as the lawsuits and settlement demands clearly show, I thought I was a genius and wondered why no one else came up with the same solution…until I found programs like MUTE which work in a similar fashion. I thought about the problem in more depth, and realized that my perfect little system for losing the traceability of the IP addresses was merely the tip of the iceberg. DoS attacks, index and file poisoning, hash collisions, plausible deniability, man-in-the-middle attacks, and “Sybil attacks” are just a portion of the problems that have to be solved, and I think I’ve answered most (if not all) of these issues.

At some point, I’ll need help testing and implementing this, taking it cross-platform, and getting the word out about it once it’s confirmed to work as expected and stress tested in the real world. For now, I’m writing this to let my readers and the Internet at large know that the problem is being worked on. I look forward to the day that copyright trolls are, in a technical sense, neutered.

Here’s to my ideal P2P file sharing vaporware. When it’s more than an idea on paper, I’ll make a new post and link to it here. Stay tuned, everyone; this will be interesting.

9 thoughts on “Fourth generation peer-to-peer file sharing: my next project

  1. Good luck with your software. I too would like to see these copyright infringement lawsuits go away, either through new technology (e.g., your idea turning into usable software), or by convincing copyright trolls that it is cheaper for them to police their own copyrights (via DMCA takedown notices) rather than suing individuals. Please let me know when you get anywhere. I’d be happy to publicize your work, and I have connections with many in the bittorrent world who would also be happy to spread the word.

    1. Thanks for the support! It’s proving to be significantly more difficult than I ever expected to engineer this stuff, because the number of possible attack vectors are staggering. When I’ve got something to show for all my efforts, I’ll definitely be broadcasting it to the world!

      1. Remember, you do not need to solve ALL the world’s problems with your software — as long as you address some key issues as you discuss in your article, it is worth it to create an executable version and publish the software (e.g., call it a Beta or Alpha version if you must). Then, as you tackle more problems, release updates.

        Sorry if this is obvious to you — you are certainly light years beyond me as far as programming and technology go. I just know many programmers to be perfectionists, and as a result, they never release their great idea [which ends up going to the grave with them], and I hate to see a good idea buried.

        Good luck to you, and kick butt!

        -Rob

        1. Unfortunately, part of the problem is that I’m primarily skilled in things like shell scripting, and not so much in C/C++ coding. I don’t intend to solve 100% of the problems (that can be part of the software evolution) but there are a lot of parameters involved that have to be considered throughout the entire design to keep it mostly functional while preserving anonymity AND the integrity and functionality of the network at large.

          For example, one of the fundamental problems with “flood searches” is that there must be a TTL on each search request or it’ll cause self-reinforcing infinite loops throughout the peer swarm; however, any time-to-live number could theoretically make it possible to decipher the node distance to the originator of an upload or download, potentially destroying the plausible deniability and anonymity of the system…and that’s just one aspect of the functionality as a whole.

          To be honest, if my programming skills don’t seem up to par, I’ll write up a specification for the protocol and operation of a node program, and let someone better at coding than myself take on the task. I’m currently working on the details of that anyway, so it can’t possibly hurt to formalize it all.

          1. Let me know if I can be of assistance. Many of my readers are the techy type, and I have come across many programmers over the past 2+ years of doing these kinds of lawsuits. Obviously no promises as far as results go, but I would be happy to Tweet or blog out what you have (or get you in touch with the people where you can write yourself), and then someone can pick up where you have left off. You can contact me offline when you’re ready to do this.

Leave a Reply

Your email address will not be published. Required fields are marked *