source: router/doc/techintro.html @ 113fbc1d

Last change on this file since 113fbc1d was 113fbc1d, checked in by zzz <zzz@…>, 16 years ago

2006-02-15 jrandom

  • Merged in the i2p_0_6_1_10_PRE branch to the trunk, so CVS HEAD is no longer backwards compatible (and should not be used until 0.6.1.1 is out)
  • Property mode set to 100644
File size: 54.2 KB
Line 
1<html>
2<head>
3 <title>Introducing I2P - a scalable framework for anonymous communication</title>
4<style>
5p { font-size: 10; text-align: left; font-family: sans-serif }
6h1 { font-size: 12; font-family: sans-serif }
7h2 { font-size: 10; font-family: sans-serif }
8h3 { font-size: 10; font-family: sans-serif }
9blockquote { font-size: 10; font-family: monospace, sans-serif }
10pre { font-size: 10; font-family: sans-serif }
11.title { font-size: 14; font-family: sans-serif }
12.subtitle { font-size: 12; font-family: sans-serif }
13</style>
14</head>
15<body>
16
17<center>
18<b class="title">Introducing I2P</b><br />
19<span class="subtitle">a scalable framework for anonymous communication</span><br />
20<i style="font-size: 8">$Id: techintro.html,v 1.8.2.1 2006/02/13 07:13:35 jrandom Exp $</i>
21<br />
22<br />
23
24<table border="0" width="50%">
25<tr><td valign="top" align="left">
26<pre>
27* <a href="#intro">Introduction</a>
28* <a href="#op">Operation</a>
29  * <a href="#op.overview">Overview</a>
30  * <a href="#op.tunnels">Tunnels</a>
31  * <a href="#op.netdb">Network Database</a>
32  * <a href="#op.transport">Transport protocols</a>
33  * <a href="#op.crypto">Cryptography</a>
34</pre>
35</td>
36<td valign="top" align="left">
37<pre>
38* <a href="#future">Future</a>
39  * <a href="#future.restricted">Restricted routes</a>
40  * <a href="#future.variablelatency">Variable latency</a>
41  * <a href="#future.open">Open questions</a>
42</pre>
43</td>
44<td valign="top" align="left">
45<pre>
46* <a href="#similar">Similar systems</a>
47  * <a href="#similar.tor">Tor</a>
48  * <a href="#similar.freenet">Freenet</a>
49* <a href="#app">Appendix A: Application layer</a>
50</pre>
51</td>
52</tr></table>
53</center>
54
55<hr />
56
57<h1 id="intro">Introduction</h1>
58<p>
59I2P is a scalable, self organizing, resilient packet switched anonymous network layer,
60upon which any number of different anonymity or security conscious applications
61can operate.  Each of these applications may make their own anonymity, latency, and
62throughput tradeoffs without worrying about the proper implementation of a free
63route mixnet, allowing them to blend their activity with the larger anonymity set of
64users already running on top of I2P.  Applications available already provide the full
65range of typical Internet activities - anonymous web browsing, anonymous web hosting,
66anonymous blogging and content syndication (with <a href="#app.syndie">Syndie</a>),
67anonymous chat (via IRC or Jabber), anonymous swarming file transfers (with <a
68href="#app.i2pbt">i2p-bt</a>, <a href="#app.i2psnark">I2PSnark</a>, and
69<a href="#app.azneti2p">Azureus</a>), anonymous file sharing (with
70<a href="#app.i2phex">I2Phex</a>), anonymous email (with <a href="#app.i2pmail">I2Pmail</a>
71and <a href="#app.i2pmail">susimail</a>), anonymous newsgroups, as well as several
72other applications under development.  Unlike web sites hosted within content
73distribution networks like <a href="#similar.freenet">Freenet</a> or
74<a href="http://www.ovmj.org/GNUnet/">GNUnet</a>, the services hosted on I2P are fully
75interactive - there are traditional web-style search engines, bulletin boards, blogs
76you can comment on, database driven sites, and bridges to query static systems like
77Freenet without needing to install it locally.
78</p>
79
80<p>
81With all of these anonymity enabled applications, I2P takes on the role of the message
82oriented middleware - applications say that they want to send some data to a cryptographic
83identifier (a "destination") and I2P takes care of making sure it gets there securely
84and anonymously.  I2P also bundles a simple <a href="#app.streaming">streaming</a> library
85to allow I2P's anonymous best-effort messages to transfer as reliable, in-order streams,
86transparently offering a TCP based congestion control algorithm tuned for the high
87bandwidth delay product of the network.  While there have been several simple SOCKS
88proxies available to tie existing applications into the network, their value has been
89limited as nearly every application routinely exposes what, in an anonymous context,
90is sensitive information.  The only safe way to go is to fully audit an application to
91ensure proper operation, and to assist in that we provide a series of APIs in various
92languages which can be used to make the most out of the network.
93</p>
94
95<!-- commented out because "The details [...] are " *NOT* " given later" -->
96<!--
97<p>
98The scope of I2P's anonymity protections varies upon the applications running on
99top of them, as well as the choices that each user makes.  The aim is to provide
100the options necessary so that a sufficient level of anonymity can be achieved while
101exposing the functionality that people facing up to state level adversaries require.
102At the same time, those facing less powerful adversaries are able to improve their
103throughput and latency while reducing the resources required to provide the necessary
104level of cover.  The details of the techniques available for facing adversaries who
105are internal or external, passive or active, local, national, or global, are given
106later.
107</p>
108-->
109
110<p>
111I2P is not a research project - academic, commercial, or governmental, but is instead
112an engineering effort aimed at doing whatever is necessary to provide a sufficient
113level of anonymity to those who need it.  It has been in active development since
114early 2003 with one full time developer and a dedicated group of part time contributors
115from all over the world.  All of the work done on I2P is open source and
116freely available on the <a href="http://www.i2p.net/">website</a>, with the majority
117of the code released outright into the public domain, though making use of a few
118cryptographic routines under BSD-style licenses.  The people working on I2P do not
119control what people release client applications under, and there are several GPL'ed
120applications available (<a href="#app.i2ptunnel">I2PTunnel</a>,
121<a href="#app.i2pmail">susimail</a>, <a href="#app.i2psnark">I2PSnark</a>, <a href="#app.azneti2p">Azureus</a>,
122<a href="#app.i2phex">I2Phex</a>).  <a href="http://www.i2p.net/halloffame">Funding</a>
123for I2P comes entirely from donations, and does not receive any tax breaks in any
124jurisdiction at this time, as many of the developers are themselves anonymous.
125</p>
126
127<h1 id="op">Operation</h1>
128<h2 id="op.overview">Overview</h2>
129
130<p>
131To understand I2P's operation, it is essential to understand a few key concepts.
132First, I2P makes a strict separation between the software participating
133in the network (a "router") and the anonymous endpoints ("destinations") associated
134with individual applications.  The fact that someone is running I2P is not usually
135a secret.  What is hidden is information on what the user is doing, if anything at
136all, as well as what router a particular destination is connected to.  End users
137will typically have several local destinations on their router - for instance, one
138proxying in to IRC servers, another supporting the user's anonymous webserver ("eepsite"),
139another for an I2Phex instance, another for torrents, etc.
140</p>
141
142<p>
143Another critical concept to understand is the "tunnel" - a directed path through
144an explicitly selected set of routers, making use of layered encryption so that
145the messages sent in the tunnel's "gateway" appear entirely random at each hop
146along the path until it reaches the tunnel's "endpoint".  These unidirectional
147tunnels can be seen as either "inbound" tunnels or "outbound" tunnels, referring
148to whether they are bringing messages to the tunnel's creator or away from them,
149respectively.  The gateway of an inbound tunnel can receive messages from any
150peer and will forward them down through the tunnel until it reaches the (anonymous)
151endpoint (the creator).  On the other hand, the gateway of an outbound tunnel is
152the tunnel's creator, and messages sent through that tunnel are encoded so that
153when they reach the outbound tunnel's endpoint, that router has the instructions
154necessary to forward the message on to the appropriate location.
155</p>
156
157<p>
158A third critical concept to understand is I2P's "network database" (or "netDb")
159- a pair of algorithms used to share network metadata.  The two types of metadata
160carried are "routerInfo" and "leaseSets" - the routerInfo gives routers the data
161necessary for contacting a particular router (their public keys, transport
162addresses, etc), while the leaseSet gives routers the information necessary for
163contacting a particular destination.  Within each leaseSet, there are any number
164of "leases", each of which specifies the gateway for one of that destination's
165inbound tunnels as well as when that tunnel will expire.  The leaseSet also
166contains a pair of public keys which can be used for layered garlic encryption.
167</p>
168
169<!--
170<p>
171I2P's operation can be understood by putting those three concepts together:
172</p>
173
174<p><img src="net.png"></p>
175!-->
176
177<p>
178When Alice wants to send a message to Bob, she first does a lookup in the
179netDb to find Bob's leaseSet, giving her his current inbound tunnel gateways.
180She then picks one of her outbound tunnels and sends the message
181down it with instructions for the outbound tunnel's endpoint to forward the
182message on to one of Bob's inbound tunnel gateways.  When the outbound
183tunnel endpoint receives those instructions, it forwards the message as
184requested, and when Bob's inbound tunnel gateway receives it, it is
185forwarded down the tunnel to Bob's router.  If Alice wants Bob to be able
186to reply to the message, she needs to transmit her own destination explicitly
187as part of the message itself (taken care of transparently in the
188<a href="#app.streaming">streaming</a> library).  Alice may also cut down on
189the response time by bundling her most recent leaseSet with the message so
190that Bob doesn't need to do a netDb lookup for it when he wants to reply, but this
191is optional.
192</p>
193
194<p>
195While the tunnels themselves have layered encryption to prevent unauthorized
196disclosure to peers inside the network (as the transport layer itself does to
197prevent unauthorized disclosure to peers outside the network), it is necessary
198to add an additional end to end layer of encryption to hide the message from the
199outbound tunnel endpoint and the inbound tunnel gateway.  This
200"<a href="#op.garlic">garlic encryption</a>" lets Alice's router wrap up multiple
201messages into a single "garlic message", encrypted to a particular public key
202so that intermediary peers cannot determine either how many messages are within
203the garlic, what those messages say, or where those individual cloves are
204destined.  For typical end to end communication between Alice and Bob, the
205garlic will be encrypted to the public key published in Bob's leaseSet,
206allowing the message to be encrypted without giving out the public key to Bob's
207own router.
208</p>
209
210<p>
211Another important fact to keep in mind is that I2P is entirely message based
212and that some messages may be lost along the way.  Applications using I2P
213can use the message oriented interfaces and take care of their own congestion
214control and reliability needs, but most would be best served by reusing the
215provided <a href="#app.streaming">streaming</a> library to view I2P as a streams
216based network.
217</p>
218
219<h2 id="op.tunnels">Tunnels</h2>
220
221<p>
222Both inbound and outbound tunnels work along similar principles - the tunnel
223gateway accumulates a number of tunnel messages, eventually preprocessing them
224into something for tunnel delivery.  Next, the gateway encrypts that preprocessed
225data and forwards it to the first hop.  That peer and subsequent tunnel
226participants add on a layer of encryption after verifying that it isn't a
227duplicate before forward it on to the next peer. Eventually, the
228message arrives at the endpoint where the messages are split out again and
229forwarded on as requested.  The difference arises in what
230the tunnel's creator does - for inbound tunnels, the creator is the endpoint
231and they simply decrypt all of the layers added, while for outbound tunnels,
232the creator is the gateway and they pre-decrypt all of the layers so that after
233all of the layers of per-hop encryption are added, the message arrives in the
234clear at the tunnel endpoint.
235</p>
236
237<p>
238The choice of specific peers to pass on messages as well as their particular
239ordering is important to understanding both I2P's anonymity and performance
240characteristics.  While the network database (below) has its own criteria for
241picking what peers to query and store entries on, tunnels may use any peers in
242the network in any order (and even any number of times) in a single tunnel.  If
243perfect latency and capacity data were globally known, selection and ordering
244would be driven by the particular needs of the client in tandem with their threat
245model.  Unfortunately, latency and capacity data is not trivial to gather
246anonymously, and depending upon untrusted peers to provide this information has
247its own serious anonymity implications.
248</p>
249
250<p>
251From an anonymity perspective, the simplest technique would be to pick peers
252randomly from the entire network, order them randomly, and use those peers
253in that order for all eternity.  From a performance perspective, the simplest
254technique would be to pick the fastest peers with the necessary spare capacity,
255spreading the load across different peers to handle transparent failover, and
256to rebuild the tunnel whenever capacity information changes.  While the former
257is both brittle and inefficient, the later requires inaccessible information
258and offers insufficient anonymity.  I2P is instead working on offering a range
259of peer selection strategies, coupled with anonymity aware measurement code to
260organize the peers by their profiles.
261</p>
262
263<p>
264As a base, I2P is constantly profiling the peers with which it interacts with
265by measuring their indirect behavior - for instance, when a peer responds to
266a netDb lookup in 1.3 seconds, that round trip latency is recorded in the
267profiles for all of the routers involved in the two tunnels (inbound and
268outbound) through which the request and response passed, as well as the queried
269peer's profile.  Direct measurement, such as transport layer latency or
270congestion, is not used as part of the profile, as it can be manipulated and
271associated with the measuring router, exposing them to trivial attacks.  While
272gathering these profiles, a series of calculations are run on each to summarize
273its performance - its latency, capacity to handle lots of activity, whether they
274are currently overloaded, and how well integrated into the network they seem to
275be.  These calculations are then compared for active peers to organize the routers
276into four tiers - fast and high capacity, high capacity, not failing, and failing.
277The thresholds for those tiers are determined dynamically, and while they
278currently use fairly simple algorithms, alternatives exist.
279</p>
280
281<p>
282Using this profile data, the simplest reasonable peer selection strategy is to
283pick peers randomly from the top tier (fast and high capacity), and this is
284currently deployed for client tunnels.  Exploratory tunnels (used for netDb
285and tunnel management) pick peers randomly from the not failing tier (which
286includes routers in 'better' tiers as well), allowing the peer to sample
287routers more widely, in effect optimizing the peer selection through randomized
288hill climbing.  These strategies alone do however leak information regarding the
289peers in the router's tip tier through predecessor and netDb harvesting attacks. 
290In turn, several alternatives exist which, while not balancing the load as evenly,
291will address the attacks mounted by particular classes of adversaries.
292</p>
293
294<p>
295By picking a random key and ordering the peers according to their XOR distance
296from it, the information leaked is reduced in predecessor and harvesting attacks
297according to the peers' failure rate and the tier's churn.  Another simple strategy
298for dealing with netDb harvesting attacks is to simply fix the inbound tunnel
299gateway(s) yet randomize the peers further on in the tunnels.  To deal with
300predecessor attacks for adversaries which the client contacts, the outbound tunnel
301endpoints would also remain fixed.  The selection of which peer to fix on the most
302exposed point would of course need to have a limit to the duration, as all peers
303fail eventually, so it could either be reactively adjusted or proactively avoided
304to mimic a measured mean time between failures of other routers.  These two strategies
305can in turn be combined, using a fixed exposed peer and an XOR based ordering within
306the tunnels themselves.  A more rigid strategy would fix the exact peers and ordering
307of a potential tunnel, only using individual peers if all of them agree to participate
308in the same way each time.  This varies from the XOR based ordering in that the
309predecessor and successor of each peer is always the same, while the XOR only makes
310sure their order doesn't change.
311</p>
312
313<p>
314As mentioned before, I2P currently (release 0.6.1.1) includes the tiered random
315strategy above, but the others are planned for the 0.6.2 release.  A more detailed
316discussion of the mechanics involved in tunnel operation, management, and peer
317selection can be found in the
318<a href="http://dev.i2p.net/cgi-bin/cvsweb.cgi/i2p/router/doc/tunnel-alt.html?rev=HEAD">tunnel spec</a>.
319</p>
320
321<h2 id="op.netdb">Network Database</h2>
322
323<p>
324As mentioned earlier, I2P's netDb works to share the network's metadata.  Two
325algorithms are used to accomplish this - primarily, a small set of routers are
326designated as "floodfill peers", while the rest of the routers participate in
327the <a href="http://en.wikipedia.org/wiki/Kademlia">Kademlia </a> derived
328distributed hash table for redundancy.  To integrate the two algorithms, each
329router always uses the Kademlia style store and fetch, but acts as if the
330floodfill peers are 'closest' to the key in question.  Additionally, when a
331peer publishes a key into the netDb, after a brief delay they query another
332random floodfill peer, asking them for the key, and if that peer does not have
333it, they move on and republish the key again.  Behind the scenes, when one of
334the floodfill peers receives a new valid key, they republish it to the other
335floodfill peers who then cache it locally.
336</p>
337
338<p>
339Each piece of data in the netDb is self authenticating - signed by the
340appropriate party and verified by anyone who uses or stores it.  In addition,
341the data has liveliness information within it, allowing irrelevant entries to be
342dropped, newer entries to replace older ones, and, for the paranoid, protection
343against certain classes of attack.  This is also why I2P bundles the necessary
344code for maintaining the correct time, occasionally querying some SNTP servers
345(the <a href="http://www.pool.ntp.org/">pool.ntp.org</a> round robin by default)
346and detecting skew between routers at the transport layer.
347</p>
348
349<p>
350The routerInfo structure itself contains all of the information that one router
351needs to know to securely send messages to another router.  This includes their
352identity (made up of a 2048bit ElGamal public key, a 1024bit DSA public key, and
353a certificate), the transport addresses which they can be reached on, such as
354an IP address and port, when the structure was published, and a set of arbitrary
355uninterpreted text options.  In addition, there is a signature against all of
356that data as generated by the included DSA public key.  The key for this routerInfo
357structure in the netDb is the SHA256 hash of the router's identity.  The options
358published are often filled with information helpful in debugging I2P's operation,
359but when I2P reaches the 1.0 release, the options will be disabled and kept blank.
360</p>
361
362<p>
363The leaseSet structure is similar, in that it includes the I2P destination
364(comprised of a 2048bit ElGamal public key, a 1024bit DSA public key, and a
365certificate), a list of "leases", and a pair of public keys for garlic encrypting
366messages to the destination.  Each of the leases specify one of the destination's
367inbound tunnel gateways by including the SHA256 of the gateway's identity, a 4
368byte tunnel id on that gateway, and when that tunnel will expire.  The key for
369the leaseSet in the netDb is the SHA256 of the destination itself.
370</p>
371
372<p>
373As the router currently automatically bundles the leaseSet for the sender inside
374a garlic message to the recipient, the leaseSet for destinations which will not
375receive unsolicited messages do not need to be published in the netDb at all.  If
376the destination itself is sensitive, the leaseSet could instead be transmitted
377through other means without ever going into the netDb.
378</p>
379
380<p>
381Bootstrapping the netDb itself is simple - once a router has at least one routerInfo
382of a reachable peer, they query that router for references to other routers in the
383network with the Kademlia healing algorithm.  Each routerInfo reference is stored in
384an individual file in the router's netDb subdirectory, allowing people to easily
385share their references to bootstrap new users.
386</p>
387
388<p>
389Unlike traditional DHTs, the very act of conducting a search distributes the data
390as well, since rather passing Kademlia's standard IP+port pairs, references are given
391to the routers that the peer should query next (namely, the SHA256 of those routers'
392identities).  As such, iteratively searching for a particular destination's leaseSet
393or router's routerInfo will also provide you with the routerInfo of the peers along
394the way.  In addition, due to the time sensitivity of the data published, the information
395doesn't often need to migrate between peers - since a tunnel is only valid for 10
396minutes, the leaseSet can be dropped after that time has passed.  To take into
397account Sybil attacks on the netDb, the Kademlia routing location used for any given
398key varies over time.  For instance, rather than storing a routerInfo on the peers
399closest to SHA256(routerInfo.identity), they are stored on the peers closest to
400SHA256(routerInfo.identity + YYYYMMDD), requiring an adversary to remount the attack
401again daily so as to maintain their closeness to the current routing key.  As the
402very fact that a router is making a lookup for a given key may expose sensitive data
403(and the fact that a router is <i>publishing</i> a given key even more so), all netDb
404messages are transmitted through the router's exploratory tunnels.
405</p>
406
407<p>
408The netDb plays a very specific role in the I2P network, and the algorithms have
409been tuned towards our needs.  This also means that it hasn't been tuned to address the
410needs we have yet to run into.  As the network grows, the primary floodfill algorithm
411will need to be refined to exploit the capacity available, or perhaps replaced with
412another technique for securely distributing the network metadata.
413</p>
414
415<h2 id="op.transport">Transport protocols</h2>
416
417<p>
418Communication between routers needs to provide confidentiality and integrity
419against external adversaries while authenticating that the router contacted
420is the one who should receive a given message.  The particulars of how routers
421communicate with other routers aren't critical - three separate protocols have
422been used at different points to provide those bare necessities.  To accommodate
423the need for high degree communication (as a number of routers will end up
424speaking with many others), I2P moved from a TCP based transport
425to a UDP based one - "Secure Semireliable UDP", or "SSU".  As described in the
426<a href="http://dev.i2p.net/cgi-bin/cvsweb.cgi/i2p/router/doc/udp.html?rev=HEAD">SSU spec</a>:</p>
427
428<blockquote>
429The goal of this protocol is to provide secure, authenticated,
430semireliable, and unordered message delivery, exposing only a minimal amount of
431data easily discernible to third parties. It should support high degree
432communication as well as TCP-friendly congestion control, and may include
433PMTU detection. It should be capable of efficiently moving bulk data at rates
434sufficient for home users. In addition, it should support techniques for
435addressing network obstacles, like most NATs or firewalls.
436</blockquote>
437
438<h2 id="op.crypto">Cryptography</h2>
439
440<p>
441A bare minimum set of cryptographic primitives are combined together to provide I2P's
442layered defenses against a variety of adversaries.  At the lowest level, interrouter
443communication is protected by the transport layer security - SSU
444encrypts each packet with AES256/CBC with both an explicit IV and MAC (HMAC-MD5-128)
445after agreeing upon an ephemeral session key through a 2048bit Diffie-Hellman exchange,
446station-to-station authentication with the other router's DSA key, plus each network
447message has their own hash for local integrity checking.
448<a href="#op.tunnels">Tunnel</a> messages passed over the transports have their own
449layered AES256/CBC encryption with an explicit IV and verified at the tunnel endpoint
450with an additional SHA256 hash.  Various other messages are passed along inside
451"garlic messages", which are encrypted with ElGamal/AES+SessionTags (explained below). 
452</p>
453
454<h3 id="op.garlic">Garlic messages</h3>
455
456<p>
457Garlic messages are an extension of "onion" layered encryption, allowing the contents
458of a single message to contain multiple "cloves" - fully formed messages alongside
459their own instructions for delivery.  Messages are wrapped into a garlic message whenever
460the message would otherwise be passing in cleartext through a peer who should not have
461access to the information - for instance, when a router wants to ask another router to
462participate in a tunnel, they wrap the request inside a garlic, encrypt that garlic to
463the receiving router's 2048bit ElGamal public key, and forward it through a tunnel.
464Another example is when a client wants to send a message to a destination - the sender's
465router will wrap up that data message (alongside some other messages) into a garlic,
466encrypt that garlic to the 2048bit ElGamal public key published in the recipient's
467leaseSet, and forward it through the appropriate tunnels.
468</p>
469
470<p>
471The "instructions" attached to each clove inside the encryption layer includes the
472ability to request that the clove be forwarded locally, to a remote router, or to a
473remote tunnel on a remote router.  There are fields in those instructions allowing a
474peer to request that the delivery be delayed until a certain time or condition has
475been met, though they won't be honored until the
476<a href="#future.variablelatency">nontrivial delays</a> are deployed.  It is possible to
477explicitly route garlic messages any number of hops without building tunnels, or even
478to reroute tunnel messages by wrapping them in garlic messages and forwarding them a
479number of hops prior to delivering them to the next hop in the tunnel, but those
480techniques are not currently used in the existing implementation.
481</p>
482
483<h3 id="op.sessiontags">Session tags</h3>
484
485<p>
486As an unreliable, unordered, message based system, I2P uses a simple combination of
487asymmetric and symmetric encryption algorithms to provide data confidentiality and
488integrity to garlic messages.  As a whole, the combination is referred to as
489ElGamal/AES+SessionTags, but that is an excessively verbose way to describe the simple
490use of 2048bit ElGamal, AES256, SHA256, and 32 byte nonces.
491</p>
492
493<p>
494The first time a router wants to encrypt a garlic message to another router, they encrypt
495the keying material for an AES256 session key with ElGamal and append the AES256/CBC
496encrypted payload after that encrypted ElGamal block.  In addition to the encrypted
497payload, the AES encrypted section contains the payload length, the SHA256 hash of the
498unencrypted payload, as well as a number of "session tags" - random 32 byte nonces.  The
499next time the sender wants to encrypt a garlic message to another router, rather than
500ElGamal encrypt a new session key they simply pick one of the previously delivered session
501tags and AES encrypt the payload like before, using the session key used with that
502session tag, prepended with the session tag itself.  When a router receives a garlic encrypted
503message, they check the first 32 bytes to see if it matches an available session tag - if
504it does, they simply AES decrypt the message, but if it does not, they ElGamal decrypt the
505first block.
506</p>
507
508<p>
509Each session tag can be used only once so as to prevent internal adversaries from unnecessarily
510correlating different messages as being between the same routers.  The sender of an
511ElGamal/AES+SessionTag encrypted message chooses when and how many tags to deliver,
512prestocking the recipient with enough tags to cover a volley of messages.  Garlic messages
513may detect the successful tag delivery by bundling a small additional message as a clove (a
514"delivery status message") - when the garlic message arrives at the intended recipient and
515is decrypted successfully, this small delivery status message is one of the cloves exposed and
516has instructions for the recipient to send the clove back to the original sender (through an
517inbound tunnel, of course).  When the original sender receives this delivery status message,
518they know that the session tags bundled in the garlic message were successfully delivered.
519</p>
520
521<p>
522Session tags themselves have a very short lifetime, after which they are discarded
523if not used.  In addition, the quantity stored for each key is limited, as are the
524number of keys themselves - if too many arrive, either new or old messages may be
525dropped.  The sender keeps track whether messages using session tags are getting
526through, and if there isn't sufficient communication it may drop the ones previously
527assumed to be properly delivered, reverting back to the full expensive ElGamal
528encryption.
529</p>
530
531<p>
532One alternative is to transmit only a single session tag, and from that, seed a
533deterministic PRNG for determining what tags to use or expect.  By keeping this
534PRNG roughly synchronized between the sender and recipient (the recipient precomputes a
535window of the next e.g. 50 tags), the overhead of periodically bundling a large number
536of tags is removed, allowing more options in the space/time tradeoff, and perhaps
537reducing the number of ElGamal encryptions necessary.  However, it would depend
538upon the strength of the PRNG to provide the necessary cover against internal
539adversaries, though perhaps by limiting the amount of times each PRNG is used, any
540weaknesses can be minimized.  At the moment, there are no immediate plans to move
541towards these synchronized PRNGs.
542</p>
543
544<h1 id="future">Future</h1>
545<p>
546While I2P is currently functional and sufficient for many scenarios, there are
547several areas which require further improvement to meet the needs of those
548facing more powerful adversaries as well as substantial user experience optimization.
549</p>
550
551<h2 id="future.restricted">Restricted route operation</h2>
552
553<p>
554I2P is an overlay network designed to be run on top of a functional packet switched
555network, exploiting the end to end principle to offer anonymity and security. 
556While the Internet no longer fully embraces the end to end principle, I2P does require a
557substantial portion of the network to be reachable - there may be a number of peers
558along the edges running using restricted routes, but I2P does not include an
559appropriate routing algorithm for the degenerate case where most peers are
560unreachable.  It would, however work on top of a network employing such an
561algorithm.
562</p>
563
564<p>
565Restricted route operation, where there are limits to what peers are
566reachable directly, has several different functional and anonymity
567implications, dependent upon how the restricted routes are handled.  At the most
568basic level, restricted routes exist when a peer is behind a NAT or firewall which
569does not allow inbound connections.  This was largely addressed in I2P 0.6.0.6 by
570integrating distributed hole punching into the transport layer, allowing people
571behind most NATs and firewalls to receive unsolicited connections without any
572configuration.  However, this does not limit the exposure of the peer's IP address to
573routers inside the network, as they can simply get introduced to the peer through
574the published introducer.
575</p>
576
577<p>
578Beyond the functional handling of restricted routes, there are two levels of
579restricted operation that can be used to limit the exposure of one's IP address -
580using router-specific tunnels for communication, and offering 'client routers'.  For
581the former, routers can either build a new pool of tunnels or reuse their exploratory
582pool, publishing the inbound gateways to some of them as part of their routerInfo in
583place of their transport addresses.  When a peer wants to get in touch with them,
584they see those tunnel gateways in the netDb and simply send the relevant message to
585them through one of the published tunnels.  If the peer behind the restricted route
586wants to reply, it may do so either directly (if they are willing to expose their IP
587to the peer) or indirectly through their outbound tunnels.  When the routers that the
588peer has direct connections to want to reach it (to forward tunnel messages, for
589instance), they simply prioritize their direct connection over the published tunnel
590gateway.  The concept of 'client routers' simply extends the restricted route by not
591publishing any router addresses.  Such a router would not even need to publish their
592routerInfo in the netDb, merely providing their self signed routerInfo to the peers
593that it contacts (necessary to pass the router's public keys).  Both levels of
594restricted route operation are planned for I2P 2.0.
595</p>
596
597<p>
598There are tradeoffs for those behind restricted routes, as they would likely
599participate in other people's tunnels less frequently, and the routers which
600they are connected to would be able to infer traffic patterns that would not
601otherwise be exposed.  On the other hand, if the cost of that exposure is less
602than the cost of an IP being made available, it may be worthwhile.  This, of course,
603assumes that the peers that the router behind a restricted route contacts are not
604hostile - either the network is large enough that the probability of using a hostile
605peer to get connected is small enough, or trusted (and perhaps temporary) peers are
606used instead.
607</p>
608
609<h2 id="future.variablelatency">Variable latency</h2>
610
611<p>
612Even though the bulk of I2P's initial efforts have been on low latency communication,
613it was designed with variable latency services in mind from the beginning.  At the
614most basic level, applications running on top of I2P can offer the anonymity of
615medium and high latency communication while still blending their traffic patterns
616in with low latency traffic.  Internally though, I2P can offer its own medium and
617high latency communication through the garlic encryption - specifying that the
618message should be sent after a certain delay, at a certain time, after a certain
619number of messages have passed, or another mix strategy.  With the layered encryption,
620only the router that the clove exposed the delay request would know that the message
621requires high latency, allowing the traffic to blend in further with the low latency
622traffic.  Once the transmission precondition is met, the router holding on to the
623clove (which itself would likely be a garlic message) simply forwards it as
624requested - to a router, to a tunnel, or, most likely, to a remote client destination.
625</p>
626
627<p>
628There are a substantial number of ways to exploit this capacity for high latency
629comm in I2P, but for the moment, doing so has been scheduled for the I2P 3.0 release.
630In the meantime, those requiring the anonymity that high latency comm can offer should
631look towards the application layer to provide it.
632</p>
633
634<h2 id="future.open">Open questions</h2>
635<pre>
636How to get rid of the timing constraint?
637Can we deal with the sessionTags more efficiently?
638What, if any, batching/mixing strategies should be made available on the tunnels?
639What other tunnel peer selection and ordering strategies should be available?
640</pre>
641
642<h1 id="similar">Similar systems</h1>
643<p>
644I2P's architecture builds on the concepts of message oriented middleware, the topology
645of DHTs, the anonymity and cryptography of free route mixnets, and the adaptability of
646packet switched networking.  The value comes not from novel concepts of algorithms
647though, but from careful engineering combining the research results of existing
648systems and papers.  While there are a few similar efforts worth reviewing, both for
649technical and functional comparisons, two in particular are pulled out here - Tor
650and Freenet.
651</p>
652
653<h2 id="similar.tor">Tor</h2>
654<p><i><a href="http://tor.eff.org/">website</a></i></p>
655
656<p>
657At first glance, Tor and I2P have many functional and anonymity related similarities.
658While I2P's development began before we were aware of the early stage efforts on Tor,
659many of the lessons of the original onion routing and ZKS efforts were integrated into
660I2P's design.  Rather than building an essentially trusted, centralized system with
661directory servers, I2P has a self organizing network database with each peer taking on
662the responsibility of profiling other routers to determine how best to exploit available
663resources.  Another key difference is that while both I2P and Tor use layered and
664ordered paths (tunnels and circuits/streams), I2P is fundamentally a packet switched
665network, while Tor is fundamentally a circuit switched one, allowing I2P to
666transparently route around congestion or other network failures, operate redundant
667pathways, and load balance the data across available resources.  While Tor offers
668the useful outproxy functionality by offering integrated outproxy discovery and
669selection, I2P leaves such application layer decisions up to applications running on
670top of I2P - in fact, I2P has even externalized the TCP-like streaming library itself
671to the application layer, allowing developers to experiment with different strategies,
672exploiting their domain specific knowledge to offer better performance.
673</p>
674
675<p>
676From an anonymity perspective, there is much similarity when the core networks are
677compared.  However, there are a few key differences.  When dealing with an internal
678adversary or most external adversaries, I2P's simplex tunnels expose half as much
679traffic data than would be exposed with Tor's duplex circuits by simply looking at
680the flows themselves - an HTTP request and response would follow the same path in
681Tor, while in I2P the packets making up the request would go out through one or
682more outbound tunnels and the packets making up the response would come back through
683one or more different inbound tunnels.  While I2P's peer selection and ordering
684strategies should sufficiently address predecessor attacks, I2P can trivially
685mimic Tor's non-redundant duplex tunnels by simply building an inbound and
686outbound tunnel along the same routers.</p>
687
688<p>
689Another anonymity issue comes up in Tor's use of telescopic tunnel creation, as
690simple packet counting and timing measurements as the cells in a circuit pass
691through an adversary's node exposes statistical information regarding where the
692adversary is within the circuit.  I2P's unidirectional tunnel creation with a
693single message so that this data is not exposed.  Protecting the position in a
694tunnel is important, as an adversary would otherwise be able to mounting a
695series of powerful predecessor, intersection, and traffic confirmation attacks.
696</p>
697
698<p>
699Tor's support for a second tier of "onion proxies" does offer a nontrivial degree
700of anonymity while requiring a low cost of entry, while I2P will not offer this
701topology until <a href="#future.restricted">2.0</a>.
702</p>
703
704<p>
705On the whole, Tor and I2P complement each other in their focus - Tor works towards
706offering high speed anonymous Internet outproxying, while I2P works towards offering
707a decentralized resilient network in itself.  In theory, both can be used to achieve
708both purposes, but given limited development resources, they both have their
709strengths and weaknesses.  The I2P developers have considered the steps necessary to
710modify Tor to take advantage of I2P's design, but concerns of Tor's viability under
711resource scarcity suggest that I2P's packet switching architecture will be able to
712exploit scarce resources more effectively.
713</p>
714
715<h2 id="similar.freenet">Freenet</h2>
716<p><i><a href="http://www.freenetproject.org/">website</a></i></p>
717
718<p>
719Freenet played a large part in the initial stages of I2P's design - giving proof to
720the viability of a vibrant pseudonymous community completely contained within the
721network, demonstrating that the dangers inherent in outproxies could be avoided.
722The first seed of I2P began as a replacement communication layer for Freenet,
723attempting to factor out the complexities of a scalable, anonymous and secure point
724to point communication from the complexities of a censorship resistant distributed
725data store.  Over time however, some of the anonymity and scalability issues
726inherent in Freenet's algorithms made it clear that I2P's focus should stay strictly
727on providing a generic anonymous communication layer, rather than as a component of
728Freenet.  Over the years, the Freenet developers have come to see the weaknesses
729in the older design, prompting them to suggest that they will require a "premix"
730layer to offer substantial anonymity.  In other words, Freenet needs to run on top
731of a mixnet such as I2P or Tor, with "client nodes" requesting and publishing data
732through the mixnet to the "server nodes" which then fetch and store the data according
733to Freenet's heuristic distributed data storage algorithms.
734</p>
735
736<p>
737Freenet's functionality is very complementary to I2P's, as Freenet natively provides
738many of the tools for operating medium and high latency systems, while I2P natively
739provides the low latency mix network suitable for offering adequate anonymity.  The
740logic of separating the mixnet from the censorship resistant distributed data store
741still seems self evident from an engineering, anonymity, security, and resource
742allocation perspective, so hopefully the Freenet team will pursue efforts in that
743direction, if not simply reusing (or helping to improve, as necessary) existing
744mixnets like I2P or Tor.
745</p>
746
747<p>
748It is worth mentioning that there has recently been discussion and work by the
749Freenet developers on a "globally scalable darknet" using restricted routes between
750peers of various trust.  While insufficient information has been made publicly
751available regarding how such a system would operate for a full review, from what
752has been said the anonymity and scalability claims seem highly dubious.  In
753particular, the appropriateness for use in hostile regimes against state level
754adversaries has been tremendously overstated, and any analysis on the implications
755of resource scarcity upon the scalability of the network has seemingly been avoided.
756Further questions regarding susceptibility to traffic analysis, trust, and other topics
757do exist, but a more in-depth review of this "globally scalable darknet" will have
758to wait until the Freenet team makes more information available.
759</p>
760
761<h1 id="app">Appendix A: Application layer</h1>
762
763<p>
764I2P itself doesn't really do much - it simply sends messages to remote destinations
765and receives messages targeting local destinations - most of the interesting work
766goes on at the layers above it.  By itself, I2P could be seen as an anonymous and
767secure IP layer, and the bundled <a href="#app.streaming">streaming library</a> as
768an implementation of an anonymous and secure TCP layer on top of it.  Beyond that,
769<a href="#app.i2ptunnel">I2PTunnel</a> exposes a generic TCP proxying system for
770either getting into or out of the I2P network, plus a variety of network
771applications provide further functionality for end users.
772</p>
773
774<h2 id="app.streaming">Streaming library</h2>
775
776<p>
777The streaming library has grown organically for I2P - first mihi implemented the
778"mini streaming library" as part of I2PTunnel, which was limited to a window
779size of 1 message (requiring an ACK before sending the next one), and then it was
780refactored out into a generic streaming interface (mirroring TCP sockets) and the
781full streaming implementation was deployed with a sliding window protocol and
782optimizations to take into account the high bandwidth x delay product.  Individual
783streams may adjust the maximum packet size and other options, though the default
784of 4KB compressed seems a reasonable tradeoff between the bandwidth costs of
785retransmitting lost messages and the latency of multiple messages.
786</p>
787
788<p>
789In addition, in consideration of the relatively high cost of subsequent messages,
790the streaming library's protocol for scheduling and delivering messages has been optimized to
791allow individual messages passed to contain as much information as is available.
792For instance, a small HTTP transaction proxied through the streaming library can
793be completed in a single round trip - the first message bundles a SYN, FIN, and
794the small payload (an HTTP request typically fits) and the reply bundles the SYN,
795FIN, ACK, and the small payload (many HTTP responses fit).  While an additional
796ACK must be transmitted to tell the HTTP server that the SYN/FIN/ACK has been
797received, the local HTTP proxy can deliver the full response to the browser
798immediately. 
799</p>
800
801<p>
802On the whole, however, the streaming library bears much resemblance to an
803abstraction of TCP, with its sliding windows, congestion control algorithms
804(both slow start and congestion avoidance), and general packet behavior (ACK,
805SYN, FIN, RST, rto calculation, etc). 
806</p>
807
808<h2 id="app.naming">Naming library and addressbook</h2>
809<p><i>Developed by: mihi, Ragnarok</i></p>
810
811<p>
812Naming within I2P has been an oft-debated topic since the very beginning with
813advocates across the spectrum of possibilities.  However, given I2P's inherent
814demand for secure communication and decentralized operation, the traditional
815DNS-style naming system is clearly out, as are "majority rules" voting systems.
816Instead, I2P ships with a generic naming library and a base implementation
817designed to work off a local name to destination mapping, as well as an optional
818add-on application called the "addressbook".  The addressbook is a web-of-trust
819driven secure, distributed, and human readable naming system, sacrificing only
820the call for all human readable names to be globally unique by mandating only
821local uniqueness.  While all messages in I2P are cryptographically addressed
822by their destination, different people can have local addressbook entries for
823"Alice" which refer to different destinations.  People can still discover new
824names by importing published addressbooks of peers specified in their web of trust,
825by adding in the entries provided through a third party, or (if some people organize
826a series of published addressbooks using a first come first serve registration
827system) people can choose to treat these addressbooks as name servers, emulating
828traditional DNS.
829</p>
830
831<p>
832I2P does not promote the use of DNS-like services though, as the damage done
833by hijacking a site can be tremendous - and insecure destinations have no
834value.  DNSsec itself still falls back on registrars and certificate authorities,
835while with I2P, requests sent to a destination cannot be intercepted or the reply
836spoofed, as they are encrypted to the destination's public keys, and a destination
837itself is just a pair of public keys and a certificate.  DNS-style systems on the
838other hand allow any of the name servers on the lookup path to mount simple denial
839of service and spoofing attacks.  Adding on a certificate authenticating the
840responses as signed by some centralized certificate authority would address many of
841the hostile nameserver issues but would leave open replay attacks as well as
842hostile certificate authority attacks.
843</p>
844
845<p>
846Voting style naming is dangerous as well, especially given the effectiveness of
847Sybil attacks in anonymous systems - the attacker can simply create an arbitrarily
848high number of peers and "vote" with each to take over a given name.  Proof-of-work
849methods can be used to make identity non-free, but as the network grows the load
850required to contact everyone to conduct online voting is implausible, or if the
851full network is not queried, different sets of answers may be reachable.
852</p>
853
854<p>
855As with the Internet however, I2P is keeping the design and operation of a
856naming system out of the (IP-like) communication layer.  The bundled naming library
857includes a simple service provider interface which alternate naming systems can
858plug into, allowing end users to drive what sort of naming tradeoffs they prefer.
859</p>
860
861<h2 id="app.syndie">Syndie</h2>
862
863<p>
864Syndie is a safe, anonymous blogging / content publication / content aggregation system.
865It lets you create information, share it with others, and read posts from those you're
866interested in, all while taking into consideration your needs for security and anonymity.
867Rather than building its own content distribution network, Syndie is designed to run on
868top of existing networks, syndicating content through eepsites, Tor hidden services,
869Freenet freesites, normal websites, usenet newgroups, email lists, RSS feeds, etc.  Data
870published with Syndie is done so as to offer pseudonymous authentication to anyone
871reading or archiving it.
872</p>
873
874<h2 id="app.i2ptunnel">I2PTunnel</h2>
875<p><i>Developed by: mihi</i></p>
876
877<p>
878I2PTunnel is probably I2P's most popular and versatile client application, allowing
879generic proxying both into and out of the I2P network.  I2PTunnel can be viewed as
880four separate proxying applications - a "client" which receives inbound TCP connections
881and forwards them to a given I2P destination, an "httpclient" (aka "eepproxy") which
882acts like an HTTP proxy and forwards the requests to the appropriate I2P destination
883(after querying the naming service if necessary), a "server" which receives inbound I2P
884streaming connections on a destination and forwards them to a given TCP host+port,
885and an "httpserver" which extends the "server" by parsing the HTTP request and
886responses to allow safer operation.  There is an additional "socksclient" application,
887but its use is not encouraged for reasons previously mentioned.
888</p>
889
890<p>
891I2P itself is not an outproxy network - the anonymity and security concerns inherent
892in a mix net which forwards data into and out of the mix have kept I2P's design focused
893on providing an anonymous network which capable of meeting the user's needs without
894requiring external resources.  However, the I2PTunnel "httpclient" application offers
895a hook for outproxying - if the hostname requested doesn't end in ".i2p", it picks a
896random destination from a user-provided set of outproxies and forwards the request to
897them.  These destinations are simply I2PTunnel "server" instances run by volunteers
898who have explicitly chosen to run outproxies - no one is an outproxy by default, and
899running an outproxy doesn't automatically tell other people to proxy through you.
900While outproxies do have inherent weaknesses, they offer a simple proof of concept for
901using I2P and provide some functionality under a threat model which may be sufficient
902for some users.
903</p>
904
905<p>
906I2PTunnel enables most of the applications in use.  An "httpserver" pointing at a
907webserver lets anyone run their own anonymous website (or "eepsite") - a webserver
908is bundled with I2P for this purpose, but any webserver can be used.  Anyone may
909run a "client" pointing at one of the anonymously hosted IRC servers, each of which
910are running a "server" pointing at their local IRCd and communicating between IRCds
911over their own "client" tunnels.  End users also have "client" tunnels pointing at
912<a href="#app.i2pmail">I2Pmail's</a> POP3 and SMTP destinations (which in turn are
913simply "server" instances pointing at POP3 and SMTP servers), as well as "client"
914tunnels pointing at I2P's CVS server, allowing anonymous development.  At times people have
915even run "client" proxies to access the "server" instances pointing at an NNTP server.
916</p>
917
918<h2 id="app.i2pbt">i2p-bt</h2>
919<p><i>Developed by: duck, et al</i></p>
920
921<p>
922i2p-bt is a port of the mainline python BitTorrent client to run both the tracker and
923peer communication over I2P.  Tracker requests are forwarded through the eepproxy to
924eepsites specified in the torrent file while tracker responses refer to peers by their
925destination explicitly, allowing i2p-bt to open up a
926<a href="#app.streaming">streaming lib</a> connection to query them for blocks.
927</p>
928
929<p>
930In addition to i2p-bt, a port of bytemonsoon has been made to I2P, making a few
931modifications as necessary to strip any anonymity-compromising information from the
932application and to take into consideration the fact that IPs cannot be used for
933identifying peers. 
934</p>
935
936<h2 id="app.i2psnark">I2PSnark</h2>
937<p><i>I2PSnark developed: jrandom, et al, ported from <a
938href="http://www.klomp.org/mark/">mjw</a>'s <a
939href="http://www.klomp.org/snark/">Snark</a> client</i></p>
940
941<p>
942Bundled with the I2P install, I2PSnark offers a simple anonymous bittorrent
943client with multitorrent capabilities, exposing all of the functionality through
944a plain HTML web interface.
945</p>
946
947<h2 id="app.azneti2p">Azureus/azneti2p</h2>
948<p><i>Developed by: parg, et al</i></p>
949
950<p>
951The developers of the <a href="http://azureus.sf.net/">Azureus</a> BitTorrent client
952have created an "azneti2p" plugin, allowing Azureus users to participate in anonymous
953swarms over I2P, or simply to access anonymously hosted trackers while contacting
954each peer directly.  In addition, Azureus' built in tracker lets people run their
955own anonymous trackers without running bytemonsoon (which has substantial prerequisites)
956or i2p-bt's tracker.  The plugin is currently (July 2005) fully functional, but is in early
957beta and has a fairly complicated configuration process, though it is hopefully going
958to be streamlined further.
959</p>
960
961<h2 id="app.i2phex">I2Phex</h2>
962<p><i>Developed by: sirup</i></p>
963
964<p>
965I2Phex is a fairly direct port of the Phex Gnutella filesharing client to run
966entirely on top of I2P.  While it has disabled some of Phex's functionality,
967such as integration with Gnutella webcaches, the basic file sharing and chatting
968system is fully functional.
969</p>
970
971<h2 id="app.i2pmail">I2Pmail/susimail</h2>
972<p><i>Developed by: postman, susi23, mastiejaner</i></p>
973
974<p>
975I2Pmail is more a service than an application - postman offers both internal and
976external email with POP3 and SMTP service through I2PTunnel instances accessing a
977series of components developed with mastiejaner, allowing people to use their
978preferred mail clients to send and receive mail pseudonymously.  However, as most
979mail clients expose substantial identifying information, I2P bundles susi23's
980web based susimail client which has been built specifically with I2P's anonymity
981needs in mind. The I2Pmail/mail.i2p service offers transparent virus filtering as
982well as denial of service prevention with hashcash augmented quotas.
983In addition, each user has control of their batching strategy prior to delivery
984through the mail.i2p outproxies, which are separate from the mail.i2p SMTP and
985POP3 servers - both the outproxies and inproxies communicate with the mail.i2p
986SMTP and POP3 servers through I2P itself, so compromising those non-anonymous
987locations does not give access to the mail accounts or activity patterns of the
988user. At the moment the developers work on a decentralized mailsystem, called
989"v2mail". More information can be found on the eepsite
990<a href="http://hq.postman.i2p/">hq.postman.i2p</a>.
991</p>
992
993</body>
994</html>
Note: See TracBrowser for help on using the repository browser.