wiki:NetDB/NextBackend

Version 76 (modified by hottuna, 6 years ago) (diff)

Why?

Scalability: Currently we lack O(log(N)) lookup scalabilty if more FloodFill nodes than the local NetDB contains exist.

Resilience: Our current FloodFill system is susceptible to attacks.

Alternatives

General P2P networks

General P2P networks have been considered, but have been found lacking in terms of search horizon*.


Name Search horizon*
Gnutella Limited
Gnutella2 Limited


  • Search horizon describes how much the network that can be searched from a certain position in the network graph. Limited search horizon means that a search from one part of the network won't necessarily find results from another part of the network.

DHTs

DHTs are a good alternative due to O(log(N)) lookup time and a unlimited search horizon. But have serious issues when it comes to being robust against attacks.


Name Search horizon* Lookup steps Mutable data Comments
Kademlia Unlimited O(log2b(N)) [2] No Is susceptible to Sybil and Eclipse attacks.*
Freenet Unlimited O(log2(N)) [3]No Is vulnerable to swap algorithm attacks, which wreak load balancing. [4]
Chord Unlimited O(log2(N)*0.5)[8] No Is highly susceptible to Sybil and Eclipse attacks.*
Pastry Unlimited O(log2b(N))[7] No Is highly susceptible to Sybil and Eclipse attacks.*


  • Kademlia is less susceptible to eclipse attacks. "For one thing, it is difficult to affect the routing tables of a Kademlia node, as each node tends to keep only highly available peers in its routing table. This increases the required costs for an attacker to convince honest nodes to link to the compromised nodes. Similarly, Kademlia uses iterative routing, exploring multiple nodes at each step, making routing less dependent on specific nodes and thus less vulnerable to attacks." [1]

Kademlia lookups can be optimized by enlarging how many bits of IDs, b, that are considered for each bucket. With b > 1 lookup steps would be decreased from O(log2(N)) to O(log2b(N)) but the number of buckets would be increased to an expected 2blog2b(N). [2]


Kademlia Defence Mechanisms

Sybil Defence

Sybil attacks are based in the idea of creating a large number of participating nodes. The Sybil attack does not damage the DHT by itself, but can be used as a vector to artificially create a major- ity of colluding malicious nodes in the overlay.


Name Source Description


Eclipse Defence

Eclipse attacks are attacks on the routing / routing tables.


Name Source Description
Random Recursive lookups R5N[4] Before initiating a recursive kad lookup, do a random walk in the network graph to determine the start of the kad lookup.
Control in/out-degrees [5][1] Control of the in-degree and out-degree of nodes via anonymous auditing. At the cost of slower avg. lookups.


Storage Defence

Storage attacks are attacks which attempt to provide bogus responses to queries.


Name Source Description
Recursive lookups R5N[4] Make FIND_VALUE request recursive by forwarding the query recursively and (recursively, to the previous requester in the chain of the recursion) returning the answer, a reliability metric of nodes can be obtained. Which can be used in conjunction with the last_seen attribute of k-bucket entries to create a combined eviction policy.


Kademlia Performance Improvements

Standard Kademlia performance can be improved by modifying it.


Name Source Description
Recursive lookups [6] Make FIND_VALUE request recursive by forwarding the query recursively and returning the answer directly to the original source of the request.


Proposal

Use a Kademlia base and extend it with features that improve performance and/or reliability.

Why?

Kademlia is preferable to Freenet due to its lookup speed and extendibility.

Kademlia is preferable to Chord/Pastry due to being as fast or faster and more resilient against Eclipse attacks.

UCSB FloodFill-takeover attack is fixed by making everyone a node [9].

Eclipse attacks can be somewhat relieved by aggressive STORE replication factors and Random Recursive lookups.

How?

1: Implement Kademlia

A full implementation of Kademlia will be used as a base for further enhancements.

  • Investigate the i2p.zzz.kademlia implementation
  • Make sure that i2p.zzz.kademlia implements Kademlia fully
    • Implement k-bucket merging

2: Improve performance

For Kademlia to be a viable alternative, lookup speed must be kept high. This means that lookup steps should be kept low and that as many round-trips as possibly should be avoided.

  • Implement recursive lookups
    • Investigate performance

3: Attack resistance

The two main attack resistance methods will be recursive and random recursive lookups. Recursive lookups are not only fast, but can be be used to provide a lookup success metric that is useful when doing the k-bucket evictions. Random Recursive lookups allows FIND_VALUE requesters to eventually find a path to the data if one exists.

  • Implement recursive lookup metric
    • Change k-bucket eviction policy to also use recursive lookup success metric
  • Implement random recursive lookups

Lookup

The ultimate goal is to provide a Kademlia implementation that supports three types of FIND_VALUE. Recursive, Iterative and Random Recursive.
Recursive is the fastest means of lookup. However it is very vulnerable to Sybil attacks where a node in the recursion simply answers 'no result' instead of continuing the recursion.
Iterative is the standard Kademlia means of lookup. It is resistant to attacks since if will query a nodes for every iteration.
Recursive Random is the R5N[4] means of lookup. It is resistant to most if not all forms of attacks since it eventually will find a path to the data if one exists.


Method Lookup time (avg.) Reliability
Recursive (log2b(N) + 1)*RTT/2 Low
Iterative log2b(N)*RTT Medium
Random Recursive (2*log2b(N) + 1)*RTT/2 High


The three lookup mechanisms could be used in either sequential failover mode, hybrid mode and parallel mode.
Sequential failover mode would would produce the lowest possible network load, while still maintaining the reliability of Random Recursive lookups. However, the lookup speed will suffer if failovers are needed.
Hybrid mode would use success rates for the different lookup methods to try to decide which method to use. This might be a bad mechanism since it likely would be susceptible to attacks that provide high success rates for most but not all keys.
Parallel mode would increase network load considerably, but provide the fastest possible lookup times at all times.

Recursion

r FIND_VALUEs are initiated at the originator node. Every node should forward rlocal_recursion FIND_VALUEs requests.

Store

Stores should ideally always be performed using Random Recursion to increase the spread of nodes that contain the inserted data.

Iterative and Recursive STORE queries are possible, however they both have limited spread of inserted data. Additionally the benefits of non-random Recursive STOREs would be insertion speed. Something that likely isn't a very important factor.


Method Store time Spread
Recursive (log2b(N) + 1)*RTT/2 Low
Iterative log2b(N)*RTT Medium
Random Recursive (log2b(N) + rand + 1)*RTT/2 High


Recursion

r STOREs are initiated at the originator node. Every node should forward rlocal_recursion STORE request.

Implementation details

End of Recursion

Recursion can be used to slow DOS the network.

Metric 1

Each node in the path of the recursion will stop forwarding the query if it is closer to the key than any node in its routing table. This is what is used by R5N.

Metric 2

A Hops-travelled counter, htravelled, can be used. The recursion can be terminated using htravelled ≤ 2*T.

Estimated Buckets entries = 2blog2b(N) ⇔ N = (2b)bucket_entries*pow(2,-b)
T = log2b(N), which is the average number of steps needed to go anywhere in the network.

Metric 1 && 2

"(h > T && closest)
(h > 4T) right now (4T or 3T or 2T, not sure); that way, the random walk (hT) doesn't terminate at a local min by accident." — C. Grothoff regarding R5N implementation.

Length of Random path

Estimated Buckets entries = 2blog2b(N) ⇔ N = (2b)bucket_entries*pow(2,-b)
rand_path_length = log2b(N), which is the average number of steps needed to go anywhere in the network [4].

Replication factor

In standard kad, k is used as the replication factor, or how many STORE request that will be sent by the originator. The replication factor, henceforth r is a trade-of between FIND_VALUE performance and the cost of STORE. Since no data is permanent in the NetDB a relatively high replication factor should be acceptable.

r is determined by, r = (N/c)½, where c is a constant describing the number of nodes with direct TCP/UDP connection to this node. For our purposes a value for c would be chosen.

Recursive Replication factor

For recursive stores a replication factor has to be calculated for each node on the path of recursion.

rlocal_recursion = 1 + (r-1) / (T + (r-1)/h), where T = log2b(N) and h represents the hops travelled. Recursion is stopped when h < 2*T. 2*T was chosen to compensate for inaccuracies in determining T.

Unresolved issues

k-bucket building for recursive lookups

k-buckets need to be continually updated. In the case of iterative searching, this is not a problem. But when doing recursive searches, the search originator will not get the usual information about nodes that are close to the queried key.

[6] suggests two solutions to this issue:
Source: Direct Mode: All nodes on the routing path send back k nodes that are close to key back to the originator of the query.
Source: Source-Routing Mode: All nodes on the routing path send back k nodes that are close to key back to the previous hop on routing path which appends its own k closest nodes to key and repeats.

However Source: Direct Mode may provide an excellent DDOS attack vector if the originator address is not verified.

Key-rotation

Key-rotation might be an interesting idea in the form of hash(dest+low_res_timestamp). It may help out against eclipse attacks if the results of the UCSB paper [9] holds up. This will have to verified before any implementation of it is meaningful.

It is only interesting if the time it takes for malicious Eclipse-nodes to integrate into the DHT is significant. nodes_needed_for_eclipse = (60/key_rot_interval)*eclipse_integration_time*attackers_per_eclipse
nodes_needed_for_eclipse = (60/10)*24*20 = 2880. A key rotation interval of 10 minutes is chosen, the time it takes to integrate a new malicious node into the DHT is guessed to be 24hrs and finally 20 attackers per destination is what was used in the UCSB paper [9] to Eclipse a key.

Performance/Memory trade-of for b

A performance/memory usage trade-of exists for the b-value. Plotted below are lookup steps and the memory usage for differing b-values. k is set to 10, which might be a reasonable setting. But more discussion is needed regarding that.

DHT lookup steps DHT memory usage

Further work

Use the locally known nodes as for routing

If we have nodes in our local NetDB that have distance, dxor , that is lower than any distance found in the Kademlia routing tables they are good candidates for FIND_VALUE querying. However to find such candidates the local NetDB has to be sorted, which is expensive.



[1] A Survey of DHT Security Techniques _
[2] Kademlia: A Peer-to-peer information system based on the XOR Metric _
[3] Searching in a Small World _
[4] R5N : Randomized Recursive Routing for Restricted-Route Networks _
[5] Eclipse attacks on overlay networks: Threats and defenses _
[6] R/Kademlia: Recursive and Topology-aware Overlay Routing _ slides
[7] Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems _
[8] Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications _ [9] Practical Attacks Against The I2P Network. By Christoph Egger. Unpublished as of 13-06-02. Accepted for RAID.

Attachments (2)

Download all attachments as: .zip