Changes between Version 62 and Version 63 of NetDB/NextBackend
 Timestamp:
 Jun 4, 2013 7:47:44 AM (7 years ago)
Legend:
 Unmodified
 Added
 Removed
 Modified

NetDB/NextBackend
v62 v63 1 = = Why? ==1 = Why? = 2 2 '''Scalability:''' Currently we lack O(log(n)) lookup scalabilty if more !FloodFill nodes than the local NetDB contains exist. 3 3 … … 6 6 7 7 8 === General P2P networks === 8 = Alternatives = 9 == General P2P networks == 9 10 10 11  '''Name'''  '''Search horizon*'''  '''Comments'''  … … 15 16 * Search horizon describes how much the network that can be searched from a certain position in the network graph. Limited search horizon means that a search from one part of the network won't necessarily find results from another part of the network. 16 17 17 == = DHTs ===18 == DHTs == 18 19 DHTs are a good alternative due to O(log n) lookup time and a unlimited search horizon. But have serious issues when it comes to being robust against attacks. 19 20 … … 63 64  Recursive lookups  [6]  Make FIND_VALUE request recursive by forwarding the query recursively and returning the answer directly to the original source of the request.  64 65 65 = = Proposal ==66 = Proposal = 66 67 '''This is a draft of the proposal.''' 67 68 68 == = Why? ===69 == Why? == 69 70 Kademlia is preferable to Freenet due to its lookup speed and extendibility. 70 71 … … 76 77 77 78 78 === ''Key'' === 79 ''Key'' rotation might be an interesting idea in the form of hash(dest+low_res_timestamp). It may help out against eclipse attacks if the results of the (unpublished) UCSB paper holds up. 80 81 82 === Store === 83 '''Idea: Random Recursive Stores:''' By making stores travel a short number of random steps (short to avoid sybil nodes) and then begin recursive store. Where a value is stored in the ~''k'' closest nodes for each node in the path of the recursive insertion. 84 85 ==== Recursion ==== 86 ''r'' STOREs are initiated at the originator node. Every node should forward r,,local_recursion,, STORE request. 87 88 === Lookup === 79 == Lookup == 89 80 90 81 The ultimate goal is to provide a Kademlia implementation that supports three types of FIND_VALUE. Recursive, Iterative and Random Recursive. [[BR]] … … 93 84 Recursive Random is the R5N[4] means of lookup. It is resistant to most if not all forms of attacks since it eventually will find a path to the data if one exists. 94 85 95  '''Method'''  '''Lookup time (avg.)'''  '''Reliability'''  '''Usage scenario''' 86  '''Method'''  '''Lookup time (avg.)'''  '''Reliability'''  96 87  Recursive  (log2^b^(n) + 1)*RTT/2  Low  97 88  Iterative  log2^b^(n)*RTT  Medium  … … 103 94 Parallel mode would increase network load considerably, but provide the fastest possible lookup times at all times.[[BR]] 104 95 105 === = Recursion ====96 === Recursion === 106 97 ''r'' FIND_VALUEs are initiated at the originator node. Every node should forward r,,local_recursion,, FIND_VALUEs requests. 107 98 108 99 109 === End of Recursion === 100 == Store == 101 Stores should ideally always be performed using Random Recursion to increase the spread of nodes that contain the inserted data. 102 Iterative and Recursive STORE queries are possible, however they both have limited spread of inserted data. Additionally the benefits of nonrandom Recursive STOREs would be insertion speed. Something that likely isn't a very important factor. 103 104  '''Method'''  '''Store time'''  '''Spread'''  105  Recursive  (log2^b^(n) + 1)*RTT/2  Low  106  Iterative  log2^b^(n)*RTT  Medium  107  Random Recursive  (log2^b^(n) + rand + 1)*RTT/2  High  108 109 === Recursion === 110 ''r'' STOREs are initiated at the originator node. Every node should forward r,,local_recursion,, STORE request. 111 112 113 == End of Recursion == 110 114 Recursion can be used to slow DOS the network. 111 115 112 === = Metric 1 ====116 === Metric 1 === 113 117 Each node in the path of the recursion will stop forwarding the query if it is closer to the ''key'' than any node in its routing table. 114 118 This is what is used by R5N. 115 119 116 === = Metric 2 ====120 === Metric 2 === 117 121 A Hopstravelled counter, ''h,,travelled,,'', can be used. The recursion can be terminated using ''h,,travelled,,'' ≤ 2*''T''. 118 122 … … 120 124 ''T'' = log2^b^(N), which is the average number of steps needed to go anywhere in the network. 121 125 122 === = Metric 1 && 2 ====126 === Metric 1 && 2 === 123 127 "(''h'' > ''T'' && closest)  (''h'' > 4''T'') right now (4''T'' or 3''T'' or 2''T'', not sure); that way, the random walk (''h'' <= ''T'') doesn't terminate at a local min by accident."  C. Grothoff [4] 124 128 125 129 126 130 127 == = Length of Random path ===131 == Length of Random path == 128 132 Estimated Buckets entries = 2^b^log2^b^(N) <=> N = (2^b^)^bucket_entries*pow(2,b)^ [[BR]] 129 133 rand_path_length = log2^b^(N), which is the average number of steps needed to go anywhere in the network [4]. 130 134 131 135 132 == = Replication factor ===136 == Replication factor == 133 137 In standard kad, ''k'' is used as the replication factor, or how many STORE request that will be sent by the originator. The replication factor, henceforth ''r'' is a tradeof between FIND_VALUE performance and the cost of STORE. Since no data is permanent in the NetDB a relatively high replication factor should be acceptable. 134 138 135 139 ''r'' is determined by, ''r'' = (N/c)^1/2^, where c is a constant describing the number of nodes with direct TCP/UDP connection to this node. For our purposes a value for c would be chosen. 136 140 137 === = Recursive Replication factor ====141 === Recursive Replication factor === 138 142 For recursive stores a replication factor has to be calculated for each node on the path of recursion. 139 143 … … 142 146 143 147 144 145 === How === 146 147 ==== 1: Implement Kademlia ==== 148 == ''Key''rotation == 149 ''Key''rotation might be an interesting idea in the form of hash(dest+low_res_timestamp). It may help out against eclipse attacks if the results of the (unpublished) UCSB paper holds up. This will have to verified before any implementation of it is meaningful. 150 151 It is only interesting if the time it takes for malicious Eclipsenodes to integrate into the DHT is significant. 152 nodes_needed_for_eclipse = (60/key_rot_interval)*eclipse_integration_time*attackers_per_eclipse[[BR]] 153 nodes_needed_for_eclipse = (60/10)*24*20 = 2880. A key rotation interval of 10 minutes is chosen, the time it takes to integrate a new malicious node into the DHT is guessed to be 24hrs and finally 20 attackers per destination is what was used in the UCSB paper to Eclipse a ''key''.[[BR]] 154 155 == How == 156 157 === 1: Implement Kademlia === 148 158 A full implementation of Kademlia will be used as a base for further enhancements. 149 159 … … 152 162 * Implement ''k''bucket merging 153 163 154 === = 2: Improve performance ====164 === 2: Improve performance === 155 165 For Kademlia to be a viable alternative, lookup speed must be kept high. This means that lookup steps should be kept low and that as many roundtrips as possibly should be avoided. 156 166 … … 158 168 * Investigate performance 159 169 160 === = 3: Attack resistance ====170 === 3: Attack resistance === 161 171 The two main attack resistance methods will be recursive and random recursive lookups. 162 172 Recursive lookups are not only fast, but can be be used to provide a lookup success metric that is useful when doing the ''k''bucket evictions. 163 Random recursive lookups allows FIND_VALUE requesters to eventually find a path to the data if one exists.173 Random Recursive lookups allows FIND_VALUE requesters to eventually find a path to the data if one exists. 164 174 165 175 * Implement recursive lookup metric … … 169 179 170 180 171 == = Unresolved issues ===172 === = ''k''bucket building for recursive lookups ====181 == Unresolved issues == 182 === ''k''bucket building for recursive lookups === 173 183 ''k''buckets need to be continually updated. 174 184 In the case of iterative searching, this is not a problem. But when doing recursive searches, the search originator will not get the usual information about nodes that are close to the queried key. [[BR]][[BR]] … … 179 189 However '''Source: Direct Mode''' may provide an excellent DDOS attack vector if the originator address is not verified. 180 190 181 === = !Performance/Memory tradeof for ''b'' ====191 === !Performance/Memory tradeof for ''b'' === 182 192 A performance/memory usage tradeof exists for the ''b''value. Plotted below are lookup steps and the memory usage for differing ''b''values. ''k'' is set to 10, which might be a reasonable setting. But more discussion is needed regarding that. 183 193 … … 186 196 [[Image(dht_mem_usage.svg, 450px)]] 187 197 188 == = Further work ===189 === = Use the locally known nodes as for routing ====198 == Further work == 199 === Use the locally known nodes as for routing === 190 200 If we have nodes in our local NetDB that have distance, d,,xor,, , that is lower than any distance found in the Kademlia routing tables they are good candidates for FIND_VALUE querying. However to find such candidates the local NetDB has to be sorted, which is expensive. 191 201