Opened 8 years ago

Closed 5 years ago

#510 closed defect (fixed)

NetDB refresh after keyspace rotation

Reported by: zzz Owned by: zzz
Priority: major Milestone: 0.9.10
Component: router/netdb Version: 0.8.7
Keywords: Cc:
Parent Tickets:

Description

I don't have any data but I suspect that netdb lookups tend to fail after each keyspace rotation at midnight UTC.

Todo:

  • Gather data
  • Floodfills query other floodfills just before midnight based on upcoming routing key? or...
  • Peers republish routerinfos and leasesets just after midnight?
  • Congestion implications?

Subtickets

Attachments (2)

midnight_peak.png (13.6 KB) - added by dg 5 years ago.
graph of peer.failedLookupRate showing a spike
midnight_peak_more.png (20.1 KB) - added by dg 5 years ago.
longer term (w/ multiple versions) graph of peer.failedLookupRate

Download all attachments as: .zip

Change History (9)

comment:1 Changed 8 years ago by zzz

  • Status changed from new to assigned

comment:2 Changed 7 years ago by zzz

  • Milestone 0.9 deleted
  • Priority changed from minor to maintenance

comment:3 Changed 5 years ago by zzz

  • Milestone set to 0.9.12
  • Priority changed from maintenance to minor

Needs review now that netdb has significantly expanded and floodfill redundancy has been repeatedly reduced.

I've disconnected from IRC the last two days in a row at 00:07 UTC.

What can we do just before or after midnight without causing massive connection limit or floodfill load problems?

comment:4 Changed 5 years ago by zzz

Another factor: In 0.9.9, RI publish interval was extended from 15-25 minutes to 30-40 minutes.

Two separate issues here: RI and LS.

1) LS: published frequently and expire frequently. Max of 10 minutes. Hard to do anything much before midnight, and should repair rapidly after midnight.

2) RI: 10x more than LS. Min expiration 1 hour, max publish interval 40 minutes. Perhaps the issue is the OBEP-IBGW xfer failing b/c the OBEP can't find the IBGW RI.

Seems like we could get the RIs to their new home before midnight. Possibilities:

a) Individual routers could also store to one or more new ff in the 40 minutes before midnight, w/o flood (i.e. token = 0) since it would flood to wrong place

b) Routers could store w/ flood (token > 0) or without flood, just after midnight

c) ffs could store to the closest new ff as a part of their flooding in the 40 minutes before midnight

d) ffs could redistribute in bulk in, say, the 5 minutes before midnight

So there's a few dimensions here. Redistribute by all routers or ffs, before or after midnight, w/ or w/o flooding.

c) is the lowest impact and may be sufficient. Needs calculations on conn limits etc.

Also need some stats/graphs on lookup success/failure just after midnight.

Changed 5 years ago by dg

graph of peer.failedLookupRate showing a spike

Changed 5 years ago by dg

longer term (w/ multiple versions) graph of peer.failedLookupRate

comment:5 Changed 5 years ago by zzz

  • Milestone changed from 0.9.12 to 0.9.10
  • Priority changed from minor to major

comment:6 Changed 5 years ago by zzz

  • Status changed from assigned to testing

2c) pushed in 889d2240b3f865df64b1f8d0dab7590838b6b058 0.9.9-3

Does the same for LS within 10 minutes of midnight, i.e. a fix for 1) as well.

May cause some conn limit issues among the ffs but I think this has the lowest conn limit impact of all the options.

comment:7 Changed 5 years ago by zzz

  • Resolution set to fixed
  • Status changed from testing to closed

Presumed fixed

Note: See TracTickets for help on using tickets.