Opened 6 years ago

Closed 4 years ago

#1198 closed enhancement (fixed)

More efficient InetAddress methods

Reported by: Zlatin Balevsky Owned by: Inondle
Priority: maintenance Milestone: 0.9.23
Component: router/general Version: 0.9.10
Keywords: review-needed Cc: zab@…, zzz@…, dg2@…
Parent Tickets: #1532 Sensitive: no

Description

The following two methods verify whether a string is a numerical IP address in a very inefficient and incorrect manner.

net.i2p.data.RouterAddress::getIP()
net.i2p.util.Addresses::getIP(String)

They should be rewritten to properly catch all invalid IP addresses and not create any temporary objects (garbage).

Also, JUnit tests should be provided for both.

Subtickets

Attachments (1)

1198.patch (7.3 KB) - added by Inondle 5 years ago.
updated patch

Download all attachments as: .zip

Change History (23)

comment:1 Changed 6 years ago by zzz

Oh. that replaceAll().length() == 0 stuff. Yeah. Must be a better way. Maybe a Matcher? One of those precompiled Pattern things may be helpful here.

Similar code in net.i2p.addressbook.Addressbook

I think I read that getByName() is horrifically slow too. But I may be misremembering.

So yeah it's ugly but incorrect? How? It's catching errors that getByName() accepts, like turning 1234 into an IP address.

comment:2 Changed 6 years ago by Zlatin Balevsky

So yeah it's ugly but incorrect? How?

"…", "0..11." etc. will all evaluate as valid

comment:3 Changed 6 years ago by zzz

Yes, but those will presumably fail in getByName(). The checks prior to getByName() are not intended to be comprehensive, but only to catch bogus things that getByName() allows, like "1234", which would come back as 0.0.4.210 .

comment:4 Changed 5 years ago by str4d

Priority: trivialmaintenance

net.i2p.data.RouterAddress has been moved to net.i2p.data.router.RouterAddress.

comment:5 Changed 5 years ago by Inondle

Hi, so I think I fixed the net.i2p.data.router.RouterAddress::getIP() and net.i2p.util.Addresses::getIP(String) methods. I replaced the "replaceAll().length() == 0" stuff and used a Pattern Matcher with regex's that check IPv4 and IPv6. I still need to learn about and write the JUnit tests for the IP checkers but I did some simple checking on my own and they seem to work well.

comment:6 Changed 5 years ago by Inondle

Ok, so I made a test build of i2p running my code and it works well, I'm having trouble writing JUnit tests though because I'm not quite sure how to use the getIP() method. In what context does this method run and what do I need to import to get it working?

Last edited 5 years ago by Inondle (previous) (diff)

comment:7 Changed 5 years ago by Inondle

Also as of right now only full IPv6 addresses (FE80:0000:0000:0000:0202:B3FF:FE1E:8329) will pass the regex I used, if I need to allow the abbreviated version (FE80::0202:B3FF:FE1E:8329) then just let me know.

comment:8 Changed 5 years ago by Inondle

Keywords: review-needed added; easy removed

Added my patch, changing to review-needed. Not sure if this is what you guys need, comment if you need more code

comment:9 Changed 5 years ago by str4d

Keywords: test-needed added; review-needed removed
Owner: set to Inondle
Status: newassigned

Thanks for the patch!

Some small pointers to begin with:

  • Please clean up the patch by stripping excess whitespace, and using spaces instead of tabs. You can check the full code style guide here.
  • The Pattern objects should be made static final constants of the Addresses class. That way, we aren't recompiling them every time Addresses.getIP() is called. RouterAddress should use those constants instead of defining its own.

As it is, the current patch is broken because it completely changes the behavior of the two methods. Take for example this part of the RouterAddress diff:

         if (host != null) {
             rv = Addresses.getIP(host);
-            if (rv != null &&
-                (host.replaceAll("[0-9\\.]", "").length() == 0 ||
-                 host.replaceAll("[0-9a-fA-F:]", "").length() == 0)) {
-                _ip = rv;
-            }
+            Matcher mIPv4 = ValidIPv4Pattern.matcher(rv.toString());
+            Matcher mIPv6 = ValidIPv6Pattern.matcher(rv.toString());
+            if(rv != null && (mIPv4.matches() || mIPv6.matches())) {
+                _ip = rv;
+            }
         }

The original code matches against host, but you are matching against rv. I think you are misunderstanding the purpose of the verification. We don't need to verify the output of InetAddress.getByName(host).getAddress(), because it is guaranteed to be an IPv4 or IPv6 address. But we do want to verify the host input, for two reasons:

  • We don't want to accept partial configured IPs (e.g. 1.2.3), which InetAddress accepts but we consider a misconfiguration. This is what the first return null is doing in Addresses.getIP().
  • If an IPv4 or IPv6 address is configured, we want to cache the slow InetAddress.getByName() call. This is done by the _IPAddress.put(host, rv) line in Addresses, and the _ip = rv line in RouterAddress. We explicitly don't want to cache when InetAddress.getByName() requires a DNS lookup (ie. when host is a hostname), because the result may change; if host is an IPv4 or IPv6 address, the result will always be the same.

Writing tests would definitely help here. I think it would be instructive (and a good idea in general) for you to write some JUnit tests for Addresses.getIP() and RouterAddress.getIP() that check for the expected behavior against valid and invalid input. For Addresses.getIP() you will want to mock the InetAddress.getByName(host) call, and for RouterAddress.getIP() you will want to mock the Addresses.getIP() call. The tests should pass for valid inputs against the current code, but fail against e.g. the invalid inputs mentioned in comment:2. The tests should also fail against your current patch. Once you have written the tests, you can then rework your patch to satisfy them.

Last edited 5 years ago by str4d (previous) (diff)

comment:10 Changed 5 years ago by Inondle

Ok, so I've made the IP patterns static final pre-compiled Patterns and made getter methods so RouterAddress.getIP() can use them. I've cleaned up both those methods and fixed that host checking mistake. I will work on the RouterAddress.getIP() JUnit tests. Was the Addresses.getIP() tests in IPAddressPatternTest ok? I think that one mocked the InetAddress.getByName(host) call pretty well, is there anything I need to improve in those tests?

Last edited 5 years ago by Inondle (previous) (diff)

comment:11 Changed 5 years ago by zzz

I will take a look at your patch during this dev cycle but after rereading this ticket I'm not sure there is anything worth fixing, or that it is sooo inefficient.

Also, in comment 10 above you say you've made a bunch of changes to your patch but haven't posted the new version, please do so.

comment:12 in reply to:  11 Changed 5 years ago by Inondle

Replying to zzz:

I will take a look at your patch during this dev cycle but after rereading this ticket I'm not sure there is anything worth fixing, or that it is sooo inefficient.

Checking the running times of my current test cases, the old code is pretty efficient but does hang every once in a while for around 5 seconds. The old code generally can run through all the tests at around .05-.06 seconds while the pattern matcher runs at around .025-.03 seconds. So realistically hundredths of a second isn't a concern but I did find that a few times the old code would stall for a few seconds, pattern matchers might be more reliable?

Last edited 5 years ago by Inondle (previous) (diff)

comment:13 Changed 5 years ago by Inondle

Keywords: review-needed added; test-needed removed

comment:14 Changed 5 years ago by Inondle

Keywords: test-needed added; review-needed removed

Double checking my work I found that I did the pattern checkers wrong and need to redo some code. Sorry about confusion

comment:15 Changed 5 years ago by Inondle

So will these methods be receiving actual website host names (eg. xyz.com) or just IP addresses?

comment:16 Changed 5 years ago by Inondle

Keywords: review-needed added; test-needed removed

Ok, so fixed that small bug and updated the patch. Sorry bout that

comment:17 Changed 5 years ago by zzz

The javadocs, code and comment make clear that hostnames are allowed. You have apparently broken the hostname resolution in your patch.

Please don't replace individual imports with e.g. java.util.*

The new public methods in Addresses should be e.g. isIPv4(String), nodoby else needs the Pattern

At line 264, no need to do the IPv6 match if IPv4 already matched. Don't always call both.

I'd like to see test cases, comments, or both on what your matchers allow or disallow. It's difficult to understand / review the IPv4 pattern in particular.

Doesn't look like IPv6 will allow e.g. aaaa:11::bbbb

comment:18 Changed 5 years ago by Inondle

Patterns now allow hostnames and block malformed IP addresses. Correctly diff'ed the test cases, fixed bad imports, made new public methods in Address return booleans rather than patterns.

I also wasn't sure if I should allow abbreviated versions of IPv6 addresses, they currently don't pass. Should they be allowed?

I'm also new to writing junit tests so if I need to change something about those let me know

Changed 5 years ago by Inondle

Attachment: 1198.patch added

updated patch

comment:19 Changed 5 years ago by zzz

While Java I2P doesn't set abbreviated IPv6 addresses, I don't think there's anything in our spec that prohibits them. I don't know what i2pd does. So we should allow them.

comment:20 Changed 5 years ago by Inondle

It appears that InetAddress.getByName(host).getAddress() throws an UnknownHostException? if it is given an abbreviated IPv6 address. Doesn't make sense to allow them if it'll just return null anyways.

comment:21 Changed 4 years ago by zzz

Parent Tickets: 1532

As a part of the SSL hostname verification work included in 0.9.20, we pulled in some Apache code, including v4/v6 pattern recognition similar to the attached patches here. The Apache code includes support for abbreviated (what they call "compressed") v6 addresses.

As a part of the work on #1532 I'll be speed testing the apache code vs. our existing code. I will also research comment 20 above that we don't in practice support abbreviated addresses.

comment:22 Changed 4 years ago by zzz

Milestone: 0.9.23
Resolution: fixed
Status: assignedclosed

Switched to Apache utils in ead6c223798e012a778305defe76d07be59db435 0.9.22-12

Apache utils are about 3-3.5x faster than replaceAll().length() for v4, and 2.5-3x faster for v6.

InetAddress?.getByName() does work for abbreviated addresses for me, in OpenJDK 7, which agrees with the javadocs referencing RFC 2373.

Total times when using Apache were about 700 ms for 1 million loops, and it did not significantly improve the /netdb page render time (see #1532).

Note: See TracTickets for help on using tickets.