Opened 4 years ago

Closed 4 years ago

#1616 closed defect (fixed)

eepget file issues

Reported by: Eche|on Owned by: z3r0fox
Priority: minor Milestone: 0.9.23
Component: api/utils Version: 0.9.20
Keywords: Cc: dg
Parent Tickets: Sensitive: no

Description

Hi

eepget http://tc73n4kivdroccekirco7rhgxdg5f3cjvbaapabupeyzrqwv5guq.b32.i2p/news.su3?lang=fr
<eche|on> ##
<eche|on> == Wed Jul 08 16:28:21 CEST 2015
<eche|on> == Transfer of http://tc73n4kivdroccekirco7rhgxdg5f3cjvbaapabupeyzrqwv5guq.b32.i2p/news.su3?lang=fr completed with 2468 bytes transferred
<eche|on> == Output saved to fr (2468 bytes)

eepget parses the output filename wrong.
It should be news.su3 instead of fr to be saved on harddrive.

Subtickets

Attachments (6)

eepget.diff (2.9 KB) - added by z3r0fox 4 years ago.
eepget patch 2015-10-07 19:54 EDT
eepget.2.diff (2.8 KB) - added by z3r0fox 4 years ago.
eepget.diff 2015-10-20 07:12
eepget.3.diff (2.6 KB) - added by z3r0fox 4 years ago.
eepget patch 2015-10-26 23:59 EDT
noTab.png (176.9 KB) - added by z3r0fox 4 years ago.
eepget.4.diff (3.0 KB) - added by z3r0fox 4 years ago.
eepget patch 2015-10-30 16:51 EDT
eepget.5.diff (3.1 KB) - added by z3r0fox 4 years ago.
eepget patch 2015-11-2 10:34 EDT

Download all attachments as: .zip

Change History (24)

comment:1 Changed 4 years ago by zzz

Component: apps/otherapi/utils
Keywords: easy added
Milestone: 0.9.21undecided

Affects command line eepget only. Workaround is to specify the output file with -o file.

See suggestName() function.

Another problematic case is when fetching e.g. http://stats.i2p/ the output file is http_stats.i2p_ .

Perhaps real URL parsing with the Java URI or URL class would enable a better result.

Longstanding issue, low priority.

Last edited 4 years ago by zzz (previous) (diff)

comment:2 Changed 4 years ago by z3r0fox

Owner: set to z3r0fox
Status: newaccepted

Assigning to self as test case to get my local dev builds working

Changed 4 years ago by z3r0fox

Attachment: eepget.diff added

eepget patch 2015-10-07 19:54 EDT

comment:3 Changed 4 years ago by z3r0fox

Here's a patch. Seems to work pretty good and catches a couple of additional edge cases. (But hardcoded fallback default filename, so if there's a string constants config file somewhere, do let me know.)

comment:4 Changed 4 years ago by z3r0fox

Keywords: review-needed added; easy removed

comment:5 Changed 4 years ago by zzz

Keywords: review-needed removed

The code isn't bad but the style is really strange and inconsistent, and there are bugs in there too:

  • Odd mix of tabs and spaces. use 4-spaces only. See our style guide.
  • An explicit check for null would be better than catching an NPE
  • Last System.out.println() appears to be for debugging, left in by mistake?
  • nameURL.getHost != "" is a bug, you must use equals(), not use != on references
  • if (last != path.length() - 1) { …} else if (last == path.length() - 1) { … } huh? why not just else?
  • whether to use else after a return in a previous if is a style choice, up to you , but I prefer not using else
  • put the } on the same line as the else
  • dies with a SIOOBE on eepget http://stats.i2p (this was my first or second test. Should never IOOBE, need to look at all the indexOf places to make sure you don't die if not found)

I suggest more testing and greater care in following our style guide and matching the style of the existing code.

comment:6 Changed 4 years ago by zzz

Also:

  • None of the "fallback" code inside the catch (MalformedURLException) block is effective, because nameURL remains null, so it's going to NPE below.

comment:7 Changed 4 years ago by dg

Cc: dg added

Adding to what zzz said, some comments from IRC earlier:

  • A lot of the values in your ifs can be checked once or stored in a variable for comparison later.
  • wrt tabs/spaces, there's a lot of different places in the code with different styles. Most editors will let you abide by what's already there — see if you can set it in yours?
  • The NPE catch is valid but we tend to check for null rather than catch. some people write with a style of "asking for forgiveness", but it's not what we do here
  • Don't bother with = null; for the nameURL object, either create it as a new URL or leave it as URL nameurl;
  • Ensure your editor flags up trailing whitespace too.

Be sure to ask for help if you need any.

Changed 4 years ago by z3r0fox

Attachment: eepget.2.diff added

eepget.diff 2015-10-20 07:12

comment:8 Changed 4 years ago by z3r0fox

Here's a much improved version.

One question to dg - why not initialize URL variables at the beginning of the method? I was always taught it improves readability if you describe them in a clump at the start. Same as with the preexisting Strings and the int.

comment:9 Changed 4 years ago by z3r0fox

Keywords: review-needed added

comment:10 Changed 4 years ago by zzz

Sill several issues:

  • You still have a mix of tabs and spaces
  • You don't need to declare variables at the beginning, and especially not to initialize them, if they will be set later. There are no code paths where nameURL, path, pattern, matcher, or last will be uninitialized. So declare them when they are set, e.g. Pattern pattern = pattern.compile(…)
  • You did not change your code so it accepts a URI without a trailing slash, only so that it doesn't IOOBE. While a URI like http://stats.i2p is not technically valid, eepget accepts it and so your change is still a regression.
  • Even worse, http://stats.i2p/ (i.e. with a trailing slash) now fails also. huge regression, and that's one of the cases this whole ticket was supposed to fix. See comment 1 above.
  • The whole pattern/matcher thing is equivalent to !path.startsWith("/") or ~path.contains("!") which is more readable and probably faster. But you're really just checking for an empty path. You can check the URL documentation to see if a path could ever start with something besides '/'.
Last edited 4 years ago by zzz (previous) (diff)

Changed 4 years ago by z3r0fox

Attachment: eepget.3.diff added

eepget patch 2015-10-26 23:59 EDT

comment:11 Changed 4 years ago by z3r0fox

Hi zzz!

Taking your comments in turn;

  • You still have a mix of tabs and spaces

Urgh, sorry, my IDE is ignoring it's own directive to replace tabs with spaces. So I will check manually from here on in, promise! :)

  • You don't need to declare variables at the beginning…

Ok, I'm accustomed to doing that for purposes of readability (as a summary of what to expect) but will omit. However, nameURL and name do still need it in this case because they appear inside a try and an if scope respectively so fail to resolve and throw compilation exceptions if not initialized.

  • The whole pattern/matcher thing…

The if ((path.equals("")) OR (matcher.matches())) { line is equivalent to
if (path.equals("") OR path.equals("/") OR path.equals(2slashes) OR path.equals(3slashes) … ) {
where it was my intention to check for an empty path, also an empty path that is just a slash, plus catch a wierd edge case where someone might possibly fat-finger too many slashes, which .getPath() does let through for some reason! This would then slip through a check for just "" and "/". So that's why I chose to go with the regex.

But probably the main source of confusion for me here was

  • Even worse, ​http://stats.i2p/ (i.e. with a trailing slash) now fails also. huge regression…

Ohhhh! On my last reading, the failure with that URL in the original code was that there's an output file name *at all* because a file (say, a pdf, or an mp3) isn't explicitly specified. Thought I was making a good catch, did not realize it's supposed to scrape the html/php/* of the website by default. We could consider updating the man page to say this. I guess for these cases we'll just use the domain name then?

So those things are patched up, hope it looks good this time around.

Last edited 4 years ago by z3r0fox (previous) (diff)

comment:12 Changed 4 years ago by tuna

I had a read through of eepget.3.diff and it all looks good to me. It's all neat and tidy.

I have _not_ tested the functionality however.

comment:13 Changed 4 years ago by zzz

still has tabs in it

comment:14 Changed 4 years ago by zzz

I wouldn't have mentioned it because it's a nasty test case, but you did say in comment 11 above that you're worried about fat-fingering too many slashes. So it's fair game to test, right? http://stats.i2p/foo// (two trailing slashes) attempts to save to an empty file name.

Changed 4 years ago by z3r0fox

Attachment: noTab.png added

Changed 4 years ago by z3r0fox

Attachment: eepget.4.diff added

eepget patch 2015-10-30 16:51 EDT

comment:15 Changed 4 years ago by z3r0fox

True. Good catch, code was only accounting for http://stats.i2p// before

Now tests ok with the two original test cases above, also http://stats.i2p/foo//, and various URLs picked off the web

Thanks again!

comment:16 Changed 4 years ago by zzz

Keywords: review-needed removed
Milestone: undecided0.9.23

ACK
congrats

comment:17 Changed 4 years ago by zzz

[javadoc] core/java/src/net/i2p/util/EepGet.java:323: warning - @return tag has no arguments.

Changed 4 years ago by z3r0fox

Attachment: eepget.5.diff added

eepget patch 2015-11-2 10:34 EDT

comment:18 Changed 4 years ago by zzz

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.