Opened 5 years ago

Closed 4 years ago

#1425 closed enhancement (fixed)

Translated news feeds

Reported by: zzz Owned by: zzz
Priority: minor Milestone: 0.9.23
Component: apps/console Version: 0.9.17
Keywords: release Cc: Eche|on, psi
Parent Tickets: Sensitive: no

Description

Atom spec (RFC 4287) allows xml:lang specification for each element, but does not allow multiple elements with different languages. See http://stackoverflow.com/questions/3595632/more-than-one-natural-languages-in-atom-xml

So we will need multiple news.su3 files, one for each language. If routers append ?lang=xx to the request, this will be ignored by news servers that don't support it, but it can do content negotiation for those that do. Alternatively we could add an Accept-Language header, but that's harder in the router as the HTTP Client Proxy strips these by default and there's no current way to disable it per-request.

Need feedback from news ops (echelon and psi) on best way to do this.

This increases the work required for a release, and we already have to do release notes in several formats (news.xml, news.su3, website blog, forum.i2p, zzz.i2p, …) but I envision only doing it for the most popular languages in I2P, i.e. German and Russian.

Subtickets

Attachments (1)

console-news.patch (4.0 KB) - added by str4d 4 years ago.
Save news.atom.xml, limit entries displayed on /console to 3

Download all attachments as: .zip

Change History (11)

comment:1 Changed 5 years ago by str4d

Keywords: release added
Milestone: 0.9.18soon

Transifex doesn't have any supported filetypes that could be used for Atom feeds, which is understandable, because Atom feeds are normally generated from another content type. Since we were already considering some kind of Atom feed generator script that enables the news ops to write news in a simpler format (and not risk an invalid Atom format), perhaps we could also leverage this for translation? This way, we could guarantee the most popular languages (by having dedicated I2P members translating the news), but also potentially gain other translations.

What we would do is pick one of the Transifex-supported formats to use for writing the news content (I would personally suggest a subset of Wiki markup), and upload this file to Transifex for translations. Then we have a script that takes the source and translation files, along with an XML file containing the <i2p:release> nodes, and generates the news.atom and news.su3 files for each language.

Re: requesting languages, another option is to use the hreflang attribute of the feed to list the other languages (which the Atom feed will probably do anyway), and the router could subsequently fetch the correct translated feed. This would require a second network hit, but the advantage to news operators would be that they could simply have a directory containing news.su3, news_fr.su3 etc. I personally think that ?lang=xx is a much simpler and cleaner method, but it does require server-side logic (although it could perhaps be done with a server rewrite rule from news.su3?lang=xx to news_xx.su3).

comment:2 Changed 4 years ago by zzz

Status: newinfoneeded_new

Do you think a workflow that included Transifex would really work? We would have to write the news days before, and upload it to TX. And it would overwrite the strings from the previous release, so we couldn't get updates for the current release any more. And the news hosters would periodically download new translations and re-sign the news?

So I'm not sure TX is fast enough, but I'm also not sure anything else is scalable. Anything is possible if we're just asking ech to translate to Deutsch on release day, and maybe ask for .ru help on IRC… but to scale it is different.

re: requesting, Not in favor of hreflang doubling the requests. Already worried about news host scalability.

the benefit of a ?lang param or an Accept header is it falls back to news.su3 for the hosts that don't support it. Also more standard, and doesn't double the reqeuests. Don't know which is easier for rewrite rules or other server config. ech/psi?

Need feedback from ech and psi on all of this.

comment:3 Changed 4 years ago by zzz

notes from IRC conversation today;

  • ech is on Jetty but willing to switch to Apache
  • Apache has well-docuemented content negotiation support
  • Can't find any Jetty docs on content negotiation or Accept-Language-based rewriting… perhaps handled by frameworks?
  • Country variants complicates things, really don't want 3 fetches (e.g. pt_BR, pt, en)
  • Transifex our best hope
  • <psi> i have a small flask webapp reverse proxied behind nginx
  • psi prefers static files but could live with a query param
  • news authors (zzz) must move from hand-writing xml to having separate news entries in some format TBD, together with an Atom feed generator TBD. This is the key to a workflow that gives us enough time to get the release entry translated
  • news entries would be pushed to TX (and maybe mtn?) in format TBD after extracting strings from TBD-format entry files
  • news servers would have to run a script that periodically fetched from TX, rebuilt and resigned su3 files. Alternatively, a script that pulls files from somebody else building/signing
  • realistically we'll only get a few translations, but as long as we get Russian, it's worth it
  • separate feed for Android may be valuable, but could also be accomplished with Atom categories
  • Concern for server load, and as standard as possible to allow for future additional news servers, but ech has plenty of capacity on his, could handle 10-100x more traffic
  • zzz has ideas about how to get Accept-Language through the client proxy
  • str4d wants real feed, with old entries preserved, this requires enhancements to hide/show old entries in router console
  • need more research by ech and psi on their preferences
  • need more research on feed generators, content formats, tx formats

comment:4 Changed 4 years ago by str4d

First pass at a news generator pushed to i2p.newsxml in 874beeaaecbaa3cdeb00844e489c30435e84f535.

comment:5 Changed 4 years ago by str4d

Status: infoneeded_newnew

I've chosen HTML as the Transifex format:

  • Transifex tags HTML fragment strings by paragraph.
  • We don't need to convert from some other Transifex-compatible format back into HTML for inclusion in the feed. News writers will get in the feed exactly what they write.

I've added a create_new_entry.sh script to simplify the process of adding a new entry.

Todo:

  • Translate feed title and subtitle.
  • Generate translated feeds.
  • Validate I2P elements and attributes.
  • Script to pull translations and sign generated feed.
  • Categories?

comment:6 Changed 4 years ago by str4d

Owner: set to str4d
Status: newaccepted

Support for translated feeds implemented in bfc2156582da348c6d8ab242775a210930da966f.

comment:7 Changed 4 years ago by zzz

In d9d7f9940d61dfdeeea78b19743dd1d323659773 to be 0.9.20-7:

Router will append the langage (and possibly country) to the URL.

param is ?lang=xx[x][_YY]

i.e. two or three-letter lower cause language code, optionally followed by two-letter upper case country code. Param will not be added if language is "en". For e.g. ?lang=pt_BR, server should return Brazilian Portuguese, falling back to Portuguese, falling back to English.

On IRC the other day, str4d and I discussed limiting the display of old news on the console home page, and a possible /news page to see all news. The news feed wouldn't grow without bound, but would have several releases in it, with maybe a max of 10 or so entries.

This would require a more intelligent storage and display in the console. We could no longer convert the feed to an old-style news.xml file. We'd either have to store the news.atom.xml file as-received (and parse it at every startup), or store (i.e. cache) individual entries, as a traditional feed reader does. The latter option would give users the ability to store old news forever. There may be some complications with GUIDs and timestamps in the presence of updated translations to make this work well.

No decisions made yet.

comment:8 Changed 4 years ago by str4d

Here's a simple idea:

  • Each time news.atom.xml is fetched, generate the old-style news.xml file using the latest 3 entries.
  • On /console, display news.xml
  • On /news, display news.atom.xml, and link to it under news.xml on /console

This would only require minor changes to the existing code for /console, and a simple renderer for /news. We could then later move to a more intelligent system later, if needed or desired.

If we wanted to not generate after every fetch, we could generate based on the feed's updated datetime. It would require additional parsing in the news feed generator - currently the updated datetime is the time at which the feeds are generated, rather than the times that the entries and corresponding translations were updated. IMHO the saving would be insubstantial.

Changed 4 years ago by str4d

Attachment: console-news.patch added

Save news.atom.xml, limit entries displayed on /console to 3

comment:9 Changed 4 years ago by zzz

Milestone: soon0.9.22
Owner: changed from str4d to zzz
Status: acceptedassigned

Thanks for the patch. Will evaluate it (and the suggestion in comment 8), as well as alternatives, after Toronto, for inclusion in 0.9.22. It wasn't clear that you were propsing this last-minute for .21, but even if it were clear it was probably too late. Until we implement something, we'll have to keep the number of feed items low, which isn't a big deal.

Reassigning to myself.

comment:10 Changed 4 years ago by zzz

Milestone: 0.9.220.9.23
Resolution: fixed
Status: assignedclosed

In 0.9.22-2 692ca02289c553712d874fc7dd6313264925db6d and previous revs.

Decided to do it the hard way by storing news entries to disk individually by UUID. So didn't use the above patch. Implemented a "news manager" to handle it all, including migration from both news.xml and initialNews.xml formats. All is in net.i2p.router.news for use by Android as well if desired.

Since we are persisting individual entries, they won't go away when removed from the feed. Enhancements may be required to the i2p.newsxml scripts to only include the last N entries in the feed while still keeping them for translation?

news.xml is still used to store metadata as before. News entries are included there as well but are no longer used in that format. We could store the metadata as real XML sometime in the future, but not necessary at this time.

The new /news page could use some CSS help still.

Note: See TracTickets for help on using tickets.