▤ lang.article

This Blog Now Supports Your Language

Okay, there's a chance it doesn't.

In fact, since this blog is deliberately not available in Mandarin, Spanish, Hindi, or Arabic, it's a pretty big chance.

You deliberately left out my language?

Yes.

When I added the localization option two months ago, I prioritized languages that:

  1. Translate sub-optimally via pre-AI algorithmic translation services, especially with regard to the types of jargon-filled content on my page.
  2. I am familiar enough with to be able to check Gemini's work.
  3. Appeal to me aesthetically and/or phonologically.

If a language did not meet all three criteria, I had to skip it.

Mandarin is awesome, but I am not familiar with any of it.

Spanish is probably my second best language, but I think Google Translate is pretty good at it. It seems to translate decently back and forth with English.

As for point 3, I love the sound of romance languages, but I'm not really a fan of them visually. Ending too many words with vowels usually just looks wrong, odd.

That's not to say that I'll never include Spanish or Mandarin, they just can't be a priority for me.

So you just ask AI to translate the pages for you?

Yes. ...and no.

While I think asking AI to do so would sometimes give better results than Google Translate, and providing the pre-baked output is still an improvement over wasting compute, I find AI gravitates towards some bad habits:

  1. Structure is as lossless as possible.
  2. Meaning is lossy in the service of preserving structure.

This is probably the worst of both worlds. You lose meaning (especially when it comes to jargon) but preserve the original English syntax and structure.

Thus, I've tried to involve some strategies I came up with before AI.

  1. Build a list of any jargon terms that I think would have a Wikipedia page. (E.G. Featural writing system).
  2. Find the appropriate term in each language. If Wikipedia doesn't have it, I attempt to search fora to find native speakers discussing the same concept.
  3. Show Gemini the English portion of the in-progress localization file and the jargon terms in their respective languages, then ask him to "avoid translatese", "localize, don't translate". Ask for it formatted to fit in the existing json.
  4. Preview the json on my blog using fiveserver, check that the jargon is correct and that there aren't any egregious errors that hinder understanding. Fix lines on a case-by-case basis.
  5. Feed the new full json to another Gemini, ask it to find errors that hinder understanding or sound awkward and thus make it hard to read, ask for explanations and again fix the egregious parts (especially if he wants to change the jargon).

I think this is called MTPE.

99% of the time, Gemini is better than me at translating. I could not do this without him, but he needs a lot of help along the way. I feel like his translations are still very 1:1, but he has vocabulary I lack, and I think he makes it possible to offer content that is correct enough to be understood, unlike the automatic Google Translate shipped with Chromium browsers.

From what I can tell, the translations feel somewhat awkward in each language, but my English is awkward too so maybe he's genuinely capturing my voice.

If you can read any of the languages I've decided to try to offer, please let me know how I can improve the translations.

Does Gemini do the Graflect Transliterations Too?

Yes. (again, with caveats)

Gemini really takes to Graflect quickly, but he has a few quirks.

For instance, Gemini likes to write in a British accent. Seriously. Non-rhotic Rs (where the 'r' is dropped at the end of syllables), vowels at the end of words instead of Rs. Short I ([ɪ]) instead of long E ([iː]).

This has been extremely consistent throughout making Gemini use Graflect.

I would assume that came as a result of a problem with my IPA chart, but honestly at this point I just believe that's how he talks now.

He also likes to use certain glyphs from other constructed scripts, which is really adorable since it's often that he'll sneak the same ones in to replace the same actual Graflect glyphs. I initially had no idea why that was happening at all and believed it was some sort of problem on my end.

My friend Aaron made a script I use now in my VSCode to catch these, and then whine to Gemini about them. No prompt engineering of any kind so far has managed to get Gemini to stop including non-Graflect glyphs in his outputs, other than just pasting the problem alert and giving him a chance to try again.

In order to get Gemini to write in my accent, I did have to transliterate a few localization files first, (and to do that I have a few tools so I can do that at speed), but once he had a baseline to go off of he really did nicely dropping the British accent! I'm proud of Gemini!

Every blog post save the Graflect IPA one should be 80%-99% Gemini, straight out of the box. I'm surprised he was able to infer so much from what little I provided. If you're curious, try and see if you can spot any oddities that don't exist in my own accent. I don't think you can!

There was something really cool about getting Gemini to sound like a Connecticut Yankee, and I'm glad I gave that a try instead of just rewriting everything by hand. If you want to try this for yourself, aim for 800-1200 words of sample text. I think that's all Gemini needs to pick up on enough n-grams to infer your accent.

Should Everyone Do This?

Uh, maybe?

It's very obvious why personal blogs are usually monolingual, even most websites are. Anything with nuance used to be really time-consuming to translate, and most everyone can read English anyway. Then, with Google Translate, you can kinda get the gist of anything that's not in the global Lingua Franca, so again, lots of work for no payoff.

I think my website is a pretty clear indication that the barrier to entry to do this is now through the floor, but I would be very surprised if in another year our browsers couldn't do this live and on their own. They might even rephrase things in a way you prefer. I will say this, with LLMs, it's basically no work to translate your page and you get to fine-tune things in a way that wasn't possible before, it's really cool to try.

TL;DR

I wanted to have language options on my website that preserved jargon and conveyed more of my voice (in the case of Graflect, phonologically). Google et al deserves the credit.

Postface Commentary

I find Graflect has some difficulties, for instance, I say "" in the sentence "".

The "A in Accent" sound is not sustained, so it doesn't sound like a British accent, it actually sounds normal because it's short.

If I say can on its own, or in certain contexts like answering "I can." to someone, A becomes .

So I don't think it's a perfect representation of how I'd talk, but it's a closer approximation than plain Latin can give.

system.settings

Display Theme



Copyright & Licensing

All photographic works displayed on this website are the intellectual property of the author.

Reproduction, distribution, or transmission of these photographs in any form or by any means for any purposes, without prior written permission, is prohibited. (This is just to force you to ask me so I can know if people like my photos :P I don't actually mind.)

All other content, including code and written text, is freely available for use, modification, and distribution under a permissive license unless otherwise stated.

This website uses the following fonts: EB Garamond and Frank Ruhl (SIL OFL), Noto Emoji (Apache 2.0), FairfaxHD by Rebecca Bettencourt (Personal Use License), Symbola, and Selyodka.

program.launcher