Tuesday, August 27, 2019

etymology - Reading and usage of 「垂オます」


A while ago I was chatting with a Japanese man and when he introduced himself instead of using 「申す」 or 「言う」 he used 「垂オます」, as in (His name)と垂オます.


Can this be used in the same way as 「申す」 and 「言う」 or is reserved exclusively for introductions? How is it read? What's the origin? I initially assumed it was a typo but it seems unusual to get those kanji from a typo for the above (unless I'm overlooking something).


I haven't talked to him since so I haven't been able to ask him personally about it but if you guys have seen it before or could clarify it I would be greatly appreciative.



Answer



It's not real Japanese. It's a munged version of 申します.




In Shift_JIS encoding, only the first byte is guaranteed to have the high bit set, which means the second byte can sometimes be the same as a character in the ASCII range. This happens with U+7533 , for which the second byte is encoded as 0x5C \.



If someone is using software that tries to strip backslashes in an encoding-unaware manner, that 0x5C \ will unfortunately go missing, munging the string and turning 申します into 垂オます.


Let's take a look at how 申します is encoded in Shift_JIS:


  $ echo -n '申します' | iconv -f UTF-8 -t SHIFT-JIS | hexdump -C
00000000 90 5c 82 b5 82 dc 82 b7 |.\......|

See? There's the backslash. Let's remove it with sed:


  $ echo -n '申します' | iconv -f UTF-8 -t SHIFT-JIS | sed 's,\\,,' | hexdump -C
00000000 90 82 b5 82 dc 82 b7 |.......|

And here's what the munged string looks like:



  $ echo '申します' | iconv -f UTF-8 -t SHIFT-JIS | sed 's,\\,,' | iconv -f SHIFT-JIS -t UTF-8
垂オます

So you're right, it's not a typo. Some software at some point down the line must have tried to remove backslashes while the string was Shift_JIS-encoded.


No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...