[Cialug] Dude, Where's My Sort?

Daniel A. Ramaley daniel.ramaley at drake.edu
Thu Mar 26 19:25:06 UTC 2020


The full-width characters were also added because when writing Asian
languages (where all characters are equal-width), if you have to include
a word in Roman letters or Arabic numerals it looks odd to have all
these half-width characters mixed in. The full-width characters make
text flow more naturally.

I have LC_COLLATE=C on all systems. I tried setting it to en_US.UTF-8 on
one system and sort still works correctly (as it should for the en_US
locale). I don't know what's wrong with the version of sort that you
have. Weird.

Oddly though, i have a RHEL 7 box which has the same version of sort as
your machine (8.22). And on that machine i get some weird results when
using en_US!

$ sort --version
sort (GNU coreutils) 8.22
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.
$ echo $LC_COLLATE
C
$ printf '%s\n' 2 1 | sort
1
2
$ export LC_COLLATE="en_US.UTF-8"
$ printf '%s\n' 2 1 | sort
2
1

It works correctly in C, but not in en_US. Are you on RHEL 7 (or variant
such as CentOS or Oracle)? I wonder if the locale definitions are a
little wonky on that system?

I tried it on both RHEL 6 and 8 systems and they work just fine. It
seems to be a bug in RHEL 7.


On 3/26/20 1:28 PM, Todd Walton wrote:
> It's actually just a funky-ness I found. It's not impacting what I'm doing.
> Turns out that once-upon-a-time people using some Asian languages had to
> use more than 7 or 8 bits for their character encoding. But if they just
> used 16 or 24 bits for their characters and mixed those with the Western
> world's 8 bits, all sorts of formatting and display errors would happen. So
> they came up with "full-width" Latin characters, i.e. the regular Latin
> characters encoded with the same number of bits as used for Asian
> languages. These are full-width Latin characters.
> 
> But they're definitely different characters with a "natural" sorting order
> to them, and this is definitely a bug in the sort program...... If I sort
> "2 1" and then sort "1 2" I should get the same output for both.
> 
> Like I said, it's not affecting me really. Just funny behavior.
> 
> --
> Todd
> 
> 
> On Thu, Mar 26, 2020 at 2:07 PM David Champion <dchamp1337 at gmail.com> wrote:
> 
>> Was your intent to sort non-ascii characters?
>>
>> Gotta take care when you cut and paste from Word or web pages, you'll get
>> goofy fancy quotes and other garbage that your shell script won't like.
>>
>> -dc
>>
>> On Thu, Mar 26, 2020, 12:57 PM Todd Walton <tdwalton at gmail.com> wrote:
>>
>>> LC_ALL=
>>> LC_COLLATE="en_US.UTF-8"
>>>
>>>
>>> On Thu, Mar 26, 2020 at 12:52 PM Daniel A. Ramaley <
>>> daniel.ramaley at drake.edu>
>>> wrote:
>>>
>>>> I did a bit more testing and tried it on some older machines that i
>> have
>>>> access to. Sort works correctly with GNU sort versions 8.4 and 8.13.
>>>>
>>>> What are the locale settings on the machine where it doesn't work?
>>>> Particularly, what are LC_COLLATE and LC_ALL set to?
>>>>
>>>> On 3/26/20 11:34 AM, Daniel A. Ramaley wrote:
>>>>> The original message looked like it included the full-width versions
>> of
>>>>> the numbers. Sort knows about Unicode and should sort those just
>> fine.
>>>>> (Even if it doesn't know Unicode it should sort them, as i believe
>> the
>>>>> binary representations of those would sort naturally.) And it does
>> work
>>>>> on my machine (GNU sort 8.30).
>>>>>
>>>>> $ printf '%s\n' 2 1 | sort
>>>>>>>>>>>>>>>
>>>>> On 3/26/20 11:23 AM, Scott Yates wrote:
>>>>>> Ya, those are not actual ascii numbers.
>>>>>>
>>>>>> On Thu, Mar 26, 2020 at 11:16 AM David Champion <
>> dchamp1337 at gmail.com
>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> You've pasted some odd characters there for your numbers...
>>>>>>>
>>>>>>> Retyped and it works.
>>>>>>>
>>>>>>> $ printf '%s\n' 2 1 | sort
>>>>>>> 1
>>>>>>> 2
>>>>>>>
>>>>>>> -dc
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 26, 2020 at 11:06 AM Todd Walton <tdwalton at gmail.com>
>>>> wrote:
>>>>>>>
>>>>>>>> Check it out:
>>>>>>>>
>>>>>>>> [prompt]$ sort --version
>>>>>>>> sort (GNU coreutils) 8.22
>>>>>>>> Copyright (C) 2013 Free Software Foundation, Inc.
>>>>>>>> License GPLv3+: GNU GPL version 3 or later <
>>>>>>>> http://gnu.org/licenses/gpl.html
>>>>>>>>> .
>>>>>>>> This is free software: you are free to change and redistribute it.
>>>>>>>> There is NO WARRANTY, to the extent permitted by law.
>>>>>>>>
>>>>>>>> Written by Mike Haertel and Paul Eggert.
>>>>>>>>
>>>>>>>> [prompt]$ printf '%s\n' 2 1 | sort
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>> It didn't sort.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Todd
>>>>>>>> _______________________________________________
>>>>>>>> Cialug mailing list
>>>>>>>> Cialug at cialug.org
>>>>>>>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Cialug mailing list
>>>>>>> Cialug at cialug.org
>>>>>>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Cialug mailing list
>>>>>> Cialug at cialug.org
>>>>>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
>>>>>>
>>>>>
>>>>> __
>>>>> Daniel Ramaley
>>>>> Server Engineer 2, Information Technology Services
>>>>> Drake University
>>>>>
>>>>> T: +1-515-271-4540
>>>>> W: https://www.drake.edu/its
>>>>> _______________________________________________
>>>>> Cialug mailing list
>>>>> Cialug at cialug.org
>>>>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
>>>>>
>>>>
>>>> __
>>>> Daniel Ramaley
>>>> Server Engineer 2, Information Technology Services
>>>> Drake University
>>>>
>>>> T: +1-515-271-4540
>>>> W: https://www.drake.edu/its
>>>> _______________________________________________
>>>> Cialug mailing list
>>>> Cialug at cialug.org
>>>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
>>>>
>>> _______________________________________________
>>> Cialug mailing list
>>> Cialug at cialug.org
>>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
>>>
>> _______________________________________________
>> Cialug mailing list
>> Cialug at cialug.org
>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
>>
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> 

__
Daniel Ramaley
Server Engineer 2, Information Technology Services
Drake University

T: +1-515-271-4540
W: https://www.drake.edu/its


More information about the Cialug mailing list