[Cialug] Dude, Where's My Sort?

Todd Walton tdwalton at gmail.com
Thu Mar 26 19:39:01 UTC 2020


I'm on RHEL 7, yes. If it's just on 7 that's weird.

--
Todd


On Thu, Mar 26, 2020 at 3:25 PM Daniel A. Ramaley <daniel.ramaley at drake.edu>
wrote:

> The full-width characters were also added because when writing Asian
> languages (where all characters are equal-width), if you have to include
> a word in Roman letters or Arabic numerals it looks odd to have all
> these half-width characters mixed in. The full-width characters make
> text flow more naturally.
>
> I have LC_COLLATE=C on all systems. I tried setting it to en_US.UTF-8 on
> one system and sort still works correctly (as it should for the en_US
> locale). I don't know what's wrong with the version of sort that you
> have. Weird.
>
> Oddly though, i have a RHEL 7 box which has the same version of sort as
> your machine (8.22). And on that machine i get some weird results when
> using en_US!
>
> $ sort --version
> sort (GNU coreutils) 8.22
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
>
> Written by Mike Haertel and Paul Eggert.
> $ echo $LC_COLLATE
> C
> $ printf '%s\n' 2 1 | sort
>>> $ export LC_COLLATE="en_US.UTF-8"
> $ printf '%s\n' 2 1 | sort
>>>
> It works correctly in C, but not in en_US. Are you on RHEL 7 (or variant
> such as CentOS or Oracle)? I wonder if the locale definitions are a
> little wonky on that system?
>
> I tried it on both RHEL 6 and 8 systems and they work just fine. It
> seems to be a bug in RHEL 7.
>
>
> On 3/26/20 1:28 PM, Todd Walton wrote:
> > It's actually just a funky-ness I found. It's not impacting what I'm
> doing.
> > Turns out that once-upon-a-time people using some Asian languages had to
> > use more than 7 or 8 bits for their character encoding. But if they just
> > used 16 or 24 bits for their characters and mixed those with the Western
> > world's 8 bits, all sorts of formatting and display errors would happen.
> So
> > they came up with "full-width" Latin characters, i.e. the regular Latin
> > characters encoded with the same number of bits as used for Asian
> > languages. These are full-width Latin characters.
> >
> > But they're definitely different characters with a "natural" sorting
> order
> > to them, and this is definitely a bug in the sort program...... If I sort
> > "2 1" and then sort "1 2" I should get the same output for both.
> >
> > Like I said, it's not affecting me really. Just funny behavior.
> >
> > --
> > Todd
> >
> >
> > On Thu, Mar 26, 2020 at 2:07 PM David Champion <dchamp1337 at gmail.com>
> wrote:
> >
> >> Was your intent to sort non-ascii characters?
> >>
> >> Gotta take care when you cut and paste from Word or web pages, you'll
> get
> >> goofy fancy quotes and other garbage that your shell script won't like.
> >>
> >> -dc
> >>
> >> On Thu, Mar 26, 2020, 12:57 PM Todd Walton <tdwalton at gmail.com> wrote:
> >>
> >>> LC_ALL=
> >>> LC_COLLATE="en_US.UTF-8"
> >>>
> >>>
> >>> On Thu, Mar 26, 2020 at 12:52 PM Daniel A. Ramaley <
> >>> daniel.ramaley at drake.edu>
> >>> wrote:
> >>>
> >>>> I did a bit more testing and tried it on some older machines that i
> >> have
> >>>> access to. Sort works correctly with GNU sort versions 8.4 and 8.13.
> >>>>
> >>>> What are the locale settings on the machine where it doesn't work?
> >>>> Particularly, what are LC_COLLATE and LC_ALL set to?
> >>>>
> >>>> On 3/26/20 11:34 AM, Daniel A. Ramaley wrote:
> >>>>> The original message looked like it included the full-width versions
> >> of
> >>>>> the numbers. Sort knows about Unicode and should sort those just
> >> fine.
> >>>>> (Even if it doesn't know Unicode it should sort them, as i believe
> >> the
> >>>>> binary representations of those would sort naturally.) And it does
> >> work
> >>>>> on my machine (GNU sort 8.30).
> >>>>>
> >>>>> $ printf '%s\n' 2 1 | sort
> >>>>> 1
> >>>>> 2
> >>>>>
> >>>>> On 3/26/20 11:23 AM, Scott Yates wrote:
> >>>>>> Ya, those are not actual ascii numbers.
> >>>>>>
> >>>>>> On Thu, Mar 26, 2020 at 11:16 AM David Champion <
> >> dchamp1337 at gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> You've pasted some odd characters there for your numbers...
> >>>>>>>
> >>>>>>> Retyped and it works.
> >>>>>>>
> >>>>>>> $ printf '%s\n' 2 1 | sort
> >>>>>>> 1
> >>>>>>> 2
> >>>>>>>
> >>>>>>> -dc
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Mar 26, 2020 at 11:06 AM Todd Walton <tdwalton at gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Check it out:
> >>>>>>>>
> >>>>>>>> [prompt]$ sort --version
> >>>>>>>> sort (GNU coreutils) 8.22
> >>>>>>>> Copyright (C) 2013 Free Software Foundation, Inc.
> >>>>>>>> License GPLv3+: GNU GPL version 3 or later <
> >>>>>>>> http://gnu.org/licenses/gpl.html
> >>>>>>>>> .
> >>>>>>>> This is free software: you are free to change and redistribute it.
> >>>>>>>> There is NO WARRANTY, to the extent permitted by law.
> >>>>>>>>
> >>>>>>>> Written by Mike Haertel and Paul Eggert.
> >>>>>>>>
> >>>>>>>> [prompt]$ printf '%s\n' 2 1 | sort
> >>>>>>>> 2
> >>>>>>>> 1
> >>>>>>>>
> >>>>>>>> It didn't sort.
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Todd
> >>>>>>>> _______________________________________________
> >>>>>>>> Cialug mailing list
> >>>>>>>> Cialug at cialug.org
> >>>>>>>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> >>>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Cialug mailing list
> >>>>>>> Cialug at cialug.org
> >>>>>>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> Cialug mailing list
> >>>>>> Cialug at cialug.org
> >>>>>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> >>>>>>
> >>>>>
> >>>>> __
> >>>>> Daniel Ramaley
> >>>>> Server Engineer 2, Information Technology Services
> >>>>> Drake University
> >>>>>
> >>>>> T: +1-515-271-4540
> >>>>> W: https://www.drake.edu/its
> >>>>> _______________________________________________
> >>>>> Cialug mailing list
> >>>>> Cialug at cialug.org
> >>>>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> >>>>>
> >>>>
> >>>> __
> >>>> Daniel Ramaley
> >>>> Server Engineer 2, Information Technology Services
> >>>> Drake University
> >>>>
> >>>> T: +1-515-271-4540
> >>>> W: https://www.drake.edu/its
> >>>> _______________________________________________
> >>>> Cialug mailing list
> >>>> Cialug at cialug.org
> >>>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> >>>>
> >>> _______________________________________________
> >>> Cialug mailing list
> >>> Cialug at cialug.org
> >>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> >>>
> >> _______________________________________________
> >> Cialug mailing list
> >> Cialug at cialug.org
> >> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> >>
> > _______________________________________________
> > Cialug mailing list
> > Cialug at cialug.org
> > https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> >
>
> __
> Daniel Ramaley
> Server Engineer 2, Information Technology Services
> Drake University
>
> T: +1-515-271-4540
> W: https://www.drake.edu/its
>


More information about the Cialug mailing list