[Cialug] Dude, Where's My Sort?

Todd Walton tdwalton at gmail.com
Thu Mar 26 18:28:16 UTC 2020


It's actually just a funky-ness I found. It's not impacting what I'm doing.
Turns out that once-upon-a-time people using some Asian languages had to
use more than 7 or 8 bits for their character encoding. But if they just
used 16 or 24 bits for their characters and mixed those with the Western
world's 8 bits, all sorts of formatting and display errors would happen. So
they came up with "full-width" Latin characters, i.e. the regular Latin
characters encoded with the same number of bits as used for Asian
languages. These are full-width Latin characters.

But they're definitely different characters with a "natural" sorting order
to them, and this is definitely a bug in the sort program...... If I sort
"2 1" and then sort "1 2" I should get the same output for both.

Like I said, it's not affecting me really. Just funny behavior.

--
Todd


On Thu, Mar 26, 2020 at 2:07 PM David Champion <dchamp1337 at gmail.com> wrote:

> Was your intent to sort non-ascii characters?
>
> Gotta take care when you cut and paste from Word or web pages, you'll get
> goofy fancy quotes and other garbage that your shell script won't like.
>
> -dc
>
> On Thu, Mar 26, 2020, 12:57 PM Todd Walton <tdwalton at gmail.com> wrote:
>
> > LC_ALL=
> > LC_COLLATE="en_US.UTF-8"
> >
> >
> > On Thu, Mar 26, 2020 at 12:52 PM Daniel A. Ramaley <
> > daniel.ramaley at drake.edu>
> > wrote:
> >
> > > I did a bit more testing and tried it on some older machines that i
> have
> > > access to. Sort works correctly with GNU sort versions 8.4 and 8.13.
> > >
> > > What are the locale settings on the machine where it doesn't work?
> > > Particularly, what are LC_COLLATE and LC_ALL set to?
> > >
> > > On 3/26/20 11:34 AM, Daniel A. Ramaley wrote:
> > > > The original message looked like it included the full-width versions
> of
> > > > the numbers. Sort knows about Unicode and should sort those just
> fine.
> > > > (Even if it doesn't know Unicode it should sort them, as i believe
> the
> > > > binary representations of those would sort naturally.) And it does
> work
> > > > on my machine (GNU sort 8.30).
> > > >
> > > > $ printf '%s\n' 2 1 | sort
> > > > 1
> > > > 2
> > > >
> > > > On 3/26/20 11:23 AM, Scott Yates wrote:
> > > >> Ya, those are not actual ascii numbers.
> > > >>
> > > >> On Thu, Mar 26, 2020 at 11:16 AM David Champion <
> dchamp1337 at gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >>> You've pasted some odd characters there for your numbers...
> > > >>>
> > > >>> Retyped and it works.
> > > >>>
> > > >>> $ printf '%s\n' 2 1 | sort
> > > >>> 1
> > > >>> 2
> > > >>>
> > > >>> -dc
> > > >>>
> > > >>>
> > > >>> On Thu, Mar 26, 2020 at 11:06 AM Todd Walton <tdwalton at gmail.com>
> > > wrote:
> > > >>>
> > > >>>> Check it out:
> > > >>>>
> > > >>>> [prompt]$ sort --version
> > > >>>> sort (GNU coreutils) 8.22
> > > >>>> Copyright (C) 2013 Free Software Foundation, Inc.
> > > >>>> License GPLv3+: GNU GPL version 3 or later <
> > > >>>> http://gnu.org/licenses/gpl.html
> > > >>>>> .
> > > >>>> This is free software: you are free to change and redistribute it.
> > > >>>> There is NO WARRANTY, to the extent permitted by law.
> > > >>>>
> > > >>>> Written by Mike Haertel and Paul Eggert.
> > > >>>>
> > > >>>> [prompt]$ printf '%s\n' 2 1 | sort
> > > >>>> 2
> > > >>>> 1
> > > >>>>
> > > >>>> It didn't sort.
> > > >>>>
> > > >>>> --
> > > >>>> Todd
> > > >>>> _______________________________________________
> > > >>>> Cialug mailing list
> > > >>>> Cialug at cialug.org
> > > >>>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> > > >>>>
> > > >>> _______________________________________________
> > > >>> Cialug mailing list
> > > >>> Cialug at cialug.org
> > > >>> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> > > >>>
> > > >> _______________________________________________
> > > >> Cialug mailing list
> > > >> Cialug at cialug.org
> > > >> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> > > >>
> > > >
> > > > __
> > > > Daniel Ramaley
> > > > Server Engineer 2, Information Technology Services
> > > > Drake University
> > > >
> > > > T: +1-515-271-4540
> > > > W: https://www.drake.edu/its
> > > > _______________________________________________
> > > > Cialug mailing list
> > > > Cialug at cialug.org
> > > > https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> > > >
> > >
> > > __
> > > Daniel Ramaley
> > > Server Engineer 2, Information Technology Services
> > > Drake University
> > >
> > > T: +1-515-271-4540
> > > W: https://www.drake.edu/its
> > > _______________________________________________
> > > Cialug mailing list
> > > Cialug at cialug.org
> > > https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> > >
> > _______________________________________________
> > Cialug mailing list
> > Cialug at cialug.org
> > https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
> >
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
>


More information about the Cialug mailing list