[Cialug] Columns of Data

jim kraai jimgkraai at gmail.com
Fri Jul 31 16:15:33 UTC 2020


fixed widths can help if they can be specified in the cli tool

if the order of columns can be specified and only one column is
variable-length, then put the variable column last for parsing

if the tool options allow for a custom field separator, use something
weird, like '~~~' that won't normally appear in the data

if the tool options allow for standard or field delimiters, use those

dive into the tool source code and add whatever separators or delimiters
that would be useful to the source


On Fri, Jul 31, 2020 at 10:18 AM Todd Walton <tdwalton at gmail.com> wrote:

> Happy SysAdmin Day, everyone!
>
> Here is an example bit of text coughed up by a Kubernetes command-line
> tool:
>
> 9m49s    Normal    Updated
>  machine/oo-r6sr3-worker-us-east-1d-nvfzh    Updated machine
> oo-r6sr3-worker-us-east-1d-nvfzh
> 9m47s    Normal    Updated            machine/oo-r6sr3-master-1
>       Updated machine oo-r6sr3-master-1
> 9m46s    Normal    Updated
>  machine/oo-r6sr3-worker-us-east-1b-hsmsx    Updated machine
> oo-r6sr3-worker-us-east-1b-hsmsx
> 9m46s    Normal    Updated
>  machine/oo-r6sr3-worker-us-east-1a-wk5cs    Updated machine
> oo-r6sr3-worker-us-east-1a-wk5cs
> 9m46s    Normal    Updated
>  machine/oo-r6sr3-worker-us-east-1e-z9xlb    Updated machine
> oo-r6sr3-worker-us-east-1e-z9xlb
> 9m44s    Normal    Updated            machine/oo-r6sr3-master-0
>       Updated machine oo-r6sr3-master-0
> 9m43s    Normal    Updated            machine/oo-r6sr3-master-2
>       Updated machine oo-r6sr3-master-2
> 9m43s    Normal    Updated
>  machine/oo-r6sr3-worker-us-east-1d-tfg6x    Updated machine
> oo-r6sr3-worker-us-east-1d-tfg6x
> 9m43s    Normal    Updated
>  machine/oo-r6sr3-worker-us-east-1c-6l42j    Updated machine
> oo-r6sr3-worker-us-east-1c-6l42j
> 59s      Normal    SuccessfulUpdate   clusterautoscaler/default
>       Updated ClusterAutoscaler deployment:
> machine-api/cluster-autoscaler-default
> 4m7s     Normal    Pulled
> pod/gateway-laravel-schedule-1296080-h43n6  Container image
> "dockerregistry:4567/group/gateway/master:alpine-nodejs-fpm" already
> present on machine
>
> For the purpose of this email, don't mind about the semantics. This could
> be anything. But do notice that the output is arranged into neat columns.
> The first four columns are strings of non-space characters. The fifth
> column, however, gives us trouble. It seeks to undermine the movement from
> within, throwing a wrench into the works. Fifth columns, amiright?
>
> Here's another example, this one taken from my /var/log/messages:
>
> Jun 28 02:50:42 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
> (wlp1s0): set-hw-addr: set MAC address to 9:7:8:9:2:F (scanning)
> Jun 28 02:50:42 ilm01-ll-ttwalto kernel: IPv6: ADDRCONF(NETDEV_UP): wlp1s0:
> link is not ready
> Jun 28 02:50:42 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
> (wlp1s0): supplicant interface state: inactive -> disabled
> Jun 28 02:50:42 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
> (wlp1s0): supplicant interface state: disabled -> inactive
> Jun 28 02:55:57 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
> (wlp1s0): set-hw-addr: set MAC address to 62:0F:6E:7A:B3:2C (scanning)
> Jun 28 02:55:57 ilm01-ll-ttwalto kernel: IPv6: ADDRCONF(NETDEV_UP): wlp1s0:
> link is not ready
> Jun 28 02:55:57 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
> (wlp1s0): supplicant interface state: inactive -> disabled
>
> Here again the fifth column is making things difficult. Also the first
> three could certainly stand to be one column, but at least they're
> standard, predictable, and manipulable. Manipulable being what I'm looking
> for.
>
> This happens frequently, where a command or log outputs text in columns
> that are not quite usefully arranged. How does one deal with columnar data
> like this? I can't use 'cut'. What would I cut on that would capture the
> first columns *and* keep the last one intact? I'm not sure how one would
> easily use awk for this. Is there something like '{ print $5- }'? Meaning,
> from column 5 onwards? I can't use "column -t" because that screws
> everything up royally.
>
> Another thing that trips me up. Sometimes I'll have a nice set of
> comma-separated values but there'll be a comma in one of the fields. The
> typical way of dealing with this in CSV files is to quote the entire field.
> But that doesn't help me, the bash scripter.
>
> Any suggestions for how to deal with stuff like this?
>
> --
> Todd
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
>


More information about the Cialug mailing list