[Cialug] Columns of Data

Jeffrey Ollie jeff at ocjtech.us
Fri Jul 31 16:28:11 UTC 2020


I'd use Python (or Perl if that floats your boat) regular expressions to
split up the file. You'll notice that the "start" of each "line" is a
timestamp or duration with an easily recognized pattern. That can be used
to split the file up. See the code here:

https://gist.github.com/jcollie/02cd24dffb695210fdeaece31925c7bc

$ python3 test.py < test.txt
('9m49s', 'Normal', 'Updated', 'machine/oo-r6sr3-worker-us-east-1d-nvfzh',
'Updated machine\noo-r')
('9m47s', 'Normal', 'Updated', 'machine/oo-r6sr3-master-1', 'Updated
machine oo-r')
('9m46s', 'Normal', 'Updated', 'machine/oo-r6sr3-worker-us-east-1b-hsmsx',
'Updated machine\noo-r')
('9m46s', 'Normal', 'Updated', 'machine/oo-r6sr3-worker-us-east-1a-wk5cs',
'Updated machine\noo-r')
('9m46s', 'Normal', 'Updated', 'machine/oo-r6sr3-worker-us-east-1e-z9xlb',
'Updated machine\noo-r')
('9m44s', 'Normal', 'Updated', 'machine/oo-r6sr3-master-0', 'Updated
machine oo-r')
('9m43s', 'Normal', 'Updated', 'machine/oo-r6sr3-master-2', 'Updated
machine oo-r')
('9m43s', 'Normal', 'Updated', 'machine/oo-r6sr3-worker-us-east-1d-tfg6x',
'Updated machine\noo-r')
('9m43s', 'Normal', 'Updated', 'machine/oo-r6sr3-worker-us-east-1c-6l42j',
'Updated machine\noo-r')
('59s', 'Normal', 'SuccessfulUpdate', 'clusterautoscaler/default', 'Updated
ClusterAutoscaler deployment:\nmachine-api/cluster-autoscaler-default\n')
('4m7s', 'Normal', 'Pulled', 'pod/gateway-laravel-schedule-1296080-h43n6',
'Container
image\n"dockerregistry:4567/group/gateway/master:alpine-nodejs-fpm"
already\npresent on machine\n')

On Fri, Jul 31, 2020 at 10:18 AM Todd Walton <tdwalton at gmail.com> wrote:

> Happy SysAdmin Day, everyone!
>
> Here is an example bit of text coughed up by a Kubernetes command-line
> tool:
>
> 9m49s    Normal    Updated
>  machine/oo-r6sr3-worker-us-east-1d-nvfzh    Updated machine
> oo-r6sr3-worker-us-east-1d-nvfzh
> 9m47s    Normal    Updated            machine/oo-r6sr3-master-1
>       Updated machine oo-r6sr3-master-1
> 9m46s    Normal    Updated
>  machine/oo-r6sr3-worker-us-east-1b-hsmsx    Updated machine
> oo-r6sr3-worker-us-east-1b-hsmsx
> 9m46s    Normal    Updated
>  machine/oo-r6sr3-worker-us-east-1a-wk5cs    Updated machine
> oo-r6sr3-worker-us-east-1a-wk5cs
> 9m46s    Normal    Updated
>  machine/oo-r6sr3-worker-us-east-1e-z9xlb    Updated machine
> oo-r6sr3-worker-us-east-1e-z9xlb
> 9m44s    Normal    Updated            machine/oo-r6sr3-master-0
>       Updated machine oo-r6sr3-master-0
> 9m43s    Normal    Updated            machine/oo-r6sr3-master-2
>       Updated machine oo-r6sr3-master-2
> 9m43s    Normal    Updated
>  machine/oo-r6sr3-worker-us-east-1d-tfg6x    Updated machine
> oo-r6sr3-worker-us-east-1d-tfg6x
> 9m43s    Normal    Updated
>  machine/oo-r6sr3-worker-us-east-1c-6l42j    Updated machine
> oo-r6sr3-worker-us-east-1c-6l42j
> 59s      Normal    SuccessfulUpdate   clusterautoscaler/default
>       Updated ClusterAutoscaler deployment:
> machine-api/cluster-autoscaler-default
> 4m7s     Normal    Pulled
> pod/gateway-laravel-schedule-1296080-h43n6  Container image
> "dockerregistry:4567/group/gateway/master:alpine-nodejs-fpm" already
> present on machine
>
> For the purpose of this email, don't mind about the semantics. This could
> be anything. But do notice that the output is arranged into neat columns.
> The first four columns are strings of non-space characters. The fifth
> column, however, gives us trouble. It seeks to undermine the movement from
> within, throwing a wrench into the works. Fifth columns, amiright?
>
> Here's another example, this one taken from my /var/log/messages:
>
> Jun 28 02:50:42 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
> (wlp1s0): set-hw-addr: set MAC address to 9:7:8:9:2:F (scanning)
> Jun 28 02:50:42 ilm01-ll-ttwalto kernel: IPv6: ADDRCONF(NETDEV_UP): wlp1s0:
> link is not ready
> Jun 28 02:50:42 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
> (wlp1s0): supplicant interface state: inactive -> disabled
> Jun 28 02:50:42 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
> (wlp1s0): supplicant interface state: disabled -> inactive
> Jun 28 02:55:57 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
> (wlp1s0): set-hw-addr: set MAC address to 62:0F:6E:7A:B3:2C (scanning)
> Jun 28 02:55:57 ilm01-ll-ttwalto kernel: IPv6: ADDRCONF(NETDEV_UP): wlp1s0:
> link is not ready
> Jun 28 02:55:57 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
> (wlp1s0): supplicant interface state: inactive -> disabled
>
> Here again the fifth column is making things difficult. Also the first
> three could certainly stand to be one column, but at least they're
> standard, predictable, and manipulable. Manipulable being what I'm looking
> for.
>
> This happens frequently, where a command or log outputs text in columns
> that are not quite usefully arranged. How does one deal with columnar data
> like this? I can't use 'cut'. What would I cut on that would capture the
> first columns *and* keep the last one intact? I'm not sure how one would
> easily use awk for this. Is there something like '{ print $5- }'? Meaning,
> from column 5 onwards? I can't use "column -t" because that screws
> everything up royally.
>
> Another thing that trips me up. Sometimes I'll have a nice set of
> comma-separated values but there'll be a comma in one of the fields. The
> typical way of dealing with this in CSV files is to quote the entire field.
> But that doesn't help me, the bash scripter.
>
> Any suggestions for how to deal with stuff like this?
>
> --
> Todd
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> https://www.cialug.org/cgi-bin/mailman/listinfo/cialug
>


-- 
Jeff Ollie
The majestik møøse is one of the mäni interesting furry animals in Sweden.


More information about the Cialug mailing list