I’m looking for a template based way of reformatting data from a text form / mask to some variables. The input is “beautiful”, the output should be just variable names and values. I want to un-format.
Does any #sysadmin have an idea how to approach this in 2024, with as little additional tools as possible?
I didn’t touch that territory for a long time so I’m not coming up with a good idea except intuitively using #awk but I feel that there should be better approaches by now.
#admin
Does any #sysadmin have an idea how to approach this in 2024, with as little additional tools as possible?
I didn’t touch that territory for a long time so I’m not coming up with a good idea except intuitively using #awk but I feel that there should be better approaches by now.
#admin
This entry was edited (3 weeks ago)
Exxo
in reply to chris@strafpla.net • • •chris@strafpla.net
in reply to Exxo • • •@Exxo That’s about the same approach as using awk and I feel dirty approaching the task the same clumsy and unreadable way as I did ~30 years ago.
I’d like to take the text form I have, put some variable names and qualifiers where the interesting stuff is and have a command line tool transmogrify the form and the template to a nice list of variables and their values.
I can’t be the only one, can I?
Exxo
in reply to chris@strafpla.net • • •EndlessMason
in reply to chris@strafpla.net • • •chris@strafpla.net
in reply to EndlessMason • • •But yes. Are there good ways to solve this with templates in Perl?
EndlessMason
in reply to chris@strafpla.net • • •If that format has a name there might be something on metacpan.org
Otherwise just a for loop and either
- capture / nibble the start off lines with a regex - it looks like some columns end in : and some don't so
if (s/([^:]+)://) {...$1 }
- split also with a regex
If this is fixed with, so maybe
- scanf (printf backwards)
- match column widths (guess/check the numbers) @cols=/(.{5}):(.{15}).../
Then just use HTTP::Tiny / JSON / YAML / DBI+sqlite to stash it away
chris@strafpla.net
in reply to EndlessMason • • •I want to take a formatted source with some annotations and use this to match the source and extract the data.
What really really irks me is that I’m feeling as if I had briefly seen something like this many years ago.
Charlie Stross
in reply to chris@strafpla.net • • •perlform - Perl formats - Perldoc Browser
perldoc.perl.orgchris@strafpla.net
in reply to Charlie Stross • • •But both don’t seem to come with a way of doing the trivial magic I’m envisioning.
Maybe I’ll just have to implement it myself - I think rewriting a human readable template to simple regex that is then used further down would work.
(And then I’d find umpteen better implementations that already have been done by people who unlike me are proper programmers.)
chris@strafpla.net
in reply to chris@strafpla.net • • •Charlie Stross
in reply to chris@strafpla.net • • •chris@strafpla.net
in reply to Charlie Stross • • •This is so simple that a solution must exist.
Claus Holm Christensen
in reply to chris@strafpla.net • • •GitHub - google/textfsm: Python module for parsing semi-structured text into python tables.
GitHubchris@strafpla.net reshared this.
chris@strafpla.net
in reply to Claus Holm Christensen • • •pheller
in reply to chris@strafpla.net • • •chris@strafpla.net
in reply to pheller • • •Michael Knudsen %n%n%n%n
in reply to chris@strafpla.net • • •chris@strafpla.net
in reply to Michael Knudsen %n%n%n%n • • •The data basically is pre formatted with annotated values in positions relative to the annotations.
There must be a way that is more accessible, maybe template based. Someone must have done this already in a better way than I can. Maybe I’m just missing a trivial way of using the tools I have.
DrScriptt
in reply to chris@strafpla.net • • •.@chris I’ve parsed a lot of things like this with awk / nawk / gawk.
Put multiple lines in an awk script (.awk) that match patterns and set variables.
Then after you e matched the last line, do what you want with the variables. E.g.
($1 == “Running”){
runningMode=$4
state=$7}
($1 == “Power”){
powerManagementMode=$5
print state, powerManagementMode, runningMode
runningMode=“”
state=“”
powerManagementMode=“”
}
The idea is match lines that should have known value(s) in specific field(s) and use set variables to the proper values on the matched line.
Once you see the last line (in a set) then use the variables and set the variables to default / control values.
I’ve had great success with this as a #sysadmin
This is just for an idea, all typose are my cats fault. Negative warranty provided. 😉
chris@strafpla.net
in reply to DrScriptt • • •Usually we look at the source, then (mentally/ on paper) mark up what to extract and name it.
(This frequently is what we get from stakeholders!)
Then we translate this to an algorithm by counting whitespace and looking for unique regexes. This is what I’d like to avoid. (1/2)
chris@strafpla.net
in reply to chris@strafpla.net • • •I’d like to directly make use of the source and create an initial, human readable template by marking up dynamic parts in an effective way.
There are nice approaches to #templates for formatting #output. I’d like to find something similar for #input without reinventing a crutch and calling it a wheel.
(Yes, in an ideal parallel universe every source has a switch to output structured data.)
I’ll have to have a thorough look into existing ways of templating output, first. (2/2)
chris@strafpla.net
Unknown parent • • •chris@strafpla.net
Unknown parent • • •It is inefficient, it breaks easily and it needs someone (me) to translate a task that is already quite well defined (“capture this value in the example output”) into a format that requires expertise without - in nearly all cases - adding any value at all.
It stinks, this can’t be the best solution in whatever userland.