Skip to main content


I’m looking for a template based way of reformatting data from a text form / mask to some variables. The input is “beautiful”, the output should be just variable names and values. I want to un-format.
Does any #sysadmin have an idea how to approach this in 2024, with as little additional tools as possible?
I didn’t touch that territory for a long time so I’m not coming up with a good idea except intuitively using #awk but I feel that there should be better approaches by now.
#admin
This entry was edited (3 weeks ago)
in reply to chris@strafpla.net

For most of my use cases, I have been using regex with grep and sed for data transformation stuff
in reply to Exxo

@Exxo That’s about the same approach as using awk and I feel dirty approaching the task the same clumsy and unreadable way as I did ~30 years ago.
I’d like to take the text form I have, put some variable names and qualifiers where the interesting stuff is and have a command line tool transmogrify the form and the template to a nice list of variables and their values.

I can’t be the only one, can I?

@Exxo
in reply to EndlessMason

@EndlessMason Especially with perl that’s a philosophical question :-)
But yes. Are there good ways to solve this with templates in Perl?
in reply to chris@strafpla.net

If that format has a name there might be something on metacpan.org

Otherwise just a for loop and either
- capture / nibble the start off lines with a regex - it looks like some columns end in : and some don't so
if (s/([^:]+)://) {...$1 }
- split also with a regex
If this is fixed with, so maybe
- scanf (printf backwards)
- match column widths (guess/check the numbers) @cols=/(.{5}):(.{15}).../

Then just use HTTP::Tiny / JSON / YAML / DBI+sqlite to stash it away

in reply to EndlessMason

@EndlessMason That’s exactly the - working - solution I want to avoid this time :-)
I want to take a formatted source with some annotations and use this to match the source and extract the data.
What really really irks me is that I’m feeling as if I had briefly seen something like this many years ago.
in reply to Charlie Stross

@cstross awk just is an option because it’s omnipresent - as is (some version of) Perl, so that’s fine, too.
But both don’t seem to come with a way of doing the trivial magic I’m envisioning.
Maybe I’ll just have to implement it myself - I think rewriting a human readable template to simple regex that is then used further down would work.
(And then I’d find umpteen better implementations that already have been done by people who unlike me are proper programmers.)
in reply to chris@strafpla.net

Oh, I’m not sure if that’s obvious: I want to parse the pretty text and extract some variables, not the other way around.
This entry was edited (3 weeks ago)
in reply to chris@strafpla.net

ah! That wasn't obvious, no. (Perl regexps with whitespace and inline comments for your future self? 🤪)
in reply to Charlie Stross

That would be close but I can not shake off the feeling that some time in the past I did use a tool with this workflow: Save text once, substitute values for variable names and types, mark up some irregularities, get an output of variable names and values for each new text.
This is so simple that a solution must exist.
This entry was edited (3 weeks ago)
in reply to chris@strafpla.net

@claushc I’ve used textfsm for hundreds of formatted text parsing problems just like this. It’s definitely the way to go.
in reply to pheller

@pheller @claushc It’s not exactly what I was looking for but maybe it is what I needed to find. And it pointed me in the direction of a lot of additional tools I didn’t know. That will be really useful, thanks!
in reply to chris@strafpla.net

if the input is fairly well-formed and stable, I would just throw a bit of awk at it, unless you plan on doing a lot of or complex processing on the result.
in reply to Michael Knudsen %n%n%n%n

@mk The input is very stable and I have mangled these things countless times with awk/sed/perl/… But every time I did this for more than one value it felt wrong, ineffective and not reusable.
The data basically is pre formatted with annotated values in positions relative to the annotations.
There must be a way that is more accessible, maybe template based. Someone must have done this already in a better way than I can. Maybe I’m just missing a trivial way of using the tools I have.
in reply to chris@strafpla.net

.@chris I’ve parsed a lot of things like this with awk / nawk / gawk.

Put multiple lines in an awk script (.awk) that match patterns and set variables.

Then after you e matched the last line, do what you want with the variables. E.g.

($1 == “Running”){
runningMode=$4
state=$7}

($1 == “Power”){
powerManagementMode=$5
print state, powerManagementMode, runningMode

runningMode=“”
state=“”
powerManagementMode=“”
}

The idea is match lines that should have known value(s) in specific field(s) and use set variables to the proper values on the matched line.

Once you see the last line (in a set) then use the variables and set the variables to default / control values.

I’ve had great success with this as a #sysadmin

This is just for an idea, all typose are my cats fault. Negative warranty provided. 😉

in reply to DrScriptt

@drscriptt Thanks, this a good, well organized way I have been using variations of, too. IMHO this still is more complicated than it should be:
Usually we look at the source, then (mentally/ on paper) mark up what to extract and name it.
(This frequently is what we get from stakeholders!)
Then we translate this to an algorithm by counting whitespace and looking for unique regexes. This is what I’d like to avoid. (1/2)
in reply to chris@strafpla.net

I’d like to directly make use of the source and create an initial, human readable template by marking up dynamic parts in an effective way.
There are nice approaches to #templates for formatting #output. I’d like to find something similar for #input without reinventing a crutch and calling it a wheel.
(Yes, in an ideal parallel universe every source has a switch to output structured data.)

I’ll have to have a thorough look into existing ways of templating output, first. (2/2)

Unknown parent

Unknown parent

chris@strafpla.net
@DrGeraintLLannfrancheta I don‘t want to scrape, I want to match. I‘m tired of the old way, I have been hating it for decades but only now I understand why.
It is inefficient, it breaks easily and it needs someone (me) to translate a task that is already quite well defined (“capture this value in the example output”) into a format that requires expertise without - in nearly all cases - adding any value at all.
It stinks, this can’t be the best solution in whatever userland.