Friendica Social Network

chris@strafpla.net

1 day ago • •

chris@strafpla.net
1 day ago • •

I’m looking for a template based way of reformatting data from a text form / mask to some variables. The input is “beautiful”, the output should be just variable names and values. I want to un-format.
Does any #sysadmin have an idea how to approach this in 2024, with as little additional tools as possible?
I didn’t touch that territory for a long time so I’m not coming up with a good idea except intuitively using #awk but I feel that there should be better approaches by now.
#admin

Screenshot of the form in a terminal emulator:
ATU-R Info (hw: annex B, f/w: annex A/B/C) -----------
Running Mode : 17A State : SHOWTIME
DS Actual Rate : 63678000 bps US Actual Rate : 23360000 bps
DS Attainable Rate : 69037492 bps US Attainable Rate : 36406000 bps
DS Path Mode : Fast US Path Mode : Fast
DS Interleave Depth : 1 US Interleave Depth : 1
NE Current Attenuation : 19 dB Cur SNR Margin : 7 dB
DS actual PSD : 5. 4 dB US actual PSD : 10. 0 dB
NE CRC Count : 0 FE CRC Count : 40
NE ES Count : 0 FE ES Count : 6
Xdsl Reset Times : 0 Xdsl Link Times : 1
ITU Version[0] : fe004452 ITU Version[1] : 41590000
VDSL Firmware Version : 08-0B-02-06-00-07 [with Vectoring support]
Power Management Mode : DSL_G997_PMS_L0

This entry was edited (1 day ago)

in reply to chris@strafpla.net

Exxo

in reply to chris@strafpla.net • 1 day ago • •

For most of my use cases, I have been using regex with grep and sed for data transformation stuff

in reply to Exxo

chris@strafpla.net

in reply to Exxo • 1 day ago • •

@Exxo That’s about the same approach as using awk and I feel dirty approaching the task the same clumsy and unreadable way as I did ~30 years ago.
I’d like to take the text form I have, put some variable names and qualifiers where the interesting stuff is and have a command line tool transmogrify the form and the template to a nice list of variables and their values.

I can’t be the only one, can I?

@Exxo

in reply to chris@strafpla.net

Exxo

in reply to chris@strafpla.net • 1 day ago • •

If you find something nice, let me know!

in reply to chris@strafpla.net

EndlessMason

in reply to chris@strafpla.net • 1 day ago • •

Does perl count as a single tool?

in reply to EndlessMason

chris@strafpla.net

in reply to EndlessMason • 1 day ago • •

@EndlessMason Especially with perl that’s a philosophical question

But yes. Are there good ways to solve this with templates in Perl?

@EndlessMason

in reply to chris@strafpla.net

EndlessMason

in reply to chris@strafpla.net • 1 day ago (Received 20 hours ago) • •

If that format has a name there might be something on metacpan.org

Otherwise just a for loop and either
- capture / nibble the start off lines with a regex - it looks like some columns end in : and some don't so
if (s/([^:]+)://) {...$1 }
- split also with a regex
If this is fixed with, so maybe
- scanf (printf backwards)
- match column widths (guess/check the numbers) @cols=/(.{5}):(.{15}).../

Then just use HTTP::Tiny / JSON / YAML / DBI+sqlite to stash it away

in reply to EndlessMason

chris@strafpla.net

in reply to EndlessMason • 20 hours ago • •

@EndlessMason That’s exactly the - working - solution I want to avoid this time

I want to take a formatted source with some annotations and use this to match the source and extract the data.
What really really irks me is that I’m feeling as if I had briefly seen something like this many years ago.

@EndlessMason

in reply to chris@strafpla.net

Charlie Stross

in reply to chris@strafpla.net • 1 day ago • •

if awk is an option how about perl, which is a strict superset of awk (via a2p), specifically for its format syntax? https://perldoc.perl.org/perlform

perlform - Perl formats - Perldoc Browser

^{perldoc.perl.org}

in reply to Charlie Stross

chris@strafpla.net

in reply to Charlie Stross • 1 day ago • •

@cstross awk just is an option because it’s omnipresent - as is (some version of) Perl, so that’s fine, too.
But both don’t seem to come with a way of doing the trivial magic I’m envisioning.
Maybe I’ll just have to implement it myself - I think rewriting a human readable template to simple regex that is then used further down would work.
(And then I’d find umpteen better implementations that already have been done by people who unlike me are proper programmers.)

@Charlie Stross

in reply to chris@strafpla.net

chris@strafpla.net

in reply to chris@strafpla.net • 1 day ago • •

Oh, I’m not sure if that’s obvious: I want to parse the pretty text and extract some variables, not the other way around.

This entry was edited (1 day ago)

in reply to chris@strafpla.net

Charlie Stross

in reply to chris@strafpla.net • 1 day ago • •

ah! That wasn't obvious, no. (Perl regexps with whitespace and inline comments for your future self? 🤪)

in reply to Charlie Stross

chris@strafpla.net

in reply to Charlie Stross • 1 day ago • •

That would be close but I can not shake off the feeling that some time in the past I did use a tool with this workflow: Save text once, substitute values for variable names and types, mark up some irregularities, get an output of variable names and values for each new text.
This is so simple that a solution must exist.

This entry was edited (1 day ago)

in reply to chris@strafpla.net

Claus Holm Christensen

in reply to chris@strafpla.net • 1 day ago • •

I have been thinking of using TextFSM for this purpose: https://github.com/google/textfsm

GitHub - google/textfsm: Python module for parsing semi-structured text into python tables.

Python module for parsing semi-structured text into python tables. - google/textfsm

^GitHub

chris@strafpla.net reshared this.

in reply to Claus Holm Christensen

chris@strafpla.net

in reply to Claus Holm Christensen • 1 day ago • •

@claushc Thanks, that’s looking very interesting and useful!

@Claus Holm Christensen

in reply to chris@strafpla.net

pheller

in reply to chris@strafpla.net • 23 hours ago • •

@claushc I’ve used textfsm for hundreds of formatted text parsing problems just like this. It’s definitely the way to go.

@Claus Holm Christensen

in reply to pheller

chris@strafpla.net

in reply to pheller • 20 hours ago • •

@pheller @claushc It’s not exactly what I was looking for but maybe it is what I needed to find. And it pointed me in the direction of a lot of additional tools I didn’t know. That will be really useful, thanks!

@Claus Holm Christensen @pheller

in reply to chris@strafpla.net

Michael Knudsen %n%n%n%n

in reply to chris@strafpla.net • 1 day ago • •

if the input is fairly well-formed and stable, I would just throw a bit of awk at it, unless you plan on doing a lot of or complex processing on the result.

in reply to Michael Knudsen %n%n%n%n

chris@strafpla.net

in reply to Michael Knudsen %n%n%n%n • 1 day ago • •

@mk The input is very stable and I have mangled these things countless times with awk/sed/perl/… But every time I did this for more than one value it felt wrong, ineffective and not reusable.
The data basically is pre formatted with annotated values in positions relative to the annotations.
There must be a way that is more accessible, maybe template based. Someone must have done this already in a better way than I can. Maybe I’m just missing a trivial way of using the tools I have.

@Michael Knudsen %n%n%n%n

in reply to chris@strafpla.net

DrScriptt

in reply to chris@strafpla.net • 1 day ago (Received 21 hours ago) • •

.@chris I’ve parsed a lot of things like this with awk / nawk / gawk.

Put multiple lines in an awk script (.awk) that match patterns and set variables.

Then after you e matched the last line, do what you want with the variables. E.g.

($1 == “Running”){
runningMode=$4
state=$7}

($1 == “Power”){
powerManagementMode=$5
print state, powerManagementMode, runningMode

runningMode=“”
state=“”
powerManagementMode=“”
}

The idea is match lines that should have known value(s) in specific field(s) and use set variables to the proper values on the matched line.

Once you see the last line (in a set) then use the variables and set the variables to default / control values.

I’ve had great success with this as a #sysadmin

This is just for an idea, all typose are my cats fault. Negative warranty provided. 😉

#sysadmin @chris@strafpla.net

in reply to DrScriptt

chris@strafpla.net

in reply to DrScriptt • 21 hours ago • •

@drscriptt Thanks, this a good, well organized way I have been using variations of, too. IMHO this still is more complicated than it should be:
Usually we look at the source, then (mentally/ on paper) mark up what to extract and name it.
(This frequently is what we get from stakeholders!)
Then we translate this to an algorithm by counting whitespace and looking for unique regexes. This is what I’d like to avoid. (1/2)

@DrScriptt

in reply to chris@strafpla.net

chris@strafpla.net

in reply to chris@strafpla.net • 21 hours ago • •

I’d like to directly make use of the source and create an initial, human readable template by marking up dynamic parts in an effective way.
There are nice approaches to #templates for formatting #output. I’d like to find something similar for #input without reinventing a crutch and calling it a wheel.
(Yes, in an ideal parallel universe every source has a switch to output structured data.)

I’ll have to have a thorough look into existing ways of templating output, first. (2/2)

#templates #output #input

⇧