libxo question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

libxo question

Mark Saad-5
All
  I am playing around with procstat and libxo on 12-STABLE from
yesterday . I wanted to get a list of  thread_id's for some processes.
I wrote a quick python script to grab the data but xml output is not
well formed. Here is my sample script , which should work on python
2.7

----8<-----------------------
  1 import subprocess as sp
  2 import os,sys
  3 import pprint as pp
  4 import xml.etree.cElementTree as ET
  5
  6
  7 FNULL = open(os.devnull, 'w')
  8 cmd = "procstat --libxo xml -ta"
  9 p = sp.Popen(cmd, shell=True, stdout=sp.PIPE,stderr=FNULL,
executable="/bin/sh")
 10 text , err = p.communicate()
 11
 12 root = ET.fromstring(text)
 13
 14 pp.pprint(root)
 15
 16 sys.exit(1)
------------>8-----------------------

I am constantly getting this odd issue about the xml being not well formatted

Traceback (most recent call last):
  File "/tmp/test.py", line 12, in <module>
    root = ET.fromstring(text)
  File "<string>", line 124, in XML
cElementTree.ParseError: not well-formed (invalid token): line 1, column 32

Attached is a copy of the xml.   Any guidance would be helpful.


--
mark saad | [hidden email]
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: libxo question

Kristof Provost
On 28 Dec 2018, at 20:31, Mark Saad wrote:

> All
>   I am playing around with procstat and libxo on 12-STABLE from
> yesterday . I wanted to get a list of  thread_id's for some processes.
> I wrote a quick python script to grab the data but xml output is not
> well formed. Here is my sample script , which should work on python
> 2.7
>
> ----8<-----------------------
>   1 import subprocess as sp
>   2 import os,sys
>   3 import pprint as pp
>   4 import xml.etree.cElementTree as ET
>   5
>   6
>   7 FNULL = open(os.devnull, 'w')
>   8 cmd = "procstat --libxo xml -ta"
>   9 p = sp.Popen(cmd, shell=True, stdout=sp.PIPE,stderr=FNULL,
> executable="/bin/sh")
>  10 text , err = p.communicate()
>  11
>  12 root = ET.fromstring(text)
>  13
>  14 pp.pprint(root)
>  15
>  16 sys.exit(1)
> ------------>8-----------------------
>
> I am constantly getting this odd issue about the xml being not well
> formatted
>
> Traceback (most recent call last):
>   File "/tmp/test.py", line 12, in <module>
>     root = ET.fromstring(text)
>   File "<string>", line 124, in XML
> cElementTree.ParseError: not well-formed (invalid token): line 1,
> column 32
>
> Attached is a copy of the xml.   Any guidance would be helpful.
>
The attachment seems to have been eaten by a grue, but I can trivially
reproduce the problem.
Passing the output of `procstat --libxo xml -ta` to xmllint gives us:

        -:1: parser error : StartTag: invalid element name
        <procstat
version="1"><threads><0><process_id>0</process_id><command>kernel</com

The libxo code doesn’t quite cope with some of the subtle differences
between JSON and XML. In this case, that XML tag names must start with a
letter or an underscore. They may contain numbers, but may not start
with them.

I’ve used the following very quick&dirty patch to make xmllint happy:

        diff --git a/usr.bin/procstat/procstat.c b/usr.bin/procstat/procstat.c
        index 0269d3c5a5f..5c042322e83 100644
        --- a/usr.bin/procstat/procstat.c
        +++ b/usr.bin/procstat/procstat.c
        @@ -152,7 +152,7 @@ procstat(const struct procstat_cmd *cmd, struct
procstat *prstat,
         {
                char *pidstr = NULL;

        -       asprintf(&pidstr, "%d", kipp->ki_pid);
        +       asprintf(&pidstr, "pid_%d", kipp->ki_pid);
                if (pidstr == NULL)
                        xo_errc(1, ENOMEM, "Failed to allocate memory in
procstat()");
                xo_open_container(pidstr);
        diff --git a/usr.bin/procstat/procstat_rusage.c
b/usr.bin/procstat/procstat_rusage.c
        index 3d8c76370c0..f9caef49a2f 100644
        --- a/usr.bin/procstat/procstat_rusage.c
        +++ b/usr.bin/procstat/procstat_rusage.c
        @@ -126,7 +126,7 @@ print_rusage(struct kinfo_proc *kipp)
                    format_time(&kipp->ki_rusage.ru_stime));

                if ((procstat_opts & PS_OPT_PERTHREAD) != 0) {
        -               asprintf(&threadid, "%d", kipp->ki_tid);
        +               asprintf(&threadid, "ID_%d", kipp->ki_tid);
                        if (threadid == NULL)
                                xo_errc(1, ENOMEM,
                                    "Failed to allocate memory in
print_rusage()");
        diff --git a/usr.bin/procstat/procstat_sigs.c
b/usr.bin/procstat/procstat_sigs.c
        index 984d5d57f95..ceb36ca0dcb 100644
        --- a/usr.bin/procstat/procstat_sigs.c
        +++ b/usr.bin/procstat/procstat_sigs.c
        @@ -155,7 +155,7 @@ procstat_threads_sigs(struct procstat *procstat,
struct kinfo_proc *kipp)
                kinfo_proc_sort(kip, count);
                for (i = 0; i < count; i++) {
                        kipp = &kip[i];
        -               asprintf(&threadid, "%d", kipp->ki_tid);
        +               asprintf(&threadid, "ID_%d", kipp->ki_tid);
                        if (threadid == NULL)
                                xo_errc(1, ENOMEM, "Failed to allocate memory
in "
                                    "procstat_threads_sigs()");
        diff --git a/usr.bin/procstat/procstat_threads.c
b/usr.bin/procstat/procstat_threads.c
        index c62bb516175..17f11044021 100644
        --- a/usr.bin/procstat/procstat_threads.c
        +++ b/usr.bin/procstat/procstat_threads.c
        @@ -66,7 +66,7 @@ procstat_threads(struct procstat *procstat, struct
kinfo_proc *kipp)
                kinfo_proc_sort(kip, count);
                for (i = 0; i < count; i++) {
                        kipp = &kip[i];
        -               asprintf(&threadid, "%d", kipp->ki_tid);
        +               asprintf(&threadid, "ID_%d", kipp->ki_tid);
                        if (threadid == NULL)
                                xo_errc(1, ENOMEM, "Failed to allocate memory
in "
                                    "procstat_threads()");

It’s probably not the prettiest XML, and I’m not sure how useful the
tags are now, but arguably tags with dynamic names are a bad idea
anyway.
I think you wouldn’t see this problem with JSON, so perhaps that’s a
workaround you can consider as well.

Regards,
Kristof
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: libxo question

Mark Saad-5
In reply to this post by Mark Saad-5
On Fri, Dec 28, 2018 at 3:40 PM Chris Torek <[hidden email]> wrote:

>
> >Attached is a copy of the xml.   Any guidance would be helpful.
>
> Your attachment was stripped before it got here, but the problem
> is clear enough.  Procstat / libxo is generating invalid XML.
>
> Here's a bit of sample "procstat --libxo xml" output, which
> I generated locally by running
>
>     procstat --libxo xml -ta
>
> and hand massaging the result:
>
>     <procstat version="1">
>         <threads>
>             <0>
>                 <process_id>0</process_id>
>                 <command>kernel</command>
>                 <threads>
>                     <100000>
>                         <thread_id>100000</thread_id>
>                         <thread_name>swapper</thread_name>
>                         <cpu>-1</cpu>
>      [snip]
>
> Valid XML tags must begin with an alphabetic character or an
> underscore (see https://www.w3schools.com/xml/xml_elements.asp),
> and neither <0> nor <100000> do so.
>
> A quick workaround is to use json instead.  However, libxo
> probably should "work smarter" with tags.
>
> (XML is a terrible data-encoding language because of all of its
> special rules.  If you think you've found them all, watch out for
> CDATA!  JSON is better but still has some issues with encoding,
> requiring that arbitrary binary data be atob or base64 encoded or
> similar.)
>
> Chris

I updated the patch form kb to work on 12
https://mirrors.nycbug.org/pub/patches/procstat-libxo-12-STABLE.patch

Here is the xml output as well
https://mirrors.nycbug.org/pub/patches/procstat.xml

This works better then before and  python's xml parser, mozilla and
edge think its valid xml.

I think this should be fixed what should we do to make it happen ?




--
mark saad | [hidden email]
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: libxo question

Kristof Provost-3


On 28 Dec 2018, at 23:12, Mark Saad wrote:

> On Fri, Dec 28, 2018 at 3:40 PM Chris Torek <[hidden email]>
> wrote:
>>
>>> Attached is a copy of the xml.   Any guidance would be helpful.
>>
>> Your attachment was stripped before it got here, but the problem
>> is clear enough.  Procstat / libxo is generating invalid XML.
>>
>> Here's a bit of sample "procstat --libxo xml" output, which
>> I generated locally by running
>>
>>     procstat --libxo xml -ta
>>
>> and hand massaging the result:
>>
>>     <procstat version="1">
>>         <threads>
>>             <0>
>>                 <process_id>0</process_id>
>>                 <command>kernel</command>
>>                 <threads>
>>                     <100000>
>>                         <thread_id>100000</thread_id>
>>                         <thread_name>swapper</thread_name>
>>                         <cpu>-1</cpu>
>>      [snip]
>>
>> Valid XML tags must begin with an alphabetic character or an
>> underscore (see https://www.w3schools.com/xml/xml_elements.asp),
>> and neither <0> nor <100000> do so.
>>
>> A quick workaround is to use json instead.  However, libxo
>> probably should "work smarter" with tags.
>>
>> (XML is a terrible data-encoding language because of all of its
>> special rules.  If you think you've found them all, watch out for
>> CDATA!  JSON is better but still has some issues with encoding,
>> requiring that arbitrary binary data be atob or base64 encoded or
>> similar.)
>>
>> Chris
>
> I updated the patch form kb to work on 12
> https://mirrors.nycbug.org/pub/patches/procstat-libxo-12-STABLE.patch
>
> Here is the xml output as well
> https://mirrors.nycbug.org/pub/patches/procstat.xml
>
> This works better then before and  python's xml parser, mozilla and
> edge think its valid xml.
>
> I think this should be fixed what should we do to make it happen ?
>
I’ve posted https://reviews.freebsd.org/D18679 as a more generic way
of addressing this.
It’s quite possible that there are other users of libxo with the same
problem, and this will help all of them.

Regards,
Kristof
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"