printf behaviour with illegal or malformed format string

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

printf behaviour with illegal or malformed format string

Poul-Henning Kamp

Obligatory bikeshed avoidance notice:

>> Please read all the way to the bottom of this email before you reply <<


Given that illegal or malformed format strings to the printf family
of functions mostly result in harmless misformatting but occationally
in coredumps and maybe some times in security issues, what is the
desired behaviour of libc's printf implementation ?

A very good example is

        printf("%hf", 1.0);

The 'h' modifier is not legal for %f format, and it is therefore a good
bet that the programmer was confused and we know that the program
contains at least one error.


Our first line of defence against this kind of error is compile-time
checking by GCC, but we cannot rely on the string being sane in libc,
we still need to do error checking.

The context for the above is that I'm working on adding extensibility
to our printf, compatible with the GLIBC (see 12.13 in the glibc
manual).  Obviously, gcc cannot compile-time check such extensions
for us, and therefore the question gains a bit more relevance.

In an ideal world, the printf family of functions would have been
defined to return EINVAL in this case.  Almost nobody checks the
return values of printf-like functions however and those few that
do, all pressume that it is an I/O error so such an approach is
unlikely to gain us much if anything.

Another alternative is to spit out the format string unformatted,
possibly with an attached notice, but this doesn't really seem to
help anybody either, but at least indicates what the problem is.


I'm leaning towards doing what phkmalloc has migrated to over time:
Make a variable which can select between "normal/paranoia" and force
it to paranoia for (uid==0 || gid==0 || setuid || setgid).

If the variable is set, a bogus format string will result in abort(2).

If it is not set, the format string will be output unformatted in
the message "WARNING: Illegal printf() format string: \"...\".


If you cannot possibly live with that, please state why, and describe
what behaviour you would prefer instead.

Thankyou, (and may the best bikeshed win).

Poul-Henning

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
[hidden email]         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Max Laier
On Monday 12 December 2005 13:14, Poul-Henning Kamp wrote:

> Obligatory bikeshed avoidance notice:
> >> Please read all the way to the bottom of this email before you reply <<
>
> Given that illegal or malformed format strings to the printf family
> of functions mostly result in harmless misformatting but occationally
> in coredumps and maybe some times in security issues, what is the
> desired behaviour of libc's printf implementation ?
>
> A very good example is
>
> printf("%hf", 1.0);
>
> The 'h' modifier is not legal for %f format, and it is therefore a good
> bet that the programmer was confused and we know that the program
> contains at least one error.
>
>
> Our first line of defence against this kind of error is compile-time
> checking by GCC, but we cannot rely on the string being sane in libc,
> we still need to do error checking.
>
> The context for the above is that I'm working on adding extensibility
> to our printf, compatible with the GLIBC (see 12.13 in the glibc
> manual).  Obviously, gcc cannot compile-time check such extensions
> for us, and therefore the question gains a bit more relevance.
>
> In an ideal world, the printf family of functions would have been
> defined to return EINVAL in this case.  Almost nobody checks the
> return values of printf-like functions however and those few that
> do, all pressume that it is an I/O error so such an approach is
> unlikely to gain us much if anything.
>
> Another alternative is to spit out the format string unformatted,
> possibly with an attached notice, but this doesn't really seem to
> help anybody either, but at least indicates what the problem is.
>
>
> I'm leaning towards doing what phkmalloc has migrated to over time:
> Make a variable which can select between "normal/paranoia" and force
> it to paranoia for (uid==0 || gid==0 || setuid || setgid).
>
> If the variable is set, a bogus format string will result in abort(2).
>
> If it is not set, the format string will be output unformatted in
> the message "WARNING: Illegal printf() format string: \"...\".
I agree on principle but would like to ask if we need to revisit some of the
error cases.  Especially with regard to 64bit porting there are some
"artifacts" that might cause serious pain for ported applications if the
above is adopted.

Specifically, right now the following will warn "long long int format, int64_t
arg (arg 2)" on our 64bit architectures while it is required on - at least -
i386

        int64_t i = 1;
        printf("%lld", i);

Many other platforms allow it for 64bit architectures as well.  As for all our
64bit architectures sizeof(long) == sizeof(long long) (as far as I am aware),
I am not convinced this should be a (fatal) error.  There might be other
similar cases.

So the question is, how strict should this check be?  Are there cases where we
are better off with a "just do it"-sollution?


As a community service, there is a right way to do this (according to C99):

        int64_t i = 1;
        printf("%" PRIi64 "\n", i);

but it's obvious this is not going to be adopted.  The other often used
workaround is:

        int64_t i = 1;
        printf("%jd\n", (intmax_t)i);

or:

        printf("%lld\n", (long long)i);

which kind of reverts the idea behind useing C99-types.

Note that:

        printf("%jd\n, i);

seems to work as well, but I not sure this is correct.

--
/"\  Best regards,                      | [hidden email]
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mlaier@EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News

attachment0 (194 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Poul-Henning Kamp
In message <[hidden email]>, Max Laier writes:

>I agree on principle but would like to ask if we need to revisit some of the
>error cases.  Especially with regard to 64bit porting there are some
>"artifacts" that might cause serious pain for ported applications if the
>above is adopted.
>
>Specifically, right now the following will warn "long long int format, int64_t
>arg (arg 2)" on our 64bit architectures while it is required on - at least i386
>
> int64_t i = 1;
> printf("%lld", i);

You misunderstood me.

"%lld" is a legal formatting string, my printf implementation would never
object to that.

I'm talking about illegal/non-sensical formatting strings like "%lhd" or
even "%!!!!d" and similar.

The issue you raise is valid and important, but it is not for this bikeshed ;-)

Poul-Henning

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
[hidden email]         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Joseph Koshy
In reply to this post by Poul-Henning Kamp
phk> I'm leaning towards doing what phkmalloc has migrated to
phk> over time:
phk> Make a variable which can select between "normal/paranoia"
phk> and force it to paranoia for (uid==0 || gid==0 ||
phk> setuid || setgid).

phk> If the variable is set, a bogus format string will result
phk> in abort(2).

phk> If it is not set, the format string will be output
phk> unformatted in the message "WARNING: Illegal printf()
phk> format string: \"...\".

Why not just print the warning for both cases, and
stop interpreting the format string any further.

What do we gain by having a uid 0 process dump core?

--
FreeBSD Volunteer,     http://people.freebsd.org/~jkoshy
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Poul-Henning Kamp
In message <[hidden email]>, Joseph Koshy writes:

>Why not just print the warning for both cases, and
>stop interpreting the format string any further.
>
>What do we gain by having a uid 0 process dump core?

I'm not very keen on producing wrong output from uid0 programs without
giving the programmer a way to handle the problem correctly.


--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
[hidden email]         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Peter Jeremy
In reply to this post by Poul-Henning Kamp
On Mon, 2005-Dec-12 13:14:23 +0100, Poul-Henning Kamp wrote:
>The context for the above is that I'm working on adding extensibility
>to our printf, compatible with the GLIBC (see 12.13 in the glibc
>manual).

http://www.gnu.org/software/libc/manual/html_node/Customizing-Printf.html
for anyone else wanting to see what it does.

>  Obviously, gcc cannot compile-time check such extensions
>for us, and therefore the question gains a bit more relevance.

There doesn't even appear to be a defined way of determining whether
extensions are supported.  Since not all libc's support printf
extensions, an application that wants to use them has to confirm that
they work and the only way to do this appears to be to try it and see
what happens (either at runtime, or during a autoconf-style configuration
process).

>Another alternative is to spit out the format string unformatted,
>possibly with an attached notice, but this doesn't really seem to
>help anybody either, but at least indicates what the problem is.

xterm does (or did) this if it is running as root and the format
string contains conversion specification that it thinks are suspicious.

>I'm leaning towards doing what phkmalloc has migrated to over time:
>Make a variable which can select between "normal/paranoia" and force
>it to paranoia for (uid==0 || gid==0 || setuid || setgid).
>
>If the variable is set, a bogus format string will result in abort(2).

set{u,g}id programs won't dump core so just abort(2)ing leaves no
trace of what went wrong.  This makes finding the problem more
difficult.  Even for {u,g}id == 0 programs, it would be nice to have
something reported (but see below).  Note that this behaviour has
implications for programs that are trying to determine if extensions
are supported or not.

>If it is not set, the format string will be output unformatted in
>the message "WARNING: Illegal printf() format string: \"...\".

Since this check presumably applies to the entire *printf() family,
where do you report the error for {s,f}printf()?

What do you define as an "illegal printf() format string"?  I can
think of four possible categories:
1) Using a nonsense value before '$', eg "%12345$d"
2) Having an invalid modifier on a builtin conversion specifier, eg "%hf"
3) Using an undefined conversion specified, eg '%W'
4) Having an invalid modifier on a user-specified conversion specifier

The last category is particularly problematic because the glibc
interface does not have any way to identify this error.  The glibc
documentation just states that the output handler conversion function
(printf_function) should return a negative number if an error occurs
so it's not possible to distinguish an invalid modifier from an I/O
error or invalid argument.

--
Peter Jeremy
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Poul-Henning Kamp
In message <[hidden email]>, Peter Jeremy writes:

>>I'm leaning towards doing what phkmalloc has migrated to over time:
>>Make a variable which can select between "normal/paranoia" and force
>>it to paranoia for (uid==0 || gid==0 || setuid || setgid).
>>
>>If the variable is set, a bogus format string will result in abort(2).
>
>set{u,g}id programs won't dump core so just abort(2)ing leaves no
>trace of what went wrong.

That's one of the reason there is an "abort2(2)" system call in the
works which allows the program to tell syslog why it comitted suicide.

I have a patch in my inbox and I should really get it committed now.

>>If it is not set, the format string will be output unformatted in
>>the message "WARNING: Illegal printf() format string: \"...\".
>
>Since this check presumably applies to the entire *printf() family,
>where do you report the error for {s,f}printf()?

Whereever the strings was meant to go, what else can I do ?

>What do you define as an "illegal printf() format string"?  I can
>think of four possible categories:
>1) Using a nonsense value before '$', eg "%12345$d"
>2) Having an invalid modifier on a builtin conversion specifier, eg "%hf"
>3) Using an undefined conversion specified, eg '%W'
>4) Having an invalid modifier on a user-specified conversion specifier

Those are probably the primary suspects.

>The last category is particularly problematic because the glibc
>interface does not have any way to identify this error.

My current plan is to provide a better API than GLIBC and make
a couple of degraded glibc-api wrappers.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
[hidden email]         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

George Paplas


--- Poul-Henning Kamp <[hidden email]> wrote:

> >>If it is not set, the format string will be output unformatted in
> >>the message "WARNING: Illegal printf() format string: \"...\".
> >
> >Since this check presumably applies to the entire *printf() family,
> >where do you report the error for {s,f}printf()?
>
> Whereever the strings was meant to go, what else can I do ?

And what if you are doing an sprintf to a buffer smaller than your
warning message?

-geop

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Poul-Henning Kamp
In message <[hidden email]>, George Paplas writes:

>
>
>--- Poul-Henning Kamp <[hidden email]> wrote:
>
>> >>If it is not set, the format string will be output unformatted in
>> >>the message "WARNING: Illegal printf() format string: \"...\".
>> >
>> >Since this check presumably applies to the entire *printf() family,
>> >where do you report the error for {s,f}printf()?
>>
>> Whereever the strings was meant to go, what else can I do ?
>
>And what if you are doing an sprintf to a buffer smaller than your
>warning message?

"Too f**king bad"

It's the programmer who's made a mess in the first place, so he
can't really make many demands in this situation...

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
[hidden email]         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Bruce Evans
In reply to this post by Max Laier
On Mon, 12 Dec 2005, Max Laier wrote:

> On Monday 12 December 2005 13:14, Poul-Henning Kamp wrote:
>> Obligatory bikeshed avoidance notice:
>>>> Please read all the way to the bottom of this email before you reply <<
>>
>> Given that illegal or malformed format strings to the printf family
>> of functions mostly result in harmless misformatting but occationally
>> in coredumps and maybe some times in security issues, what is the
>> desired behaviour of libc's printf implementation ?

It is to emit nasel daemons.  Failing that, a core dump is best.

>> A very good example is
>>
>> printf("%hf", 1.0);
>>
>> The 'h' modifier is not legal for %f format, and it is therefore a good
>> bet that the programmer was confused and we know that the program
>> contains at least one error.
>>
>> Our first line of defence against this kind of error is compile-time
>> checking by GCC, but we cannot rely on the string being sane in libc,
>> we still need to do error checking.

There is also fmtcheck(3).

>> The context for the above is that I'm working on adding extensibility
>> to our printf, compatible with the GLIBC (see 12.13 in the glibc
>> manual).  Obviously, gcc cannot compile-time check such extensions
>> for us, and therefore the question gains a bit more relevance.
>>
>> In an ideal world, the printf family of functions would have been
>> defined to return EINVAL in this case.  Almost nobody checks the
>> return values of printf-like functions however and those few that
>> do, all pressume that it is an I/O error so such an approach is
>> unlikely to gain us much if anything.

A core dump is best for these reasons.  Also, no one uses fmtcheck(3),
and there there is no way for standard-conforming programs to check
for errors caused by undefined behaviour, since undefined behaviour
includes undefined errors.

I think most checking belongs in the compiler and in fmtcheck(3).
printf() itself cannot detect most types errors without compiler
support that would basically involve passing it the types of all
args so that it could call fmtcheck(3).  Extensions should rarely
be needed for printf(), and it is not unreasonable to expect that
applications to use the extensions to be more careful.  Extensions
might be more needed for printf-like interfaces that aren't really
printf.

>> Another alternative is to spit out the format string unformatted,
>> possibly with an attached notice, but this doesn't really seem to
>> help anybody either, but at least indicates what the problem is.
>>
>> I'm leaning towards doing what phkmalloc has migrated to over time:
>> Make a variable which can select between "normal/paranoia" and force
>> it to paranoia for (uid==0 || gid==0 || setuid || setgid).
>>
>> If the variable is set, a bogus format string will result in abort(2).

This sometimes breaks defined behaviour.

>> If it is not set, the format string will be output unformatted in
>> the message "WARNING: Illegal printf() format string: \"...\".

malloc()'s messages are better ("<progname>: error ...").

> I agree on principle but would like to ask if we need to revisit some of the
> error cases.  Especially with regard to 64bit porting there are some
> "artifacts" that might cause serious pain for ported applications if the
> above is adopted.
>
> Specifically, right now the following will warn "long long int format, int64_t
> arg (arg 2)" on our 64bit architectures while it is required on - at least -
> i386
>
> int64_t i = 1;
> printf("%lld", i);

Warning is the correct behaviour for this.  Code like this isn't required
on any arch, and is just wrong on arches where long long is longer than
int64_t.  Fortunately we have some arches where int64_t has a different
type than long long, so code like this causes a warning so it doesn't
get written so often.  rms added the warning to gcc 15-18 years ago after
I complained about the corresponding bad code for ints and long:

  int i = 1;
  printf("%ld", i);

This is just wrong on arches where long is longer than int.

> Many other platforms allow it for 64bit architectures as well.  As for all our
> 64bit architectures sizeof(long) == sizeof(long long) (as far as I am aware),
> I am not convinced this should be a (fatal) error.  There might be other
> similar cases.

int64_t = long != long long (although all the sizes are the same) is an
artifact.  We use the artificat mainly to detect errors so that many printf
formats don't need to be "fixed" yet again for 128-bit machines.

> So the question is, how strict should this check be?  Are there cases where we
> are better off with a "just do it"-sollution?

Just doing it is the correct method.  It requires the compiler to rewrite
the string (or replace the call to printf by calls to special formatting
functions).  E.g.,

  int64_t i64 = 1;
  long long ll = 1;
  int i = 1;
  printf("%I %I %I\n", i64, ll, i);

should produce the same code as:

  int64_t i64 = 1;
  long long ll = 1;
  int i = 1;
  printf("%" PRIi64 "%ld %d\n", i64, ll, i);

Message catalogs should probably use somthing like fmtcheck() to rewrite
the strings.

> As a community service, there is a right way to do this (according to C99):
>
> int64_t i = 1;
> printf("%" PRIi64 "\n", i);

This is a very wrong way to do this.  It makes the programmer hande details
that the compiler can handle better.

>
> but it's obvious this is not going to be adopted.  The other often used

Fortunately it is so ugly that no one uses it.

> workaround is:
>
> int64_t i = 1;
> printf("%jd\n", (intmax_t)i);

This is not a workaround, but the only reasonable way to do things (unless
%I exists).  It is easier to use than PRI*, and doesn't need changing if
the type of i is expanded.

> or:
>
> printf("%lld\n", (long long)i);

This is a wrong way to do this.  It uses the long long mistake, and only
works because i has type int64_t and long long is longer than int64_t
(the latter is standard).

> which kind of reverts the idea behind useing C99-types.
>
> Note that:
>
> printf("%jd\n, i);
>
> seems to work as well, but I not sure this is correct.

It is incorrect.  It assumes that int64_t has the same _size_ as intmax_t,
and would cause a compile time error if int64_t doesn't have the same
_type_ as intmax_t.

We use intmax_t == int64_t on all arches, but this is an even weirder
artifact than int64_t != long long, since it makes the "maximum" type
"smaller" (same in size, but logically shorter) than long long:

     char < short < int < long = int64_t = intmax_t ~< long long

Bruce
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Poul-Henning Kamp
In message <[hidden email]>, Bruce Evans writes:

>There is also fmtcheck(3).

I didn't even know about that one, but given that there is only two
uses in all of /src I do not feel ashamed.

>Extensions should rarely be needed for printf(),

Actually I disagree with you on that.

It was my list of "things I keep doing over and over" that convinced
me otherwise.

Here are some of the formats I miss, and which I will probably write
extensions for so people can trivially enable them:

        %T print a time_t
        %lT print a struct timeval
        %llT print a struct timespec
        %I print an IP#
        %lI print an IPv6#
        %H Hexdump
        %V stringvis a string
        %M Metric (like the "engineering" format on HP calculators)
        %H "Human" (Tera,Giga,Mega,Kilo{bits,bytes})

>>> I'm leaning towards doing what phkmalloc has migrated to over time:
>>> Make a variable which can select between "normal/paranoia" and force
>>> it to paranoia for (uid==0 || gid==0 || setuid || setgid).
>>>
>>> If the variable is set, a bogus format string will result in abort(2).
>
>This sometimes breaks defined behaviour.

It does ?  I didn't think there were defined behaviour for bogus
format strings ?

>>> If it is not set, the format string will be output unformatted in
>>> the message "WARNING: Illegal printf() format string: \"...\".
>
>malloc()'s messages are better ("<progname>: error ...").

Obviously.


--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
[hidden email]         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Peter Jeremy
In reply to this post by Bruce Evans
On Tue, 2005-Dec-13 18:53:15 +1100, Bruce Evans wrote:
>>>Our first line of defence against this kind of error is compile-time
>>>checking by GCC, but we cannot rely on the string being sane in libc,
>>>we still need to do error checking.
>
>There is also fmtcheck(3).

I'm probably not the only person who hadn't heard of this function
before reading your mail.  I'm not sure it is of much use other than
validating user-supplied format strings (and maybe internationalisation
translations via catgets(3)).

>I think most checking belongs in the compiler and in fmtcheck(3).

fmtcheck(3) can't check that the arguments passed to printf are valid.
All it can do is verify that two arbitrary format strings expect the
same arguments and the documentation states it doesn't even manage that:
     The fmtcheck() function does not understand all of the conversions
     that printf(3) does.
Note that fmtcheck(3) has no way of reporting that neither format
string makes sense - the documentation states that it will always
return one of its arguments.

If a programmer (mistakenly) believes that "%hf" is sensible then
fmtcheck(3) isn't going to help.

>printf() itself cannot detect most types errors without compiler
>support that would basically involve passing it the types of all
>args so that it could call fmtcheck(3).

This is an interesting idea but fmtcheck(3) isn't adequate.  In
particular, consider:
        int i, j;
        const char *fmt, *string;
        ...
        fmt = "%*.*s %p";
        ...
        printf(fmt, i, j, string, string);

Based on the printf() arguments, the compiler can't really do
better than:
        printf(fmtcheck(fmt, "%d %d %s %s"), i, j, string, string);
but fmtcheck(3) doesn't consider that "%*.*s %p" and "%d %d %s %s" are
equivalent.

What could be useful is a function that takes a format string and
a set of argument types and verifies that the argument types match
the format string.  The compiler could then expand the printf() to:
        if (fmtvalidate(fmt, "int;int;char*;char*"))
                printf(fmt, i, j, string, string);
        else
                abort();
(though it's not clear how the compiler should handle struct/union
pointers which might make sense to a user extension).

>>>If the variable is set, a bogus format string will result in abort(2).
>
>This sometimes breaks defined behaviour.

I thought printf(3) documented the behaviour for invalid conversion
specifiers and mofidiers but I can't find it in the man page right now.

--
Peter Jeremy
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Bruce Evans
In reply to this post by Poul-Henning Kamp
On Tue, 13 Dec 2005, Poul-Henning Kamp wrote:

> In message <[hidden email]>, Bruce Evans writes:
>
>> There is also fmtcheck(3).
>
> I didn't even know about that one, but given that there is only two
> uses in all of /src I do not feel ashamed.

I learned about it commit mail (or arch?) when Kris was sweeping for
security holes related to printf formats.

>> Extensions should rarely be needed for printf(),
>
> Actually I disagree with you on that.
>
> It was my list of "things I keep doing over and over" that convinced
> me otherwise.

Now I think they should be very rarely needed and more rarely used.
Using them mainly gives unportable code that breaks especially badly
on systems which don't support extensions.

> Here are some of the formats I miss, and which I will probably write
> extensions for so people can trivially enable them:
>
> %T print a time_t
> %lT print a struct timeval
> %llT print a struct timespec
> %I print an IP#
> %lI print an IPv6#
> %H Hexdump
> %V stringvis a string
> %M Metric (like the "engineering" format on HP calculators)
> %H "Human" (Tera,Giga,Mega,Kilo{bits,bytes})

I think these belong in specialized applications or libraries.  %T is
already handled better by strftime/gmtime/localtime.  It has lots of
subformats and delicate conversion issues.  A generic %T couldn't
reasonably support much more than "%[#0- +,]*.*T".  If a generic version
were implemented as a function in libc, then
printf("%T", asprintf_time_t(tt)) wouldn't be much harder to write than
printf("%T", tt), but storage management for it would be harder.  Maybe
you really want to write cout << tt :-).

>>>> I'm leaning towards doing what phkmalloc has migrated to over time:
>>>> Make a variable which can select between "normal/paranoia" and force
>>>> it to paranoia for (uid==0 || gid==0 || setuid || setgid).
>>>>
>>>> If the variable is set, a bogus format string will result in abort(2).
>>
>> This sometimes breaks defined behaviour.
>
> It does ?  I didn't think there were defined behaviour for bogus
> format strings ?

I mean aborting instead of returning NULL for failing malloc()s breaks
defined behaviour.

Bruce
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Poul-Henning Kamp
In message <[hidden email]>, Bruce Evans writes:

>Now I think they should be very rarely needed and more rarely used.
>Using them mainly gives unportable code that breaks especially badly
>on systems which don't support extensions.

Portability is good, but it shouldn't get in the way of improving
our programs.

>I think these belong in specialized applications or libraries.  %T is
>already handled better by strftime/gmtime/localtime.

There's no handling for fractional seconds there.

>I mean aborting instead of returning NULL for failing malloc()s breaks
>defined behaviour.

Right, that's deliberate.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
[hidden email]         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: printf behaviour with illegal or malformed format string

Bruce Evans
In reply to this post by Peter Jeremy
On Tue, 13 Dec 2005, Peter Jeremy wrote:

> On Tue, 2005-Dec-13 18:53:15 +1100, Bruce Evans wrote:
>>>> Our first line of defence against this kind of error is compile-time
>>>> checking by GCC, but we cannot rely on the string being sane in libc,
>>>> we still need to do error checking.
>>
>> There is also fmtcheck(3).
>
> I'm probably not the only person who hadn't heard of this function
> before reading your mail.  I'm not sure it is of much use other than
> validating user-supplied format strings (and maybe internationalisation
> translations via catgets(3)).

I think internationalisation is its main use now.

>> I think most checking belongs in the compiler and in fmtcheck(3).
>
> fmtcheck(3) can't check that the arguments passed to printf are valid.
> All it can do is verify that two arbitrary format strings expect the
> same arguments and the documentation states it doesn't even manage that:
>     The fmtcheck() function does not understand all of the conversions
>     that printf(3) does.

Its example also starts by saying that %p is compatible with %lu.
Punning "void *" as "unsigned long" might be safe enough if these types
have the same size, but fmtcheck()'s code doesn't seem to have any
size checks.

> If a programmer (mistakenly) believes that "%hf" is sensible then
> fmtcheck(3) isn't going to help.

This can be checked statically -- check the default at compile time;
then any format that actually consumes the same arg types as the
default will consume the ones passed.

fmtcheck() could also rewrite the suspect format -- you could put
all type info in the default and all other info in the suspect format
and merge at runtime.  E.g., afmtcheck("%-*I %#8I", "%*s%jx") =
strdup("%-*s %#8jx").

>> printf() itself cannot detect most types errors without compiler
>> support that would basically involve passing it the types of all
>> args so that it could call fmtcheck(3).
>
> This is an interesting idea but fmtcheck(3) isn't adequate.  In
> particular, consider:
> int i, j;
> const char *fmt, *string;
> ...
> fmt = "%*.*s %p";
> ...
> printf(fmt, i, j, string, string);
>
> Based on the printf() arguments, the compiler can't really do
> better than:
> printf(fmtcheck(fmt, "%d %d %s %s"), i, j, string, string);
> but fmtcheck(3) doesn't consider that "%*.*s %p" and "%d %d %s %s" are
> equivalent.

So more that ordinary C type info is needed.  The compile would have to
parse the format string (which is possible in the above since fmt is
const and visible but not in all cases) and pass "%*.*s %p" not
"%d %d %s %s".

> What could be useful is a function that takes a format string and
> a set of argument types and verifies that the argument types match
> the format string.  The compiler could then expand the printf() to:
> if (fmtvalidate(fmt, "int;int;char*;char*"))
> printf(fmt, i, j, string, string);
> else
> abort();
> (though it's not clear how the compiler should handle struct/union
> pointers which might make sense to a user extension).

Both this and my afmtcheck() example could be merged into printf()
itself.  printf("%*.*s %p", width, prec, s, s1) could be transformed
into printf("%C%*.*s %p", "%*.*s%p", width, prec, s, (void *)s1).  %C
indiciates a check arg and that arg is "%*.*s%p".  The check arg is
amost exactly the original string here but could be very different if
the compiler doesn't understand the format or the check arg is from a
message catalog.  Here the compiler understands the format and does
the following transformations:
- %*.*s -> itself.  No problem since the args match.
- " " -> "".  Not related to args.
- "%s" -> "%p".  The args don't match, but this is harmless for char *
   vs void * and the compiler has silently fixed the arg type to match
   the format.  Perhaps it should leave this to printf().

I don't see any special problem with struct/union pointers.  The
compiler just won't usually be able to understand the pointed-to
objects or parts of the format string related to them.  An escape
sequence might be needed to tell it not to warn about extensions,
but pass the escaped format and args directly to the format
checker.

>>>> If the variable is set, a bogus format string will result in abort(2).
>>
>> This sometimes breaks defined behaviour.

I meant in malloc().

> I thought printf(3) documented the behaviour for invalid conversion
> specifiers and mofidiers but I can't find it in the man page right now.

The C standard mostly says "undefined" for nonstandard things in fprintf().
>From n869.txt:

        [#9] If a conversion specification is invalid, the  behavior
        is  undefined.225)  ...

Footnote 225 points to future library extensions.  Conversions specs
have to be undefined so that everyone can define inconsistent extensions :).

Bruce
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Re: printf behaviour with illegal or malformed format string

Sergey Babkin
In reply to this post by Poul-Henning Kamp
>From: Poul-Henning Kamp <[hidden email]>

>In message <[hidden email]>, George Paplas writes:
>>
>>
>>--- Poul-Henning Kamp <[hidden email]> wrote:
>>
>>> >>If it is not set, the format string will be output unformatted in
>>> >>the message "WARNING: Illegal printf() format string: \"...\".
>>> >
>>> >Since this check presumably applies to the entire *printf() family,
>>> >where do you report the error for {s,f}printf()?
>>>
>>> Whereever the strings was meant to go, what else can I do ?
>>
>>And what if you are doing an sprintf to a buffer smaller than your
>>warning message?
>
>"Too f**king bad"
>
>It's the programmer who's made a mess in the first place, so he
>can't really make many demands in this situation...

Probably the least intrusive thing is to print the original
message to the best ability of deciphering it, and then
print the error message on stderr (kind of like the message
printed on the first usage of gets() and such).

-SB
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"