[CFT] BSDL iconv in base system

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[CFT] BSDL iconv in base system

Gabor Kovesdan-3
Hello Folks,

during the last summer, Google generously founded my Summer of Code
project, which was providing a BSD-licensed iconv implementation for
FreeBSD. I'm proud to announce that the work has been completed and a
patch is available to add it to the base system.

The results of this work are:
- The Citrus implementation has been ported from NetBSD.
- Some utilities have been added. There is a conversion table generator,
which can compare conversion tables to reference data generated by GNU
libiconv. This helps ensuring conversion compatibility.
- UTF-16 surrogate support and some endianness issues have been fixed.
- The rather chaotic Makefiles to build metadata have been refactored
and cleaned up, now it is easy to read and it is also easier to add
support for new encodings.
- A bunch of new encodings and encoding aliases have been added.
- Support for 1->2, 1->3 and 1->4 mappings, which is needed for
transliterating with flying accents as GNU does, like "u.
- Lots of warnings have been fixed, the major part of the code is now
WARNS=6 clean.
- New section 1 and section 5 manual pages have been added.
- Some GNU-specific calls have been implemented: iconvlist(),
iconvctl(), iconv_canonicalize(), iconv_open_into()
- Support for GNU's //IGNORE suffix has been added.
- The "-" argument for stdin is now recognized in iconv(1) as per POSIX.
- The Big5 conversion module has been fixed.
- The iconv.h header files is supposed to be compatible with the GNU
version, i.e. sources should build with base iconv.h and GNU libiconv.
I've just did a very quick test and it seems ports can safely link to
GNU libiconv, there's no conflict.
- Various cleanups and style(9) fixes.
- A bachelor thesis written in Hungarian language:
http://www.kovesdan.org/files/bsc_iconv.pdf

The rather big patch (42,5M) is available here:
http://www.kovesdan.org/patches/iconv_base_integrate.diff

Any comments, suggestions or bugreports are very welcome.

--
Gabor Kovesdan
FreeBSD Volunteer

EMAIL:[hidden email]  .:|:.[hidden email]
WEB:http://people.FreeBSD.org/~gabor  .:|:.http://kovesdan.org

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [CFT] BSDL iconv in base system

Brandon Gooch
On Mon, Jun 14, 2010 at 7:13 PM, Gabor Kovesdan <[hidden email]> wrote:

> Hello Folks,
>
> during the last summer, Google generously founded my Summer of Code project,
> which was providing a BSD-licensed iconv implementation for FreeBSD. I'm
> proud to announce that the work has been completed and a patch is available
> to add it to the base system.
>
> The results of this work are:
> - The Citrus implementation has been ported from NetBSD.
> - Some utilities have been added. There is a conversion table generator,
> which can compare conversion tables to reference data generated by GNU
> libiconv. This helps ensuring conversion compatibility.
> - UTF-16 surrogate support and some endianness issues have been fixed.
> - The rather chaotic Makefiles to build metadata have been refactored and
> cleaned up, now it is easy to read and it is also easier to add support for
> new encodings.
> - A bunch of new encodings and encoding aliases have been added.
> - Support for 1->2, 1->3 and 1->4 mappings, which is needed for
> transliterating with flying accents as GNU does, like "u.
> - Lots of warnings have been fixed, the major part of the code is now
> WARNS=6 clean.
> - New section 1 and section 5 manual pages have been added.
> - Some GNU-specific calls have been implemented: iconvlist(), iconvctl(),
> iconv_canonicalize(), iconv_open_into()
> - Support for GNU's //IGNORE suffix has been added.
> - The "-" argument for stdin is now recognized in iconv(1) as per POSIX.
> - The Big5 conversion module has been fixed.
> - The iconv.h header files is supposed to be compatible with the GNU
> version, i.e. sources should build with base iconv.h and GNU libiconv. I've
> just did a very quick test and it seems ports can safely link to GNU
> libiconv, there's no conflict.
> - Various cleanups and style(9) fixes.
> - A bachelor thesis written in Hungarian language:
> http://www.kovesdan.org/files/bsc_iconv.pdf
>
> The rather big patch (42,5M) is available here:
> http://www.kovesdan.org/patches/iconv_base_integrate.diff

Over 40 Megabytes?! WOW. Thank you for this incredible amount of work,
I know the FreeBSD community will benefit greatly from it.

I think this effort deserves some hardcore testing, so now to the
FreeBSD community -- I know it will get the attention it deserves :)

-Brandon
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [CFT] BSDL iconv in base system

Gleb Kurtsou-3
In reply to this post by Gabor Kovesdan-3
On (15/06/2010 02:13), Gabor Kovesdan wrote:

> Hello Folks,
>
> during the last summer, Google generously founded my Summer of Code
> project, which was providing a BSD-licensed iconv implementation for
> FreeBSD. I'm proud to announce that the work has been completed and a
> patch is available to add it to the base system.
>
> The results of this work are:
> - The Citrus implementation has been ported from NetBSD.
> - Some utilities have been added. There is a conversion table generator,
> which can compare conversion tables to reference data generated by GNU
> libiconv. This helps ensuring conversion compatibility.
> - UTF-16 surrogate support and some endianness issues have been fixed.
> - The rather chaotic Makefiles to build metadata have been refactored
> and cleaned up, now it is easy to read and it is also easier to add
> support for new encodings.
> - A bunch of new encodings and encoding aliases have been added.
> - Support for 1->2, 1->3 and 1->4 mappings, which is needed for
> transliterating with flying accents as GNU does, like "u.
> - Lots of warnings have been fixed, the major part of the code is now
> WARNS=6 clean.
> - New section 1 and section 5 manual pages have been added.
> - Some GNU-specific calls have been implemented: iconvlist(),
> iconvctl(), iconv_canonicalize(), iconv_open_into()
> - Support for GNU's //IGNORE suffix has been added.
> - The "-" argument for stdin is now recognized in iconv(1) as per POSIX.
> - The Big5 conversion module has been fixed.
> - The iconv.h header files is supposed to be compatible with the GNU
> version, i.e. sources should build with base iconv.h and GNU libiconv.
> I've just did a very quick test and it seems ports can safely link to
> GNU libiconv, there's no conflict.
> - Various cleanups and style(9) fixes.
> - A bachelor thesis written in Hungarian language:
> http://www.kovesdan.org/files/bsc_iconv.pdf
>
> The rather big patch (42,5M) is available here:
> http://www.kovesdan.org/patches/iconv_base_integrate.diff
>
> Any comments, suggestions or bugreports are very welcome.

Awesome! Thanks for working on it.

Are there any plans to resurrect/finish multibyte collation support
GSoC'2008 project:
http://wiki.freebsd.org/KonradJankowski/Collation

And are you aware of any plans on adding utf8-aware regex? I think
NetBSD has already imported one:
http://blog.netbsd.org/tnf/entry/efficient_wide_character_regular_expressions

Thanks,
Gleb.

> --
> Gabor Kovesdan
> FreeBSD Volunteer
>
> EMAIL:[hidden email]  .:|:.[hidden email]
> WEB:http://people.FreeBSD.org/~gabor  .:|:.http://kovesdan.org
>
> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[hidden email]"
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [CFT] BSDL iconv in base system

Gabor Kovesdan-3

> Are there any plans to resurrect/finish multibyte collation support
> GSoC'2008 project:
> http://wiki.freebsd.org/KonradJankowski/Collation
>    
Yes, my queue is just so long that I haven't got there yet. I'm in SoC
2010 again with a different project and there's still BSD grep from SoC
2008. I'm also fixing the last nits of that. And there are also personal
things, like a one-year internship in Portugal, which is going to start
in September. But I hope once I'll find time or this.
> And are you aware of any plans on adding utf8-aware regex? I think
> NetBSD has already imported one:
> http://blog.netbsd.org/tnf/entry/efficient_wide_character_regular_expressions
>    
Yes, again but same issues. :) Besides, we need/should add a more
relaxed regex support to TRE before we can adopt it. GNU regex allows
things like [a|], which is not standard, so we should support them to
maintain compatibility. This will be important for ports. This is also
the reason why BSD grep is linked to GNU regex instead of libc-regex.

--
Gabor Kovesdan
FreeBSD Volunteer

EMAIL: [hidden email] .:|:. [hidden email]
WEB:   http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [CFT] BSDL iconv in base system

Konrad Jankowski
In reply to this post by Gleb Kurtsou-3
On Tue, Jun 15, 2010 at 7:01 PM, Gleb Kurtsou <[hidden email]> wrote:

> On (15/06/2010 02:13), Gabor Kovesdan wrote:
>> Hello Folks,
>>
>> during the last summer, Google generously founded my Summer of Code
>> project, which was providing a BSD-licensed iconv implementation for
>> FreeBSD. I'm proud to announce that the work has been completed and a
>> patch is available to add it to the base system.
>>
>> The results of this work are:
>> - The Citrus implementation has been ported from NetBSD.
>> - Some utilities have been added. There is a conversion table generator,
>> which can compare conversion tables to reference data generated by GNU
>> libiconv. This helps ensuring conversion compatibility.
>> - UTF-16 surrogate support and some endianness issues have been fixed.
>> - The rather chaotic Makefiles to build metadata have been refactored
>> and cleaned up, now it is easy to read and it is also easier to add
>> support for new encodings.
>> - A bunch of new encodings and encoding aliases have been added.
>> - Support for 1->2, 1->3 and 1->4 mappings, which is needed for
>> transliterating with flying accents as GNU does, like "u.
>> - Lots of warnings have been fixed, the major part of the code is now
>> WARNS=6 clean.
>> - New section 1 and section 5 manual pages have been added.
>> - Some GNU-specific calls have been implemented: iconvlist(),
>> iconvctl(), iconv_canonicalize(), iconv_open_into()
>> - Support for GNU's //IGNORE suffix has been added.
>> - The "-" argument for stdin is now recognized in iconv(1) as per POSIX.
>> - The Big5 conversion module has been fixed.
>> - The iconv.h header files is supposed to be compatible with the GNU
>> version, i.e. sources should build with base iconv.h and GNU libiconv.
>> I've just did a very quick test and it seems ports can safely link to
>> GNU libiconv, there's no conflict.
>> - Various cleanups and style(9) fixes.
>> - A bachelor thesis written in Hungarian language:
>> http://www.kovesdan.org/files/bsc_iconv.pdf
>>
>> The rather big patch (42,5M) is available here:
>> http://www.kovesdan.org/patches/iconv_base_integrate.diff
>>
>> Any comments, suggestions or bugreports are very welcome.
>
> Awesome! Thanks for working on it.
>
> Are there any plans to resurrect/finish multibyte collation support
> GSoC'2008 project:
> http://wiki.freebsd.org/KonradJankowski/Collation

Hi. The project is not dead. I've resumed actively working on it.
Expect some patches/commits soon.


--
Konrad
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [CFT] BSDL iconv in base system

Jaakko Heinonen
In reply to this post by Gabor Kovesdan-3

Hi,

On 2010-06-15, Gabor Kovesdan wrote:
> - The iconv.h header files is supposed to be compatible with the GNU
> version, i.e. sources should build with base iconv.h and GNU libiconv.
> I've just did a very quick test and it seems ports can safely link to
> GNU libiconv, there's no conflict.

> The rather big patch (42,5M) is available here:
> http://www.kovesdan.org/patches/iconv_base_integrate.diff

iconv(3) prototype doesn't conform to POSIX.1-2008. Is it a
well-considered decision?

--
Jaakko
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [CFT] BSDL iconv in base system

Gabor Kovesdan-3

> iconv(3) prototype doesn't conform to POSIX.1-2008. Is it a
> well-considered decision?
>    
No, it was just like that in the Citrus version and I didn't notice the
const qualifier. Fixed in my working copy, will be available soon with
some minor modifications. Thanks for reporting this.

--
Gabor Kovesdan
FreeBSD Volunteer

EMAIL: [hidden email] .:|:. [hidden email]
WEB:   http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [CFT] BSDL iconv in base system

Dag-Erling Smørgrav
In reply to this post by Jaakko Heinonen
Jaakko Heinonen <[hidden email]> writes:
> iconv(3) prototype doesn't conform to POSIX.1-2008. Is it a
> well-considered decision?

Probably not, because it breaks the interface.

Imagine that inbuf were just a char *, not a char **.  It would be
perfectly safe to change it to const char *, because you can always
assign a char * to a const char *.

However, inbuf is a char **, which is a pointer to a pointer to char.
Gabor changed it to const char **, which is a pointer to a pointer to
const char.  Unfortunately, the two types are incompatible.  If foo is a
char *, you can't pass &foo as inbuf.

% cat >/tmp/const.c <<EOF
#include <stdio.h>
void fs(char *s) { puts(++s); }
void gs(const char *s) { puts(++s); }
void fsp(char **sp) { puts(++*sp); }
void gsp(const char **sp) { puts(++*sp); }
int main() { char *s = "xyzzy", **sp = &s; fs(s); gs(s); fsp(sp); gsp(sp); }
EOF
% cc -Wall -Wextra -Werror -std=c99 -o/dev/null /tmp/const.c
cc1: warnings being treated as errors
/tmp/const.c: In function ‘main’:
/tmp/const.c:6: error: passing argument 1 of ‘gsp’ from incompatible pointer type
/tmp/const.c:5: note: expected ‘const char **’ but argument is of type ‘char **’

This means you can't, say, read data from a file into a buffer and then
pass that buffer to iconv, because the buffer is not const (otherwise
you couldn't have read data into it).  That seems like a pretty
fundamental flaw.

DES
--
Dag-Erling Smørgrav - [hidden email]
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [CFT] BSDL iconv in base system

Jilles Tjoelker
In reply to this post by Jaakko Heinonen
On Wed, Jun 16, 2010 at 10:04:16PM +0300, Jaakko Heinonen wrote:
> On 2010-06-15, Gabor Kovesdan wrote:
> > - The iconv.h header files is supposed to be compatible with the GNU
> > version, i.e. sources should build with base iconv.h and GNU libiconv.
> > I've just did a very quick test and it seems ports can safely link to
> > GNU libiconv, there's no conflict.

> > The rather big patch (42,5M) is available here:
> > http://www.kovesdan.org/patches/iconv_base_integrate.diff

> iconv(3) prototype doesn't conform to POSIX.1-2008. Is it a
> well-considered decision?

I think the difference from POSIX.1-2008 is pretty common and may
therefore cause less compilation problems. NetBSD's Citrus iconv and GNU
iconv have the extra 'const', and so does the default Solaris iconv
(Solaris has a separate iconv for standards-conforming applications with
the POSIX prototype.)

--
Jilles Tjoelker
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [CFT] BSDL iconv in base system

Anonymous-86
Jilles Tjoelker <[hidden email]> writes:

> On Wed, Jun 16, 2010 at 10:04:16PM +0300, Jaakko Heinonen wrote:
>> On 2010-06-15, Gabor Kovesdan wrote:
>> > - The iconv.h header files is supposed to be compatible with the GNU
>> > version, i.e. sources should build with base iconv.h and GNU libiconv.
>> > I've just did a very quick test and it seems ports can safely link to
>> > GNU libiconv, there's no conflict.
>
>> > The rather big patch (42,5M) is available here:
>> > http://www.kovesdan.org/patches/iconv_base_integrate.diff
>
>> iconv(3) prototype doesn't conform to POSIX.1-2008. Is it a
>> well-considered decision?
>
> I think the difference from POSIX.1-2008 is pretty common and may
> therefore cause less compilation problems. NetBSD's Citrus iconv and GNU
> iconv have the extra 'const', and so does the default Solaris iconv

GNU iconv doesn't use `const' by default. Our port adds it explicitly.

    CONFIGURE_ENV= gl_cv_cc_visibility="no" \
                   am_cv_func_iconv="yes" \
                   am_cv_proto_iconv_arg1="const"

For example, devel/git refers to `const' using OLD_ICONV macro.

BTW, iconv() on DragonFlyBSD doesn't seem to have `const', too.

> (Solaris has a separate iconv for standards-conforming applications with
> the POSIX prototype.)
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "[hidden email]"