mtree "language" enhancements

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

mtree "language" enhancements

Warner Losh
Greetings,

As part of making NanoBSD buildable by non-root, I've found a need to have
a richer mtree language than we currently have.

mtree started out as a language to express hierarchies of files. It does a
decent job at that, even if some of the tools that we have in the tree
aren't so great about manipulating them. One could easily wish for better
tools, but that's not the topic of this thread.

So, I've started to move the language into one that can also journal
changes to a tree, and have been moving NanoBSD to using wrappers that do
the changes to the tree and record the journal events at the end of the
metalog produced from buildworld. I have a second tool that reads the meta
log, and applies the actions to the earlier entries and then produces a
final metalog that's used for makefs. These tools are still evolving, but
before I got too close to the point of committing, I thought I'd post a
proposed extension to mtree for comments so I don't have to change too much.

I'd like a new type called 'action' (so type=action in the records). This
type is defined loosely to manipulate and earlier entry (or maybe entries,
still unsure) in the file.

Each action entry would have an 'action' keyword. The keywords I've defined
so far are as follows:
1. "unlink" which throws away the previous entry. That entry has been
removed. It may apply to files or directories, but it is an error not to
remove all entries in a directory when removing the directory.
2. "move" which relocates a previous entry. An additional targetpath
keyword specifies the ultimate destination for this entry.
3. "copy" which duplicates a previous entry. It too takes targetpath.
4. "meta" which changes the meta data of the previous entry. All keywords
on this are merged with the previous entry.

The one other thing that my merging tool does is to remove all size
keywords. In the NanoBSD environment, size is irrelevant. Files are
replaced and appended to all the time in the build process, and it doesn't
make sense to track the size. makefs fails if the size is different, so
post-processing of the tree, say to add a new default to
/etc/defaults/rc.conf or to tweak /etc/ttys to turn on/off a tty (or append
a new entry) will cause it to fail. I would be nice of mtree could do this,
but is simply can't (but see above for whining about better tools being
beyond the scope of this).

If things go well, we could eventually move these extensions into mtree so
that the post-processing stage is no longer necessary. I'm content to
maintain the hundred or two lines of awk I've written to implement it. I
chose awk because it does the job well enough, though python might do it
better. But I don't want to talk about that choice since right now it is
purely internal to NanoBSD (though I hope that other build orchestration
systems like src/release and crochet look to adopt).

Comments?

Warner
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: mtree "language" enhancements

Poul-Henning Kamp
--------
In message <[hidden email]>
, Warner Losh writes:

>As part of making NanoBSD buildable by non-root, I've found a need to have
>a richer mtree language than we currently have.

>I'd like a new type called 'action' (so type=action in the records). This
>type is defined loosely to manipulate and earlier entry (or maybe entries,
>still unsure) in the file.

I suggest you define this so that all records have an action, and that
the default action is "create"

>2. "move" which relocates a previous entry. An additional targetpath
>keyword specifies the ultimate destination for this entry.
>3. "copy" which duplicates a previous entry. It too takes targetpath.

Is targetpath absolute or relative ?

Can it reach out of the mtree root ?

>4. "meta" which changes the meta data of the previous entry. All keywords
>on this are merged with the previous entry.

System-III called this "chmog" if I recall correctly :-)

>The one other thing that my merging tool does is to remove all size
>keywords.

That sounds wrong to me.  Shouldn't you just emit "meta" records updating
the size as appropriate ?

What about digest fields ?

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
[hidden email]         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: mtree "language" enhancements

Warner Losh
On Sun, Nov 29, 2015 at 11:16 AM, Poul-Henning Kamp <[hidden email]>
wrote:

> --------
> In message <
> [hidden email]>
> , Warner Losh writes:
>
> >As part of making NanoBSD buildable by non-root, I've found a need to have
> >a richer mtree language than we currently have.
>
> >I'd like a new type called 'action' (so type=action in the records). This
> >type is defined loosely to manipulate and earlier entry (or maybe entries,
> >still unsure) in the file.
>
> I suggest you define this so that all records have an action, and that
> the default action is "create"


From a practical point of view, I didn't consider this, but that is
what would be a logical consequence of these extensions.


>
> >2. "move" which relocates a previous entry. An additional targetpath
> >keyword specifies the ultimate destination for this entry.
> >3. "copy" which duplicates a previous entry. It too takes targetpath.
>
> Is targetpath absolute or relative ?
>

relative to top of tree.


> Can it reach out of the mtree root ?


Nope. Those cases need entirely new entries.


>
> >4. "meta" which changes the meta data of the previous entry. All keywords
> >on this are merged with the previous entry.
>
> System-III called this "chmog" if I recall correctly :-)


I love that term. I'll steal it :)


>
> >The one other thing that my merging tool does is to remove all size
> >keywords.
>
> That sounds wrong to me.  Shouldn't you just emit "meta" records updating
> the size as appropriate ?
>

Emitting records that change the size is possible, but would add an extra
step. It's easy to catch mv, rm, etc, but hard to catch >>. I took the easy
way out of just ignoring size changes, though one could add a nano_resize
<path>
command that you need to call after changing the size of a file in the
post-processing phase.


> What about digest fields ?
>

In my use case, they are irrelevant.  They aren't generated by buildworld's
metalog, and aren't generally useful. They might add some protection against
tampering between when the tree is created and when it is put into a
partition,
but that's racy. For an attacker, if they can replace the file after it is
created
but before the checksum is run, they win. So there's little value here for
me.

However, having said that, digest fields either should be discarded (for
the same
reason as size), or they should be correct before the dedup tool / enhanced
mtree
gets to them. This gets into the nuts and bolts of NanoBSD: we copy files
around
all the time, but have no spec for them. The usual answer is to have a bunch
of chmod / chown calls that 'fix' them up and generate a mtree for the image
so you can protect against corruption in the field (or at least know what
changed).
In a nopriv-build, you need to somehow record these changes. Do I continue
the
traditional behavior, or do I require a new mtree spec for all the files you
wish top copy and use that to modify the metalog, or hack the permissions
directly for the priv-build case. The decision between discard and check
likely is an input to the dedup tool. For NanoBSD the decision is likely
to default to discard. But other tools might want to check, and some
NanoBSD users may wish to climb the hill to being correct by adding
calls to correct the size everywhere.

My first goal is to create a tool that produces correct images with
the right permissions. A secondary goal would be to safe-guard the process
from unintended changes that would be caught by size and/or digest
changes. It isn't a current feature of NanoBSD, but that doesn't make
it undesirable. Especially if your NanoBSD build process puts precious
files onto the media that you want to make sure the rest of the build
process doesn't tamper with accidentally to guard against bugs...

Warner
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: mtree "language" enhancements

Tim Kientzle-2
In reply to this post by Warner Losh
Sounds interesting.

Have you talked with Michal (CCed) who is working on a libmtree library?

The capabilities you're describing here really need to be bundled into a library, I think.  In particular, the ability to "unlink", "copy", etc, is much more useful if you can directly query the mtree file contents to perform conditional changes.  (For example, it may be important to remove an empty directory which requires you to be able to query whether a directory has files in it.)

I would also be interested in a description of the processing model.  It sounds like you're assuming the same model used by the current mtree program -- mtree files are processed sequentially line-by-line as they are read.

For instance, libarchive's mtree processor works differently; it reads the entire input, merging redundant lines for the same file, and then processes the list.  This is more explicitly declarative, and simplifies things like modifying the ownership or permissions of already-listed files.

> Each action entry would have an 'action' keyword.

In terms of the language per se, this seems unnecessary.    I've proposed alternate language below that omits the unnecessary "type=action" by just adding new keywords.

> The keywords I've defined
> so far are as follows:
> 1. "unlink" which throws away the previous entry. That entry has been
> removed. It may apply to files or directories, but it is an error not to
> remove all entries in a directory when removing the directory.

# When set on an entry, a matching file on disk will be removed.
# This would also be useful for things like ObsoleteFiles
unlink=true

> 2. "move" which relocates a previous entry. An additional targetpath
> keyword specifies the ultimate destination for this entry.

# When set on an entry, moves the existing file to the new name
rename=<targetpath>

# Example
foo/bar type=file owner=root mode=0755 rename=foo/baz

> 3. "copy" which duplicates a previous entry. It too takes target path.

# As with rename, except it copies the contents.
copy_from=<original>

# properties that are not specified will be copied as well
# Create foo/bar by copying foo/baz, preserving all attributes
foo/bar type=file copy_from=foo/baz
# Create foo/bar as above, but modify the owner
foo/bar owner=dialer type=file copy_from=foo/baz

> 4. "meta" which changes the meta data of the previous entry. All keywords
> on this are merged with the previous entry.

As above, libarchive's mtree processor already does this by default; no language change is needed.

> The one other thing that my merging tool does is to remove all size
> keywords. ... [comments about modifying existing files]

One common case here is appending new contents to an existing file.  That could similarly be handled with the same pattern:

# Append from source
foo/bar append_from=<target path>

In particular, that removes the need to find the source file to modify it in-place.  I've run into various headaches with Crochet when the /usr/obj layout changes between releases and Crochet cannot find the new location of a file.  This would remove the need to always modify the file in-place.  (But not all.)

Cheers,

Tim



> On Nov 29, 2015, at 10:04 AM, Warner Losh <[hidden email]> wrote:
>
> Greetings,
>
> As part of making NanoBSD buildable by non-root, I've found a need to have
> a richer mtree language than we currently have.
>
> mtree started out as a language to express hierarchies of files. It does a
> decent job at that, even if some of the tools that we have in the tree
> aren't so great about manipulating them. One could easily wish for better
> tools, but that's not the topic of this thread.
>
> So, I've started to move the language into one that can also journal
> changes to a tree, and have been moving NanoBSD to using wrappers that do
> the changes to the tree and record the journal events at the end of the
> metalog produced from buildworld. I have a second tool that reads the meta
> log, and applies the actions to the earlier entries and then produces a
> final metalog that's used for makefs. These tools are still evolving, but
> before I got too close to the point of committing, I thought I'd post a
> proposed extension to mtree for comments so I don't have to change too much.
>
> I'd like a new type called 'action' (so type=action in the records). This
> type is defined loosely to manipulate and earlier entry (or maybe entries,
> still unsure) in the file.
>
> Each action entry would have an 'action' keyword. The keywords I've defined
> so far are as follows:
> 1. "unlink" which throws away the previous entry. That entry has been
> removed. It may apply to files or directories, but it is an error not to
> remove all entries in a directory when removing the directory.
> 2. "move" which relocates a previous entry. An additional targetpath
> keyword specifies the ultimate destination for this entry.
> 3. "copy" which duplicates a previous entry. It too takes targetpath.
> 4. "meta" which changes the meta data of the previous entry. All keywords
> on this are merged with the previous entry.
>
> The one other thing that my merging tool does is to remove all size
> keywords. In the NanoBSD environment, size is irrelevant. Files are
> replaced and appended to all the time in the build process, and it doesn't
> make sense to track the size. makefs fails if the size is different, so
> post-processing of the tree, say to add a new default to
> /etc/defaults/rc.conf or to tweak /etc/ttys to turn on/off a tty (or append
> a new entry) will cause it to fail. I would be nice of mtree could do this,
> but is simply can't (but see above for whining about better tools being
> beyond the scope of this).
>
> If things go well, we could eventually move these extensions into mtree so
> that the post-processing stage is no longer necessary. I'm content to
> maintain the hundred or two lines of awk I've written to implement it. I
> chose awk because it does the job well enough, though python might do it
> better. But I don't want to talk about that choice since right now it is
> purely internal to NanoBSD (though I hope that other build orchestration
> systems like src/release and crochet look to adopt).
>
> Comments?
>
> Warner
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "[hidden email]"
>

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: mtree "language" enhancements

Simon J. Gerraty
In reply to this post by Warner Losh
Warner Losh <[hidden email]> wrote:
> As part of making NanoBSD buildable by non-root, I've found a need to have
> a richer mtree language than we currently have.

No fundamental objection there.
Indeed I'd really like the ability to provide default uid/gid
for the case that a uname/gname cannot be looked up.
Or even just a flag to say if lookup fails use 0:0
This would avoid the need to post-process BSD.var.dist to replace all
uname/gname with uid=0/gid=0 during various bootstrap situations.

> I'd like a new type called 'action' (so type=action in the records). This
> type is defined loosely to manipulate and earlier entry (or maybe entries,
> still unsure) in the file.
>
> Each action entry would have an 'action' keyword. The keywords I've defined

would or could?

> so far are as follows:
> 1. "unlink" which throws away the previous entry. That entry has been
> removed. It may apply to files or directories, but it is an error not to
> remove all entries in a directory when removing the directory.
> 2. "move" which relocates a previous entry. An additional targetpath
> keyword specifies the ultimate destination for this entry.
> 3. "copy" which duplicates a previous entry. It too takes targetpath.
> 4. "meta" which changes the meta data of the previous entry. All keywords
> on this are merged with the previous entry.

Probably need to know a bit more about how NanoBSD is built/packaged to
comment more usefully.  Any useful references?

> The one other thing that my merging tool does is to remove all size
> keywords. In the NanoBSD environment, size is irrelevant. Files are
..
> replaced and appended to all the time in the build process, and it doesn't
> make sense to track the size. makefs fails if the size is different, so

Agreed.

Where do these size keywords come from?
We (Juniper) do not have them in any of our mtree based manifests.
Which we use directly with makefs.

On the off chance it is of interest...
I wonder if this style of manifest would simplify your problem?
I believe all the code needed (other than makefiles) is in head at least.

There are two styles supported, classic mtree:

#mtree
#
# Group IDs used:
#       0       wheel
#
# User IDs used:
#       0       root
#
/set uid=0 gid=0 mode=555 type=file

bin type=dir
  cat contents="${STAGE_OBJTOP}/bin/cat"
  cp contents="${STAGE_OBJTOP}/bin/cp"
     
  ..

which is good for manually maintained manifests,
and for autogenerated (eg via find) an full path format:

usr/tests/bin/cat/d_align.in mode=0644 contents="/b/sjg/work/stable10/obj/stage/i386/usr/tests/bin/cat/d_align.in"
usr/tests/bin/cat/d_align.out mode=0644 contents="/b/sjg/work/stable10/obj/stage/i386/usr/tests/bin/cat/d_align.out"

the two can be combined - an mtree style header with autogenerated
info appended.

> If things go well, we could eventually move these extensions into mtree so
> that the post-processing stage is no longer necessary. I'm content to
> maintain the hundred or two lines of awk I've written to implement it. I
> chose awk because it does the job well enough, though python might do it
> better. But I don't want to talk about that choice since right now it is
> purely internal to NanoBSD (though I hope that other build orchestration
> systems like src/release and crochet look to adopt).

FWIW we use python when awk/sed etc prove insufficient or cumbersome
but awk/sed are usually adequate.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: mtree "language" enhancements

Warner Losh
In reply to this post by Tim Kientzle-2
On Sun, Nov 29, 2015 at 11:59 AM, Tim Kientzle <[hidden email]> wrote:

> Sounds interesting.
>
> Have you talked with Michal (CCed) who is working on a libmtree library?
>

No. I haven't. I've been thinking mostly what's the fastest way I can get
NanoBSD working in a nopriv (-DNO_ROOT) environment that wouldn't
be hard to push into a library later.


> The capabilities you're describing here really need to be bundled into a
> library, I think.  In particular, the ability to "unlink", "copy", etc, is
> much more useful if you can directly query the mtree file contents to
> perform conditional changes.  (For example, it may be important to remove
> an empty directory which requires you to be able to query whether a
> directory has files in it.)
>

In the NanoBSD context, these entries would be automatically generated,
so the tree is at hand. There'd be no need for this conditional stuff,
though
having it as an additional extension wouldn't be bad.


> I would also be interested in a description of the processing model.  It
> sounds like you're assuming the same model used by the current mtree
> program -- mtree files are processed sequentially line-by-line as they are
> read.
>

The processing model is that the resulting mtree file is read sequentially.
Each
new entry either creates a new node in an internal representation, or
modifies
a previous node. Once everything has been processed, the internal
representation
would be used to do something. In my case, I'd output an mtree file free of
these
extensions.


> For instance, libarchive's mtree processor works differently; it reads the
> entire input, merging redundant lines for the same file, and then processes
> the list.  This is more explicitly declarative, and simplifies things like
> modifying the ownership or permissions of already-listed files.


Yes. My awk script that is the first manifestation of these extensions
is implemented this way. That's why I described it as a journal, but
didn't explain that in my nomenclature, a journal is process
first to last to get the current state.


>
> > Each action entry would have an 'action' keyword.
>
> In terms of the language per se, this seems unnecessary.    I've proposed
> alternate language below that omits the unnecessary "type=action" by just
> adding new keywords.


That would work too. I came up with the type=action thing as a way to avoid
a lot of new keywords, and to segregate the new actions from the old, but
what you propose would also work and might be more general.

> The keywords I've defined
> > so far are as follows:
> > 1. "unlink" which throws away the previous entry. That entry has been
> > removed. It may apply to files or directories, but it is an error not to
> > remove all entries in a directory when removing the directory.
>
> # When set on an entry, a matching file on disk will be removed.
> # This would also be useful for things like ObsoleteFiles
> unlink=true


OK. That's a little different than what I had in mind. My notion was that
the tree would be modified in place to remove the file, and this entry
would announce that action so the mtree internal representation could
be modified to reflect that. Though I do see value in your approach.


>
> > 2. "move" which relocates a previous entry. An additional targetpath
> > keyword specifies the ultimate destination for this entry.
>
> # When set on an entry, moves the existing file to the new name
> rename=<targetpath>
>
> # Example
> foo/bar type=file owner=root mode=0755 rename=foo/baz


That would work.

>
> > 3. "copy" which duplicates a previous entry. It too takes target path.
>
> # As with rename, except it copies the contents.
> copy_from=<original>
>

Yes.


> # properties that are not specified will be copied as well
> # Create foo/bar by copying foo/baz, preserving all attributes
> foo/bar type=file copy_from=foo/baz
> # Create foo/bar as above, but modify the owner
> foo/bar owner=dialer type=file copy_from=foo/baz


s/owner/uname=/ but I like that.


> > 4. "meta" which changes the meta data of the previous entry. All keywords
> > on this are merged with the previous entry.
>
> As above, libarchive's mtree processor already does this by default; no
> language change is needed.


OK. If it matches existing practice, I'm cool with the change.


> > The one other thing that my merging tool does is to remove all size
> > keywords. ... [comments about modifying existing files]
>
> One common case here is appending new contents to an existing file.  That
> could similarly be handled with the same pattern:
>
> # Append from source
> foo/bar append_from=<target path>
>

That's a novel idea. My most-processor might have a little trouble with it
if we were trying not
to modify the actual target tree. But with modify in place, we could make
it work.


> In particular, that removes the need to find the source file to modify it
> in-place.  I've run into various headaches with Crochet when the /usr/obj
> layout changes between releases and Crochet cannot find the new location of
> a file.  This would remove the need to always modify the file in-place.
> (But not all.)
>

It is a useful pattern.

Most of the nanobsd scripts I've seen use >> to append individual files,
one line at a time.


Warner

Cheers,

>
> Tim
>
>
>
> > On Nov 29, 2015, at 10:04 AM, Warner Losh <[hidden email]> wrote:
> >
> > Greetings,
> >
> > As part of making NanoBSD buildable by non-root, I've found a need to
> have
> > a richer mtree language than we currently have.
> >
> > mtree started out as a language to express hierarchies of files. It does
> a
> > decent job at that, even if some of the tools that we have in the tree
> > aren't so great about manipulating them. One could easily wish for better
> > tools, but that's not the topic of this thread.
> >
> > So, I've started to move the language into one that can also journal
> > changes to a tree, and have been moving NanoBSD to using wrappers that do
> > the changes to the tree and record the journal events at the end of the
> > metalog produced from buildworld. I have a second tool that reads the
> meta
> > log, and applies the actions to the earlier entries and then produces a
> > final metalog that's used for makefs. These tools are still evolving, but
> > before I got too close to the point of committing, I thought I'd post a
> > proposed extension to mtree for comments so I don't have to change too
> much.
> >
> > I'd like a new type called 'action' (so type=action in the records). This
> > type is defined loosely to manipulate and earlier entry (or maybe
> entries,
> > still unsure) in the file.
> >
> > Each action entry would have an 'action' keyword. The keywords I've
> defined
> > so far are as follows:
> > 1. "unlink" which throws away the previous entry. That entry has been
> > removed. It may apply to files or directories, but it is an error not to
> > remove all entries in a directory when removing the directory.
> > 2. "move" which relocates a previous entry. An additional targetpath
> > keyword specifies the ultimate destination for this entry.
> > 3. "copy" which duplicates a previous entry. It too takes targetpath.
> > 4. "meta" which changes the meta data of the previous entry. All keywords
> > on this are merged with the previous entry.
> >
> > The one other thing that my merging tool does is to remove all size
> > keywords. In the NanoBSD environment, size is irrelevant. Files are
> > replaced and appended to all the time in the build process, and it
> doesn't
> > make sense to track the size. makefs fails if the size is different, so
> > post-processing of the tree, say to add a new default to
> > /etc/defaults/rc.conf or to tweak /etc/ttys to turn on/off a tty (or
> append
> > a new entry) will cause it to fail. I would be nice of mtree could do
> this,
> > but is simply can't (but see above for whining about better tools being
> > beyond the scope of this).
> >
> > If things go well, we could eventually move these extensions into mtree
> so
> > that the post-processing stage is no longer necessary. I'm content to
> > maintain the hundred or two lines of awk I've written to implement it. I
> > chose awk because it does the job well enough, though python might do it
> > better. But I don't want to talk about that choice since right now it is
> > purely internal to NanoBSD (though I hope that other build orchestration
> > systems like src/release and crochet look to adopt).
> >
> > Comments?
> >
> > Warner
> > _______________________________________________
> > [hidden email] mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> > To unsubscribe, send any mail to "[hidden email]"
> >
>
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: mtree "language" enhancements

Tim Kientzle-2

> On Nov 29, 2015, at 11:22 AM, Warner Losh <[hidden email]> wrote:
>
> I would also be interested in a description of the processing model.  It sounds like you're assuming the same model used by the current mtree program -- mtree files are processed sequentially line-by-line as they are read.
>
> The processing model is that the resulting mtree file is read sequentially. Each
> new entry either creates a new node in an internal representation, or modifies
> a previous node. Once everything has been processed, the internal representation
> would be used to do something. In my case, I'd output an mtree file free of these
> extensions.

Good.  I like that model.

> > 1. "unlink" which throws away the previous entry.
>
> # When set on an entry, a matching file on disk will be removed.
> # This would also be useful for things like ObsoleteFiles
> unlink=true
>
> OK. That's a little different than what I had in mind. My notion was that
> the tree would be modified in place to remove the file, and this entry
> would announce that action so the mtree internal representation could
> be modified to reflect that. Though I do see value in your approach.

I was thinking that the 'mtree' command-line tool could be useful for bulk-remove operations (or more generally for updating an existing tree including removal of obsolete files).  But bulk-remove is probably easier to do with 'xargs rm', so that might be overkill.


Simon J. Gerry suggested:
> which is good for manually maintained manifests,
> and for autogenerated (eg via find) an full path format:
>
> usr/tests/bin/cat/d_align.in mode=0644 contents="/b/sjg/work/stable10/obj/stage/i386/usr/tests/bin/cat/d_align.in"
> usr/tests/bin/cat/d_align.out mode=0644 contents="/b/sjg/work/stable10/obj/stage/i386/usr/tests/bin/cat/d_align.out"
>
> the two can be combined - an mtree style header with autogenerated
> info appended.

libarchive also supports this mixture.  It's a little tricky to parse accurately, though.  I think libarchive considers any line a "full path" line if the name has a '/' in it.  So you occasionally need to use things like './foo' to force the right interpretation.  And of course, there are tricky details like merging properties accurately when some are specified in the old format and some in the new.

Simon also asked:
> Indeed I'd really like the ability to provide default uid/gid
> for the case that a uname/gname cannot be looked up.

I think 'tar' got this right:  If uname and uid are both specified, then look up uname and if that fails, use the specified uid.  Ditto for gname/gid.  In particular, this lets a single specification be used to rebuild a tree on another system with different UIDs or on a system that does not (yet) have a full password file.  An option could be provided for the (rare) case that someone really wants to prefer UIDs to unames.

Tim

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: mtree "language" enhancements

Tim Kientzle-2

> On Nov 29, 2015, at 2:49 PM, Tim Kientzle <[hidden email]> wrote:
>
> Simon also asked:
>> Indeed I'd really like the ability to provide default uid/gid
>> for the case that a uname/gname cannot be looked up.
>
> I think 'tar' got this right:  If uname and uid are both specified, then look up uname and if that fails, use the specified uid.  Ditto for gname/gid.  In particular, this lets a single specification be used to rebuild a tree on another system with different UIDs or on a system that does not (yet) have a full password file.  An option could be provided for the (rare) case that someone really wants to prefer UIDs to unames.

On further reflection, preferring UIDs to unames would actually be pretty common here.

In particular, NanoBSD (and Crochet and other similar tools) should prefer the UID when building images instead of looking up unames against the build host's password file.

Tim

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: mtree "language" enhancements

Warner Losh
On Sun, Nov 29, 2015 at 9:28 PM, Tim Kientzle <[hidden email]> wrote:

>
> > On Nov 29, 2015, at 2:49 PM, Tim Kientzle <[hidden email]> wrote:
> >
> > Simon also asked:
> >> Indeed I'd really like the ability to provide default uid/gid
> >> for the case that a uname/gname cannot be looked up.
> >
> > I think 'tar' got this right:  If uname and uid are both specified, then
> look up uname and if that fails, use the specified uid.  Ditto for
> gname/gid.  In particular, this lets a single specification be used to
> rebuild a tree on another system with different UIDs or on a system that
> does not (yet) have a full password file.  An option could be provided for
> the (rare) case that someone really wants to prefer UIDs to unames.
>
> On further reflection, preferring UIDs to unames would actually be pretty
> common here.
>
> In particular, NanoBSD (and Crochet and other similar tools) should prefer
> the UID when building images instead of looking up unames against the build
> host's password file.


I've implemented what we've talked about, except this. When doing the
makefs, we should use the /etc/master_password that's inside the image in
preference to either of these alternatives. That's the most correct thing
to do: use as much of the data as you can, as late as you can.

The thing I'm struggling with now is why would both be present? Would that
indicate an error? Or someone changing the defaults? And if they are
changing the defaults, why use a uid in preference to a uname? Is this to
avoid contamination? To set something not in the password file, or just
comfort level of the user? FreeBSD will write unames for install*.

So I'm left thinking that maybe the rule should be 'last one wins' at least
for the use case where we use the target's /etc/master_password. That's
what I've actually implemented.

Preliminary testing of http://people.freebsd.org/~imp/mtree-dedup.awk
appears to be working. I haven't tried all the cases yet, but it is looking
promising. I don't need append_from, so that's just a stub in this file.
Since this is in awk, I don't use the host's /etc/password at all. That's
one of the failures of mtree that I've seen when I tried to use it, and
perhaps the source of your concern. I'd love to see any libmtree be able to
manipulate mtree files absent the tree it describes and even any process of
uname -> uid at all to avoid these issues. The silly awk thing I wrote is
purely a path to set of key-value pair manipulation tool.

Once I'm  more confident about this after some testing and integration into
NanoBSD, I'll post something to phabricator. But I'd welcome any comments
on what I've implemented in the mean time.

Warner
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: mtree "language" enhancements

Tim Kientzle-2

> On Nov 29, 2015, at 9:49 PM, Warner Losh <[hidden email]> wrote:
>
> On Sun, Nov 29, 2015 at 9:28 PM, Tim Kientzle <[hidden email]> wrote:
>
>>
>>> On Nov 29, 2015, at 2:49 PM, Tim Kientzle <[hidden email]> wrote:
>>>
>>> Simon also asked:
>>>> Indeed I'd really like the ability to provide default uid/gid
>>>> for the case that a uname/gname cannot be looked up.
>>>
>>> I think 'tar' got this right:  If uname and uid are both specified, then
>> look up uname and if that fails, use the specified uid.  Ditto for
>> gname/gid.  In particular, this lets a single specification be used to
>> rebuild a tree on another system with different UIDs or on a system that
>> does not (yet) have a full password file.  An option could be provided for
>> the (rare) case that someone really wants to prefer UIDs to unames.
>>
>> On further reflection, preferring UIDs to unames would actually be pretty
>> common here.
>>
>> In particular, NanoBSD (and Crochet and other similar tools) should prefer
>> the UID when building images instead of looking up unames against the build
>> host's password file.
>
>
> I've implemented what we've talked about, except this. When doing the
> makefs, we should use the /etc/master_password that's inside the image in
> preference to either of these alternatives. That's the most correct thing
> to do: use as much of the data as you can, as late as you can.
>
> The thing I'm struggling with now is why would both be present? Would that
> indicate an error? Or someone changing the defaults? And if they are
> changing the defaults, why use a uid in preference to a uname? Is this to
> avoid contamination? To set something not in the password file, or just
> comfort level of the user? FreeBSD will write unames for install*.
>
> So I'm left thinking that maybe the rule should be 'last one wins' at least
> for the use case where we use the target's /etc/master_password. That's
> what I've actually implemented.

There are two key cases that drove this design for tar:

1.  Handling user info that is not (yet) in the target password file.  In practice, images get built up in different orders:  I might add a bunch of new files owned by a new user before some other process gets a chance to add the user.

2.  Restoring info when the target has different user numbering than the host.  (Or when the user isn’t in the host password file at all.)

For #1, you need the UID since the uname can’t be looked up anywhere.  For #2, you must have the uname since the UID would be wrong.  An image that can work in either scenario needs to have both.

For NanoBSD, you may be able to enforce that users are always present in the target password file before any data owned by those users is added to the image.  So it may be reasonable to just rely on uname everywhere for now.

Tim

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: mtree "language" enhancements

Simon J. Gerraty
Tim Kientzle <[hidden email]> wrote:

> > So I'm left thinking that maybe the rule should be 'last one wins' at least
> > for the use case where we use the target's /etc/master_password. That's
> > what I've actually implemented.
>
> There are two key cases that drove this design for tar:
>
> 1.  Handling user info that is not (yet) in the target password file.
> In practice, images get built up in different orders: I might add a
> bunch of new files owned by a new user before some other process gets
> a chance to add the user.

This is the issue we face.
We don't like magic numbers so prefer to use names (uid=0 gid=0
is fine).

We use mtree with BSD.var.dist at various times, and in at least some of
those cases we cannot assume that the passwd or group databases will
be complete (or even valid - eg during recovery from corrupted storage).

In such cases we could easily tollerate mtree simply using 0:0 (or
current uid:gid) for any uname:gname it could not resolve, since we
aren't likely to care about those dirs until we are up and running
properly - by which time the ownership would have been fixed.

What we don't want is for mtree to toss its cookies or flood the console
with pointless noise (which it is wont to do).

What we currently have to do to avoid problems, is run BSD.var.dist
through sed to replace all \([gu]\)name=[^ ]* with \1id=0 and
and it would be nice to be able to skip that.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: mtree "language" enhancements

Masao Uebayashi
In reply to this post by Tim Kientzle-2
On Tue, Dec 1, 2015 at 11:31 AM, Tim Kientzle <[hidden email]> wrote:

>
>> On Nov 29, 2015, at 9:49 PM, Warner Losh <[hidden email]> wrote:
>>
>> On Sun, Nov 29, 2015 at 9:28 PM, Tim Kientzle <[hidden email]> wrote:
>>
>>>
>>>> On Nov 29, 2015, at 2:49 PM, Tim Kientzle <[hidden email]> wrote:
>>>>
>>>> Simon also asked:
>>>>> Indeed I'd really like the ability to provide default uid/gid
>>>>> for the case that a uname/gname cannot be looked up.
>>>>
>>>> I think 'tar' got this right:  If uname and uid are both specified, then
>>> look up uname and if that fails, use the specified uid.  Ditto for
>>> gname/gid.  In particular, this lets a single specification be used to
>>> rebuild a tree on another system with different UIDs or on a system that
>>> does not (yet) have a full password file.  An option could be provided for
>>> the (rare) case that someone really wants to prefer UIDs to unames.
>>>
>>> On further reflection, preferring UIDs to unames would actually be pretty
>>> common here.
>>>
>>> In particular, NanoBSD (and Crochet and other similar tools) should prefer
>>> the UID when building images instead of looking up unames against the build
>>> host's password file.
>>
>>
>> I've implemented what we've talked about, except this. When doing the
>> makefs, we should use the /etc/master_password that's inside the image in
>> preference to either of these alternatives. That's the most correct thing
>> to do: use as much of the data as you can, as late as you can.
>>
>> The thing I'm struggling with now is why would both be present? Would that
>> indicate an error? Or someone changing the defaults? And if they are
>> changing the defaults, why use a uid in preference to a uname? Is this to
>> avoid contamination? To set something not in the password file, or just
>> comfort level of the user? FreeBSD will write unames for install*.
>>
>> So I'm left thinking that maybe the rule should be 'last one wins' at least
>> for the use case where we use the target's /etc/master_password. That's
>> what I've actually implemented.
>
> There are two key cases that drove this design for tar:
>
> 1.  Handling user info that is not (yet) in the target password file.  In practice, images get built up in different orders:  I might add a bunch of new files owned by a new user before some other process gets a chance to add the user.

When you say "image", you surely mean "file-system image".
File-system image contains on-disk data (inode), which contains
UID/GID instead of symbolic ones (uname/gname).

When you decide to create an image, you have a whole tree
(directories/files) that ends up in a generated file-system image.
Which means that when you create an image, you must know all the files
and UIDs/GIDs put there.  If not, what you are creating should not be
an image.  If you don't know UIDs/GIDs, can't you just create a tar
archive, and extract it when you really create an image later?

I don't really want mtree(1) unnecessarily smart so it makes
unnecessary decisions.  I want it to be simple and deterministic.

> 2.  Restoring info when the target has different user numbering than the host.  (Or when the user isn’t in the host password file at all.)
>
> For #1, you need the UID since the uname can’t be looked up anywhere.  For #2, you must have the uname since the UID would be wrong.  An image that can work in either scenario needs to have both.
>
> For NanoBSD, you may be able to enforce that users are always present in the target password file before any data owned by those users is added to the image.  So it may be reasonable to just rely on uname everywhere for now.
>
> Tim
>
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "[hidden email]"
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: mtree "language" enhancements

Mark Felder
In reply to this post by Tim Kientzle-2


On Sun, Nov 29, 2015, at 22:28, Tim Kientzle wrote:

>
> > On Nov 29, 2015, at 2:49 PM, Tim Kientzle <[hidden email]> wrote:
> >
> > Simon also asked:
> >> Indeed I'd really like the ability to provide default uid/gid
> >> for the case that a uname/gname cannot be looked up.
> >
> > I think 'tar' got this right:  If uname and uid are both specified, then look up uname and if that fails, use the specified uid.  Ditto for gname/gid.  In particular, this lets a single specification be used to rebuild a tree on another system with different UIDs or on a system that does not (yet) have a full password file.  An option could be provided for the (rare) case that someone really wants to prefer UIDs to unames.
>
> On further reflection, preferring UIDs to unames would actually be pretty
> common here.
>

Just don't lose the functionality to use unames. It's really useful when
changing lots of UIDs. Just schedule maintenance, do an mtree capture of
the filesystem, change UIDs, re-apply the mtree. It will fix everything
for you :-)

--
  Mark Felder
  ports-secteam member
  [hidden email]
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"