Standards

Navigation

Comments on DRAFT AES57-xxxx

last updated 2011-08-10

PAGE XX, SUPPLEMENTARY. Click here to access main page for comments on this document.


Comments received from Mr. I. Rudd, 2011-07-07 interleaved with
Reply from D. Ackerman, chair of SC-03-06, 2011-07-22

Dear Mr Ackerman,
Thank you for your response to my comments. Alas, there is a fair bit of work still to be done on the standard, as my response to your response, attached [reproduced below], shows.

Dear Mr. Rudd,
In this document are my responses to your second round of comments. Please reply within two weeks if this reply is not acceptable to you. You may also ask us to consider your comments again for the next revision of the document. You may also appeal our decision to the Standards Secretariat.
All comments and answers are being accumulated in a subject-named file on the Web site.
Very best, David

Item Comment

2 Numerous definitions include the term being defined (e.g. "width shall be used to describe the width") and therefore need to be reworded. I do not claim to have caught all of them in my specific comments and a general review with this problem in mind is required.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]
Evidently, I wasn't clear enough - sorry. The point is that the definition of the word in question includes the word which is being defined. I used an example which happened to use a term in monospaced font, but my concern - in numerous places - is that a word is defined using the word itself and so the definitions which take this form are not adequate. Say you asked me what a Compact Disc is; were I to tell you that a Compact Disc is a compact disc, you would be none the wiser, but were I to tell you that it is a circular plane of polycarbonate material which stores recorded music in the form of digital audio which can be read using reflected laser light, you would have a much better idea of what the object is and what its purpose is. So here, telling me that the width is the width doesn't explain what is meant by the term "width" and similarly in other places throughout the document. Work is required to improve all the definitions which fall into this trap.
Your point was understood. This was a point of extensive discussion during the writing of the document. The writing group felt it was not necessary to redefine common terms like width but rather to simply assign them to the formal term to which they apply, e.g. width. I am not persuaded otherwise at this time.

6 Specific comments:

Page 6, Section 3.1 While I can understand the thinking around wave files stored on "transient" media, what does an archivist do with audio stored on, say, 8" floppies or disc packs - or LTO for that matter? Might not a deep, long-term archive hold wave files on these formats, particularly LTO?
[Out of scope for this document. May be considered in a later revision or an alternative document. ]
In that case, we should say that LTO etc. is out of scope, at least for now, in Sections 1 and 3.1 so that an archivist with these formats can stop reading, but at least be aware that we have thought about the matter (and that things may change).
Just to be clear, AES57 is of no use to describe the LTO tape. It can however be used perfectly well to describe the audio files that the LTO tape may contain. When the archivist migrates those files from that LTO tape to the next storage medium, the AES57 instance document(s) should equally well describe that migrated data. Therefore I disagree with your conclusion and the need to warn the archivist.

8 Page 6, Section 3.2 "A document that conforms to the minimally required set of elements and attributes defined by an XML schema." This implies that a document with more than the minimally-required set isn't an instance document - and anyway, the Introduction points out that the concept of a minimum data set is not (always) a realistic proposition, "but rather is the set of elements that is expected to be known or determinable at a minimum".

I think the intention would be better expressed as: "A document that conforms to the XML schema (or Document Object Model) to which it refers." Alternatively, we could say (NB hyphenation): "In general, a document that conforms to the minimally-required set of elements and attributes defined by the XML schema (or Document Object Model - "DOM") to which it refers. However, recognising that there will be certain information that cannot be known in some environments, such as an archive, an instance document is also defined here as a document that conforms to the XML schema or DOM to which it refers."
[Not accepted - this language has been well understood by implementers to date. The issue of what is not known or known is a separate matter from what comprises an instance document and I believe it best to not introduce this idea into the definition of the instance document.]
If you are sure that everyone knows what we mean (are you?) then all right. I was trying to add clarity so that we say what we mean. Not saying what we mean is a problem elsewhere in the document too.
An instance document is a well understood concept among people who work with XML. A document containing more than the minimum set of elements and attributes is still an instance document. The issue of unknown data is a separate issue, even though it may influence ones ability to construct an instance document.

9 Page 6, Section 4.1 "Each audio object is described by a single instance document in a strict one-to-one mapping." would be better expressed as: "For a given archive or domain, each audio object is described by a single instance document in a strict one-to-one mapping." This allows multiple archives to create individual, locally-relevant instance documents for a given audio object to which they have access. E.g. Archive B may wish to classify objects by mood, something which may not be relevant to (or perhaps agreed by) the owning archive, Archive A.
[Out of scope for the current document. Issues of separate data sets addressing the same objects may be considered at a future stage.]
Again, then, for the detail, we need to state that this is the case in Section 1. A reader who picks up this document needs to know - quickly - how useful it will be for the purpose required. The AES will not win friends from those who rummage around this document in hope and then walk away disappointed. HOWEVER, the comment was not about technicalities, but rather it was about the clarity of expression and it needs improvement.

10 Page 6, Section 4.1 "Other standards exist that address such high level structural metadata." Does the AES not cite other standards bodies in such cases, even as informative references? (Candidate examples might include EBU Tech 3306 for audio instances and Tech 3295 for the editorial connection between audio instances.)
[There are many other metadata schemes, however it is not the purpose of this standard to list them. They are not required in order to implement this standard.]
No informative reference is ever required to implement a standard and yet we have them. Annex C has a whole page with just a single informative reference on it ...

11 Page 6, Sections 4.2, 4.3, 4.4 I found these sections most confusing. E.g. we say in 4.2 "The top level of the document is the audioObject section". We then talk in Section 4.3 about something which wasn't mentioned in Section 4.2, an element, and then say in Section 4.4 that the audioObject isn't at the top level after all: "The audioObject element is a subclass of the objectType element." Er, except that the Section has the heading, "Document root" - ?!?
[In this schema, the document root is in fact the audioObject element. There is no issue with the root element itself inheriting properties from an abstract element as it does in this case. The abstract objectType element cannot be directly instantiated in any case, so I don't think this should prove confusing to developers who work with XML.

The w3c schema primer (see http://www.w3.org/TR/xmlschema-0/#abstract ) states, "XML Schema provides a mechanism to force substitution for a particular element or type. When an element or type is declared to be "abstract", it cannot be used in an instance document. When an element is declared to be abstract, a member of that element's substitution group must appear in the instance document. When an element's corresponding type definition is declared as abstract, all instances of that element must use xsi:type to indicate a derived type that is not abstract."]
I think most people who work with XML will be aware that an abstract element cannot be instantiated. We must be aware that not all audio archivists are XML experts though. Again, my concern is not with the technicalities, but with the clarity of expression, which is why I suggested - and still propose - the revision in my original comments.
This standard has been written to be implemented by the programmers and systems people who build the interfaces that these archivists use to generate the metadata. It is expected that they will be familiar with XML. It is highly doubtful that archivists will be hand coding XML instance documents that conform to this standard.

13 In addition, there is a chance that known data may be lost if element values are known but not used and yet this circumstance is allowed by "audioObject element may contain the following sub-elements and attributes". There is further opportunity for confusion by specifying mandatory elements/ attributes (OCCURS MIN = 1) within an optional framework ("may contain the following sub-elements and attributes"). ...
[Not all objects can provide all supported elements, but where they do the requirement is clear. Comment rejected.]
So, again, we need to be clear in what we say; rather than, "may" - which means "need not" - we ought to be using the structure "where exists - must".
See the "note on normative language" at the foot of page 3: "Sentences giving permission use the verb, "may". Actual usage can depend on the policy of the archive. Validation against the schema provides the necessary robustness in implementation.

15 Page 7; Table Would it be useful to have a NOTE to remind the reader that ID is an abstract attribute? I suspect that not everyone will associate the italics with the abstract super-class.
[This parameter is in italics in accordance with clause 0.1.2. to indicate that, "Inherited elements and attributes are printed in an italicized equally spaced font." This is correct as it is.]
All right. Again, I know what it means and yes, the explanation is tucked away in Section 0.1.2. My concern is merely to help the reader, especially someone who may be unfamiliar with the concepts of XML. Remember than an archivist is not necessarily a fully-informed information-engineering expert as well.
This document is for people who plan to implement the standard, not the users you are describing. Placing the explanation in Section 0.1.2 - in the document preamble - emphasises it's general applicability in what follows.

17 Page 8 Section 4.4.2.1.1.2.2 Particularly because physicalProperties can be specified for a formatRegion, we need to allow for the structure of leader tape here. (Don't forget the colour!)
[This clause specifies a way to describe the physical structure of any tape. Leader tape is a case where a substrate will be described but no coating. Leader tape and recording tape will appear as separate "sections"]
That's not at all clear from the document! It needs to be made explicit,for the reader could not guess the intention. Oh, and given your answer, there is no roleType for leader either; there needs to be one.
The document does not tell you how to describe leader, nor does it tell you how to describe acetate tape or polyester tape or paper tape or… It however, gives you a series of data elements that you can use to describe all of those things, if you have knowledge of their construction.

The parameter roleType does not need a value named leader. Rather a piece of leader tape should be described as consisting of a SUPPORT_LAYER and a LABEL_LAYER, but these details in the various combinations are beyond the scope of this document and better left to a separate user guide.

18 Page 11 Section 4.4.2.1.1.3 opticalStructure. Isn't a film optical sound track a valid medium? How do I handle it?
[Out of scope for this standard. May be considered for a future revision of this standard, or a complementary standard. (there will be other formats that are not described here that will be handled similarly).]
Again, then, we need to state that this is the case in Section 1. A reader who picks up this document needs to know - quickly - how useful it will be for the purpose required. The AES will not win friends from those who rummage around this document in hope and then walk away disappointed and frustrated.
I believe it to be impossible to provide a comprehensive list of all unsupported formats. Additional formats can be added to revisions of this standard directly as need arises, or by reference to other metadata sets.

19a Page 12, Section 4.4.2.1.1.4.1.2 I found the term "filler Layer" for the inner core extremely confusing when considering the analogue disc, initially thinking of it in terms of the other layers of tape and discs. I have no problem with the term "innerCore" (or innerCoreLayer" if it is desired to make the type structure clearer, but I think it adds ambiguity) being of type layerType. I suggest a new second sentence be inserted, viz: "[...] where one is present. Where it exists, the inner core of a disc is the separate material between the inner edge of the disc and the hole to accommodate the spindle of the disc's player. innerCore [or innerCoreLayer] is of data type layerType[...]"
[In this context, "filler" is a layer of certain kinds of disk and not a radial component.]
Ahhhh! In that case, the comment I originally created (but deleted when, after quite some consideration, I thought I understood what the document meant) needs to be stated and an extra comment now comes into play:
1 Given that "the base substrate of the analog disc media" is recorded by the substrateMaterialLayer (see also Fig 1a - Order 0, "the primary substrate layer"), saying that the fillerLayer provides "information about the inner core of an analog disc" is most confusing; for most readers, the innerCore would surely be considered to be the same as the basesubstrateLayer. Again, the intention of the standard needs to be explicit. An improved definition is required here and that definition should make it clear which Order is meant when thinking of the diagram - Orders 1/ -1? Alternatively, there should be another diagram which clarifies the intent.
This layer is used to describe the material between the substrateLayer which declares itself to be the base layer of the object and the surfaceLayer. I don't agree that it is particularly confusing to anyone who has enough information to make use of it. I do agree that it would be helpful to provide some tutorial material on this topic in a separate document and I am willing to draft an additional diagram to illustrate its use.
19b 2 How do I accommodate the early discs which have a large central hole into which has to be fitted another piece of material with a hole to accommodate the spindle of the disc's player? (I don't know the correct term for this thing, sorry - a "former"? I remember them though.)
This document does not currently concern itself with the size of the spindle hole. This could be amended in a future revision.

20 Page 13, Section 4.4.2.1.1.5.1.2 Again, innerCore (or innerCoreLayer) is a better term. Here the new, inserted second sentence should read: "[...] where one is present. Where it exists, the inner core of a cylinder is the separate material between the inner edge of the cylinder and the cylinder's spindle or hole to accommodate the spindle of the cylinder's player. innerCore (or innerCoreLayer) is of data type layerType[...]"
[See above. In this context, "filler" is a layer of certain kinds of disk and not a radial component.]
Ahhhh! Again. So this term refers to the structure of the outer wall of the cylinder and not a radial component. Again, this needs to be clarified, for the wording is deeply ambiguous.
See above answer regarding this on disc recordings.

22 Page 13/ 14 Section 4.4.2.1.2 Table
TAPE width : the definition includes the word being defined. I suggest: width shall be used to describe the breadth of the tape, as seen between the two flanges of the tape reel.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]
No - point missed. See my response to the first comment on Page 1. (This is a specific instance of the general point I made there.)
See my response under 'specific comments", above.

23 TAPE length : the double-definition is ambiguous and it includes the word being defined.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]
No - point missed. See my response to the first comment on Page 1. (This is a specific instance of the general point I made there.)
See my response under 'specific comments", above.

24 TAPE length The words recognise that the first definition is not entirely adequate, but the second one doesn't help; the leading foot or two would not be played past the tape head because that length is necessary to secure the tape to the take-up spool and it is not clear if the leader tape should be included. I suggest a single definition: length shall be used to describe the distance measured from one end of the tape to the other, including any leader tape. (NB we have to include the leader tape or else we shall have to measure and subtract all the intermediate lengths of leader tape!)
[Leader tape and recording tape will appear as separate "sections" in a single object]
Ah. I think care is needed here, for as we state in Section 4.4.2, "physicalProperties element may appear as the second sub-element of the audioObject section(see 4.4) or may appear as the first sub element of the formatRegion element (see 4.4.17.2)". Where we are at the audioObject level, we are not considering "sections", but rather "the [entire] length of the tape unwound from the reel". Additionally, one of the definitions needs to be removed to reduce the chance of misinterpretation.
It is reasonable to expect that you would state the length of the tape in physicalProperties at the audioObject level and then define regions that describe tape-with-oxide vs leader-tape. This again is perfect tutorial material for a separate document.

25a TAPE thickness : older tapes may well not be of uniform thickness, with areas of oxide loss etc. and again the definition includes the word being defined. I suggest: thickness shall be used to describe the total depth of of a single, straight piece of tape with all layers intact, from one face to the other.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.
No - point missed. See my response to the first comment on Page 1. (This is a specific instance of the general point I made there.)
See previous remarks on this. The issue of uniform thickness should be covered by the entry for TAPE in the table on page 13- 14 where is states that "thickness shall be used to describe the total thickness of the tape."
25b The issue of uniform thickness should be covered by the entry for TAPE in the table on page 13-14 where is states that "thickness shall be used to describe the total thickness of the tape."]
No - point missed. I'm asking about tapes which are not of uniform thickness because of oxide loss etc.
The document only concerns itself with measuring the total thickness of the tape, not with measuring the variations in thickness that might occur due to oxide loss. Such oxide loss could be documented in a conditionNote however.

26 ANALOG DISC or OPTICAL DISC We need to improve the English here (especially "laying"): thickness shall be used to describe the distance from the bottom of the disc to the top of the disc when the disc is lying flat.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]
No - point missed. See my response to the first comment on Page 1.
See my response under 'specific comments", above.

27 WIRE diameter Again the double-definition can be simplified and improved by the removal of the word being defined: diameter shall be used to describe the distance across the wire, as seen looking down the wire from one end to the other.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]
No - point missed. See my response to the first comment on Page 1. Additionally, one of the definitions needs to be removed to reduce the chance of misinterpretation.
See my response under 'specific comments", above.

28 WIRE length Similar problems to TAPE, before. I suggest: length shall be used to describe the distance measured from one end of the wire to the other.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]
No - point missed. See my response to the first comment on Page 1.
See my response under 'specific comments", above.

29 CYLINDER length Similar problems to TAPE, before. I suggest: length shall be used to describe the distance measured from one end of the cylinder to the other.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]
No - point missed. See my response to the first comment on Page 1.
See my response under 'specific comments", above.

30a Page 15 Sections 4.4.2.1.2.1.6 and 4.4.2.1.2.1.7 Similar to earlier, where a shell exists ought not MIN OCCURS = 1?
[In an ideal world it would but one of the issues with xml is that it becomes cumbersome to enforce either/or paradigms. To keep this element simple, the schema defines all sub-elements as optional but the text of the standard specifies in 4.4.2.1.3 on page 18, "When shellDimensions is present in an instance document it shall either have a length, width and depth sub-element, or it shall have a diameter and depth sub-element. All other combinations from the dimensionsType are illegal in this context."

In 4.4.2.1.2.1.7 it would likewise be cumbersome to map out all of the possible combinations that could be required for various media objects]
Two things:
1 Point taken - BUT because we are in Section 4.4.2.1.2 and not 4.4.2.1.3 (some pages further on and which therefore might well not be read by someone using the document as a reference tool), we need to include those words again here.
Someone reading as a reference tool hopefully will take note of the reference to 4.4.2.1.3 where they will get a full description of how to use the data type described in 4.4.2.1.2.1.6
30b 2 Accepting your point about height in Section 4.4.2.2.1.7 (below) - with my proviso - we need to allow for the exceptional use of the height dimension, both in Section 4.4.2.1.2.1.6 and on page 18, for it is currently illegal.
It was the conclusion of the writing group that a shell can adequately be described with using just length, width and depth or just depth and diameter. Please understand that 4.4.2.1.2.1.7 dimensionsType is a more general dataType used as a catchall where the other format specific types fail and therefore it contains all possible elements for describing the dimensions of an audio object, including height.

31 Page 17 Sections 4.4.2.1.2.4, 4.4.2.1.2.5, 4.4.2.1.2.6, 4.4.2.1.2.7, 4.4.2.1.2.8, 4.4.2.1.2.9, 4.4.2.1.2.10 All these sections have definitions which include the term being defined and need to be reworded in a manner similar to length, diameter, thickness etc. previously.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]
No - point missed. See my response to the first comment on Page 1.
See my response under 'specific comments", above.

32 Page 17 Section 4.4.2.1.2.1.7/ 4.4.2.1.2.5 height appears to have been included "for completeness" but it adds ambiguity, particularly in the light of Figure 2 and the convention set out for the shell's dimensions in Section 4.4.2.1.3. It is not used elsewhere in the document and indeed it is implicitly declared illegal at the end of Section 4.4.2.1.3. It needs to be removed.
[Included because height may be important for specific objects. May be deprecated in favour of 'depth' in a future revision.]
In that case we need to say that depth is the preferred term, but that height is included to allow for exceptional circumstances; it needs to be used with great care to ensure that the data is kept as clean as possible. We must not refer to "specific objects" unless we then specify them.
To be considered as part of a future revision where we would expect to benefit from practical experience from early implementations

33 Page 18 Section 4.4.3 Don't we also want to know what the nature of the private data is? I think the definition should be: "When present, the value for the appSpecificData element shall be the nature of the data deposited in the audio object by the software application name and version defined by the appSpecificDataType." This make the intent of the practical example much clearer - that what is wanted is the form of the data and not the data itself. (At least, I presume that is the intent - ?)
[The point about private data is that its private. The administrator of the database associated with this metadata set will not know its content or purpose, simply that there's a block of data associated with this object. The point of recording this information is to first recognize and preserve its existence and that it may serve a usefulness to the file owner through the use of a software application despite its proprietary nature, and second to map the structure of the file for those who come to curate it in the future long after the useful life of that private data so that they may have the best chance possible of recovering the sound essence from the file.]
Yes, having the best chance possible of recovering the essence (and metadata!) was my intention too. Perhaps I should have asked the question - as I do now: Where does the archivist record the data which conforms to a known standard, such as the <bext>. <levl> or <axml> chunks of the Broadcast Wave Format?
Currently out of scope of this document, may be considered in a revision.

34 Page 18 Section 4.4.4 Particularly since the word "compression" does not appear in the document and also because not everyone will appreciate the difference between coding and compression, I think that the reader would find it most helpful to be reminded of the difference in a NOTE and to have a pointer to bitrateReduction (Section 4.4.17.4.13) in this Section.
[Not all coding schemes use data compression. PCM, for example. In this context, the use of compression or otherwise is not relevant.]
Er, yes, I know that not all coding schemes use compression. However, particularly since the word "compression" does not appear in the document and because not all archivists will appreciate the difference between coding and compression, I think that the reader would find it most helpful to be reminded of the difference in a NOTE and to have a pointer to bitrateReduction (Section 4.4.17.4.13) - where compression is relevant - in this Section.
As noted above, this standard is written for software implementers rather than archivists. I believe the content as it stands is clear.

35 Page 21, Section 4.4.11.2 Given that we are aiming this document at preservation and restoration, an archive would seem a popular place to use it. Therefore, ought we not to include ACCESSION_NUMBER as a primary identifierTypeType? Also, should not UUIDs be on the primary list? The reference is: http://www.itu.int/ITU-T/asn1/uuid.html

The list is oriented towards the material itself, which is fine, but ought not some specific provision be made for editorial identifiers such as ISRC, ISAN, V-ISAN, Programme Number etc. in the secondary Identifier? Maybe another Section is needed for this purpose - editorialIdentifier? (In this case, materialIdentifierType and secondaryMaterialIdentifier would have to be re-named materialIdentifierType and secondaryMaterialIdentifier.)
[Accession numbers, program numbers and the like often refer to groups of objects and as such may not be the best identifiers to use in the context of a primary identifier which is a 'primary key' or unique identifier. On the other hand ISRC and I presume ISAN numbers may refer to parts of an audio object's contents, more so than the object itself and likewise seem unfit for use in this field. However, that said, both the primaryIdentifier and the secondaryIdentifier may be set to type OTHER where the user may define their own type using the idOtherType attribute. Additional identifier types may be added in future revisions.]
All right. We won't be thanked though.

42 Page 27 section 4.4.16.3.6.1 et seq. Given that channelAssignment is mandatory, that it represents an area of the audio sound stage and that it has mandatory left/ right and front/ rear positions, what do I do with a set of stems?
[You should describe each channel of the set with its intended playout position. If no position is known, use the default center-front setting.]
In that case we need to say so. Again, this is an example of where we need to make the implicit explicit.
I don't believe that multi-channel stems represent a special case - the general case will be satisfactory to define the pan position of each channel, or to use the default.

43 Page 27 Section 4.4.16.3.6.2.1: Typo (probably): as we have three required attributes, don't they all have to haveMIN OCCURS = 1?
[Not in this case. In XML speak, when you have an attribute that carries a default value, it is necessary to set its use attribute in the schema document to optional. It seems that the w3c believes it makes no sense to have a default value for an attribute that you require the user to provide. However the net effect is that the optional attributes that carry default values will always be present in the document simple because the default value is provided when the user omits their own.]
In that case, we need to add a note to say that if the user does not specify a value, the system will create the attribute with the default value. Though not an expert, I know a bit about XML; given that I missed this nuance, many/ most archivists will do so also, I suggest. Adding the w3c's argument (as you just have) will make the position much clearer.
As noted above, this standard is written for software implementers rather than archivists. I believe the content as it stands is clear. The effect of an XML default setting could be explained to general users in an explanatory supplement in due course.

49 Page 32 Section 4.4.17.4.2.3 If you accepted the need for film optical storage earlier, do we need a valid speedMeasurementUnitsType here to accommodate that form of storage?
[Out of scope for this document.]
Fine - so long as we specified this form of storage to be out of scope in Section 1.

53 Page 34 Section 4.4.17.4.6 If you accepted the need for film optical storage earlier, then I think we need words to accommodate it here too (but I don't know enough about sep opt practice to provide sufficient direct guidance, sorry).
[Out of scope for this document.]
Fine - so long as we specified this form of storage to be out of scope in Section 1.

61 Page 37 Section 4.4.17.5 I suppose that recording the splice angles used on analogue audio tape is too arcane . . .
[Out of scope for this document. May be considered for a future revision.]
All right.

62 Page 39 Between Sections 4.4.17.8 and 4.4.17.9 If you accepted the need for film optical storage earlier, then I think we need a new opticalFilmFormatRegionType Section, similar to the other xxxFormatRegionTypes, to accommodate it here too (but I don't know enough
[Out of scope for this document.]
Fine - so long as we specified this form of storage to be out of scope in Section 1.

64 Page 40 section 4.4.19 We need to be clear about what is meant by "title"; do we mean ownership, an award, the name of the series, the name of the programme, the name of the episode, the temporary name of the piece ("working title") ... ? What about a work which is called something quite different (not just a translation of the original language) in different languages?

Only one title is currently allowed per audio object; I suggest that we need to do more work here. Imagine a disc from a series called "Horn Spectacular"; this release in the series won a "Disc of the Year" award in 1935 and it contains Joseph Haydn's Symphony No 31 in D Major, Hob.I:31 "Auf dem Anstand" - in English (not a literal translation): "Horn signal". My archive wishes to record that this disc is on permanent loan from the benefactor who actually owns it. What happens?
[We mean the name the owner of the object associates with the audio object, whatever that is. It is purposefully a bit fuzzy to allow for localized object naming practices, and is not intended for use in a comprehensive descriptive metadata context but rather to allow an application to display a locally meaningful name for the audioObject under description. Anything more extensive is out of scope and the subject of other metadata standards that deal with item description.]
Oh dear. This will lead to ambiguity and "duff data" - and diminish the AES in the eyes of the archivists, for the need to understand the term "title" is a fundamental requirement. Yes, we can allow for aliases and local practices, but we must be clear about what we mean. Archives lend things to each other and clarity of meaning is essential. I think you have provided the solution: change the term to "localName" and the explanatory text to: "The localName attribute may be used to give the audioObject a term of reference which has meaning for the organisation which holds it." Note that I made no reference to the "object owner", for the archive and the object's owner may be different, as I demonstrated above - and nor did I use the word "local" or "name" in the definition.
All metadata sets aim to be internally consistent in their use of labels, but can seldom be guaranteed to be consistent with other sets without special provision. AES-R9-2008: "AES standards project report - Considerations for standardising AES metadata sets" discussed this in some detail.

I think you may be confusing this with a descriptive metadata schema. It is not. It is technical metadata schema which allows the user to assign a title of their choice to the object the instance document represents. It is assumed that other metadata documents will fulfill your aspirations for this element, including standards and schemas such as EBU T3293 and AES60.

66 Page 40 Section 4.4.21 I find this description ambiguous; is it meant to refer to, say, the 1945 version of Stravinsky's Firebird Suite, which was perhaps the third generation of the suite? I don't know what to suggest to improve this item. (The "generational version of the original recording" runs into problems with sub-mixes, mixes, finishing, sub-masters/ revised repeats etc.)
[The intention here is to record the number of destructive copy generations since the original recording, when known. It has nothing to do with alternate mixes et al…]
"Destructive copy generations"? I'm still confused. Are we including analogue tape as well as digital tape and memory-based (flash drive, hard disc etc.) recordings? If I have some direct-cut LPs, can they be included or is that deemed a non-destructive generation? I do think we need to give an example or two as well as an improved definition. From what you say, I understand that we are considering the linear progression of a recording from capture to final mix for distribution (in whatever medium is relevant to the operation). I note the stated intention to exclude alternate mixes, but given the frequency of re-mixes, I wonder is that wise.
Propose to change the wording to read: "The generation attribute may be used to indicate the number of copy generations since the original recording represented by the described audio. "
AES - Audio Engineering Society