Standards

Navigation

Comments on DRAFT AES57-xxxx

last updated 2011-08-08

PAGE 3, SUPPLEMENTARY. Click here to access main page for comments on this document.
NOTE Individual comments have been numbered in this transcription to simplify cross-referencing.


Reply from D. Ackerman, chair of SC-03-06 2011-07-05,
to comments received from Mr. I. Rudd, 2011-06-16

Dear Mr Rudd,

Attached is a document containing my response to the comments you submitted on AES-57. Note that some editorial issues have been left to Mark Yonge to address. These will be sent by him in a separate document.

Very best, David Ackerman

Item Comment

2 Numerous definitions include the term being defined (e.g. "width shall be used to describe the width") and therefore need to be reworded. I do not claim to have caught all of them in my specific comments and a general review with this problem in mind is required.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]

3 Several of the preambles to the tables containing details of data types talk of a different number of mandatory and optional elements and attributes from those found in the tables. Rather than itemising every instance of this problem, and thereby increasing the length of this document, every preamble and table needs to be checked and corrected where necessary. I saw the problem first in Section 4.4.2.1.1.3.1, where there are actually four required elements of the layerType.
[See item 13, below].

4 I also found at least one logical impossibility between mandatory and optional elements (see comment against Section 4.4) and I think I saw more; I have not had time to examine the whole document in detail to reveal each one but a thorough review with this problem in mind is required.
See item 13, below.

6 Specific comments:

Page 6, Section 3.1 While I can understand the thinking around wave files stored on "transient" media, what does an archivist do with audio stored on, say, 8" floppies or disc packs - or LTO for that matter? Might not a deep, long-term archive hold wave files on these formats, particularly LTO?
[Out of scope for this document. May be considered in a later revision or an alternative document. ]

8 Page 6, Section 3.2 "A document that conforms to the minimally required set of elements and attributes defined by an XML schema." This implies that a document with more than the minimally-required set isn't an instance document - and anyway, the Introduction points out that the concept of a minimum data set is not (always) a realistic proposition, "but rather is the set of elements that is expected to be known or determinable at a minimum".

I think the intention would be better expressed as: "A document that conforms to the XML schema (or Document Object Model) to which it refers." Alternatively, we could say (NB hyphenation): "In general, a document that conforms to the minimally-required set of elements and attributes defined by the XML schema (or Document Object Model - "DOM") to which it refers. However, recognising that there will be certain information that cannot be known in some environments, such as an archive, an instance document is also defined here as a document that conforms to the XML schema or DOM to which it refers."
[Not accepted - this language has been well understood by implementers to date. The issue of what is not known or known is a separate matter from what comprises an instance document and I believe it best to not introduce this idea into the definition of the instance document.]

9 Page 6, Section 4.1 "Each audio object is described by a single instance document in a strict one-to-one mapping." would be better expressed as: "For a given archive or domain, each audio object is described by a single instance document in a strict one-to-one mapping." This allows multiple archives to create individual, locally-relevant instance documents for a given audio object to which they have access. E.g. Archive B may wish to classify objects by mood, something which may not be relevant to (or perhaps agreed by) the owning archive, Archive A.
[Out of scope for the current document. Issues of separate data sets addressing the same objects may be considered at a future stage.]

10 Page 6, Section 4.1 "Other standards exist that address such high level structural metadata." Does the AES not cite other standards bodies in such cases, even as informative references? (Candidate examples might include EBU Tech 3306 for audio instances and Tech 3295 for the editorial connection between audio instances.)
[There are many other metadata schemes, however it is not the purpose of this standard to list them. They are not required in order to implement this standard.]

11 Page 6, Sections 4.2, 4.3, 4.4 I found these sections most confusing. E.g. we say in 4.2 "The top level of the document is the audioObject section". We then talk in Section 4.3 about something which wasn't mentioned in Section 4.2, an element, and then say in Section 4.4 that the audioObject isn't at the top level after all: "The audioObject element is a subclass of the objectType element." Er, except that the Section has the heading, "Document root" - ?!?
[In this schema, the document root is in fact the audioObject element. There is no issue with the root element itself inheriting properties from an abstract element as it does in this case. The abstract objectType element cannot be directly instantiated in any case, so I don’t think this should prove confusing to developers who work with XML.

The w3c schema primer (see http://www.w3.org/TR/xmlschema-0/#abstract ) states, “XML Schema provides a mechanism to force substitution for a particular element or type. When an element or type is declared to be "abstract", it cannot be used in an instance document. When an element is declared to be abstract, a member of that element's substitution group must appear in the instance document. When an element's corresponding type definition is declared as abstract, all instances of that element must use xsi:type to indicate a derived type that is not abstract.”]

13 In addition, there is a chance that known data may be lost if element values are known but not used and yet this circumstance is allowed by "audioObject element may contain the following sub-elements and attributes". There is further opportunity for confusion by specifying mandatory elements/ attributes (OCCURS MIN = 1) within an optional framework ("may contain the following sub-elements and attributes"). ...
[Not all objects can provide all supported elements, but where they do the requirement is clear. Comment rejected.]

15 Page 7; Table Would it be useful to have a NOTE to remind the reader that ID is an abstract attribute? I suspect that not everyone will associate the italics with the abstract super-class.
[This parameter is in italics in accordance with clause 0.1.2. to indicate that, "Inherited elements and attributes are printed in an italicized equally spaced font." This is correct as it is.]

17 Page 8 Section 4.4.2.1.1.2.2 Particularly because physicalProperties can be specified for a formatRegion, we need to allow for the structure of leader tape here. (Don't forget the colour!)
[This clause specifies a way to describe the physical structure of any tape. Leader tape is a case where a substrate will be described but no coating. Leader tape and recording tape will appear as separate "sections"]

18 Page 11 Section 4.4.2.1.1.3 opticalStructure. Isn't a film optical sound track a valid medium? How do I handle it?
[Out of scope for this standard. May be considered for a future revision of this standard, or a complementary standard. (there will be other formats that are not described here that will be handled similarly).]

19 Page 12, Section 4.4.2.1.1.4.1.2 I found the term "filler Layer" for the inner core extremely confusing when considering the analogue disc, initially thinking of it in terms of the other layers of tape and discs. I have no problem with the term "innerCore" (or innerCoreLayer" if it is desired to make the type structure clearer, but I think it adds ambiguity) being of type layerType. I suggest a new second sentence be inserted, viz: "[...] where one is present. Where it exists, the inner core of a disc is the separate material between the inner edge of the disc and the hole to accommodate the spindle of the disc's player. innerCore [or innerCoreLayer] is of data type layerType[...]"
[In this context, "filler" is a layer of certain kinds of disk and not a radial component.]

20 Page 13, Section 4.4.2.1.1.5.1.2 Again, innerCore (or innerCoreLayer) is a better term. Here the new, inserted second sentence should read: "[...] where one is present. Where it exists, the inner core of a cylinder is the separate material between the inner edge of the cylinder and the cylinder's spindle or hole to accommodate the spindle of the cylinder's player. innerCore (or innerCoreLayer) is of data type layerType[...]"
[See above. In this context, "filler" is a layer of certain kinds of disk and not a radial component.]

22 Page 13/ 14 Section 4.4.2.1.2 Table
TAPE width : the definition includes the word being defined. I suggest: width shall be used to describe the breadth of the tape, as seen between the two flanges of the tape reel.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]

23 TAPE length : the double-definition is ambiguous and it includes the word being defined.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]

24 TAPE length The words recognise that the first definition is not entirely adequate, but the second one doesn't help; the leading foot or two would not be played past the tape head because that length is necessary to secure the tape to the take-up spool and it is not clear if the leader tape should be included. I suggest a single definition: length shall be used to describe the distance measured from one end of the tape to the other, including any leader tape. (NB we have to include the leader tape or else we shall have to measure and subtract all the intermediate lengths of leader tape!)
[Leader tape and recording tape will appear as separate "sections" in a single object]

25 TAPE thickness : older tapes may well not be of uniform thickness, with areas of oxide loss etc. and again the definition includes the word being defined. I suggest: thickness shall be used to describe the total depth of of a single, straight piece of tape with all layers intact, from one face to the other.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.

The issue of uniform thickness should be covered by the entry for TAPE in the table on page 13-14 where is states that “thickness shall be used to describe the total thickness of the tape.”]

26 ANALOG DISC or OPTICAL DISC We need to improve the English here (especially "laying"): thickness shall be used to describe the distance from the bottom of the disc to the top of the disc when the disc is lying flat.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]

27 WIRE diameter Again the double-definition can be simplified and improved by the removal of the word being defined: diameter shall be used to describe the distance across the wire, as seen looking down the wire from one end to the other.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]

28 WIRE length Similar problems to TAPE, before. I suggest: length shall be used to describe the distance measured from one end of the wire to the other.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]

29 CYLINDER length Similar problems to TAPE, before. I suggest: length shall be used to describe the distance measured from one end of the cylinder to the other.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]

30 Page 15 Sections 4.4.2.1.2.1.6 and 4.4.2.1.2.1.7 Similar to earlier, where a shell exists ought not MIN OCCURS = 1?
[In an ideal world it would but one of the issues with xml is that it becomes cumbersome to enforce either/or paradigms. To keep this element simple, the schema defines all sub-elements as optional but the text of the standard specifies in 4.4.2.1.3 on page 18, “When shellDimensions is present in an instance document it shall either have a length, width and depth sub-element, or it shall have a diameter and depth sub-element. All other combinations from the dimensionsType are illegal in this context.”

In 4.4.2.1.2.1.7 it would likewise be cumbersome to map out all of the possible combinations that could be required for various media objects]

31 Page 17 Sections 4.4.2.1.2.4, 4.4.2.1.2.5, 4.4.2.1.2.6, 4.4.2.1.2.7, 4.4.2.1.2.8, 4.4.2.1.2.9, 4.4.2.1.2.10 All these sections have definitions which include the term being defined and need to be reworded in a manner similar to length, diameter, thickness etc. previously.
[The term in Courier monospaced font is a formal code - see clause 0.1.2; the word in regular body text is a common English word used to describe it. This is correct.]

32 Page 17 Section 4.4.2.1.2.1.7/ 4.4.2.1.2.5 height appears to have been included "for completeness" but it adds ambiguity, particularly in the light of Figure 2 and the convention set out for the shell's dimensions in Section 4.4.2.1.3. It is not used elsewhere in the document and indeed it is implicitly declared illegal at the end of Section 4.4.2.1.3. It needs to be removed.
[Included because height may be important for specific objects. May be deprecated in favour of 'depth' in a future revision.]

33 Page 18 Section 4.4.3 Don't we also want to know what the nature of the private data is? I think the definition should be: "When present, the value for the appSpecificData element shall be the nature of the data deposited in the audio object by the software application name and version defined by the appSpecificDataType." This make the intent of the practical example much clearer - that what is wanted is the form of the data and not the data itself. (At least, I presume that is the intent - ?)
[The point about private data is that its private. The administrator of the database associated with this metadata set will not know its content or purpose, simply that there's a block of data associated with this object. The point of recording this information is to first recognize and preserve its existence and that it may serve a usefulness to the file owner through the use of a software application despite its proprietary nature, and second to map the structure of the file for those who come to curate it in the future long after the useful life of that private data so that they may have the best chance possible of recovering the sound essence from the file.]

34 Page 18 Section 4.4.4 Particularly since the word "compression" does not appear in the document and also because not everyone will appreciate the difference between coding and compression, I think that the reader would find it most helpful to be reminded of the difference in a NOTE and to have a pointer to bitrateReduction (Section 4.4.17.4.13) in this Section.
[Not all coding schemes use data compression. PCM, for example. In this context, the use of compression or otherwise is not relevant.]

35 Page 21, Section 4.4.11.2 Given that we are aiming this document at preservation and restoration, an archive would seem a popular place to use it. Therefore, ought we not to include ACCESSION_NUMBER as a primary identifierTypeType? Also, should not UUIDs be on the primary list? The reference is: http://www.itu.int/ITU-T/asn1/uuid.html

The list is oriented towards the material itself, which is fine, but ought not some specific provision be made for editorial identifiers such as ISRC, ISAN, V-ISAN, Programme Number etc. in the secondary Identifier? Maybe another Section is needed for this purpose - editorialIdentifier? (In this case, materialIdentifierType and secondaryMaterialIdentifier would have to be re-named materialIdentifierType and secondaryMaterialIdentifier.)
[Accession numbers, program numbers and the like often refer to groups of objects and as such may not be the best identifiers to use in the context of a primary identifier which is a ‘primary key’ or unique identifier. On the other hand ISRC and I presume ISAN numbers may refer to parts of an audio object’s contents, more so than the object itself and likewise seem unfit for use in this field. However, that said, both the primaryIdentifier and the secondaryIdentifier may be set to type OTHER where the user may define their own type using the idOtherType attribute. Additional identifier types may be added in future revisions.]

42 Page 27 section 4.4.16.3.6.1 et seq. Given that channelAssignment is mandatory, that it represents an area of the audio sound stage and that it has mandatory left/ right and front/ rear positions, what do I do with a set of stems?
[You should describe each channel of the set with its intended playout position. If no position is known, use the default center-front setting.]

43 Page 27 Section 4.4.16.3.6.2.1: Typo (probably): as we have three required attributes, don't they all have to haveMIN OCCURS = 1?
[Not in this case. In XML speak, when you have an attribute that carries a default value, it is necessary to set its use attribute in the schema document to optional. It seems that the w3c believes it makes no sense to have a default value for an attribute that you require the user to provide. However the net effect is that the optional attributes that carry default values will always be present in the document simple because the default value is provided when the user omits their own.]

49 Page 32 Section 4.4.17.4.2.3 If you accepted the need for film optical storage earlier, do we need a valid speedMeasurementUnitsType here to accommodate that form of storage?
[Out of scope for this document.]

53 Page 34 Section 4.4.17.4.6 If you accepted the need for film optical storage earlier, then I think we need words to accommodate it here too (but I don't know enough about sep opt practice to provide sufficient direct guidance, sorry).
[Out of scope for this document.]

61 Page 37 Section 4.4.17.5 I suppose that recording the splice angles used on analogue audio tape is too arcane . . .
[Out of scope for this document. May be considered for a future revision.]

62 Page 39 Between Sections 4.4.17.8 and 4.4.17.9 If you accepted the need for film optical storage earlier, then I think we need a new opticalFilmFormatRegionType Section, similar to the other xxxFormatRegionTypes, to accommodate it here too (but I don't know enough
[Out of scope for this document.]

64 Page 40 section 4.4.19 We need to be clear about what is meant by "title"; do we mean ownership, an award, the name of the series, the name of the programme, the name of the episode, the temporary name of the piece ("working title") ... ? What about a work which is called something quite different (not just a translation of the original language) in different languages?

Only one title is currently allowed per audio object; I suggest that we need to do more work here. Imagine a disc from a series called "Horn Spectacular"; this release in the series won a "Disc of the Year" award in 1935 and it contains Joseph Haydn's Symphony No 31 in D Major, Hob.I:31 "Auf dem Anstand" - in English (not a literal translation): "Horn signal". My archive wishes to record that this disc is on permanent loan from the benefactor who actually owns it. What happens?
[We mean the name the owner of the object associates with the audio object, whatever that is. It is purposefully a bit fuzzy to allow for localized object naming practices, and is not intended for use in a comprehensive descriptive metadata context but rather to allow an application to display a locally meaningful name for the audioObject under description. Anything more extensive is out of scope and the subject of other metadata standards that deal with item description.]

66 Page 40 Section 4.4.21 I find this description ambiguous; is it meant to refer to, say, the 1945 version of Stravinsky's Firebird Suite, which was perhaps the third generation of the suite? I don't know what to suggest to improve this item. (The "generational version of the original recording" runs into problems with sub-mixes, mixes, finishing, sub-masters/ revised repeats etc.)
[The intention here is to record the number of destructive copy generations since the original recording, when known. It has nothing to do with alternate mixes et al…]
AES - Audio Engineering Society