Standards

Navigation

Comments on DRAFT AES57-xxxx

last updated 2011-07-27

PAGE 2, SUPPLEMENTARY. Click here to access main page for comments on this document.
NOTE Individual comments have been numbered in this transcription to simplify cross-referencing.


Comments received from Mr. I. Rudd, 2011-06-16

Item Comment

1 General comments:

Reading through the document, I do think that a node-tree diagram or, preferably, UML class diagram and a hierarchical diagram of the elements and attributes would be an extremely useful pair of Annexes. (Were they not created in writing the draft document anyway?) In the official printed version, they would fold out so as to be visible wherever one was in the body of the document. These diagrams would tell me at once where a particular element or attribute sits when I'm looking at an audio object by having a reference to the section(s) in which it/ they appear. E.g. I find myself looking at a cylinder and wish to record its dimensions or I am looking at the audio object's reference to C_PASS and wish to know more about what that means.

2 Numerous definitions include the term being defined (e.g. "width shall be used to describe the width") and therefore need to be reworded. I do not claim to have caught all of them in my specific comments and a general review with this problem in mind is required.

3 Several of the preambles to the tables containing details of data types talk of a different number of mandatory and optional elements and attributes from those found in the tables. Rather than itemising every instance of this problem, and thereby increasing the length of this document, every preamble and table needs to be checked and corrected where necessary. I saw the problem first in Section 4.4.2.1.1.3.1, where there are actually four required elements of the layerType.

4 I also found at least one logical impossibility between mandatory and optional elements (see comment against Section 4.4) and I think I saw more; I have not had time to examine the whole document in detail to reveal each one but a thorough review with this problem in mind is required.

5 On the matter of presentation, it would be good if the rendering of the images could be improved.

6 Specific comments:

Page 6, Section 3.1: While I can understand the thinking around wave files stored on "transient" media, what does an archivist do with audio stored on, say, 8" floppies or disc packs - or LTO for that matter? Might not a deep, long-term archive hold wave files on these formats, particularly LTO?

7 Typo: "later case" should be "latter case".

8 Page 6, Section 3.2 "A document that conforms to the minimally required set of elements and attributes defined by an XML schema." This implies that a document with more than the minimally-required set isn't an instance document - and anyway, the Introduction points out that the concept of a minimum data set is not (always) a realistic proposition, "but rather is the set of elements that is expected to be known or determinable at a minimum".

I think the intention would be better expressed as: "A document that conforms to the XML schema (or Document Object Model) to which it refers." Alternatively, we could say (NB hyphenation): "In general, a document that conforms to the minimally-required set of elements and attributes defined by the XML schema (or Document Object Model - "DOM") to which it refers. However, recognising that there will be certain information that cannot be known in some environments, such as an archive, an instance document is also defined here as a document that conforms to the XML schema or DOM to which it refers."

9 Page 6, Section 4.1 "Each audio object is described by a single instance document in a strict one-to-one mapping." would be better expressed as: "For a given archive or domain, each audio object is described by a single instance document in a strict one-to-one mapping." This allows multiple archives to create individual, locally-relevant instance documents for a given audio object to which they have access. E.g. Archive B may wish to classify objects by mood, something which may not be relevant to (or perhaps agreed by) the owning archive, Archive A.

10 Page 6, Section 4.1 "Other standards exist that address such high level structural metadata." Does the AES not cite other standards bodies in such cases, even as informative references? (Candidate examples might include EBU Tech 3306 for audio instances and Tech 3295 for the editorial connection between audio instances.)

11 Page 6, Sections 4.2, 4.3, 4.4 I found these sections most confusing. E.g. we say in 4.2 "The top level of the document is the audioObject section". We then talk in Section 4.3 about something which wasn't mentioned in Section 4.2, an element, and then say in Section 4.4 that the audioObject isn't at the top level after all: "The audioObject element is a subclass of the objectType element." Er, except that the Section has the heading, "Document root" - ?!?

12 Further, in section 4.3, "subclass" is not a verb and what is meant by "reasonably unique"? Further again, "should" implies "need not".

13 In addition, there is a chance that known data may be lost if element values are known but not used and yet this circumstance is allowed by "audioObject element may contain the following sub-elements and attributes". There is further opportunity for confusion by specifying mandatory elements/ attributes (OCCURS MIN = 1) within an optional framework ("may contain the following sub-elements and attributes"). Therefore, I strongly suggest a re-write; something such as:

4.2 objectType elements and Instance Documents
The objectType element is an abstract super-class of the audioObject element created in instance documents which conform to the audio object schema defined in this standard. It provides an XML ID attribute to all elements which are in a subclass of it. Care shall be taken to ensure that the ID attribute is a unique value within the associated instance document. Further, since the instance document may be included in other xml documents, problems can occur if two instance documents share the same values for ID attributes. Therefore ID attributes shall be made unique both within and across instance documents at least in the domains in which they will be used.

4.3 Instance Document hierarchy
Instance documents conforming to the audio object schema are hierarchical in design. There are four primary levels of hierarchy that together describe the structure of the audio object. The root, or top level, of the instance document is the audioObject element. Beneath this is the face section. Beneath the face section is the region section. Finally, beneath the region section is the stream section. The face, region and stream sections can each be repeated in accordance with proper xml schema syntax.

4.4 Document root
All elements in the instance document shall be contained between the <audioObject> and </audioObject> tags. Where the values of the following sub-elements and attributes are known, the audioObject element shall contain them; where the values are not known, the audioObject element need not contain the associated sub-elements and attributes: [...]

14 Page 7 and passim; Table
Column "KIND" should be first but must at least be second so that the reader realises what the element/ attribute is. (It is - surely - more logical to think, "I have an ELEMENT with the NAME 'format' which is of DATA TYPE 'formatType' etc. " than, "I have the NAME "format" which is of DATA TYPE "formatType" etc." and only discover just what it is I have with that name, format etc. after I have looked at the optionality and cardinality.)

A further improvement would be two tables with, respectively, "NAME" replaced by"ELEMENT NAME" and "ATTRIBUTE NAME". That would mean there would be no need for the "KIND" column and the added benefit would be no wrapping of the ELEMENT NAME/ ATTRIBUTE NAME or DATA TYPE fields.

15 Page 7; Table Would it be useful to have a NOTE to remind the reader that ID is an abstract attribute? I suspect that not everyone will associate the italics with the abstract super-class.

16 Page 7, Section 4.4.2 What is the force and meaning of the word "directly"? There is no other means (stated) of appearing in the element in question.

17 Page 8 Section 4.4.2.1.1.2.2 Particularly because physicalProperties can be specified for a formatRegion, we need to allow for the structure of leader tape here. (Don't forget the colour!)

18 Page 11 Section 4.4.2.1.1.3 opticalStructure. Isn't a film optical sound track a valid medium? How do I handle it?

19 Page 12, Section 4.4.2.1.1.4.1.2 I found the term "filler Layer" for the inner core extremely confusing when considering the analogue disc, initially thinking of it in terms of the other layers of tape and discs. I have no problem with the term "innerCore" (or innerCoreLayer" if it is desired to make the type structure clearer, but I think it adds ambiguity) being of type layerType. I suggest a new second sentence be inserted, viz: "[...] where one is present. Where it exists, the inner core of a disc is the separate material between the inner edge of the disc and the hole to accommodate the spindle of the disc's player. innerCore [or innerCoreLayer] is of data type layerType[...]"

20 Page 13, Section 4.4.2.1.1.5.1.2 Again, innerCore (or innerCoreLayer) is a better term. Here the new, inserted second sentence should read: "[...] where one is present. Where it exists, the inner core of a cylinder is the separate material between the inner edge of the cylinder and the cylinder's spindle or hole to accommodate the spindle of the cylinder's player. innerCore (or innerCoreLayer) is of data type layerType[...]"

21 Page 13 Section 4.4.2.1.2 Typo: "There are a series" should be "There is a series".

22 Page 13/ 14 Section 4.4.2.1.2 Table TAPE width : the definition includes the word being defined. I suggest: width shall be used to describe the breadth of the tape, as seen between the two flanges of the tape reel.

23 TAPE length : the double-definition is ambiguous and it includes the word being defined.

24 TAPE length The words recognise that the first definition is not entirely adequate, but the second one doesn't help; the leading foot or two would not be played past the tape head because that length is necessary to secure the tape to the take-up spool and it is not clear if the leader tape should be included. I suggest a single definition: length shall be used to describe the distance measured from one end of the tape to the other, including any leader tape. (NB we have to include the leader tape or else we shall have to measure and subtract all the intermediate lengths of leader tape!)

25 TAPE thickness : older tapes may well not be of uniform thickness, with areas of oxide loss etc. and again the definition includes the word being defined. I suggest: thickness shall be used to describe the total depth of of a single, straight piece of tape with all layers intact, from one face to the other.

26 ANALOG DISC or OPTICAL DISC We need to improve the English here (especially "laying"): thickness shall be used to describe the distance from the bottom of the disc to the top of the disc when the disc is lying flat.

27 WIRE diameter Again the double-definition can be simplified and improved by the removal of the word being defined: diameter shall be used to describe the distance across the wire, as seen looking down the wire from one end to the other.

28 WIRE length Similar problems to TAPE, before. I suggest: length shall be used to describe the distance measured from one end of the wire to the other.

29 CYLINDER length Similar problems to TAPE, before. I suggest: length shall be used to describe the distance measured from one end of the cylinder to the other.

30 Page 15 Sections 4.4.2.1.2.1.6 and 4.4.2.1.2.1.7 Similar to earlier, where a shell exists ought not MIN OCCURS = 1?

31 Page 17 Sections 4.4.2.1.2.4, 4.4.2.1.2.5, 4.4.2.1.2.6, 4.4.2.1.2.7, 4.4.2.1.2.8, 4.4.2.1.2.9, 4.4.2.1.2.10 All these sections have definitions which include the term being defined and need to be reworded in a manner similar to length, diameter, thickness etc. previously.

32 Page 17 Section 4.4.2.1.2.1.7/ 4.4.2.1.2.5 height appears to have been included "for completeness" but it adds ambiguity, particularly in the light of Figure 2 and the convention set out for the shell's dimensions in Section 4.4.2.1.3. It is not used elsewhere in the document and indeed it is implicitly declared illegal at the end of Section 4.4.2.1.3. It needs to be removed.

33 Page 18 Section 4.4.3 Don't we also want to know what the nature of the private data is? I think the definition should be: "When present, the value for the appSpecificData element shall be the nature of the data deposited in the audio object by the software application name and version defined by the appSpecificDataType." This make the intent of the practical example much clearer - that what is wanted is the form of the data and not the data itself. (At least, I presume that is the intent - ?)

34 Page 18 Section 4.4.4 Particularly since the word "compression" does not appear in the document and also because not everyone will appreciate the difference between coding and compression, I think that the reader would find it most helpful to be reminded of the difference in a NOTE and to have a pointer to bitrateReduction (Section 4.4.17.4.13) in this Section.

35 Page 21, Section 4.4.11.2 Given that we are aiming this document at preservation and restoration, an archive would seem a popular place to use it. Therefore, ought we not to include ACCESSION_NUMBER as a primary identifierTypeType? Also, should not UUIDs be on the primary list? The reference is: http://www.itu.int/ITU-T/asn1/uuid.html

The list is oriented towards the material itself, which is fine, but ought not some specific provision be made for editorial identifiers such as ISRC, ISAN, V-ISAN, Programme Number etc. in the secondary Identifier? Maybe another Section is needed for this purpose - editorialIdentifier? (In this case, materialIdentifierType and secondaryMaterialIdentifier would have to be re-named materialIdentifierType and secondaryMaterialIdentifier.)

36 Page 23 Section 4.4.16.2 Typo : 4.4.16.2.1.11 should be 4.4.16.2.1.1

37 Page 24, Section 4.4.16.2.1.1 Typo: "edit UnitNumberType" should be "editUnitNumberType".

38 Page 24, Section 4.4.16.2.1.2.1 Typo: "data type is a long" should be "data type is a long".

39 Typo: the redundant spaces in the table before editRate, positiveInteger and factorNumerator can be removed.

40 Page 26 - 28, Sections 4.4.16.3.4, 4.4.16.3.4.1, 4.4.16.3.4.2, 4.4.16.3.6, 4.4.16.3.6.2, 4.4.16.3.6.3, 4.4.16.4.1 Typos: all these sections point to sections 4.4.15.x(.y.z) instead of 4.4.16.x(.y.z) and Section 4.4.16.4 points to 4.2.17.4.1 instead of 4.4.16.4.1

41 Page 26 Section 4.4.16.3.4.3 Typo: " ... described in 4.4.16.2.1 and 4.4.16.2.2.." (double stop -sic) should be " ... described in 4.4.16.2.1.1 and 4.4.16.2.1.2."

42 Page 27 section 4.4.16.3.6.1 et seq. Given that channelAssignment is mandatory, that it represents an area of the audio sound stage and that it has mandatory left/ right and front/ rear positions, what do I do with a set of stems?

43 Page 27 Section 4.4.16.3.6.2.1Typo (probably): as we have three required attributes, don't they all have to haveMIN OCCURS = 1?

44 Page 28 section 4.4.16.3.6.4I didn't understand "... aid in the identification of the document section"; I can't provide guidance, except that if the reference is to the instance document, it is mixing up document attributes with the audio object's attributes - !

Isn't " ... should be displayed to the user through a software interface ..." redundant on the grounds that all the information will be so presented? I suggest removing the second sentence.

45 Page 28 section 4.4.16.3.6.5The explanatory "(point to)" in Section 4.4.16.3.8 needs to be in this Section because this is the first occurrence of it. Whether it then appears in 4.4.16.3.7 and 4.4.16.3.8 is optional.

46 Pages 30 - 34 Section 4.4.17 - 4.4.17.4.7 Typos: all these sections point to sections 4.4.16.x(.y.z) instead of 4.4.17.x(.y.z)

47 Pages 31 - 39 Section 4.4.17.4 and most sub-sections Typos: the FormatRegion is described in Section 4.4.17.2 and not Section 4.4.2. the baseFormatRegionType is described in Section 4.4.17.3 and not Section 4.3.

48 Page 32 Section 4.4.17.4.2 "The speed element may be used to indicate the playback speed of the described audio object."As was pointed out in Section 4.4.17.2, different regions of the audio object may be played at different speeds. Therefore, I suggest that this sentence should be: "The speed element may be used to indicate the playback speed of the formatRegion of the described audio object."

49 Page 32 Section 4.4.17.4.2.3 If you accepted the need for film optical storage earlier, do we need a valid speedMeasurementUnitsType here to accommodate that form of storage?

50 Page 33 Section 4.4.17.4.3 "The bitDepth element shall declare the number of bits per sample for the audio content of the described audio object." As was pointed out in Section 4.4.17.2, different regions of the audio object may have different characteristics. Therefore, I suggest that this sentence should be: "The bitDepth element shall declare the number of bits per sample for the audio content of the formatRegion of the described audio object."

51 Page 33 Section 4.4.17.4.4 Similarly, we should talk of the sampleRate of the audio data of the formatRegion of the described audio object.

52 Typos: "sample-rate" should be "sample rate" (two occurrences) and "sampleRate" in the third line should be "sampleRate".

53 Page 34 Section 4.4.17.4.6 If you accepted the need for film optical storage earlier, then I think we need words to accommodate it here too (but I don't know enough about sep opt practice to provide sufficient direct guidance, sorry).

54 Pages 35 - 37 Sections 4.4.17.4.10 - 4.4.17.4.13.8 and 4.4.17.4.13.9.1.2.1 Again, we need to specify the formatRegion of the audio object and not the entire audio object in each case. (E.g. this is a mono passage between two stereo passages or an inadvertent operation at the mixing desk introduced a high-frequency cut in this passage.)

55 Page 35 Section 4.4.17.4.13 Typo: The reference to Section 4.4.16.4.11.1 should be to 4.4.17.4.13.1

56 Page 36 Section 4.4.17.4.13.8 The grammar and syntax of the first sentence can be tidied here to read: "The dataRateMode shall be used to indicate that the described formatRegion of the audio object's [NB apostrophe] audio data has been processed to achieve a FIXED (constant) or a VARIABLE bit rate."

57 Page 36 Section 4.4.17.4.13.9 Typo: The header of this Section, "packetList" and "bitrateReduction" need the correct formatting.

58 Page 36 Section 4.4.17.4.13.9.1 Typo: "packetListType" should be "packetListType".

59 Page 37 Section 4.4.17.4.13.9.1.1 Typo: "The packet element shall be used ..." should be "The packet element shall be used ...".

60 Page 37 Section 4.4.17.4.13.9.1.2.2 Typo: "packetLength" should be "packetLength".

61 Page 37 Section 4.4.17.5 I suppose that recording the splice angles used on analogue audio tape is too arcane . . .

62 Page 39 Between Sections 4.4.17.8 and 4.4.17.9 If you accepted the need for film optical storage earlier, then I think we need a new opticalFilmFormatRegionType Section, similar to the other xxxFormatRegionTypes, to accommodate it here too (but I don't know enough

63 Page 40 Section 4.4.18 Syntax: "metadata" is redundant and may be removed.

64 Page 40 section 4.4.19 We need to be clear about what is meant by "title"; do we mean ownership, an award, the name of the series, the name of the programme, the name of the episode, the temporary name of the piece ("working title") ... ? What about a work which is called something quite different (not just a translation of the original language) in different languages?

Only one title is currently allowed per audio object; I suggest that we need to do more work here. Imagine a disc from a series called "Horn Spectacular"; this release in the series won a "Disc of the Year" award in 1935 and it contains Joseph Haydn's Symphony No 31 in D Major, Hob.I:31 "Auf dem Anstand" - in English (not a literal translation): "Horn signal". My archive wishes to record that this disc is on permanent loan from the benefactor who actually owns it. What happens?

65 Page 40 Section 4.4.20.1 " ... designed to be independent of its physical housing" implies that the FILE_DIGITAL audio object has a physical housing and that is by no means certain. The words need to be: " ... designed to be independent of a physical housing".

66 Page 40 Section 4.4.21 I find this description ambiguous; is it meant to refer to, say, the 1945 version of Stravinsky's Firebird Suite, which was perhaps the third generation of the suite? I don't know what to suggest to improve this item. (The "generational version of the original recording" runs into problems with sub-mixes, mixes, finishing, sub-masters/ revised repeats etc.)

67 Page 40 Section 4.4.22 I think that "final disposition" should be "current disposition" because numerous archives have a retention policy which determines that certain items will be discarded after some event or period of time. Therefore, although I might have [this item] in my archive at the moment, I know that its final location will be the skip. I would wish to know whether or not I still actually have the item.
Ian Rudd, June, 2011.
AES - Audio Engineering Society