Operations

GET /attachments/{id}/content (Legal Transcription)

Summary Get the raw data for the legal attachment with given id
URL /api/v1/attachments/{id}/content
Detailed Description

Use this method to download the finished output file contents of a legal transcription attachment. This endpoint will retrieve the latest version of the attachment, including edits, after all editor sessions for the attachment have been closed for at least two minutes.

For output file attachments, you may request to get the contents in a specific representation (file format), either via the Accept HTTP header or by appending an extension to the end of the URL.

Per word timestamps are automatically included with your transcript. Per word timestamps are only available with the JSON format.

Additionally, legal monologue types will also surface in the JSON output to provide more information on what type each monologue is. The following monologue types are supported:

  • colloquy
  • question
  • answer
  • by_line
  • exhibit
  • swear_in
  • examination_title
  • other_parenthetical
  • proceeding_end

The following representations are supported for legal transcripts:

Media Type Extension Accepted For
application/vnd.openxmlformats-officedocument.wordprocessingml.document .docx AI Rough Draft, Human Rough Draft, Ready To Certify (if set as preferred legal file type)
application/pdf .pdf AI Rough Draft, Human Rough Draft
text/plain .txt AI Rough Draft, Human Rough Draft, Ready To Certify (if set as a preferred legal file type)
application/json+rev-transcript .json AI Rough Draft, Human Rough Draft, Raedy To Certify
{
    "speakers": [
        {"id": 1, "name": "Interviewee"},
        {"id": 2, "name": "Interviewer"}
    ],
    /*
        Monologues (transcribed speech).
        Each monologue corresponds to a run of text from one speaker.
    */
    "monologues": [
        {
            "id": 1,
            "speaker": 1, /* Speaker id, same as declared above */
            "speaker_name": "Interviewee", /* Speaker name, same as declared above */
            "monologue_type": "colloquy", /* Monologue type */
            /*
                Elements of the monologue.
                Each element is either text (transcribed speech)
                or a tag (an annotation), such as inaudible.

                Text elements may or may not contain a timestamp.
                Tag elements require a timestamp.
            */
            "elements": [
                {
                    /* Text element, no timestamp */
                    "type": "text",
                    /* For text elements, value contains the text */
                    "value": "I've never been here before"
                },
                {
                    /* Tag element */
                    "type": "tag",
                    /* For tag elements, value contains the annotation type */
                    "value": "inaudible",
                    /*
                        Timestamp format is hh:mm:sss,fff,
                        where fff represents milliseconds
                    */
                    "timestamp": "00:00:07,000"
                },
                /*
                    Text elements, with per word timestamps
                    Whitespace and punctuation are not timestamped
                */
                {
                    "type": "text",
                    "value": " "
                },
                {
                    "type": "text",
                    "value": "so",
                    "timestamp": "00:00:20,000",
                    "end_timestamp": "00:00:20,138"
                },
                {
                    "type": "text",
                    "value": " "
                },
                {
                    "type": "text",
                    "value": "this",
                    "timestamp": "00:00:20,222",
                    "end_timestamp": "00:00:20,566"
                }
            ]
        }
    ]
}
                    

If the requested representation is not supported for the output file, then a 406 Not Acceptable status code will be returned. With the Accept header, you may specify multiple acceptable representations (as described in the Accept header HTTP spec), and the best possible match will be returned.

If you do not specify a desired representation via Accept header or URL extension, then the output file contents will be returned in its default format. For both AI Rough Draft and Human Rough Draft transcripts, this will be a JSON output. For Ready To Certify transcripts, this will be returned in the format of the set preferred legal file type.

For both AI Rough Draft and Human Rough Draft transcripts, if you request an attachment in the plain text representation (using "Accept: text/plain" header or ".txt" URL extension), you can also request a specific character encoding using the Accept-Charset HTTP header. The following encodings are supported:

Character Set Encoding Description
utf-8 UTF-8 encoding. The byte order mark is not included.
utf-16 UTF-16 encoding. Note this results in a binary file - it is unlikely this is what you want.
iso-8859-1 Commonly referred to as Western European (ISO), this consists of the ASCII characters, a set of control characters in hex 80-9F range, plus a "Latin Extended" character block with common symbols and diacritized characters in the hex A0-FF range. Characters outside the 00-FF range will be mapped to their nearest equivalents or replaced by "?".
Full details about this character set are here: https://en.wikipedia.org/wiki/ISO/IEC_8859-1
windows-1252 Windows code page 1252, or Western European (Windows). Similar to iso-8859-1, but uses characters in hex 80-9F range for displayable characters such as smart quotes, rather than control characters.
Full details about this character set are here: https://en.wikipedia.org/wiki/Windows-1252
us-ascii US ASCII - this consists purely of ASCII characters in the hex 00-7F range, which includes numbers, the letters a-z and A-Z, punctuation, and some control characters. Characters outside the 00-FF range will generally be omitted, except that diacritized characters are replaced with their basic latin character equivalents (eg À is replaced with A).
Full details about this character set are here: https://en.wikipedia.org/wiki/ASCII

If you do not specify a desired character encoding via the Accept-Charset header, then the UTF-8 encoding will be used. If something other than the plain text representation is requested, then the Accept-Charset header is ignored.

Request Parameters
  • id - id of the attachment to retrieve
Request Headers
  • Authorization - contains client/user API keys
  • Accept - optional, specifies a desired format for the attachment as a media type.
  • Accept-Charset - optional, specifies a desired character set encoding for the attachment, if a plain text representation is requested.
Response On success, 200 OK.
If the attachment with given id is not found, 404 Not Found
If none of the requested representations are available for the attachments, 406 Not Acceptable
Response Headers
  • Content-Type - the content type of the attachment
Response Body On success, the raw data for the attachment’s contents in the requested format
Error Codes None