fredag, maj 17, 2013

The role of media types in RESTful web services

One of the never ending discussions in the REST community is that of custom and domain specific media types; should we, or should we not, create new media types - and if we should, for what reasons should it be done?

In this blog post I will discuss the role of media types in web services and illustrate it with an example media type. I will go through the requirements for this media type and from this I will build up the features it needs to support. Together with this I will show some example scenarios and sketch out the processing algorithm for the client side. At last I compare this media type to other similar media types (HAL, Sirene, JSON-API).

My goals for this blog post are:
  1. To improve my own understanding of the role of media types in RESTful web services - and share that with others.
  2. To define a new media type for what I call systems integration - and show how it facilitates loose coupling between the integration components.

By systems integration I mean the kind of background processing that takes place behind the scenes in almost any IT enabled business today; shuffling data from one system to another in a safe and durable way without any human interaction.

REST seems like a good fit for systems integration. It has a strong focus on loosely coupled systems where servers and clients can evolve independently of each others; if we can leverage that then the whole ecosystem of multiple servers and clients should be a lot easier to maintain and with much less downtime required for upgrading the various components.

There is an ongoing trend to include hyper media controls in never web services; that is a good trend as it removes the clients dependency on specific URL structures. This in turn allows the server to evolve by adding new resources and link to these - and it also facilitates the ability to use multiple servers without the clients ever noticing (since the client do not care about either URL path structures or host names).

But there is still a thing missing in the puzzle. In Roy Fielding's (in)famous rant "REST APIs must be hypertext-driven" he states:

... Any effort spent describing what methods to use on what URIs of interest should be entirely defined within the scope of the processing rules for a media type

... From that point on, all application state transitions must be driven by client selection of server-provided choices that are present in the received representations

Especially the last statement is interesting "all application state transitions must be driven by client selection of server-provided choices". This means the client should not make any requests without first being instructed to do so (and how to do it). The client should not POST a new Tweet, bug report or similar without being instructed, on the fly, by some mechanism embedded in the server responses. Todays use of links in responses is on the right track, but links do not inform the client about what HTTP method to use (it assumes GET) and neither does it say anything about the possible payload.

With this blog post I will try to explain how a media type, with a sufficient number of hyper media controls, together with some intelligent client side code, can enable what Fielding is describing. The downside of this approach is that client implementations become more complex - the upside is that the whole client/server application becomes much more loosely coupled which, in the end, hopefully will help us reach a maintenance Nirvana of loosely coupled systems integration :-)

By the way, I am not comparing REST with SOAP/WSDL and EDA (event driven architectures) - that is not the purpose here even though these are often found in systems integration projects. I would rather just explore what benefits we can get from REST.

Media type requirements and constrains

The primary driver for this new media type is loose coupling where the clients only depends on the media type and some out-of-band business specific data structures and identifiers. This means:

  • The client must not make any assumptions about URL structures.
  • The client must not make any assumptions about what concrete service implementation it is interacting with.
  • The client must not initiate any HTTP request without following instructions embedded in server responses (besides the initial request).
  • The client should not be given more than:
    • A root URL from which all other resources must be discovered at runtime.
    • A set of business specific data structures.
    • A set of well known identifiers for locating hyper media controls and business data.
The media type itself must be generic with respect to the business domain; it must not contain references to concepts like medical records, e-commerce and so on.

The media type must be rich enough in terms of hyper media affordances to enable all the operations needed for systems integration.

The media type does not need to included much, if any, in terms of UI elements since it is intended for operations without human interaction. Neither is the media type intended for mobile use where bandwidth and message size is a concern.

The media type will be based on JSON. It could just as well be based on XML but, in my experience, JSON is lot simpler to work with, fits the data needs I have met, and has a simple and easy-to-work-with patch format (application/json-patch) which will come in handy later on.

Armed with these constraints and requirements we are ready to build up our new media type.

Example business domain "BugMe"

Through out this blog post I will use the imaginary open standard "BugMe" for interacting with bug tracking systems through the new media type. BugMe supports adding of new bug reports, attaching documents to reports, adding comments to reports and similar features shown later on.

BugMe is not a part of the media type specification - it is only used to illustrate how the media type facilitates interaction with BugMe servers anywhere on the web.

Neither is BugMe a vendor specific "standard", it is strictly defined in terms of the generic media type and a set of bug reporting specific data structures and identifiers (more on that later on).

Compare this to APIs like Twitter and others; these are always defined in terms of vendor specific resources and explicit URL structures and was never designed to be implemented on servers anywhere else on the web.

To highlight the difference between a standard like BugMe and an actual implementation I will assume that some clever guy named Joe, who studies computer science 101 at Example.edu, has set up a BugMe server for some local study project. He is using an implementation that uses a vocabulary slightly different from  BugMe - it talks about "issues" where BugMe talks about "bug reports". This fact is illustrated through the concrete URLs used in the examples . The root URL is http://example.edu/~joe/track.

Example 1 - Creating a bug report

The first thing we will try is to create a new bug report with BugMe. To do so we must supply our client with a few details about the operation:
  • The root URL: http://example.edu/~joe/track/index.
  • A "create bug report" identifier (as defined by BugMe): "http://bugme.org/names/create-bug-report".
  • Bug reporting data (as defined by BugMe)
    • Title: "Something bad happened",
    • Description: "I pressed ctrl-alt-del and all went black",
    • Severity: 5
We must also have an identifier for the media type. Lets call it it "application/razor+json" for no specific reason.
Now we are ready to set our client loose and make it create the bug report. It will do so in the same manner as a human working with a web based UI: get a resource representation, look for well known identifiers that labels data and hyper media controls, fill out data and activate hyper media controls.

This interaction pattern, getting a resource representation and following instructions on the fly, has a price: it requires more complex client side logic than "normal RPC" patterns with design time binding of methods and it results in higher bandwidth due to the embedded hyper media controls. The upside is a much looser coupling between clients and serves. But all of this is of course already discussed in Fielding's thesis on REST ;-)

GET initial resource

At the very beginning our client has nothing to do but GET the root URL in hope of finding something useful there:

Request
GET /~joe/track/index
Accept: application/razor+json

Response
Content-Type: application/razor+json

{
  curies:
  [
    { prefix: "bug", reference: "http://bugme.org/names/" }
  ],
  controls:
  [
    ...,
    {
      type: "link",
      name: "bug:create-bug-report",
      href: "http://example.edu/~joe/track/add-issue",
      title: "Add issue to issue tracker"
    },
    ...
  ]
}

The returned JSON data contains two top level properties defined by the media type: curies and controls. "curies" define short names for URLs used as identifiers in the other elements (see http://www.w3.org/TR/curie/) and "controls" contains various hyper media controls. The use of curies should be optioinal - but it helps reading the responses in posts like this.

Now the client scans the "controls" element looking for the identifier "bug:create-bug-report". In this case it finds a "link" control which is equivalent to an ATOM link. Since our client understands all the features of the media type it will know that a link should be "followed" by issuing a HTTP GET on the "href" value.

This little "algorithm" is equivalent to what a human would do: open up a webpage, look for instructions on how to perform the task at hand and then follow them.

You may have noticed the dots "..." in the example. Those are there for a reason: they illustrate how the client only cares about stuff that is relevant to its current task. Anything else in the response is ignored. The consequence is that the server is free to evolve the content of the resource over time without breaking any clients - as long as it only adds new stuff. Neither does the client care if the content is supposed to be a "link page", a service index, a medical record or have any other specific "type" - as long as it contains elements that will help the client getting closer to its goal.

Follow link

Here we have the next operation:

Request
GET /~joe/track/add-issue
Accept: application/razor+json

Response
200 Ok
Content-Type: application/razor+json

{
  curies: ...,
  controls:
  [
    {
      type: "poe-factory",
      name: "bug:create-bug-report",
      href: "http://example.edu/~joe/track/add-issue",
      title: "Create new idempotent POE resource"
    }
  ]
}

Bingo! This time the client finds an "poe-factory" control with the right name "bug:create-bug-report" and now its time to create the bug report. The control type "poe-factory" means "Post Once Exactly factory" and is a special action element that enables idempotent POST operations. If you do not know what "idempotent" means then take a look at this page: http://www.infoq.com/news/2013/04/idempotent.

The good thing about idempotent operations is that they can safely be repeated if anything goes wrong on the network. If an operation times out the client can simply retry it again without the risk of creating the same entry multiple times. And since this new media type is for safe and durable "behind the scenes" work I find it rather important to include a mechanism for idempotent POST operations.

The implementation chosen here requires the client to do an empty POST first. This will create a new POE resource (thus the name "poe-factory") and redirect the client to it. The client can then POST to the new resource as many times it needs until the operation succeeds. The server returns "201 Created" first time it completes the operation whereas it returns "303 See Other" on following requests. In either case the server includes a "Location" header pointing to the new POE resource.

Subbu Allamaraju has a nice blog post on post once exactly techniques.

I chose this approach for the following reasons:
  • It has the simplest possible client side logic - at the cost of an extra round trip to the server. A similar solution could have required the client to create a GUID (message ID) and include it in the payload somehow, but that would make the protocol slightly more prone to client side errors.
  • It requires no special headers.
  • It adds no extra information to the payload.
  • URLs are opaque and the server gets to choose how the POE/message ID is encoded.

Create POE resource

In order to complete its task the client first issues an empty POST operation to the URL of the "href" attribute:

Request
POST /~joe/track/add-issue
Content-length: 0

Response
201 Created
Location: http://example.edu/~joe/track/add-issue/bd925-ye174h

GET POE resource

It should be rather obvious now that the client has no choice but to follow the response:

Request
GET /~joe/track/add-issue/bd925-ye174h
Accept: application/razor+json

Response
400 Ok
Content-Type: application/razor+json

{
  curies: ...,
  controls:
  [
    {
      type: "poe-action",
      name: "bug:create-bug-report",
      documentation: ... some URL ...,
      method: "POST",
      href: "http://example.edu/~joe/track/add-issue/bd925-ye174h",
      type: "application/json",
      scaffold: ... any JSON object ...,
      title: "Add issue"
    }
  ]
}

Now the client gets a response with a "poe-action" control. This tells the client that it can safely POST as many times it needs to the "href" URL. The actual payload is given by the BugMe specification (Title, Description, Severity).

Some comments on the above response:
  1. The payload is encoded in application/json as a trivial JSON object. Other formats may be included in the media type spec later on.
  2. This format is NOT intended for automatic creation of UI's and thus it contains no UI related list of field definitions or similar.
  3. It is NOT necessary to embed any kind of schema information - that sort of thing is given by the name of the control element.
  4. The optional "scaffold" value is the JSON payload equivalent of a URL template: it supplies default values to some properties and adds additional "hidden" properties the client can ignore (as long as they are sent back).
  5. POE-actions are not restricted to POST - a PATCH with json/patch would work as well (but then perhaps we need to change the action type name).

Create bug report

Then the client issues a new request:

Request
POST /~joe/track/add-issue/bd925-ye174h
Accept: application/razor+json
Content-Type: application/json

{
  Title: "Something bad happened",
  Description: "I pressed ctrl-alt-del and all went black",
  Severity: 5
}

Response
201 Created
Location: http://example.edu/~joe/track/issues/32

GET created bug report

Now we are done unless we want to see the actual created bug report by following the Location header:

Request
GET /~joe/track/issues/32
Accept: application/razor+json

Response
Content-Type: application/razor+json

{
  curies: ...,
  controls: ...,
  payloads:
  [
    ...,
    {
      name: "bug:bug-report",
      data:
      {
        Id: 32,
        Title: "Something bad happened",
        Description: "I pressed ctrl-alt-del and all went black",
        Severity: 5,
        Created: "2012-04-23T18:25:43Z"
      }
    },
    ...
  ]
}

Now that the client can see the actual bug report it wanted to create it knows that the task is completed. Everyone is smiling and put on their happy face :-)

Other hyper media controls

There are of course more scenarios to cover than this single "Create stuff" scenario and these scenarios will call for other kinds of hyper media controls, for instance URL templates, PATCH actions, binary file upload and more (I should cover these in some future blog posts ...)

Error handling

If the client receives a 4xx or 5xx status code it can inspect the JSON payload and look for a property named "error" together with the other "payloads" and "controls" properties. The "error" property should contain data according to my previous blog post on error handling.

Here is an example:

Request
POST /~joe/track/add-issue/bd925-ye174h
Accept: application/razor+json
Content-Type: application/json

{
  Title: "Something bad happened",
  Description: "I pressed ctrl-alt-del and all went black",
  Severity: 5
}

Response
503 Service Unavailable
Content-Type: application/razor+json

{
  error:
  {
    message: "Could not create new bug report; server is down for maintenance",
    ...
  }
}

In addition to this the client can try to use content negotiation to receive error information in the format of application/api-problem+json.

Client side processing algorithm

Here is a simplified view of how the client should process the content:
  1. GET initial root resource.
  2. [LOOP:] Look for hyper media controls with appropriate names.
  3. Check the type of the found control element:
    1. If it is a "link" then follow that link and restart from [LOOP].
    2. If it is a "poe-factory" then issue an empty POST to the href value and restart from [LOOP].
    3. if it is a "poe-action" then issue a request with the specified method and data encoded according to the "target" media type. Then restart from [LOOP].
  4. Look for a payload with the appropriate name: If it exists then the task is complete - otherwise it has failed (actually I don't like this last step, but that is the only kind of "acknowledge" I can see the server responding with).
A consequence of this approach is that the service specification (BugMe in my example) should state nothing about how to find and update data since that is up to the servers actual implementation. The service specification should only consider what kind of data to look for or modify. The "how"-part is contained entirely in the returned hyper media controls.

As the media type evolves and more types of hyper media controls are added the client(s) will grow more and more complex. This is one of the trade offs that has to be accepted in order to keep clients and servers as loosely coupled as possible.

If the media type gets popular one could even expect to see the same scenario we see with todays web browsers: there will be multiple implementations of the client libraries and some will implement more than others of the final specification.

No profile needed

It may be tempting to allow for a "profile" parameter with the media type ID. But typically that would be used to ask for a specific "type" of a resource like for instance "application/razor+json;profile=user". As can be seen in the client side processing algorithm above there is no need for such a thing, so lets not introduce it.

Related work

Quite a few other people are trying to create new media types to reach similar goals, but neither of them include features such as POE semantics. Here is the list of related media types that I am aware of:
And then there is Jim Webber's fantastic "How to GET a cup of coffee" which has been a big inspiration for me over the years.

Reasons for creating a new media type

How many media types should we invent? Well, as many as needed, I would say. The media type described here includes some features not found in other media types (POE semantics for instance) and that should be sufficient argument for creating a new one.

I don't see anything wrong by creating many media types - eventually a few of them will be good enough and gain enough traction to become ubiquitous standards. That's called evolution.

Summary

In this blog post I have tried to explain one way of understanding media type's role in RESTful web services and illustrated it by building up (parts of) a media type for systems integration. I have also touched upon the issue of "typed" resources and how to avoid it (by not assuming anything about the resource type and instead look for certain identifiers in the response) ... there could be a blog post more to come on this issue.

So what do you think? Was this useful, understandable, totally overkill, outright naive or simply a pile of, well, rubbish? Feel free to add a comment, Tweet me or send me an e-mail. I would love to get some feedback.

Happy hacking, Jørn

UPDATE 2014-02-24: I have actually put much of this into a media type called Mason. See http://soabits.blogspot.dk/2014/02/implementing-hypermedia-apis-and-rest.html.

onsdag, maj 15, 2013

Error handling considerations and best practices

A recurring topic in REST and Web API discussions is that of error handling (see for instance https://groups.google.com/d/topic/api-craft/GLz_nNbK-6U/discussion or http://stackoverflow.com/questions/942951/rest-api-error-return-good-practices]; what information should be included in error responses, how should HTTP status codes be used and what media type should the response be encoded in? In this blog post I will try to address these issues and give some guidelines based on my own experience and existing solutions.

Existing solutions

Let us first take a look at some existing solutions to get started:
  • The twitter API uses a list of descriptive error messages and error codes. Twitter has both JSON and XML representations with property names: "errors", "error", "code"
  • The Facebook Graph API has  a single descriptive error message, an error code and even a sub-code. Facebook uses a JSON representation with property names: "error", "message", "type", "code" and "error_subcode".
  • The Github API has a top level descriptive error message and a optional list of additional error elements. The items in the error list refers to resources, fields and codes. Github uses a JSON representation with property names: "message", "errors", "resource", "field", "code".
  • The US White House has a set of guidelines for its APIs on GitHub. The error message used here contains the HTTP status code, a developer message, a user message, an error code and links to further information.
  • Ben Longden has proposed a media type for error reporting. This specification includes an "logref" identifier that some how refers to a log entry on the server side - such a feature can help debugging server errors later on.
  • Mark Nottingham has introduced "Problem Details for HTTP APIs" as an IETF draft. This proposal makes use of URIs for identifying errors and is as such meant as a general and extensible format for "problem reporting".
All of these response formats share some similar content: one or more descriptive messages, status codes and links to further information. But as can be seen there is a wide variety in the actual implementation and wire format.

Considerations and guidelines

So, what should you do with your web API? Well, here are some considerations and guidelines you can base your error reporting format on ...

Target audience

Remember that your audience includes both the end user, the client developer, the client application and your frontline support (which may just happen to be you). Your error responses should include information that caters for all of these parties:
  • The end user needs a short descriptive message.
  • The client developer needs as much detailed information as possible to debug the application.
  • The client application needs error codes (HTTP status codes) for error recovery actions.
  • The frontline support people needs detailed information and/or keywords to look for in their knowledge database.

Use the HTTP status codes correct

The HTTP status codes are standardized all over the web and your clients will know immediately how to handle them. Make sure to use them correct:
  • Do NOT just return HTTP status code 200 (OK) regardless of success or failure.
  • Use 2xx when a request succeeds.
  • Use 4xx when a request fails and the client should be able to fix it by modifying its own request.
  • Use 5xx when a request fails due to some internal server error.

Use descriptive error messages

Be descriptive in your error messages and include as much context as possible. Failure to do so will cost you dearly in support later on: if your client developers cannot figure out why their request went wrong, they will look for help - and eventually that will be you who will spend time tracking down client errors instead of coding new and exiting features for your service.

If it is a validation error, be sure to include why it failed, where it failed and what part of it that failed. A message like "Invalid input" is horrible and client developers will bug you for it over and over again, wasting your precious development time. Be descriptive and include context: "Could not place order: the field 'Quantity' should be an integer between 0 and 99 (got 127)".

You may want to include both a short version for end users and a more verbose version for the client developer.

Localization

Error messages for end users should be localized (translated into other languages) if your service is already a multi language service. Personally I don't think developer messages should be localized: it is difficult to translate technical terms correct and it will make it more difficult to search online for more information.

When localization is introduced it may also be necessary to include language codes and maybe even allow for a list of different translations to be returned in the error response.

Allow for more than one message

Make it possible to include more than one message in the error response. Then try to collect all possible errors on the server side and return the complete list in a single response. This is not always possible - and requires some more coding on the server side (compared to simply throwing an exception first time some invalid input is detected).

Additional status codes

If your business domain calls for more detailed information than can be found in the normal HTTP status codes then include a business specific status code in the response. Make sure all of the codes are documented.

You may be tempted to include more technical error codes, but consider who your audience is for that: It won't help your end user. It may help your client application recovering from errors - but probably not in any way that was not already covered by the HTTP status codes. Your client developer may have some need for it - but why make them lookup error codes in online documentation when you can include descriptive error text and links that refers directly to the documentation? It may help your support - but if the client dev have enough information in the error response they won't need to call your support anyway - right?

Use letters for status codes

I often find myself searching for online resources that can help me when I get some error while interacting with third party APIs. Usually I search for a combination of the API name, error messages and codes. If you include additional error codes in your response then you might want to use letters instead of digits: it is simply more likely to get a relevant hit for something like "OAUTH_AUTHSERVER_UNAVAILABLE" than "1625".

Include links to online resources

Include links to online help and other resources that will either clarify what went wrong or in some other way help the client developer to solve the problem.

Support multiple media types

If your have a RESTful service that allows both client applications and developers to explore it then you might want to support a human readable media type for your error responses. HTML is perfect for this as it allows the client developers to view the error information righ in their browsers without installing any additional plugins. A fallback to plain text could also be useful (but probably overkill).

Include a timestamp or log-reference

It can help support and bug hunting if the error report contains a timestamp (server timezone or UTC). This may help locating the right logfile entries later on.

Another possibility is to include some other kind of information that refers back to the logfiles such that server developers and support people can track what happened.

Field-by-field messages

In some cases it makes sense to be explicit about the fields in the input that caused the errors and include field names in separate elements of the error response. For instance something like this JSON response:

{
  message: "One or more inputs were not entered correctly",
  errors:
  [
    { field: "Weight", message: "The value if 'Weight' exceeds 100 - the value should be between 0 and 100" },
    { field: "Height", message: "A value must be entered for 'Height'" }
  ]
}


This would make it possible for the client to highlight those fields in the UI and draw the end users attention to them. It is although difficult to keep clients and servers in sync and requires a lot of coding on both sides to get it to work. Usually field-by-field information is handled by client side validation logic anyway. So a clear error message like "The value of 'Weight' exceeded 100 - the value should be between 0 and 100" should be enough for most applications.

Include the HTTP status code

This may sound a bit odd, but according to people on api-craft there are some client side environments where the application code do not have access to the HTTP headers and status codes. To cater for these clients it may be necessary to include the HTTP status code in the error message payload.

Do not include stack traces

It may be tempting to include a stack trace for easier support when something goes wrong. Don't do it! This kind of information is too valuable for hackers and should be avoided.

Implementation

Now that we have our "requirements" ready we should be able to design a useful solution. Lets first try to define the response without considering an actual wire format:

  • message (string): the primary descriptive error message - either in the primary language of the server or translated into a language negotiated via the HTTP header "Accept-Language".
  • messages (List of string): an optional list of descriptive error messages (with the same language rules as above).
  • details (string): an optional descriptive text targeted at the client developer. This text should always be in the primary language of the expected developer community (that would be English in my case).
  • errorCode (string): an optional error code.
  • httpStatusCode (integer): an optional copy of the HTTP status code.
  • time (date-time): an optional timestamp of when the error occurred.
  • additional (any data): a placeholder for any kind of business specific data.
  • links (List of <string,string,string>): an optional list of links to other resources that can be helpful for debugging (but should probably not be shown to the end user). Each link consists of <href, rel, title> just like an ATOM link element.
I have ignored the possibility of having multiple translations of the messages. Neither does this implementation include any field-by-field validation since I expect that to be performed by the client. That doesn't mean the server shouldn't do the validation - it just doesn't have to return the detailed field information in a format readable by the client application.

JSON format example

Now it is time to select a wire format for the error information. I will choose JSON since that is a wide spread and well known format that can be handled by just about any piece of infrastructure nowadays. The format is straight forward and is probably best illustrated with a few examples:

Example 1 - the simplest possible instantiation

{
  message: "The field 'StartDate' did not contain a valid date (the value provided was '2013-20-23'). Dates should be formated as YYYY-MM-DD."
}


Example 2 - handling multiple validation errors

{
  message: "There was something wrong with the input (see below)",
  messages:
  [
    "The field 'StartDate' did not contain a valid date (the value provided was '2013-20-23'). Dates should be formated as YYYY-MM-DD.",
    "The field 'Title' must have a value."
  ]
}


Example 3 - using most of the features

{
  message: "Could not authorize user due to an internal problem - please try again later.",
  details: "The OAuth2 service is down for maintenance.",
  errorCode: "O2SERUNAV",
  httpStatusCode: 503,
  time: "2013-04-30T10:27:12",
  links:
  [
    { 

      href: "http://example.com/oauth2status.html", 
      rel: "help", 
      title: "Service status information" 
    }
  ]
}

Client implementation and media types - a matter of perspective

The client implementation should, at a suitable high level, be straight forward:
  1. Client makes an HTTP request.
  2. Request fails for some reason, server returns HTTP status code 4xx or 5xx and includes error information in the HTTP body.
  3. Client checks HTTP status code, sees that it is 4xx or 5xx and decodes the error information.
  4. Client tries to recover from error - either showing the error message to the end user, write the error to a log, give up or maybe retry the request - all depending on the error and the client's own capabilities.
But, hey, wait a minute ... how does the client know how to to decode the payload? I mean, perhaps the client asked for a resource representation containing medical records, but then it got a HTTP status code 400 - how is it supposed to know the format of the error information?

If the client is working with a vendor specific service, like Twitter and GitHub, then chances are that the client is hard wired to extract the error information based on the vendor specific service documentation. My guess is that this is how most clients are implemented.

But what if the client is working with a more, shall we say, RESTful service? That is; the client doesn't know what actual implementation it is interacting with. This could for instance be the case of clients consuming an ATOM feed (application/atom+xml). How would the client know how to decode the error response payload? Actually this seems like an unanswered question for ATOM since the spec is rather vague about this point (see for instance http://stackoverflow.com/questions/9874319/how-to-represent-error-messages-in-atom-feeds)

A RESTful service specification may call for a media type dedicated to error reporting; lets call such a media type "application/error+json". When the client receives a 4xx or 5xx HTTP status it can then look at the content-type header: if it matches "application/error+json" then the client would know exactly what to look for in the HTTP body.

It could also be that the base media type included detailed specification about error payloads.

I would prefer one of the two last options: either specify error handling in the base media type of the service - or use an existing standard media type. The last option is actually what Mark Nottingham has done with https://tools.ietf.org/html/draft-nottingham-http-problem-03.

So it is a matter of perspective: vendor specific "one-of-a-kind" services tend to invent their own error formats whereas RESTful services (like ATOM) should standardize error reporting via media types for everyone to reuse all over the web.

Have fun, Jørn