OA Web Service

The PMC OA service allows users to discover downloadable resources from the PMC Open Access Subset some or all openaccess content. These articles are available for download from our FTP site in tgz (tarred, gzipped) format, or, for those articles that have them, in PDF format as well.

This service provides an API to allow discovery of resources related to articles. For example, it can be used to find the PDFs of all articles that have been updated since a specified date. This could facilitate implementing tools that reuse the OA subset content, such as mirror sites, text mining processes, etc.

If you have questions or comments about this service, please write to the PMC help desk. To stay informed about new or updated tools or services provided by PMC, subscribe to the PMC-Utils-Announce mailing list.

The base URL for the service is https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi. Requests to the service use HTTP GET or POST, with a set of parameters that specify the desired data. There are two types of responses: an identification response, which returns information about the service and database as a whole, and a results set response, which returns a list of records. These are described in more detail below.

Identification response

Accessing the base URL of the service, without any other parameters, retrieves a response that provides information about the database. For example,

Get database information:
https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi

The response gives a list of the data formats supported (currently pdf and tgz), a count of the number of records in the OA subset (total and by format), and the dates/times of the earliest and latest updates. For example,

<OA>
  <responseDate>2019-01-28 12:13:23</responseDate>
  <request>https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi</request>
  <repositoryName>PubMed Central Open Access FTP Repository</repositoryName>
  <formats>
    <format>tgz</format>
    <format>pdf</format>
  </formats>
  <records>
    <count>2305124</count>
    <count format="tgz">2305124</count>
    <count format="pdf">554142</count>
    <earliest>1970-01-01 00:00:00</earliest>
    <latest>2019-01-28 11:16:33</latest>
  </records>
</OA>

All dates and times are given in local time in Bethesda, Maryland: either EST (-05:00) or EDT (-04:00), depending on the time of year. There is a space separating the date from the time.

Results set response

Adding parameters to the request causes the service to return information about a set of records in the database, as the following examples illustrate.

Get a record by id:
https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?id=PMC5334499

In addition to echoing the response date and time, this will provide information about any downloadable resources for that article, for example:

<OA>
  <responseDate>2019-01-28 10:41:16</responseDate>
  <request id="PMC5334499">https://www.ncbi.nlm.nih.gov/utils/oa/oa.fcgi?id=PMC5334499</request>
  <records returned-count="2" total-count="2">
    <record id="PMC5334499" citation="World J Radiol. 2017 Feb 28; 9(2):27-33">
      <link format="tgz" updated="2017-03-17 13:10:45"
        href="ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/8e/71/PMC5334499.tar.gz"/>
      <link format="pdf" updated="2004-10-01 13:09:51"
        href="ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8e/71/WJR-9-27.PMC5334499.pdf"/>
    </record>
  </records>
</OA>
Get all the records updated on or after a specified date:
https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?from=2019-01-01

The value of the from parameter is either a date, in YYYY-MM-DD format, or a date/time combination, in YYYY-MM-DD HH:MM:SS format. As with dates/times in responses, these are in local time in Bethesda, Maryland. Note that in a URL, the space separating the date and the time can be represented either as a "+" or as "%20". For example,

Get all the records updated since a date/time:
https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?from=2019-01-01+08:00:00

If there are more than 1000 records in a result set, then only the first 1000 will be returned, and the response will end with a <resumption> element, describing how to get the next 1000 records. For example,

<resumption>
  <link token="1102623!20130101000000!!!a1e8c64fd7952a09"
    href="https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?resumptionToken=1102623!20130101000000!!!a1e8c64fd7952a09"/>
</resumption>
Get the next 1000 records in a result set:
https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?resumptionToken=843921!20120101000000!!!6e8a2c112f595273

You can also filter results by format. For example,

Get the records that have PDFs, updated since a date:
https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?from=2019-01-01&format=pdf

Additionally, you can specify a range of dates/times, using both from and until. For example,

Get the records that have PDFs, updated between two dates/times:
https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?from=2019-01-02&until=2019-01-02+07:00:00&format=pdf

Error responses

If there is any error in the request parameters, then a response will be produced that contains the <error> tag, with a description of the problem.

Support Center

Last updated: Mon, 28 Jan 2019