OA Web Service
The PMC OA service allows users to discover downloadable resources from the PMC Open Access Subset . These articles are available for download from our FTP site in tgz (tarred, gzipped) format, or, for those articles that have them, in PDF format as well.
This service provides an API to allow discovery of resources related to articles. For example, it can be used to find the PDFs of all articles that have been updated since a specified date. This could facilitate implementing tools that reuse the OA subset content, such as mirror sites, text mining processes, etc.
If you have questions or comments about this service, please write to the PMC help desk. To stay informed about new or updated tools or services provided by PMC, subscribe to the PMC-Utils-Announce mailing list.
The base URL for the service is https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi. Requests to the service use HTTP GET or POST, with a set of parameters that specify the desired data. There are two types of responses: an identification response, which returns information about the service and database as a whole, and a results set response, which returns a list of records. These are described in more detail below.
Identification response
Accessing the base URL of the service, without any other parameters, retrieves a response that provides information about the database. For example,
- Get database information:
- https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi
The response gives a list of the data formats supported (currently pdf and tgz), a count of the number of records in the OA subset (total and by format), and the dates/times of the earliest and latest updates. For example,
<OA> <responseDate>2019-01-28 12:13:23</responseDate> <request>https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi</request> <repositoryName>PubMed Central Open Access FTP Repository</repositoryName> <formats> <format>tgz</format> <format>pdf</format> </formats> <records> <count>2305124</count> <count format="tgz">2305124</count> <count format="pdf">554142</count> <earliest>1970-01-01 00:00:00</earliest> <latest>2019-01-28 11:16:33</latest> </records> </OA>
All dates and times are given in local time in Bethesda, Maryland: either EST (-05:00) or EDT (-04:00), depending on the time of year. There is a space separating the date from the time.
Results set response
Adding parameters to the request causes the service to return information about a set of records in the database, as the following examples illustrate.
- Get a record by id:
- https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?id=PMC5334499
In addition to echoing the response date and time, this will provide information about any downloadable resources for that article, for example:
<OA> <responseDate>2019-01-28 10:41:16</responseDate> <request id="PMC5334499">https://www.ncbi.nlm.nih.gov/utils/oa/oa.fcgi?id=PMC5334499</request> <records returned-count="2" total-count="2"> <record id="PMC5334499" citation="World J Radiol. 2017 Feb 28; 9(2):27-33"> <link format="tgz" updated="2017-03-17 13:10:45" href="ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/8e/71/PMC5334499.tar.gz"/> <link format="pdf" updated="2004-10-01 13:09:51" href="ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8e/71/WJR-9-27.PMC5334499.pdf"/> </record> </records> </OA>
- Get all the records updated on or after a specified date:
- https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?from=2019-01-01
The value of the from
parameter is either a date, in
YYYY-MM-DD format, or a date/time combination, in
YYYY-MM-DD HH:MM:SS format. As with dates/times in responses,
these are in local time in Bethesda, Maryland. Note that in a URL, the space separating the
date and the time can be represented either as a "+" or as "%20". For example,
- Get all the records updated since a date/time:
- https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?from=2019-01-01+08:00:00
If there are more than 1000 records in a result set, then only the first 1000 will be
returned, and the response will end with a <resumption>
element, describing
how to get the next 1000 records. For example,
<resumption> <link token="1102623!20130101000000!!!a1e8c64fd7952a09" href="https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?resumptionToken=1102623!20130101000000!!!a1e8c64fd7952a09"/> </resumption>
- Get the next 1000 records in a result set:
- https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?resumptionToken=843921!20120101000000!!!6e8a2c112f595273
You can also filter results by format. For example,
- Get the records that have PDFs, updated since a date:
- https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?from=2019-01-01&format=pdf
Additionally, you can specify a range of dates/times, using both from
and
until
. For example,
- Get the records that have PDFs, updated between two dates/times:
- https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?from=2019-01-02&until=2019-01-02+07:00:00&format=pdf
Error responses
If there is any error in the request parameters, then a response will be produced that contains the <error> tag, with a description of the problem.