I apologize for adding this so late to the "SAS or R software " thread. This is a question, not a reply, but it seems to me to fit in well with the subject of this thread. I would like to know anyone's experiences in the following two areas below. I should add I have no experience myself in these areas: 1) Migrating from SAS to R in the choice of statistical software used for FDA reporting. (For example, was there more effort involved in areas of documentation, revision tracking, or validation of software codes?) 2) Migrating from SAS to R in the choice of statistical software used for NIH reporting (or other US or non-US) government agencies) . I find myself using R more and more and being continually amazed by its breadth of capabilities, though I have not tried ordering pizza yet. I use SAS, S-Plus, and, more recently, R for survival analysis and recurrent events in clinical trials. Alex Cambon Biostatistician School of Public Health and Information Sciences University of Louisville
SAS or R software
15 messages · Alexander C Cambon, Henric Nilsson, Jonathan Baron +6 more
Alexander C Cambon wrote:
I apologize for adding this so late to the "SAS or R software " thread. This is a question, not a reply, but it seems to me to fit in well with the subject of this thread. I would like to know anyone's experiences in the following two areas below. I should add I have no experience myself in these areas: 1) Migrating from SAS to R in the choice of statistical software used for FDA reporting. (For example, was there more effort involved in areas of documentation, revision tracking, or validation of software codes?)
FDA has no requirements. They accept Minitab and even accept Excel. Requirements are to be a good statistician doing quality reproducible work for its own sake.
2) Migrating from SAS to R in the choice of statistical software used for NIH reporting (or other US or non-US) government agencies) .
No issues. Frank
I find myself using R more and more and being continually amazed by its breadth of capabilities, though I have not tried ordering pizza yet. I use SAS, S-Plus, and, more recently, R for survival analysis and recurrent events in clinical trials. Alex Cambon Biostatistician School of Public Health and Information Sciences University of Louisville
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Alexander C Cambon wrote:
I apologize for adding this so late to the "SAS or R software " thread. This is a question, not a reply, but it seems to me to fit in well with the subject of this thread. I would like to know anyone's experiences in the following two areas below. I should add I have no experience myself in these areas: 1) Migrating from SAS to R in the choice of statistical software used for FDA reporting. (For example, was there more effort involved in areas of documentation, revision tracking, or validation of software codes?)
This brings up a question that I have often asked but have never had answered. If someone asks me if R is "validated" I usually respond "by whom and for what?". There seems to be an belief that the FDA validates software as acceptable for use in the analysis of data for a submission to the FDA. However I have never met anyone who can describe to me exactly what this entails. So I can't say if R is "validated" because I don't know what that means. As I understand it the FDA does not certify or validate software as providing "correct" or acceptable answers. I have been told that what the FDA requires is that the software used to produce the results quoted in a submission should be auditable. That is, the FDA must be able to check exactly how the numerical results were produced, should they wish to do so. This can be tricky for proprietary software because typically the group making the submission does not have access to the source code so there has to be a delicate three-way negotiation on the extent to which the software vendor will reveal their source code. However, revealing source code not a difficult issue in the open source world. Representatives of the FDA (or anyone else, for that matter) can read the source code any time they want to. In fact they are encouraged to do so. So if the standard is "auditable" I don't think you get much more auditable than R is.
On Fri, 2004-12-17 at 17:11 -0500, Alexander C Cambon wrote:
I apologize for adding this so late to the "SAS or R software " thread. This is a question, not a reply, but it seems to me to fit in well with the subject of this thread. I would like to know anyone's experiences in the following two areas below. I should add I have no experience myself in these areas: 1) Migrating from SAS to R in the choice of statistical software used for FDA reporting.
You will find that to be a non-issue from the FDA's perspective. This has been discussed here with some frequency. If you search the archives you will find comments from Frank Harrell and others. The FDA does not and cannot endorse a particular software product. Nor does it validate any statistical software for a specific purpose. They do need to be able to reproduce the results, which means they need to know what software product was used, which version and on what platform, etc. The SAS XPORT Transport Format (which is openly defined and documented), has been used for the transfer of data sets and has been available in many statistical products. There have been a variety of activities (CDISC, HL-7, etc) regarding the electronic submission of data to the FDA. Some additional information is here: http://www.fda.gov/cder/regulatory/ersr/default.htm and here: http://www.cdisc.org/news/index.html Any other issues impacting the selection of a particular statistical application are more likely to be political within your working environment and FUD. As you are likely aware, other statistically relevant issues are contained in various ICH guidance documents regarding GCP considerations and principles for clinical trials: http://www.ich.org/UrlGrpServer.jser?@_ID=475&@_TEMPLATE=272 Keep in mind also that one big advantage R has (in my mind) is the use of Sweave for the reproducible generation of reports, which to an extent are self-documenting.
(For example, was there more effort involved in areas of documentation, revision tracking, or validation of software codes?
Since the FDA's role with computer software and validation has been raised before, the following documents cover many of these areas. The list is not meant to be exhaustive, but should give a flavor in this domain. There are specific guidance documents by the FDA pertaining to software that is contained in a medical device (ie. the firmware in a pacemaker or medical monitoring equipment) or is used to develop a medical device. The current guidance in this case is here: http://www.fda.gov/cdrh/comp/guidance/938.html Other guidance pertains to 21 CFR 11, which addresses data management systems used for clinical trials and covers issues such as electronic signatures, audit trails and the like. A guidance document for that is here: http://www.fda.gov/cder/guidance/5667fnl.htm Keep in mind, from a perspective standpoint, that even MS Excel and Access can be made to be 21 CFR 11 compliant and there are companies whose business is focused on just that task. There is also a general guidance document for computer systems used in clinical trials here: http://www.fda.gov/ora/compliance_ref/bimo/ffinalcct.htm Though it is to be superseded by a draft document here: http://www.fda.gov/cder/guidance/6032dft.htm
2) Migrating from SAS to R in the choice of statistical software used for NIH reporting (or other US or non-US) government agencies) .
Same here to my knowledge. As I was typing this, I see Frank just responded. I also just noted Doug's post, so perhaps some of the above information will be helpful in clarifying some of his questions as well. I believe that the above is factually correct, but if someone knows anything to not be so, please correct me. HTH, Marc Schwartz
Marc Schwartz said the following on 2004-12-18 01:19:
As you are likely aware, other statistically relevant issues are contained in various ICH guidance documents regarding GCP considerations and principles for clinical trials: http://www.ich.org/UrlGrpServer.jser?@_ID=475&@_TEMPLATE=272
ICH E9 states that (p. 27): "The computer software used for data management and statistical analysis should be reliable, and documentation of appropriate software testing procedures should be available." Some commercial software vendors (SAS, Insightful, and StatSoft) offer white papers stating that their software can work within an 21 CFR Part 11 compliant system. http://www.sas.com/industry/pharma/develop/papers.html http://www.insightful.com/industry/pharm/21cfr_part11_Final.pdf http://www.statsoft.com/support/whitepapers/pdf/STATISTICA_CFR.pdf Some commercial vendors (SAS and Insightful) also offers tools for validation of the installation and operation of the software. SAS has http://support.sas.com/documentation/installcenter/common/91/ts1m3/qualification_tools_guide.pdf and S-PLUS has validate(). As a statistical consultant working within the pharamceutical industry, I think that our clients find the white papers being some kind of quality seal. It signals that someone has actually thought about the issues involved, written a document about it, and even stated that it can be done. Of course, there's a lot of FUD going on here. But if our lives can be made simpler by producing similar white papers and QA tools, why not? (But for some people, only SAS will do: Last week we were audited on behalf of a client. One of the specific issues discussed were validation and the Part 11 compliance of S-PLUS. In this specific trial, data are to be transferred from Oracle Clinical -> SAS -> SPLUS, and they auditors were really worried about the first and last link of that chain. Finally, they suggested using only SAS... And in this particular case, Part 11 is really a non-issue since physical records exists (i.e. case report forms) and all final S-PLUS output and code will also be stored physically (i.e. print-outs) -- no need for electronic signatures here!)
There is also a general guidance document for computer systems used in clinical trials here: http://www.fda.gov/ora/compliance_ref/bimo/ffinalcct.htm Though it is to be superseded by a draft document here: http://www.fda.gov/cder/guidance/6032dft.htm
From the introduction (p. 2): "This document provides guidance about computerized systems that are used to create, modify, maintain, archive, retrieve, or transmit clinical data required to be maintained and/or submitted to the Food and Drug Administration (FDA)" The `retrieve' part is certainly applicable. If we regard R as off-the-shelf software, the guidance says (p. 11): "For most off-the-shelf software, the design level validation will have already been done by the company that wrote the software. Given the importance of ensuring valid clinical trial data, FDA suggests that the sponsor or contract research organization (CRO) have documentation (either original validation documents or on-site vendor audit documents) of this design level validation by the vendor and would itself have performed functional testing (e.g., by use of test data sets) and researched known software limitations, problems, and defect corrections. Detailed documentation of any additional validation efforts performed by the sponsor or CRO will preserve the findings of these efforts. In the special case of database and spreadsheet software that is: (1) purchased off-the-shelf, (2) designed for and widely used for general purposes, (3) unmodified, and (4) not being used for direct entry of data, the sponsor or contract research organization may not have documentation of design level validation. FDA suggests that the sponsor or contract research organization perform functional testing (e.g., by use of test data sets) and research known software limitations, problems, and defect corrections. In the case of off-the-shelf software, we recommend that the following be available to the Agency on request: * A written design specification that describes what the software is intended to do and how it is intended to do it; * A written test plan based on the design specification, including both structural and functional analysis; and * Test results and an evaluation of how these results demonstrate that the predetermined design specification has been met." I think the guidance is quite clear here. We must prove to the FDA, at their wish, that the software used is working properly. In order to do this, we seem to need documents describing the development process and the QA tools used by R Core. An idea of what we'll need may be found in the `Computer Systems Validation in Clinical Research - A Practical Guide (Edition 1)' at http://www.acdm.org.uk/public/publications/publications.htm Especially section 2.4, 5 + subsections, 8 + subsections, and 9.7 + subsections seem relevant. (I've ordered the 2nd edition, but it hasn't arrived yet.) Henric
Marc Schwartz wrote:
On Fri, 2004-12-17 at 17:11 -0500, Alexander C Cambon wrote:
I apologize for adding this so late to the "SAS or R software " thread. This is a question, not a reply, but it seems to me to fit in well with the subject of this thread. I would like to know anyone's experiences in the following two areas below. I should add I have no experience myself in these areas: 1) Migrating from SAS to R in the choice of statistical software used for FDA reporting.
You will find that to be a non-issue from the FDA's perspective. This has been discussed here with some frequency. If you search the archives you will find comments from Frank Harrell and others. The FDA does not and cannot endorse a particular software product. Nor does it validate any statistical software for a specific purpose. They do need to be able to reproduce the results, which means they need to know what software product was used, which version and on what platform, etc. The SAS XPORT Transport Format (which is openly defined and documented), has been used for the transfer of data sets and has been available in many statistical products. There have been a variety of activities (CDISC, HL-7, etc) regarding the electronic submission of data to the FDA. Some additional information is here: http://www.fda.gov/cder/regulatory/ersr/default.htm and here: http://www.cdisc.org/news/index.html Any other issues impacting the selection of a particular statistical application are more likely to be political within your working environment and FUD. As you are likely aware, other statistically relevant issues are contained in various ICH guidance documents regarding GCP considerations and principles for clinical trials: http://www.ich.org/UrlGrpServer.jser?@_ID=475&@_TEMPLATE=272 Keep in mind also that one big advantage R has (in my mind) is the use of Sweave for the reproducible generation of reports, which to an extent are self-documenting.
(For example, was there more effort involved in areas of documentation, revision tracking, or validation of software codes?
Since the FDA's role with computer software and validation has been raised before, the following documents cover many of these areas. The list is not meant to be exhaustive, but should give a flavor in this domain. There are specific guidance documents by the FDA pertaining to software that is contained in a medical device (ie. the firmware in a pacemaker or medical monitoring equipment) or is used to develop a medical device. The current guidance in this case is here: http://www.fda.gov/cdrh/comp/guidance/938.html Other guidance pertains to 21 CFR 11, which addresses data management systems used for clinical trials and covers issues such as electronic signatures, audit trails and the like. A guidance document for that is here: http://www.fda.gov/cder/guidance/5667fnl.htm Keep in mind, from a perspective standpoint, that even MS Excel and Access can be made to be 21 CFR 11 compliant and there are companies whose business is focused on just that task. There is also a general guidance document for computer systems used in clinical trials here: http://www.fda.gov/ora/compliance_ref/bimo/ffinalcct.htm Though it is to be superseded by a draft document here: http://www.fda.gov/cder/guidance/6032dft.htm
2) Migrating from SAS to R in the choice of statistical software used for NIH reporting (or other US or non-US) government agencies) .
Same here to my knowledge. As I was typing this, I see Frank just responded. I also just noted Doug's post, so perhaps some of the above information will be helpful in clarifying some of his questions as well. I believe that the above is factually correct, but if someone knows anything to not be so, please correct me. HTH, Marc Schwartz
In addition to the excellent points made by Marc, Doug, and Matt, I want to expand on the revision tracking point originally raised by Alexander. We use CVS for all pharmaceutical industry work. Besides allowing two statisticians working on each project to mirror each other's data and code (for backup when one is out and a pressing question is asked), the revision control and commented change tracking of CVS has proven to work incredibly well in this arena. The one area where we use SAS for pharmaceutical industry work is running SAS PROC EXPORT to convert data to cvs format for importing with the Hmisc package's sasxport.get function (see http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/SASexportHowto). We found that reading binary SAS transport format datasets in R or with Stat/Transfer was not reliable enough. We have a freely available SAS macro that runs PROC EXPORT in a loop to get all datasets in a data library, with metadata. That way any SAS exporting errors can be blamed on SAS. Ironically there is a bug in PROC EXPORT. When a character field has an unmatched quote in it, the CSV file can result in an odd number of quotes for the field. sasxport.get checks the number of records imported against the number reported by PROC CONTENTS, so this problem is easily detected and corrected with emacs. Note that with literally billions of dollars at their disposal, SAS didn't take the time to really write a procedure for PROC EXPORT. Like the R sas.get function, it generates voluminous SAS DATA step code to do the work. Regarding CDISC, the SAS transport format that is now accepted by FDA is deficient because there is no place for certain metadata (e.g., units of measurement, value labels are remote from the datasets, variable names are truncated to 8 characters). The preferred format for CDISC will become XML.
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Henric Nilsson wrote:
Marc Schwartz said the following on 2004-12-18 01:19:
As you are likely aware, other statistically relevant issues are contained in various ICH guidance documents regarding GCP considerations and principles for clinical trials: http://www.ich.org/UrlGrpServer.jser?@_ID=475&@_TEMPLATE=272
ICH E9 states that (p. 27): "The computer software used for data management and statistical analysis should be reliable, and documentation of appropriate software testing procedures should be available." Some commercial software vendors (SAS, Insightful, and StatSoft) offer white papers stating that their software can work within an 21 CFR Part 11 compliant system. http://www.sas.com/industry/pharma/develop/papers.html http://www.insightful.com/industry/pharm/21cfr_part11_Final.pdf http://www.statsoft.com/support/whitepapers/pdf/STATISTICA_CFR.pdf Some commercial vendors (SAS and Insightful) also offers tools for validation of the installation and operation of the software. SAS has http://support.sas.com/documentation/installcenter/common/91/ts1m3/qualification_tools_guide.pdf and S-PLUS has validate(). As a statistical consultant working within the pharamceutical industry, I think that our clients find the white papers being some kind of quality seal. It signals that someone has actually thought about the issues involved, written a document about it, and even stated that it can be done. Of course, there's a lot of FUD going on here. But if our lives can be made simpler by producing similar white papers and QA tools, why not? (But for some people, only SAS will do: Last week we were audited on behalf of a client. One of the specific issues discussed were validation and the Part 11 compliance of S-PLUS. In this specific trial, data are to be transferred from Oracle Clinical -> SAS -> SPLUS, and they auditors were really worried about the first and last link of that chain. Finally, they suggested using only SAS... And in this particular case, Part 11 is really a non-issue since physical records exists (i.e. case report forms) and all final S-PLUS output and code will also be stored physically (i.e. print-outs) -- no need for electronic signatures here!)
There is also a general guidance document for computer systems used in clinical trials here: http://www.fda.gov/ora/compliance_ref/bimo/ffinalcct.htm Though it is to be superseded by a draft document here: http://www.fda.gov/cder/guidance/6032dft.htm
From the introduction (p. 2): "This document provides guidance about computerized systems that are used to create, modify, maintain, archive, retrieve, or transmit clinical data required to be maintained and/or submitted to the Food and Drug Administration (FDA)" The `retrieve' part is certainly applicable.
...
Henric
That is not clear. And since FDA allows submissions using Excel, with not even an audit trail, and with known major statistical computing errors in Excel, I am fairly certain that it is not applicable or at the least is not enforced in any meaningful way.
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Frank E Harrell Jr said the following on 2004-12-18 15:03:
That is not clear.
Perhaps. And I think this is the issue. From the clients' perspective, not a single FDA document states that you can use other software than SAS. They haven't really thought about the fact that there isn't any FDA documents encouraging the use of SAS for statistical analyses. I don't think that the real problem is convincing regulatory authorities that R (or any other (open-source) software for that matter) is operating adequately. But clients and auditors seems to reason along the lines of "rather being safe than sorry" and "nobody's ever been critized for using SAS". From their perspective, when we propose using `some other' software they start thinking that it perhaps may jeopardize their trial results (and, all to often, "but doesn't FDA require SAS?"). How to fight this? I don't know. Right now I'm thinking, "If you can beat 'em, join 'em" and that the way of proving that `some other' software works is through having similar documents and tools as the commercial vendors.
And since FDA allows submissions using Excel, with not even an audit trail, and with known major statistical computing errors in Excel, I am fairly certain that it is not applicable or at the least is not enforced in any meaningful way.
The general preconception seems to be that neither SAS nor Excel needs validation. E.g. the British guideline referenced in my previous email states on p. 12 that "It is generally considered that there is no requirement for validation of commercial hardware and established operating systems or for packages such as the SAS system, Oracle and MS Excel, as entities in their own right. However, most are configurable systems and so need adequate control of installation and their configuration parameters." Luckily for Excel, not a single word about precision and adequacy... Henric
There were two earlier threads on this topic: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/17554.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/10706.html Jon
Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron R search page: http://finzi.psych.upenn.edu/
Frank E Harrell Jr wrote:
... much discussion deleted ...
Regarding CDISC, the SAS transport format that is now accepted by FDA is deficient because there is no place for certain metadata (e.g., units of measurement, value labels are remote from the datasets, variable names are truncated to 8 characters). The preferred format for CDISC will become XML.
Since you brought up the SAS XPORT data format I have to respond with my usual rant about it. <rant> When it comes to the SAS XPORT data format those are at best third or fourth order deficiencies in the metadata. The first order deficiency in the metadata is that it does not contain the number of records in a data set. In this format a file can contain more than one data set and a data set consists of an unknown number of fixed-length records. Because of the potential of more than one data set you can't just read to the end of the file or use the file size and the record size to calculate the number of records. You must read through the file examining each group of 80 characters (Why 80 characters? Those of us who remember punched cards can tell you why.) and for each such group try to determine if this is the beginning of another record in the current data set or the beginning of a new data set. How is the beginning of a new data set indicated - by a magic string of characters. What if, either perversely or accidently, this magic string of characters were included as a text field at the beginning of a record? You wouldn't be able to tell if you have a new record or a new data set. Even better than that, there are situations in which the number of records in a data set is not well-defined due to the requirement of padding the last 80 character group with blanks. (After all when you create a punch card deck from your data set you want to get an integer number of punched cards.) For example, if you are writing an odd number of 40 character records then you must pad the last 80 character group with blanks. When reading this data set how can you distinguish the odd number of records padded with blanks from an even number of records in which the last record happened to be all blanks? You can't. When I first encountered this, I thought that I must not understand the format properly. I thought that SAS (and, through SAS, the FDA) couldn't really be using a format in which the number of records in a data set can be ambiguous. This would mean that the operations of writing the XPORT data set and reading it are not guaranteed to be inverses. I started reading material on the SAS web site and discovered that SAS indeed was aware of this problem and had a solution - users should not create data sets that exhibit this abiguity. That's it. Their solution is "don't do that". </rant> I think that replacing the SAS XPORT data format with XML will be a step forward.
Henric Nilsson wrote:
Frank E Harrell Jr said the following on 2004-12-18 15:03:
That is not clear.
Perhaps. And I think this is the issue. From the clients' perspective, not a single FDA document states that you can use other software than SAS. They haven't really thought about the fact that there isn't any FDA documents encouraging the use of SAS for statistical analyses.
Right. This reminds me of the worst movie of all time, Plan 9 From Outer Space, in which the psychic Creskin closes the movie by saying "Can you prove that this DIDN'T happen?".
I don't think that the real problem is convincing regulatory authorities that R (or any other (open-source) software for that matter) is operating adequately. But clients and auditors seems to reason along the lines of "rather being safe than sorry" and "nobody's ever been critized for using SAS". From their perspective, when we propose using `some other' software they start thinking that it perhaps may jeopardize their trial results (and, all to often, "but doesn't FDA require SAS?").
Yes that is the hurdle.
How to fight this? I don't know. Right now I'm thinking, "If you can beat 'em, join 'em" and that the way of proving that `some other' software works is through having similar documents and tools as the commercial vendors.
With the job market for statisticians being excellent, I've often wondered why clinical statisticians in industry are so often timid. Statisticians need to show strength and stamina, along with good teaching skills, on this issue.
And since FDA allows submissions using Excel, with not even an audit trail, and with known major statistical computing errors in Excel, I am fairly certain that it is not applicable or at the least is not enforced in any meaningful way.
The general preconception seems to be that neither SAS nor Excel needs validation. E.g. the British guideline referenced in my previous email states on p. 12 that "It is generally considered that there is no requirement for validation of commercial hardware and established operating systems or for packages such as the SAS system, Oracle and MS Excel, as entities in their own right. However, most are configurable systems and so need adequate control of installation and their configuration parameters."
This makes me wonder about the British system. Have they not seen the serious calculation errors documented to be in Excel?
Luckily for Excel, not a single word about precision and adequacy...
Right. Thanks for your note Henric -Frank
Henric
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
On 19-Dec-04 Frank E Harrell Jr wrote:
Henric Nilsson wrote:
How to fight this? I don't know. Right now I'm thinking, "If you can beat 'em, join 'em" and that the way of proving that `some other' software works is through having similar documents and tools as the commercial vendors.
With the job market for statisticians being excellent, I've often wondered why clinical statisticians in industry are so often timid. Statisticians need to show strength and stamina, along with good teaching skills, on this issue.
Because, I fear (and I don't have good documentation on it but I do have quite a strong impression), that such is not what their employers and managers see as their role and function. Other may wish to comment ... Best wishes to all, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 19-Dec-04 Time: 10:00:48 ------------------------------ XFMail ------------------------------
All good points; in my current organization there seem to be 3 hurdles that need to be crossed. Most are internal issues, but all related to conservative interpretation of Part 11. 1. Qualifications: installation, operational, and performance. R clearly satisfies the first and third, the second perhaps needs someone in R core or similar (i.e. consultant, etc) needs to provide the OQ. 2. Statistical results as derived variables (i.e. data). If so, then Part 11 can apply, if not it might not. 3. Removing the "Open Source" moniker (which gets legal people really upset) and treating R as quality vendor-supplied code under a novel licensing scheme which has source available and for which a business case can be made. Back in the old days (i.e. when I was in high school in the 80s), our school mini computers had source for the OSs available, and for most critical vendor or contractor suppiled software, we had source. In fact, it was standard! Anyway, I'm slowly working on these issues internally. At somepoint, there will be a breakthrough at one pharma, making it easier for the rest. Right now my issue is how to deal with Clinical QA, the equivalent group is a nightmare bureaucracy to work through, I'm sure, at most large pharmas. best, -toniy On Sat, 18 Dec 2004 14:10:40 +0100, Henric Nilsson
<henric.nilsson at statisticon.se> wrote:
Marc Schwartz said the following on 2004-12-18 01:19:
As you are likely aware, other statistically relevant issues are contained in various ICH guidance documents regarding GCP considerations and principles for clinical trials: http://www.ich.org/UrlGrpServer.jser?@_ID=475&@_TEMPLATE=272
ICH E9 states that (p. 27): "The computer software used for data management and statistical analysis should be reliable, and documentation of appropriate software testing procedures should be available." Some commercial software vendors (SAS, Insightful, and StatSoft) offer white papers stating that their software can work within an 21 CFR Part 11 compliant system. http://www.sas.com/industry/pharma/develop/papers.html http://www.insightful.com/industry/pharm/21cfr_part11_Final.pdf http://www.statsoft.com/support/whitepapers/pdf/STATISTICA_CFR.pdf Some commercial vendors (SAS and Insightful) also offers tools for validation of the installation and operation of the software. SAS has http://support.sas.com/documentation/installcenter/common/91/ts1m3/qualification_tools_guide.pdf and S-PLUS has validate(). As a statistical consultant working within the pharamceutical industry, I think that our clients find the white papers being some kind of quality seal. It signals that someone has actually thought about the issues involved, written a document about it, and even stated that it can be done. Of course, there's a lot of FUD going on here. But if our lives can be made simpler by producing similar white papers and QA tools, why not? (But for some people, only SAS will do: Last week we were audited on behalf of a client. One of the specific issues discussed were validation and the Part 11 compliance of S-PLUS. In this specific trial, data are to be transferred from Oracle Clinical -> SAS -> SPLUS, and they auditors were really worried about the first and last link of that chain. Finally, they suggested using only SAS... And in this particular case, Part 11 is really a non-issue since physical records exists (i.e. case report forms) and all final S-PLUS output and code will also be stored physically (i.e. print-outs) -- no need for electronic signatures here!)
There is also a general guidance document for computer systems used in clinical trials here: http://www.fda.gov/ora/compliance_ref/bimo/ffinalcct.htm Though it is to be superseded by a draft document here: http://www.fda.gov/cder/guidance/6032dft.htm
From the introduction (p. 2): "This document provides guidance about computerized systems that are used to create, modify, maintain, archive, retrieve, or transmit clinical data required to be maintained and/or submitted to the Food and Drug Administration (FDA)" The `retrieve' part is certainly applicable. If we regard R as off-the-shelf software, the guidance says (p. 11): "For most off-the-shelf software, the design level validation will have already been done by the company that wrote the software. Given the importance of ensuring valid clinical trial data, FDA suggests that the sponsor or contract research organization (CRO) have documentation (either original validation documents or on-site vendor audit documents) of this design level validation by the vendor and would itself have performed functional testing (e.g., by use of test data sets) and researched known software limitations, problems, and defect corrections. Detailed documentation of any additional validation efforts performed by the sponsor or CRO will preserve the findings of these efforts. In the special case of database and spreadsheet software that is: (1) purchased off-the-shelf, (2) designed for and widely used for general purposes, (3) unmodified, and (4) not being used for direct entry of data, the sponsor or contract research organization may not have documentation of design level validation. FDA suggests that the sponsor or contract research organization perform functional testing (e.g., by use of test data sets) and research known software limitations, problems, and defect corrections. In the case of off-the-shelf software, we recommend that the following be available to the Agency on request: * A written design specification that describes what the software is intended to do and how it is intended to do it; * A written test plan based on the design specification, including both structural and functional analysis; and * Test results and an evaluation of how these results demonstrate that the predetermined design specification has been met." I think the guidance is quite clear here. We must prove to the FDA, at their wish, that the software used is working properly. In order to do this, we seem to need documents describing the development process and the QA tools used by R Core. An idea of what we'll need may be found in the `Computer Systems Validation in Clinical Research - A Practical Guide (Edition 1)' at http://www.acdm.org.uk/public/publications/publications.htm Especially section 2.4, 5 + subsections, 8 + subsections, and 9.7 + subsections seem relevant. (I've ordered the 2nd edition, but it hasn't arrived yet.) Henric
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
best, -tony --- A.J. Rossini blindglobe at gmail.com
R folks: I appreciate and have learned from the recent "SAS vs R" and "Bad Excel Calculations" threads. Not only civil, but even at times erudite, discussion. So I apologize for the lateness of this remark and hope it isn't redundant or trivial. To those who may wonder why SAS is so dominant in the clinical arena despite (better) alternatives: INERTIA. That is: 1) There is a huge infrastructure of SAS code already in place for regulatory submissions and SAS programmers to maintain and enlarge it. As a practical matter, it is hard to imagine a large organization simply chucking this and starting afresh. Clearly, change -- if were to occur at all -- would have to be slow and incremental. 2) From my experience at presentations of recent biostatistics PhD's, for most, their education continues to promulgate the use of SAS in clinical/regulatory settings, undoubtedly due to 1). 3) As has already been noted, most existing FDA regulators -- statisticians and clinicians alike -- are familiar with SAS, and therefore submissions with other software (like R) might delay or complicate the review process. We statisticians are not the biggest dogs in this arena, after all. Reality bites! So R users must persevere. -- Bert Gunter
On Mon, 2004-12-20 at 10:38 -0800, Berton Gunter wrote:
R folks: I appreciate and have learned from the recent "SAS vs R" and "Bad Excel Calculations" threads. Not only civil, but even at times erudite, discussion. So I apologize for the lateness of this remark and hope it isn't redundant or trivial. To those who may wonder why SAS is so dominant in the clinical arena despite (better) alternatives: INERTIA. That is: 1) There is a huge infrastructure of SAS code already in place for regulatory submissions and SAS programmers to maintain and enlarge it. As a practical matter, it is hard to imagine a large organization simply chucking this and starting afresh. Clearly, change -- if were to occur at all -- would have to be slow and incremental. 2) From my experience at presentations of recent biostatistics PhD's, for most, their education continues to promulgate the use of SAS in clinical/regulatory settings, undoubtedly due to 1). 3) As has already been noted, most existing FDA regulators -- statisticians and clinicians alike -- are familiar with SAS, and therefore submissions with other software (like R) might delay or complicate the review process. We statisticians are not the biggest dogs in this arena, after all. Reality bites! So R users must persevere.
Since the notion of inertia was raised by Bert, for those interested in at least one theory on the adoption of technology and product life cycles (if one considers R as a software technology), the book "Crossing the Chasm" by Geoffrey Moore might be of interest. The Amazon.com link is: http://www.amazon.com/exec/obidos/tg/detail/-/0066620023 and a very brief Wikipedia overview is here, with a diagram: http://en.wikipedia.org/wiki/Crossing_the_Chasm In many respects, the general and increasing adoption of open source applications fits the theory well. One might consider the growth of Linux and more recent specific examples of applications such as Firefox and Thunderbird as replacements for Internet Explorer and Outlook Express (anybody see the two page Firefox ad in the New York Times). The potential impact of this particular theory, with respect to change, was importantly noted when the National Academy of Sciences' Institute of Medicine published a book as part of their Health Care Quality Initiative, calling it "Crossing the Quality Chasm: A New Health System for the 21st Century": http://www.iom.edu/focuson.asp?id=8089 Another book, which I think dovetails with Moore's book, is "Only the Paranoid Survive" by Andy Grove, the Chairman of the Board at Intel. In some cases, the catalyst for crossing the chasm might be a shift in marketplace dynamics, which sees a market leader falter when they fail to effectively react to the shift, enabling a new company, technology or product to take the leadership position. Grove calls these situations "strategic inflection points", with a meaning taken from the mathematical term. If the company properly reacts to the shift, they experience new positive growth possibly under a substantially altered business model. If they fail to react, they begin a slide downhill, possibly to never recover or regain their dominance. The Amazon.com link for Grove's book is: http://www.amazon.com/exec/obidos/tg/detail/-/0385483821 HTH, Marc Schwartz