
9 F.Supp.2d 1168 (1998)
UNITED STATES of America, Plaintiff,
v.
Constance R. ORIANS and Ronald G. Orians, Defendants.
No. CR-96-534 PHX RCB.
United States District Court, D. Arizona.
April 15, 1998.
*1169 Thomas C. Bradley, Brian D. Bailey, U.S. Department of Justice, Washington, DC, for Plaintiff.
Thomas M. Hoidal, Phoenix, AZ, for Defendant Ronald G. Orians.
Sally S. Duncan, Assistant Federal Public Defender, Phoenix, AZ, for Constance B. Orians.

ORDER
BROOMFIELD, Chief Judge.
On January 23, 1998 Defendant Constance R. Orians filed a "Motion for Introduction of Polygraph Evidence, or in the Alternative, Motion to Continue, to Allow for Daubert Hearing." Co-Defendant Ronald Gregory Orians joined her motion. The court held a Daubert evidentiary hearing over a period of three days commencing on February 24, 1998, at which time it took this matter under advisement. Now, having carefully considered the issues before it, the court rules.

BACKGROUND
Mr. and Mrs. Orians are currently charged with filing false personal tax returns for the years 1988, 1989 and 1990, a specific intent crime. The government must prove that Defendants did not believe that the returns were true as to every material matter and that they willfully subscribed to the false return with the intent to violate the law. Both Defendants underwent polygraph examinations before separate, non-government examiners in an attempt to show that they did not act with the requisite intent to commit the crime with which they are charged. The issue before the court is whether it should admit the results of such tests.
During the evidentiary hearing, the court heard testimony from two experts about the scientific reliability of polygraph examinations in general, the reliability of the individual results in this case, and the acceptance of the technique in the scientific community.[1]*1170 The testimony centered around the reliability of the technique in general; however, the experts did testify about the specific test results of Constance and Ronald Gregory Orians.
Two experts testified about the results of Constance Orians' first examination, which she took November 7, 1997. The defense expert testified that Ms. Orians passed the examination (conclusive results), and the prosecution expert testified that the results of the examination were inconclusive. Ms. Orians also took a second examination that proved to have similar results.
On February 13, 1998 Mr. Orians also took a polygraph examination, and four days later Mr. Orians joined Ms. Orians' motion. Again, the defense expert testified that Mr. Orians passed the examination while the prosecution's expert indicated that the result was inconclusive. In addition, the prosecution's expert testified that the charts that depicted Mr. Orians' results showed possible counter-measures, which are techniques used to produce a truthful result when a person is actually being untruthful.
The court has now considered the testimony of the experts and it is prepared to render its decision.

DISCUSSION
The Supreme Court recently addressed the admissibility of polygraph evidence in United States v. Scheffer, ___ U.S. ___, 118 S.Ct. 1261, 140 L.Ed.2d 413 (1998). In upholding the military's per se ban on polygraph evidence, the principal opinion noted that "there is simply no consensus that polygraph evidence is reliable." Id. at 1264. The four member concurrence agreed: "The continuing, good-faith disagreement among experts and courts on the subject of polygraph reliability counsels against our invalidating a per se exclusion of polygraph results...." Id. at 1268. However, the Supreme Court did not go so far as to require the exclusion of polygraph testimony. Rather, the principal and concurring opinion expressed differing views as to the overall wisdom of a per se exclusion. In any event, what was clear from the court's opinion is that lower courts must follow the rule in their jurisdiction. Id. at 1264 ("Individual jurisdictions ... may reasonably reach differing conclusions as to whether polygraph evidence should be admitted.") (principal opinion, part II-A, which eight justices joined).
The Ninth Circuit recently rejected its per se ban on polygraph evidence, in favor of a more flexible rule allowing District Courts to determine the admissibility of such evidence on a case by case basis under the principles of Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993). See United States v. Cordoba, 104 F.3d 225, 227 (9th Cir.1997). Thus, the court must consider the Daubert factors in determining the admissibility of the Defendants' polygraph evidence, as well as the Federal Rules of Evidence.

A. Daubert Analysis

Under Federal Rule of Evidence 702 expert testimony should be evaluated to determine whether it meets the requisite degree of specialized knowledge and if it is testimony that will assist the trier of fact in determining an issue. Daubert, 509 U.S. at 589-93, 113 S.Ct. 2786. The Supreme Court in Daubert articulated several factors that a trial court should consider in making an admissibility determination under Rule 702. These facts include: 1) Whether the scientific method is capable of being tested; 2) Whether the scientific theory has been the subject of peer review and publication; 3) Whether the method has a known rate or potential rate of error; 4) The general acceptance of the method within the scientific community; and 5) Whether the method is controlled by *1171 established standards. Id. at 593-94, 113 S.Ct. 2786. For the reasons below, the court determines that the Daubert factors militate against the admission of this polygraph evidence.

1. Testing

Both experts talked about the methods used to test the reliability of polygraph examinations. Two methods are used: laboratory and field research. The most positive results produced from these two methods suggest that polygraph examinations have an accuracy rate somewhere between 85-95%. However, the results are not without criticism.
Polygraphy is especially susceptible to problems in both laboratory and field research. In laboratory research, researchers must invent a situation, usually by staging a mock crime or event, and then question various participants. Often, participants are offered a monetary incentive to beat the polygraph. However, critics argue that these mock situations do not necessary duplicate real-life situations. That is, results may vary because of the difference in circumstances, including the level of stress, between mock events and those that have truly occurred.
On the other hand, in field research, researchers use participants who have, often times, been charged with a crime. In order to assess which participants are lying and which are telling the truth, researchers often look for situations where individuals have confessed, where victims have recanted, or where others have confessed to the charged crime. Critics point out the fact that these scenarios do not guarantee the guilt or innocence of the participants. Victims may have varied motives in recanting their crimes, and individuals, who are innocent of crimes, have been known to confess. Thus, critics argue that the inability to judge "ground truth" affects the reliability of these tests.
Unfortunately, there is no test available to truly measure lies. Both experts concede that the polygraph measures only the body's physiological reactions to stress. Dr. Barland expressed some concern that it is impossible to truly know or measure the number of individuals attempting countermeasures, measures used to beat the polygraph, and their effect on the test results. Dr. Raskin argued that his research showed that countermeasures do not work without proper training. However, the same research showed the potential to beat the test with minimal training on such countermeasures. Neither expert suggested that it is possible to differentiate between the results of one trained in countermeasures and a truthful person. Moreover, Dr. Raskin's research tested only certain potential countermeasures. It is also possible that individuals may succeed in beating the exam through other measures. This potential may not be fully testable.
Unquestionably, the accuracy of any scientific study involving polygraph depends, in large part, on the reliability and willing participation of the subjects. Because of the sensitive nature of the subject matter, honesty or deceit, this concern is probably far greater in relation to polygraphs than in relation to other scientific fields. Because the polygraph exam does not truly measure lies, and therefore the potential to alter the results exists, the potential problems and potential for error in both laboratory and field testing is increased. That is, because the laboratory setting may differ considerably from real life, and because it may be impossible to assess "ground truth" in the field, the true rate of error and the potential ability of subjects to alter their physiological responses during true-life situations may never be known or testable. Thus, this factor probably favors the exclusion of the polygraph evidence.

2. Publications and Peer Review

Factor two, on the other hand, is more favorable to Defendants' position. Both experts testified that the science of polygraphy has been subjected to peer review, and is the topic of extensive publication in both scientific and non-scientific publications. Thus, although some peer review was less than complimentary this factor tends to favor admission.

3. Known or Potential Rate of Error

The known or potential rate of error most likely weighs against the admissibility of polygraph evidence. Both experts presented *1172 the court with a rather high estimate of the reliability of polygraphs, but acknowledged that estimates vary. As discussed above, the rate of error is generally estimated by polygraph proponents to be somewhere between five and ten percent. However, the potential problems with the testing method and countermeasures call this figure into serious question. Accordingly, critics estimate that the accuracy rate of polygraphs is much lower. Studies have varied, and Dr. Barland testified that some experts suspect the accuracy rate of the polygraph, in certain situations, is not much better than chance. Moreover, the court is concerned with other factors that may or may not have an effect on the potential rate of error.
Dr. Barland and Dr. Raskin offered differing opinions on the proper method of scoring the polygraph. Dr. Barland noted that the Department of Defense considers all tests in which the result on one or more individual question is inconclusive, or where there is significant fluctuation across the charts, to be inconclusive overall. On the other hand, Dr. Raskin uses the total test score, irrespective of inconclusive results to one or more questions, or serious fluctuation in scores across charts. This conflict suggests that there is some dispute over both the proper standard and the potential rate of error. Other problems also persist.
Dr. Raskin discounted several potentially problematic areas during his testimony. These areas included the potential effectiveness of countermeasures, especially in light of Dr. Barland's testimony that Mr. Orians' test showed potential countermeasures, the potential for error when a directed lie question is answered incorrectly, and the potential problems presented by a "friendly" polygraph test.
First, Dr. Raskin testified that countermeasures were not a concern when evaluating a polygraph test, because untrained individuals could not be successful. Dr. Raskin completely discounted the possibility that an individual might be trained. Moreover, Dr. Barland recounted his own first-hand experience where untrained people used countermeasures which had in fact worked. Moreover, Dr. Barland is particularly skilled in the subject of countermeasures and the court gives considerable weight to his testimony on this issue.
Dr. Raskin also testified that Ms. Orians' improper answer in response to a directed lie was of no consequence. He indicated that his own research showed no difference in the accuracy of the test when the directed lie question was answered yes (in violation of the examiner's instructions) as compared to when the question was properly answered no. This result, when considered in light of an exam meant to measure physiological responses indicating deceptiveness, seems counterintuitive. When the directed lie is being used as a base to measure a person's response when lying, it seems odd that there would be no difference when that person fails to lie.
Furthermore, Dr. Raskin and Dr. Barland differed in their opinion as to whether a "friendly" polygraph examination, or the subject matter of the examination, might affect the results of the test. Dr. Raskin testified emphatically that neither friendly test conditions nor the subject matter of the examination made any difference, yet Dr. Barland suggested that some circumstances, including the type of alleged crime (tax evasion), might result in a higher rate of false negatives.
Dr. Raskin also refused to acknowledge any potential for error caused by additional bad acts with which a subject might be concerned. That is, Raskin testified that even if a subject was concerned about a prior crime, encompassed by a control question, it would not skew the results because the person would not be charged with the crime at the time of the test. The court finds this emphatic, unqualified statement rather broad and somewhat discomforting. Moreover, the suggestion that virtually no circumstance will affect the results of a polygraph suggests to the court that the rate of error is probably unknown and underestimated, as it is difficult to believe that any method is infallible.
Other courts have also reached this conclusion. Particularly notable is the District Court in United States v. Cordoba, 991 F.Supp. 1199, 1207 (C.D.Cal.1998) ("Cordoba II"). The court again refused to allow in the polygraph testimony after the Ninth Circuit remanded the case for consideration in light *1173 of Daubert. The Cordoba II court was also disturbed by Dr. Raskin's[2] refusal to acknowledge any potential problems with polygraph evidence:
Dr. Raskin declined to criticize Defendant's test in any meaningful way, and found the test to be entirely acceptable. Confronted with each defect, Dr. Raskin staunchly stuck to his view that the test was reliable and acceptable. He acknowledged each item, but declared each to be no defect or of little significance and within acceptable limits .... The blanket and non-critical approval of Defendant's test by Dr. Raskin, who is probably the strongest and best informed advocate for polygraph admissibility, illustrates that the polygraph industry lacks sufficient controlling standards to satisfy Daubert.
Cordoba, 991 F.Supp. at 1207.
Like the Cordoba II court, this court is concerned with Dr. Raskin's complete unwillingness to concede any downfalls or problems with the polygraph test. While the Cordoba II court was concerned with the industry standard prong of Daubert, this court concludes that the same problems are applicable when evaluating the potential rate of error.
Moreover, Dr. Raskin's demeanor during the evidentiary hearing made it clear that he was not an impartial witness. Dr. Raskin became almost hostile, as if personally offended, in response to the government's cross-examination. This fact is not surprising, nor is it condemnable, given the fact that Dr. Raskin's livelihood depends on the acceptance of polygraphy and the fact that he feels very strongly about the subject. However, the court concludes that Dr. Barland, who was willing to concede flaws in both his own position and the position of Defendants, was a slightly more credible witness, and as such, it places greater weight on Barland's testimony.
Dr. Barland testified that several potential problems exist that may effect the known rate of error. While Dr. Barland agreed that polygraphy overall is a highly reliable field, he also testified that the he would not endorse the admission of "inconclusive" test results. In addition, Dr. Barland's testified that a "friendly" polygraph test might effect the results. Overall, the court concludes that the potential rate of error, under the circumstances with which the court is presented today, is unknown and potentially high. Accordingly, this factor favors the government's position.

4. Acceptance in the Scientific Community

The next step in the Daubert analysis requires the court to determine the level of acceptance in the community. Under Daubert, widespread acceptance is no longer required. However, the level of acceptance should be a consideration in determining admissibility. Daubert, 509 U.S. at 593-94, 113 S.Ct. 2786. The acceptance in the scientific community depends in large part on how the relevant scientific community is defined. Defendants urge the court to consider only the portion of the scientific community that is well-versed in polygraph methodology and science. However, the court is reluctant to embrace such a narrow interpretation.
This prong of the Daubert analysis does not require the court to limit its inquiry to those individuals that base their livelihood on the acceptance of the relevant scientific theory. These individuals are often too close to the science and have a stake in its acceptance; i.e., their livelihood depends in part on the acceptance of the method. Rather, the court will look to the larger group of psychophysiologists. This group includes individuals who are involved in the same larger science which would include polygraphy, but it is not necessarily limited to those individuals whose work is dedicated entirely, or in large part, to polygraph research. This larger group is the more relevant group and the court will consider the level of acceptance in regard to the psychophysiological community.
Both parties pointed to surveys conducted on the usefulness of polygraph examinations. Defendants note two surveys that show that over two thirds of psychophysiologists surveyed accept the polygraph technique. (Defense Exhibits 106 and 121 combined). On the other hand, the government points to a *1174 survey in which the scientific community was polled in regard to courtroom acceptance. W.G. Iacono and D.T. Lykken, The Validity of the Lie Detector: Two Surveys of Scientific Opinion, 82 J. Applied Psychol. 426 (1997). In this survey, the results were far less optimistic for polygraph acceptance. Several respondents (53%) answered that polygraph testing was of "questionable usefulness, entitled to little weight." While Defendants argue that the latter survey was improperly tailored to the courtroom setting, and that it did not target those individuals with specialized knowledge, the court is not persuaded. Rather, the court concludes, as the Supreme Court did in Scheffer that there is widespread disagreement about the acceptance of polygraphy, and that this is especially true in a courtroom setting. See Scheffer, 118 S.Ct. at 1264. As such, this factor does not favor admission of the polygraph evidence.

5. Controlling Standards

The final factor also weighs against admissibility. As discussed above, under potential rate of error, there is some disagreement between the experts testifying in this case about the proper analysis and testing conditions for a polygraph examination. The court is concerned with this discrepancy. Although the court has already presented many of its concerns in this area through its discussion regarding the potential rate of error, it will take the time to elaborate slightly at this point.
Polygraph critics Iacono and Lykken articulated their criticism of current polygraph methods by noting "The validity of a test that was not objective would be undermined by individual differences in judgment that varied from one examiner to the next. We shall conclude that the [control question technique] is neither standardized nor objective." William G. Iacono and David T. Lykken, The Scientific Status of Research on Polygraph Techniques: The Case Against Polygraph Tests, in Modern Scientific Evidence: The Law and Science of Expert Testimony 582, 589, 592-93 (David L. Faigman et al. eds., 1997). These researchers focused on the need to formulate appropriate control questions and the different judgments of each examiner. The court concludes that this is a valid concern, and it is especially relevant when considering the controlling standards for polygraph.
The court is concerned with the varying scores between the government and defense experts, the original examiners, and those individuals who blindly scored the polygraph exams. The parties presented a wide range of potential scores. This range demonstrates the lack of standardization in polygraph scoring and methodology. It is only one example of what this court perceives to be a lack of controlling standards in the polygraph industry. For this reason and the reasons that the court has already expressed above, it concludes that the industry lacks sufficient controlling standards to insure the reliability of individual polygraph examinations.

6. Conclusion on Daubert

The court finds that four of the five facts of Daubert militate against the admission of the polygraph evidence. The remaining factor, peer review and publication, favors admission. Thus, the court concludes that the polygraph evidence should not be admitted in the present case.
However, even if the court were to determine that the polygraph method used in the present instance passed muster under the Daubert analysis, it would still conclude that the specific test results of Defendants should be excluded pursuant to Federal Rule of Evidence 403.

B. Rule 403 and Other Concerns

Dr. Barland testified that, in his opinion, the test results of Defendants were inconclusive. Moreover, in regard to Mr. Orians' result, Dr. Barland found evidence of possible countermeasures. The expert further testified that he did not believe that inconclusive results should ever be admitted into evidence.
Defendants argue that the evidence should be admitted and that the government may raise its concerns on cross-examination. However, the court concludes that the potential for prejudice is extremely high, and that the probative value of the examination results is unproven. As such, the court concludes, *1175 independent of Daubert, that it must exclude the evidence entirely.
The jury may easily put far too great of weight on the polygraph. This concern was, in part, what lead the Ninth Circuit to formulate its per se exclusion of polygraph evidence. See Cordoba, 104 F.3d at 228. Although the Circuit has since relented, it by no means wholeheartedly endorses polygraphs. On the contrary, the Circuit expressed potential concern over polygraph evidence in Cordoba. Id. This particular case, where the results are potentially inconclusive, and an expert has identified potential problems with the test method and the results, presents a strong case for exclusion to avoid potential prejudice.
Moreover, this court is moved by the principal opinion in the recent Supreme Court case addressing an additional concern raised by polygraph evidence in relation to the military's per se exclusion (Rule 707). Justice Thomas noted:
It is equally clear that Rule 707 serves a second legitimate governmental interest: Preserving the jury's core function of making credibility determinations in criminal trials....
By its very nature, polygraph evidence may diminish the jury's role in making credibility determinations.... [A] polygraph expert can supply the jury only with another opinion, in addition to its own, about whether the witness was telling the truth. Jurisdictions, in promulgating rules of evidence, may legitimately be concerned about the risk that juries will give excessive weight to the opinions of a polygrapher, clothed as they are in scientific expertise and at times offering ... a conclusion about the ultimate issue in the trial.
Scheffer, 118 S.Ct. at 1264 (Part IIB). While an equal number of justices refused to embrace this portion of the opinion in the concurrence, the court is not convinced that it is not well-taken, as applied to this case.
The Defendants intend to admit this polygraph evidence as evidence that they were truthful when they claimed that they did not act with the requisite intent. It is difficult to form any conclusion other than the conclusion that this is expert testimony on the Defendants' truthfulness and state of mind. This testimony is testimony on the ultimate issue, and removes, at least partially, this determination from the hands of the jury. Testimony on the ultimate issue is specifically precluded by Rule 704: "No expert witness testifying with respect to the mental state ... of a defendant in a criminal case may state an opinion or inference as to whether the defendant did or did not have the mental state or condition constituting an element of the crime charged ...." Fed. R.Evid. 704(b). The rule precludes this testimony for the exact reason recognized in Thomas' opinion in Scheffer: Such determinations should be left to the jury. The court is reluctant to ignore the apparent conflict between polygraph results and Rule 704.

CONCLUSION
The court concludes that polygraph evidence, especially the evidence presented in the present case, is inadmissible under the Daubert factors. Moreover, the court concludes that the unstipulated, and possibly inconclusive, polygraphs exams in the present case should be excluded pursuant to Fed. R.Evid. 403. The court is also concerned that the evidence has the potential to intrude unfairly into the province of the jury, and although this factor is not a conclusive reason to exclude the evidence, it militates against its admission under the present circumstances. Accordingly, the court will deny Defendants' Motions.
IT IS ORDERED denying Motion for Introduction of Polygraph Evidence (doc. 53).
NOTES
[1]  Both experts were well respected psychophysiologists specializing in polygraphy.

Dr. David C. Raskin testified for the defense. Dr. Raskin is a leading expert in the field of polygraphy. He has performed extensive research and published numerous articles on the subject, and has held positions in several related organizations. Dr. Raskin's most recent position was as a Professor in the Department of Psychology at the University of Utah, where he currently holds the title of Professor Emeritus.
Dr. Gordon H. Barland testified for the government. Dr. Barland is also well-versed in the field of polygraphy and, particularly, counter-measures. Dr. Barland is employed full time in the area of polygraphy, has performed extensive research, and supervision of students. Dr. Barland is employed as a research psychologist and polygrapher at the Department of Defense.
[2]  Raskin also testified for the Cordoba defendant.
