
677 F.3d 72 (2012)
UNITED STATES of America, Appellee,
v.
William SCOTT, aka William Boone, Defendant-Appellant.
Docket No. 10-3978-cr.
United States Court of Appeals, Second Circuit.
Argued: November 8, 2011.
Decided: April 6, 2012.
*73 Kristin M. Pauley, Michael S. Schachter, Willkie Farr & Gallagher LLP, New York, N.Y., for defendant-appellant.
*74 James J. Pastore, Jr., Telemachus P. Kasulis, Jesse M. Furman (Assistant United States Attorneys, of counsel) for Preet Bharara, United States Attorney for the Southern District of New York, for appellee.
Before: POOLER, B.D. PARKER, Circuit Judges.[1]
POOLER, Circuit Judge:
This appeal arises from appellant William Scott's judgment of conviction entered on September 28, 2010, in the Southern District of New York (Buchwald, J.). Scott was convicted after a jury trial of one count of distributing, and possessing with the intent to distribute, a controlled substance, in violation of 21 U.S.C. § 841(b)(1)(C). At his trial, the prosecution introduced, over defense objection, testimony from two police detectives that they were familiar with Scott and had spoken to him on numerous occasions prior to his arrest in the instant case. Scott argues that admission of this testimony under Federal Rules of Evidence 404(b) and 403 was an abuse of discretion and that the error in admitting it was not harmless. We agree. Scott conceded that he was present at the scene and did not argue that police witnessed someone else engage in a drug transaction. Instead, he argued that the detectives observed him engage in innocent conduct. The challenged recognition testimony served no purpose other than to invite jury speculation about Scott's propensity to attract police contact and monitoring. We therefore vacate Scott's conviction and remand for a new trial.[2]

BACKGROUND
Scott was arrested on July 28, 2009, in the Bronx after two detectives, Mark Moran and Robert Geary, observed him engage in what they believed to be a hand-to-hand drug sale. Testimony at trial indicated that the detectives stopped at an intersection in a "high narcotics prone location" after observing Scott, whom they recognized from prior interactions, standing with a group of other individuals on the corner. Moran testified that he witnessed Scott take something from an unidentified woman. Geary testified that this object appeared to be currency. Moran then saw Scott retrieve something out of "a hole in the tree," which he fumbled with and then put back. Both detectives testified that Scott then returned to the woman, who appeared to take something from him; Geary testified that Scott then exchanged currency with another individual. The detectives made all of these observations through the tinted windows of their vehicle, using the rear and side view mirrors. Both detectives testified that Scott looked as though he was going to run after he noticed them, but stopped and went towards them after Moran called to him. Scott was then arrested. At that point, Moran searched the tree, where he found eighteen plastic bags containing crack cocaine hidden inside a cigar wrapper. No drugs were recovered from Scott or from the woman, though detectives testified that when they approached her, she put her hand to her mouth as if to swallow *75 something.[3]
Prior to trial, the government informed defense counsel that it planned to ask the following questions during the direct examination of each of the two police witnesses: "About how many times had you seen [Scott] in the past? Had you spoken to him before? And when you had spoken to him before, what was about the longest conversation you had ever had with him?" The government indicated that one detective would say his longest conversation with Scott lasted five minutes, the other twenty. Defense counsel first objected to this line of questioning at a pretrial telephone conference. The district court overruled the objection because "the amount of time that the witnesses would state that they had spoken to Mr. Scott before would not lead someone to conclude that he had ever been arrested, and as we all know, that is something that, unless he takes the stand, we are not getting into his prior record." Defense counsel continued his objection before trial, arguing that the recognition testimony would encourage the jury to speculate about Scott's prior encounters with the police. He stated:
I don't think anyone in the jury box is going to think it was a friendly encounter. I don't think the jurors are going to believe that Mr. Scott was just saying hello and asking about the officers' family life. But, rather, it would be clear that the police engaging in these conversations were doing so in the course of investigative processes, and the jury is going to believe that this is an individual who is known to the police. . . . Again, it's not a matter of the jury just being led to believe that the officers were setting up in a location and watching to see what, if any, activity would be happening in this known drug location, but, rather, that they intentionally stopped because they saw an individual who has been known to them in the past, who they have had encounters with in the past, and therefore they chose to stop and observe him, which would only lead a jury to believe that there is a history of criminal activity on behalf of Mr. Scott, there is a propensity for him to commit crimes or a propensity for them to do something that would warrant police observation.
Defense counsel then suggested that the police testify that they stopped because they knew the area to be drug prone and that they "ha[d] seen [Scott] in the past, with[in] an unspecified period of time," without detailing the number and length of conversations they had had with him. The government argued that such testimony would be untruthful, because the police did in fact stop because they recognized Scott.
The court responded to these arguments by expressing its view that "there is just a limit to how much a defendant with a criminal record can insist that he ought to be portrayed as someone whoI am trying to think of a good expressionI think the expression is simon-pure." Defense counsel continued to object, and the court said again, "it is to me inappropriate to pretend that [Scott] has never had any contact with the police before." Defense counsel and the court then had the following exchange:
Mr. Farber: I guess the question is: How is that properly before this jury that he has had prior police contact? If I raise the issue of identification, or lack of familiarity with his face, clearly, I am opening up the door. But if it's a matter of simply the officer

*76 The Court: Mr. Farber, we have been through this. The government has the burden of proof beyond a reasonable doubt. They are entitled to have their witnesses' identification supported by the fact that they have seen him before.
The district court indicated that it understood the detectives would testify that "they stopped because it is a known drug location," but that they would also be permitted to testify to the fact that they had recognized Scott.
The recognition testimony featured prominently throughout Scott's trial. During the opening statement, the government told the jury that it was "going to hear how two officers were on a routine patrol that day in the Bronx. They saw the defendant, a man they knew and had spoken to before" engage in a drug transaction. As expected, during direct examination, both detectives were questioned as to their prior knowledge of Scott. The relevant testimony of Detective Moran proceeded as follows:
Q: You said Mr. Scott. I want to take a step back. Had you seen Mr. Scott before?
A: Yes, I have.
Q: About how many times had you seen Mr. Scott before you saw him on July 28?
A: About ten times.
Q: Have you ever had a chance to talk with Mr. Scott?
A: Yes, I have.
Q: About how many times have you had a chance to talk with Mr. Scott?
A: About four or five times.
Q: What is the longest amount of time that you have spoken to Mr. Scott?
A: About 20 minutes.
Q: During that 20 minute conversation, how far away were you from Mr. Scott?
A: About 2 feet.
Moran's redirect testimony also included a confirmation of his recognition of Scott. The relevant testimony of Detective Geary proceeded similarly:
Q: What did you observe once you pulled over?
A: Observed a group of males. One of the males that I had recently, that I had known before, a male named William Scott. . . .
Q: Now, was July 28, 2009 the first time you had ever seen Mr. Scott?
A: No. I had seen him approximately five to ten times before in the past.
Q: Before that day, before July 28, had you had occasion to speak with him in the past?
A: Yes, approximately two or three times.
Q: What is the longest amount of time you had ever spoken to him before that day?
A: Approximately five minutes.
Q: How far away from him were you when you had that five minute conversation?
A: Approximately one to two feet away from him.
This was not the last time the jury was exposed to the recognition testimony. It also featured prominently in the prosecution's closing arguments, making up one of out six transcribed pages. The prosecutor told the jury, in part:
Now, why is [the detectives' experience with Scott] significant? Well, common sense tells you why it is significant. If you have ever seen a friend, a coworker, or a neighbor down the street, you watch what they are doing, you know right away that it's them. It's a lot easier to follow the action, to follow what they are doing, if it's someone you recognize, someone you know. And that's *77 important here because the detectives saw not just a drug dealer, but someone they recognized, someone they recognized engage in a drug transaction.
For its part, the defense told the jury in the opening statement that during the trial they would "come to see how an innocent encounter between Mr. Scott and another individual rose to become this case that you're asked to sit on the jury today." In his closing, defense counsel argued that if Scott had actually been selling drugs, he would have run when he was called by the detectives, because by the state's own admission, he knew they were police officers.
In addition to the two police witnesses, a lab technician, who tested the material taken from the tree, and a paralegal from the United States Attorney's Office, who transcribed the drug-related text messages, testified. The jury found Scott guilty and the district court sentenced him to 21 months' imprisonment. This appeal followed.

DISCUSSION
Scott makes two major contentions on appeal: first, that the district court abused its discretion in admitting the evidence under Rule 404(b), and second, that it abused its discretion by failing to engage in the required Rule 403 analysis. In response, the government argues primarily that the evidence was not Rule 404(b) evidence in the first instance, and in the alternative, even if it was Rule 404(b) evidence, that it was properly admitted.

I. The Nature of the Testimony
At the time of Scott's trial, Rule 404(b) of the Federal Rules of Evidence provided that
Evidence of other crimes, wrongs, or acts is not admissible to prove the character of a person in order to show action in conformity therewith. It may, however, be admissible for other purposes, such as proof of motive, opportunity, intent, preparation, plan, knowledge, identity, or absence of mistake or accident. . . .[4]
The government's primary contention is that the detectives' testimony is not other act evidence within the meaning of Rule 404(b). While we agree that not all other acts are subject to Rule 404(b), the cases the government cites in support of the argument that the recognition testimony is not so subject are simply inapposite, standing as they do only for the unadvanced proposition that where other acts are direct evidence of or are otherwise inextricably intertwined with the charged act, then they may be admissible without being subject to Rule 404(b). See United States v. Quinones, 511 F.3d 289, 309 (2d Cir.2007); United States v. Baez, 349 F.3d 90, 94 (2d Cir.2003).
The only colorable argument the government presents on this point arises from our holding in United States v. Lumpkin, 192 F.3d 280 (2d Cir.1999). In Lumpkin, we held that a police officer's testimony that he "often saw [the defendant] in the area where the relevant drug *78 transactions occurred" was not other acts evidence subject to Rule 404(b). Id. at 287. We determined that the nature of the repeated observations did not "qualify as evidence of a crime or bad act," because "nothing in [the officer's] observations indicate[d] that [the defendant] [wa]s of bad character." Id. The government argues that the recognition testimony at issue here is analogous to that in Lumpkin, because the detectives did not testify that Scott was "engaging in criminal or wrongful conduct." Id. The problem with such an argument is that Rule 404(b) is not limited to evidence of crimes or wrongs. By its very terms, Rule 404(b) addresses "other crimes, wrongs, or acts." (emphasis added). Nothing about these words implies that the "other . . . acts" to which Rule 404(b) refers must be "bad." Indeed, to read the Rule as such "violate[s] the cardinal principle of statutory interpretation that courts must `give effect, if possible, to every clause and word of a statute.'" Triestman v. United States, 124 F.3d 361, 375 (2d Cir.1997) (quoting United States v. Menasche, 348 U.S. 528, 538-39, 75 S.Ct. 513, 99 L.Ed. 615 (1955)). While crimes, wrongs, or bad acts may be more likely than other kinds of acts to demonstrate criminal propensity and thus be inadmissible for that reason under Rule 404(b), the Rule itself is in no sense limited to such acts. Each of our sister Circuits to consider the issue has concluded that Rule 404(b) extends to non-criminal acts or wrongs,[5] and we now join them.
The district court's determination that the testimony was admissible because the testimony would not lead jurors to conclude Scott had previously been arrested was thus in error. Even if the jury would not reach that conclusion based on the recognition testimony, unlike the testimony in Lumpkin, Geary and Moran's prior contacts with Scott would certainly "bear adversely on the jury's judgment of his character." United States v. Cooper, 577 F.2d 1079, 1088 (6th Cir.1978). The difference between a police officer's mere observations of a defendant in an area and testimony that two different detectives had had occasion to speak to him up to five times and for up to twenty minutes (and on at least some different occasions) is substantial. That a police officer has merely seen a person, even repeatedly and *79 even in a drug-prone location, may simply suggest that the person lives or works near the officer's daily patrol; in that sense, it is not evidence of any "act" at all. But that two detectives have not only seen but spoken on multiple, lengthy occasions to a defendant indicates to a jury that he is, at a minimum, the sort of person who warrants a level of police observation to which law-abiding citizens are unaccustomed. As defense counsel stated during his initial objection, a jury hearing this testimony would not believe that Scott "was just saying hello and asking about the officer's family life." That is especially true since no such innocent explanation for Scott's significant contacts with the police could be offered. A jury hearing this testimony would conclude that Scott was a person with a propensity to engage in wrongful, criminal or otherwise unusual behavior that would attract the attention of the police, and not, as in Lumpkin, merely a person who had been seen by a police officer at some point in his life. This testimony falls well within Rule 404(b), which prohibits the introduction of evidence of extrinsic acts that might adversely reflect on the actor's character. Huddleston v. United States, 485 U.S. 681, 685, 108 S.Ct. 1496, 99 L.Ed.2d 771 (1988). The testimony here invites speculation about the defendant's propensity to come into frequent contact with police. Because this testimony might adversely reflect on the actor's character, the evidence must be evaluated under 404(b). To the extent the district court's decision rested on its determination that the testimony must relate to criminal conduct leading to arrest to fall under the Rule, it was in error.

II. The Admissibility of the Evidence under Rule 404(b)
Having determined that the recognition testimony is properly analyzed under Rule 404(b), we next analyze whether it was properly admitted. This Circuit has adopted an "inclusionary" approach to other act evidence under Rule 404(b), which allows such evidence to be admitted for any purpose other than to demonstrate criminal propensity. United States v. LaFlam, 369 F.3d 153, 156 (2d Cir.2004) (per curiam). We have, however, emphasized that this inclusionary rule is not a carte blanche to admit prejudicial extrinsic act evidence when, as here, it is offered to prove propensity. See e.g., United States v. McCallum, 584 F.3d 471, 477 (2d Cir. 2009) (holding that evidence of prior convictions was "propensity evidence in sheep's clothing"). We review the district court's determination of admissibility under Rule 404(b) only for abuse of discretion. United States v. Brand, 467 F.3d 179, 196 (2d Cir.2006).
In conducting this review, we follow the inquiry laid out by the Supreme Court in Huddleston, 485 U.S. at 691-92, 108 S.Ct. 1496. See also United States v. Gilan, 967 F.2d 776, 780 (2d Cir.1992) (discussing adoption of Huddleston test). Under Huddleston, "[t]o determine whether a district court properly admitted other act evidence, the reviewing court considers whether (1) it was offered for a proper purpose; (2) it was relevant to a material issue in dispute; (3) its probative value is substantially outweighed by its prejudicial effect; and (4) the trial court gave an appropriate limiting instruction to the jury if so requested by the defendant." LaFlam, 369 F.3d at 156. No limiting instruction was requested here, and so we limit our analysis to the first three factors.

1. Whether the Evidence was Offered for a Proper Purpose
The first Huddleston inquiry is whether the evidence was offered for a proper purpose. The government argues there are *80 two proper purposes justifying the admission of the evidence: first, that it was relevant to prove identity, and second, that it supported the "reliability and the credibility" of the detectives' testimony, in part by corroborating it. Identity is conceded by both parties as a proper purpose under Rule 404(b), but whether the testimony could properly be offered to support reliability and credibility is disputed.
The government contends that "[t]he fact that the Detectives previously knew Scott spoke loudly to the reliability and credibility of their observations, as the jury could infer that the Detectives, knowing who they were watching, were able to focus more closely on what they were watching." This theory was also advanced during the government's summation:
If you have ever seen a friend, a coworker, or a neighbor down the street, you watch what they are doing, you know right away that it's them. It's a lot easier to follow the action, to follow what they are doing, if it's someone you recognize, someone you know.
We find this argument to be entirely unpersuasive. First, Scott's identity was not in dispute, see infra, and second, there is no evidence in this record even remotely suggesting that an officer watching what he believes to be a hand-to-hand drug sale would generally have any reason to focus extensively on the identity of the perpetrator rather than the action itself. An officer on patrol who believes he is witnessing a crime and that an arrest is imminent has no immediate need to identify the suspect. Detective Moran admitted as much at trial when he testified that he and Detective Geary did not test the drugs recovered from the tree for fingerprints or for DNA "[b]ecause on an observation sale of narcotics, when you see the crime occur in your presence, there's no reason to get fingerprints, because I already know who committed the crime." (emphasis added). We reject the notion that freeing the detectives' brains from a task we have no reason to believe they were engaged in made their observations somehow more credible or reliable.
In addition to the lack of rationale arising out of police procedure, neither does the government proffer any credible explanation sounding in common sense or science about why a person is better able "to follow the action" or to physically see an event because he is familiar with a person involved. There is only one reason a person is better able tell what someone he knows is doing, which is that he knows what someone he is familiar with is likely to be doing. The government's summation underscored this when the prosecutor stated, "[the recognition is] important here because the detectives saw not just a drug dealer, but someone they recognized, someone they recognized engage in a drug transaction." (emphasis added). It may be true that the detectives' knowledge of Scott made them more reliable because it helped them to identify as a drug transaction what might have otherwise been ambiguous, but that fact goes to criminal propensity alone, precisely what Rule 404(b) prohibits. Even our "`inclusionary approach,'" LaFlam, 369 F.3d at 156 (2d Cir.2004), to Rule 404(b) evidence cannot support the admission of such propensity evidence.
Nor can we say the recognition testimony bolstered the credibility of the detectives by explaining their actions, something our cases support as a proper purpose for other act evidence. In United States v. Bermudez, we held admissible an officer's testimony that he overheard the defendant make drug-related comments because it explained "why the officers' attention was focused on [the defendant] as opposed to any number of *81 other individuals in the high crime area." 529 F.3d 158, 163 (2d Cir.2008). We determined that without this explanation for "singling out" the defendant, the officers' entire testimony would be suspect. Id. There is no such rationale here. The detectives testified that they were on patrol in a drug-prone location when they witnessed a hand-to-hand drug sale. Detective Moran further testified that the area "was pretty desolate" except for the group surrounding Scott at the time of the stop. The jury needed to hear no further explanation for why the detectives were drawn to the scene.
In the same vein, the government also argues that the recognition testimony corroborated the detectives' other testimony. We have "consistently held [other act evidence] admissible to corroborate crucial prosecution testimony." United States v. Everett, 825 F.2d 658, 660 (2d Cir.1987). But this allowance for corroboration is not unlimited: "[T]o avoid potential prosecutorial abuse, we have required the proponent of the evidence to demonstrate a close relationship between the proffered evidence and the evidence to be corroborated." Id. The corroboration must be also be "direct and the matter corroborated ... significant." Id. (internal quotation mark omitted). Here, the government has presented no explanation for what exactly was corroborated by the fact that the detectives had seen Scott before, much less demonstrated the required close relationship. We find no disputed fact that the recognition testimony could possibly have corroborated. It cannot corroborate the detectives' accounts of the crime itself, because it is not relevant to them. It cannot even serve, more generally, to corroborate either detective's testimony by suggesting overall truthfulness since the two accounts of police interactions do not match up (nor is there a reason for them to). Even if we were to accept the government's argument that under our case law, it is acceptable to introduce other act evidence to "corroborate... overall testimony" (which is not at all what Everett teaches), there is no such corroboration here, except to the impermissible extent it suggests that, since Scott had been up to no good before, the detectives were right to think that he was up to no good again. "The Government ... must do more than disclaim an intention of proving that the defendant is a bad man. For what prosecutor in his right mind will ever offer that improper justification? Instead, the prosecutor must show that the evidence is relevant, and there is no presumption that it is." United States v. O'Connor, 580 F.2d 38, 40 (2d Cir.1978). The government here has failed to show to how the recognition testimony was relevant to corroborating the detectives' other testimony. We therefore reject this suggested purpose.

2. Whether the Testimony was Relevant to an Issue in Dispute
The only purpose proposed by the government that we can credit at all is that the recognition testimony was relevant to identity. But relevance is not the end of the inquiry: evidence admitted under 404(b) must be relevant to an issue in dispute. LaFlam, 369 F.3d at 156. While the government itself admits that "identity was not disputed per se," it contends that Scott's failure to concede identity meant that it was sufficiently in dispute for the recognition testimony to be relevant and admissible. The record belies such an argument, however. Identity was not only not in dispute during the trialit was also clear to the government and to the court that it would not be beforehand. We have held that a formal stipulation removing an issue from a case, while preferable, is not necessary:

*82 Whether an issue remains sufficiently in dispute for similar acts evidence to be material and hence admissible, unless the prejudicial effect of the evidence substantially outweighs its probative value, depends not on the form of words used by counsel but on the consequences that the trial court may properly attach to those words. When the Government offers prior act evidence to prove an issue, counsel must express a decision not to dispute that issue with sufficient clarity that the trial court will be justified (a) in sustaining objection to any subsequent cross-examination or jury argument that seeks to raise the issue and (b) in charging the jury that if they find all the other elements established beyond a reasonable doubt, they can resolve the issue against the defendant because it is not disputed. While those consequences can be attached to a formal stipulation that the issue has been conceded, a formal stipulation is not required or necessarily appropriate.

United States v. Figueroa, 618 F.2d 934, 942 (2d Cir.1980) (emphasis added) (citation omitted). Here, Scott's counsel did not explicitly concede identity, but he was not required to in order to remove identity from the case. The record indicates that defense counsel attempted to inform the court that he would not be challenging identity at trial, as indeed he did not.
Mr. Farber: I guess the question is: How is that properly before this jury that he has had prior police contact? If I raise the issue of identification, or lack of familiarity with his face, clearly I am opening up the door. But if it's a matter of simply the officer
The Court: Mr. Farber, we have been through this. The government has the burden of proof beyond a reasonable doubt. They are entitled to have their witnesses' identification supported by the fact that they have seen him before.
The defense might well have stipulated to identity if the district court had permitted further discussion. See United States v. Colon, 880 F.2d 650, 659 (2d Cir.1989) ("[U]pon consideration of the record and of counsel's argument before us, we believe that [counsel] was endeavoring to remove intent from the case, and that he might well have agreed to an explicit stipulation if the district court had permitted further examination of the issue after the change in defense theory."). That under these circumstances an express stipulation from counsel was not forthcoming is not surprising, especially in light of the court's statements that it did not believe Scott could expect to be "portrayed as someone who... is simon-pure." These statements would have led any reasonable defense counsel to believe that the court would not hear a stipulation as to identity even if offered.
Even more crucially, defense counsel conceded in his opening statement not only that his client was present, but also that it was, in fact, his client who was engaged in the behavior that led police to believe they had observed a drug sale. Scott's defense was not predicated on the theory that the detectives witnessed a drug sale committed by someone other than Scott. Instead, the defense was that detectives witnessed Scott himself engaged in innocent behavior. This is a crucial difference, and one that we have recognized in our case law in analogous circumstances. See Colon, 880 F.2d at 657 (2d Cir.1989) ("Our cases have thus recognized a distinction between defense theories that claim that the defendant did not do the charged act at all, and those that claim that the defendant did the act innocently or mistakenly, with only the latter truly raising a disputed issue of intent."); United States v. Ortiz, 857 F.2d 900, 904 (2d Cir.1988) ("Moreover, intent is *83 not placed in issue by a defense that the defendant did not do the charged act at all. When a defendant unequivocally relies on such a defense, evidence of other acts is not admissible for the purpose of proving intent." (internal citations omitted)). "In some circumstances the very nature of a defense put forward by the defendant may itself remove an issue from a case." United States v. Tarricone, 996 F.2d 1414, 1421 (2d Cir.1993). This was such a circumstance. By telling the jury that they would "come to see how an innocent encounter between Mr. Scott and another individual rose to become this case that you're asked to sit on the jury today," the issue of identity was removed from the case with sufficient clarity so as to meet Figueroa's demands.
Ultimately, nothing in the defense case, from opening to close, even remotely raised the issue of identity. The government suggests that appellant might have raised the issue of identity at summation, too late to be rebutted, but such an argument cannot be credited under the facts of this case. If identity was in dispute here, it is hard to imagine the case in which it would not be in dispute. Where the defense has attempted to concede identity pretrial, opened on a theory completely inconsistent with misidentification, and presented a case that went only to the fact of a crime itself and not the identity of the perpetrator, a prosecutor's fear that an inconsistent theory will be advanced at closing cannot justify the admission of highly prejudicial other act evidence.
Finally, even if we were to credit the possibility that Scott and his counsel might have, in the end, pulled a fast one, this would only justify the admission of testimony that detectives had seen Scott in the past, not that they had spoken to him repeatedly and at some length. See Lumpkin, 192 F.3d at 287. The government provides no explanation, and we can find none, for why testimony that the detectives recognized him simply from having seen him so many times before would not have sufficed to support their identification. Officers who could testify that they had seen a defendant up to twenty times before would not need to further confirm their ability to identify him by testifying about the number and length of conversations they had had with him, especially where a challenge to identity was so obviously disclaimed by the defense. We see no reason to have admitted the testimony at all, but even accepting that it may have been necessary, it was not necessary to have gone so far.
Identity was not in dispute in this case. There was, accordingly, no proper purpose for the recognition testimony.

3. Whether the Testimony's Probative Value was Outweighed by its Prejudicial Effect
Even assuming the relevance of the recognition testimony, evidence admitted under Rule 404(b) must not have its probative value substantially outweighed by its prejudicial effect. Huddleston, 485 U.S. at 691, 108 S.Ct. 1496. This prong of Huddleston is essentially an importation of Rule 403's balancing test, and so we address Scott's Rule 403 argument along with it. See United States v. Gilan, 967 F.2d 776, 780 (2d Cir.1992) (under Huddleston, "the evidence must satisfy the probative-prejudice balancing test of Rule 403"). Under Rule 403, "[a]lthough relevant, evidence may be excluded if its probative value is substantially outweighed by the danger of unfair prejudice." We accord "great deference" to a district court's assessment of this prejudice, Quinones, 511 F.3d at 310, "second-guess[ing] a district court only if there is a clear showing that the court abused its discretion or acted *84 arbitrarily or irrationally," United States v. Salameh, 152 F.3d 88, 110 (2d Cir.1998) (internal quotation marks omitted).
While we have held that "a mechanical recitation of the Rule 403 analysis is not required," United States v. Pitre, 960 F.2d 1112, 1120 (2d Cir.1992), "[t]o avoid acting arbitrarily, the district court must make a `conscientious assessment' of whether unfair prejudice substantially outweighs probative value," Salameh, 152 F.3d at 110. Scott argues that the district court did not engage in the analysis required by Rule 403. We are compelled to agree. The district court's only inquiry into prejudice was its suggestion that because the testimony would not lead the jury to conclude Scott had been previously arrested, it was admissible. As an initial matter, as we have discussed supra Part I, Rule 404(b) does not encompass only criminal or wrongful acts incident to arrest, but all other crimes, wrongs, or acts relating to a person's character. Even a juror who did not infer that Scott had been arrested before from the recognition testimony would certainly take from it the implication that he had had substantial contact with the police that was not benign. The district court's view of the evidence as not having a prejudicial effect was thus too narrow.
Moreover, outside of this single comment, there is nothing in the record to suggest that the district court fully engaged in the required analysis, a fact corroborated by its repeated erroneous statements about the limited right of a defendant not to have evidence of his prior bad acts admitted against him. Even the government tries to distance itself from these statements, because they suggest the evidence was admitted precisely for the improper purpose of making it known that Scott had a criminal propensity.
We simply find no probative value to this testimony, and what little probative value this testimony may have had was substantially outweighed by the risk of unfair prejudice. The government's main argument on this point appears to be that there were more prejudicial facts they might have tried to raise against Scott at trial. But that other evidence might have been more prejudicial does nothing to support the proposition that the recognition testimony was not. Jurors would have drawn from it the conclusion that Scott had previously had significant contact with the police. Moreover, they were told this contact helped the detectives identify what he was doing, thus implying that they knew what he was likely to be doingin this case dealing drugs. It stretches credulity to think that a jury would assume that the defendant's lengthy and numerous contacts with the police were not, in some sense, related to his bad character and criminal propensity, even if evidence of these contacts did not lead that same jury to conclude that he had been previously arrested. This is the essence of prejudice. In other cases in which identity is actually in dispute, that prejudice may be outweighed by the evidence's probative value, and the evidence may be admissible. But here, given that the only proper purpose for the testimony was an issue not in dispute, we have no trouble concluding that the potential prejudice of this testimony outweighed any conceivable probative value.
Scott did not request a limiting instruction. Save for that, all of the Huddleston inquiries fall in his favor. The testimony was either not relevant to a proper purpose or not relevant to an issue sufficiently in dispute. Nor did its probative value remotely approach outweighing the possible prejudicial effect. The district court's admission of the testimony under Rule *85 404(b) was thus an abuse of discretion, as was its failure to conduct the required Rule 403 analysis.

III. Harmless Error
The government concedes in a footnote that if it was error to admit the testimony, the error would not be harmless. We agree. In determining whether an evidentiary error was harmless, we consider "(1) the overall strength of the prosecution's case; (2) the prosecutor's conduct with respect to the improperly admitted evidence; (3) the importance of the wrongly admitted testimony; and (4) whether such evidence was cumulative of other properly admitted evidence." United States v. Kaplan, 490 F.3d 110, 123 (2d Cir.2007) (internal quotation marks omitted). Here, the first and third prongs, the importance of the wrongly admitted testimony and the strength of the prosecution's case, which we have indicated "is probably the single most critical factor," United States v. Lombardozzi, 491 F.3d 61, 76 (2d Cir.2007) (internal quotation marks omitted), are sufficient to support a finding that the error in admitting the testimony was not harmless.
The prosecution's case here was not particularly strong. While we recognize that the evidence at trial was substantially composed of eyewitness testimony by police officers, it was, as Scott notes, testimony based on observation "through the patrol car's tinted windows, from approximately 55 feet away, and using only the side-view or rear mirrors." There was also testimony that might have strained a juror's credulity: that the transaction took place in broad daylight, that no one appeared to be acting as a lookout, and that, despite the fact that the police "usually get spotted pretty quickly," no one noticed the tinted-window car stopped at the intersection. Additionally, no drugs were found either on Scott or on the woman to whom he was alleged to have sold.
As to the importance of the testimony, we note, as does Scott, that approximately one out of six typed pages of the prosecution's closing was devoted to the fact that Scott was known to the detectives. The recognition testimony was also referenced in the opening statement. See Hynes v. Coughlin, 79 F.3d 285, 291 (2d Cir.1996) ("Another barometer [for the importance of wrongly admitted Rule 404(b) evidence] is whether or not the evidence was emphasized in arguments to the jury."). Moreover, by suggesting that the officers were better able to tell what Scott was doing because they knew him, the prosecutor was directly (albeit perhaps unintentionally) suggesting that this was so because they knew what he was likely to be doing, which the testimony very plainly indicated to the jury was something unlawful.
"An error in the admission of evidence may be deemed harmless only if it is highly probable that the error did not contribute to the verdict." United States v. Jean-Baptiste, 166 F.3d 102, 108 (2d Cir. 1999) (internal quotation marks omitted). Given the weaknesses in the prosecution's case and the importance of the testimony, we cannot say it is highly probable that the error did not contribute to the verdict. A jury could easily have believed Scott's version of the factsthat is, that the distance, the tinted windows, and the view though the mirrors meant that the detectives could not be certain about what had transpired based on their sight aloneand yet convicted because of the recognition testimony, finding that even if the detectives could not see exactly what was happening, they knew well enough what Scott's behavior meant because of their prior contacts with him. This error in admitting the recognition testimony was not harmless.


*86 CONCLUSION
Accordingly, for the foregoing reasons and consistent with our November 9, 2011 order, we VACATE Scott's conviction and REMAND for new trial.
NOTES
[1]  Judge Roger J. Miner was an original member of the panel that heard oral argument on November 8, 2011. He died on February 18, 2012, after having voted to vacate and remand Scott's conviction in accordance with this opinion.
[2]  After oral argument and upon due consideration of the record, we issued an order, filed November 9, 2011, vacating Scott's conviction and remanding for a new trial, and noting that an opinion would follow in due course.
[3]  A cell phone which contained a forwarded message advertising drugs was also recovered from Scott.
[4]  The language of Rule 404(b) has since been amended to read "[e]vidence of a crime, wrong, or other act is not admissible to prove a person's character in order to show that on a particular occasion the person acted in accordance with the character. . . . This evidence may be admissible for another purpose, such as proving motive, opportunity, intent, preparation, plan, knowledge, identity, absence of mistake, or lack of accident." The Advisory Committee Notes for this change indicate that the change was "intended to be stylistic only. There is no intent to change any result in any ruling on evidence admissibility." Fed.R.Evid. 404 advisory committee's note. Our analysis would thus be identical under either version of the Rule.
[5]  See United States v. Devin, 918 F.2d 280, 286 (1st Cir.1990) (holding the "disjunctive terminology [of Rule 404(b)] shows unmistakably that [it] reaches conduct which is neither criminal nor unlawful so long as the conduct is probative of, and revelatory as to, a permitted purpose"); United States v. Rawle, 845 F.2d 1244, 1247 (4th Cir.1988) ("To fall within the scope of Rule 404(b), an act need not be criminal, so long as it tends to impugn a defendant's character."); United States v. Kendall, 766 F.2d 1426, 1436 n. 5 (10th Cir. 1985) ("To fall within the scope of 404(b), an act need not be criminal, so long as it tends to impugn a defendant's character."); United States v. Terebecki, 692 F.2d 1345, 1348 n. 2 (11th Cir.1982) (Rule 404(b) "includes non-criminal activity that `impugns the defendant's character'"); United States v. Miller, 573 F.2d 388, 392 (7th Cir. 1978) (holding "the structure and language of . . . Rule [404(b)] indicate that it includes conduct that is neither criminal nor wrongful"); United States v. Cooper, 577 F.2d 1079, 1087-88 (6th Cir. 1978) (holding that other acts evidence is not "limited only to evidence of other crimes, for its own language also speaks of `wrongs or acts.' Conceivably within the broad language of the rule is any conduct of the defendant which may bear adversely on the jury's judgment of his character."); United States v. Beechum, 582 F.2d 898, 914 n. 17 (5th Cir. 1978) (Rule 404 "include[s] noncriminal activity that impugns the defendant's character," because while "[t]he danger of a jury's reprisal for unpunished extrinsic activity is likely to be less when that activity is not of a criminal nature but merely `bad[,]' [t]he trial judge should recognize . . . that the conscience of a jury does not always coincide with the perimeters of criminality.").
