Alex Wong Cian Yih
commented on
Let’s cut to the chase: the AI world is buzzing about DeepSeek, a Chinese startup that claims to break new ground with two models: DeepSeek-V3 (released on December 26, 2024) and DeepSeek-R1 (released on January 20, 2025). Both tout spectacular innovations—pure reinforcement learning, multi-head latent attention, mixture-of-experts, resource optimization—and remarkably low training costs. They’ve managed to s...
![Data Parasitism and the “Innovations” of DeepSeek: When Borrowing Becomes a Breakthrough](https://sgsnsimg.moomoo.com/sns_client_feed/102493980/20250131/1738337180891-random6920-102493980-android-compress.jpg/thumb?area=101&is_public=true)
![Data Parasitism and the “Innovations” of DeepSeek: When Borrowing Becomes a Breakthrough](https://sgsnsimg.moomoo.com/sns_client_feed/102493980/20250131/1738337181422-random6852-102493980-android-compress.jpg/thumb?area=101&is_public=true)
![Data Parasitism and the “Innovations” of DeepSeek: When Borrowing Becomes a Breakthrough](https://sgsnsimg.moomoo.com/sns_client_feed/102493980/20250131/1738337181644-random4573-102493980-android-compress.jpg/thumb?area=101&is_public=true)
9
10
1
The debate surrounding U.S. export controls on advanced AI chips to China is getting louder, with officials touting it as a critical measure to safeguard national security and “maintain America’s technological lead.” But is choking off hardware really the most effective way to rein in the competition—particularly when new data-parasitic methods, like knowledge distillation, pose an even bigger threat to proprietary AI models? Let’s break it...
5
1
It’s been a wild year in AI, and the latest jaw-dropper comes from a scrappy contender called DeepSeek. On the surface, they’re just another upstart touting big ambitions—“faster performance,” “open-source glory,” “revolutionizing machine learning,” all the usual slogans. But dig deeper, and you’ll find the real controversy fueling their rise: accusations of “data parasitism.” Some say they’re sly geniuses exploiting the system;...
6
3
The AI world loves a splashy headline, and boy, did DeepSeek deliver. With its recent open-source release—boasting a rumored cost of just a few million dollars—DeepSeek sent ripples (or tsunamis, depending on who you ask) through the industry. Supporters are hailing it as a “low-cost revolution” in AI model training. Critics, however, are calling it a blatant case of “data parasitism,” accusing DeepSeek of piggybacking on the fru...
8
1
1
Alex Wong Cian Yih
commented on
The tech world is buzzing, and quantum computing has taken center stage in the conversation. Much like the crypto frenzy of previous years, we’re now seeing a new wave of hype—this time around companies like IonQ, Rigetti, and D-Wave. Their stock prices have soared by hundreds of percentage points, fueled by speculative enthusiasm and sensational headlines.
---
The Quantum Computing Phenomenon: How Did We Get Here?
Quantum comp...
---
The Quantum Computing Phenomenon: How Did We Get Here?
Quantum comp...
39
11
7
In the wild world of quantum computing, Rigetti Computing (NASDAQ: RGTI) has become the latest object of extreme investor fascination. The stock has skyrocketed by bewildering percentages—be it 725% in a single month or 1024.51% over the past year, depending on which window of time you cherry-pick. If that doesn’t scream “volatile,” I don’t know what does. But with all the buzz about qubits and supercooled circuits, does RGTI ge...
44
4
Back in 2019, Google stunned the tech world with a claim of “quantum supremacy.” Its then-new quantum processor reportedly handled a specialized calculation faster than any known supercomputer could. Yet, years later, quantum computers remain mostly confined to research labs. And now, with Google’s latest Willow chip, many are once again asking if we’re on the cusp of a quantum revolution—o...
28
1
4
Alex Wong Cian Yih
commented on
The cryptocurrency market has always been a breeding ground for dramatic narratives, and XRP’s recent surge has been no exception. Within just a few days, XRP has surpassed Tether (USDT) to claim the title of the third-largest cryptocurrency by market capitalization, trailing only behind Bitcoin and Ethereum. My WhatsApp conversations with a friend over the past two days have revolved almost entirely around this topic. He’s not only thrilled ab...
34
10
6
Over the past two weeks, Ripple’s XRP has been on a meteoric rise, sparking excitement and even euphoria among investors. A friend of mine, who recently invested $500 into XRP, has been sharing daily updates with me on WhatsApp. Every day, he sends screenshots of XRP’s gains, celebrating how much it has surged. While I’m happy for his success, I can’t help but feel a bit worried for him—and for others who might be caught up in the curr...
46
1
4
Alex Wong Cian Yih
commented on
In the ever-evolving landscape of technology investments, ASML Holding N.V. (ASML) and Qualcomm Incorporated (QCOM) stand out as pivotal players in the semiconductor industry. Both companies have demonstrated resilience and innovation, making them attractive considerations for long-term investors. This analysis delves into their current valuations, financial health, and growth prospects to assess whether now is an opportune...
loading...
32
2
1
Alex Wong Cian Yih OP Deltaman099 : I understand your perspective, and I think it’s an interesting point. But let me ask you: if you were running OpenAI and had invested billions of dollars in training models using massive compute resources and human feedback, only to see a competitor use your model’s outputs to build their own product at just 1/20th of your cost—how would you feel? Do you think it would still motivate you to invest in groundbreaking innovation, knowing your efforts could be leveraged by others so easily? I’m genuinely curious to hear how you’d approach this situation if the roles were reversed.
Alex Wong Cian Yih OP lousyimpressario : I see your point, but there’s a fundamental difference between drawing inspiration from public knowledge and directly leveraging a competitor’s proprietary outputs to train a competing model.
OpenAI’s API policy explicitly states that its outputs cannot be used as training data to develop competing AI models. This is clearly outlined in their Terms and Conditions (T&C), and DeepSeek has blatantly ignored this rule. OpenAI’s API-generated responses are not just raw data scraped from the internet—they are already fully processed, cleaned, and structured high-quality outputs, carefully curated through human feedback and extensive model training.
What this means is that DeepSeek didn’t have to go through the same rigorous and expensive data processing steps as OpenAI, such as collecting raw internet data, performing human annotation, conducting reinforcement learning with human feedback (RLHF), and filtering out low-quality responses. Instead, they simply took OpenAI’s high-quality API outputs and fed them back into their own model, effectively short-cutting the most costly and labor-intensive parts of AI training.
This is why, when asked about its origins, DeepSeek’s model has been seen stating that it was trained by OpenAI—a clear indication that it directly absorbed OpenAI’s API responses without meaningful transformation.
Yes, knowledge distillation and model refinement are common in the industry, and many AI companies do use competitive insights to improve their models. However, other companies at least process, refine, and adjust the data before using it, ensuring that their outputs are not direct copies. What DeepSeek has done is take OpenAI’s processed knowledge as-is and repurpose it for its own use without adding real value or originality.
This isn’t just an ethical concern—it’s a direct violation of OpenAI’s T&C and an unfair exploitation of their proprietary research. If this practice were normalized, it would undermine the incentives for any company to invest in real AI innovation, as any competitor could simply take the end product and reuse it for free. OpenAI, and any other serious AI developer, would have little reason to continue advancing the field if their outputs could be so easily repurposed by rivals.
So, this is not just about ‘inspiration’ or ‘competition’—it’s a clear-cut case of unauthorized exploitation of another company’s intellectual property, and it sets a dangerous precedent for the future of AI development.
Alex Wong Cian Yih OP lousyimpressario : I see what you’re saying, but I think your argument overlooks some critical distinctions. Innovation is indeed about building on prior knowledge, but there’s a major difference between drawing from public data and directly leveraging a competitor’s proprietary outputs.
Your chef analogy doesn’t work because OpenAI’s API responses aren’t just ‘flavors to be tasted’—they’re more like a secret recipe that took billions to perfect. DeepSeek isn’t just drawing inspiration from OpenAI; they’re accused of directly training on its proprietary outputs, which OpenAI’s Terms & Conditions explicitly prohibit.
You also assume this won’t impact AI investment, but that’s an overly optimistic view. If major AI developers can’t protect their research, they will have little incentive to keep pushing boundaries. Why spend billions when a competitor can just extract the final results and skip the hardest steps?
Fair competition should be about genuine innovation, not bypassing a competitor’s most difficult and costly R&D efforts. That’s the real issue here.
Alex Wong Cian Yih OP Deltaman099 : I find it interesting that you immediately default to 'just take it to court' when we are simply discussing an industry-wide issue. I’m not OpenAI, nor am I an investor in OpenAI—so why would legal action be my concern? What we are debating here isn’t about filing lawsuits; it’s about ethics, fair competition, and the impact on AI innovation as a whole.
Legal action isn’t the only way to discuss ethical concerns. Even before lawsuits happen, unfair practices can impact the AI industry and competitive landscape. The real question isn’t whether DeepSeek optimized their model architecture, but whether they followed OpenAI’s terms of service when acquiring training data.
It doesn’t matter how well a model performs if its foundation is built on violating another company’s policies. Innovation should be fair, and genuine breakthroughs should come from original work—not from repurposing a competitor’s proprietary outputs without permission.
If we can’t even discuss these concerns rationally, then by your logic, we shouldn’t question any potential violations in any industry unless a court case is filed. That’s not how ethical discussions work.
Alex Wong Cian Yih OP Deltaman099 : Since DeepSeek has open-sourced its model under the MIT license, OpenAI can, in theory, apply knowledge distillation techniques to improve its own models using DeepSeek’s outputs. MIT licensing allows commercial and unrestricted use, meaning OpenAI could legally extract insights from DeepSeek’s model to enhance its Chinese-language capabilities.
Given that DeepSeek’s advancements in Chinese comprehension, grammar, and semantics are notably strong, leveraging its outputs could significantly enhance OpenAI’s own models in this area. This is a practical example of how open-source fosters cross-learning—just as DeepSeek might have built upon existing AI advancements, OpenAI now has the same opportunity to do so in return.
With DeepSeek’s model available to everyone, it becomes part of the broader AI ecosystem, allowing improvements to flow in multiple directions. This is how technological progress works when models are shared openly.