https://youtu.be/lcMZIG16l_k️
What Trump and Elon Musk’s DOGE’s shutdown of Radio Free Europe means for free speech | Focus on Europe (DW - German News in English VIDEO) #Ukraine #Mastodon #BoycottTesla #BoycottMusk #BoycottX #Musk #ElonMusk #Tesla #RFE #RL #OSCE #PACE #Germany #France #NukesForUkraine #SouthKorea #Japan #Press #Taiwan #Media #NukesOrNATO #USA #US #UK #EU #NATO #News #UnitedStates #EuropeanUnion #UnitedKingdom #russiaUkraineWar #11yrInvasionofUkraine #RussiaIsATerroristState
Here is the #Trump administration following court orders regarding funding of #RadioFreeEurope #RFE #RL
Leyla Latypova - RL & RFE Closure Silences Indigenous Voices Struggling Against Russian Imperialism.
https://www.youtube.com/watch?v=KdMawwzGG08
Apart from destroying democracy #trump and #musk behave like #ISIS destroying historical statues (the archives).
#LeylaLatypova #Latypova #Tatarstan #Bashkortostan #Chechnya
#RL #RadioLiberty #RFE #RadioFreeEurope
#ruSSia #racist #fascist #kleptocrat #maffia #terror
RingRing!!! Montag ist Medienmagazin-Podcast-Tag. Hier die frische Ausgabe. Diesmal sehr politisch und technisch.
#RFE #RL #VoA #RadioFarda #DW #DAB+ #5GBroadcast
Ukraine Daily summary - Sunday, March 23 2025
Russia uses propaganda narratives to undermine peace talks, shift blame to Ukraine, ISW says -- 'They are Russian-speaking, and there have been referendums'; US's Witkoff parrots Russian propaganda -- Putin's new decree part of plan to forcibly Russify Ukrainians, UK intelligence says -- Italy suspends Starlink purchase negotiations with SpaceX amid Musk controversy -- and morehttps://writeworks.uk/~/UkraineDaily/Ukraine%20Daily%20summary%20-%20%20Sunday,%20March%2023%202025/
Ukraine Daily summary - Thursday, March 19 2025
US scales down efforts in countering Russian sabotage -- Russian-occupied Zaporizhzhia Nuclear Plant in focus of Ukraine peace talks. What's at stake -- Trump Jr., Witkoff, Carlson involved in secret talks with Zelensky's rivals -- Zelensky confirms new arrival of F-16 jets to Ukraine -- US may relinquish leadership of NATO's European command -- and more2+2=5
"America destroys one of its own symbols," by #StanislavAseyev
#TimothySnyder #Trump #RFE #RL
https://youtu.be/0HN3IfmOEg8EU searches for help after Trump cuts hit Radio Free Europe (Reuters News VIDEO) #Ukraine #Mastodon #RFE #RFA #RL #NukesForUkraine #SouthKorea #Press #News #Taiwan #Media #Japan #NukesOrNATO #USA #US #UK #EU #NATO #UnitedStates #UnitedKingdom
#EuropeanUnion #russiaUkraineWar
#11yrInvasionOfUkraine
#RussiaIsATerroristState
Ukraine Daily summary - Sunday, March 16 2025
Russia readying to attack Sumy as Donbas front stabilizes -- 'Putin is lying to everyone' — Zelensky calls for 'strong pressure' on Russia after UK summit -- Duda denounces Russia for 'imperial greed,' reiterates calls to deploy US nuclear weapons in Poland -- Russia attacks Ukraine with 178 drones overnight, targets energy infrastructure -- and morehttps://writeworks.uk/~/UkraineDaily/Ukraine%20Daily%20summary%20-%20%20Sunday,%20March%2016%202025/
#ACMPrize
#2024ACMPrize
#ACMTuringAward
» #ReinforcementLearning
An Introduction
1998
standard reference...cited over 75,000
...
prominent example of #RL
#AlphaGo victory
over best human #Go players
2016 2017
....
recently has been the development of the chatbot #ChatGPT
...
large language model #LLM trained in two phases ...employs a technique called
reinforcement learning from human feedback #RLHF «
aka cheap labor unnamed in papers
https://awards.acm.org/about/2024-turing
2/2
Self-Improving Reasoners.
Both expert human problem solvers and successful language models employ four key cognitive behaviors
1. verification (systematic error-checking),
2. backtracking (abandoning failing approaches),
3. subgoal setting (decomposing problems into manageable steps), and
4. backward chaining (reasoning from desired outcomes to initial inputs).
Some language models naturally exhibits these reasoning behaviors and exhibit substantial gains, while others don't and quickly plateau.
The presence of reasoning behaviors, not the correctness
of answers is the critical factor. Models with incorrect solutions containing proper reasoning patterns achieve comparable performance to those trained on correct solutions.
It seems that the presence of cognitive behaviors enables self-improvement through RL.
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
https://arxiv.org/abs/2503.01307
Richard Sutton and Andrew Barto Win 2024 Turing Award https://awards.acm.org/about/2024-turing
Andrew Barto and Richard Sutton Recognized as Pioneers of Reinforcement Learning
SFT vs. RFT: Choosing the Right Fine-Tuning Strategy for Your AI Customizing foundation models has become essential for organizations seeking to create differentiated value
#AI #LLM #RL
https://gradientflow.com/post-training-rft-sft-rlhf/
Hear me out: I think applying RL on #LLMs and LMMs is misguided, and we can do much better.
Those #RL algorithms are unsuitable for this, and for example they cannot learn how their decisions affect the eventual rewards, but instead are just optimized to make the decisions based on Bellman optimization.
Instead we can simply condition the LLMs with the rewards. The rewards become the inputs to the model, not something external to it, so the model will learn the proper reward dynamics, instead of only being externally forced towards the rewards. The model can itself do the credit assignment optimally without fancy mathematical heuristics!
This isn't a new idea, it comes from goal-conditioned RL, and decision transformers.
We can simply run the reasoning trajectories, judge the outcomes, and then put the outcome tokens first to these trajectories before training them to the model in a batch.
How to formulate exploration-exploitation trade-off better than all the hacks on top of Bellman equation?
We can first of all simply estimate the advantage of exploration by Monte-Carlo in a swarm setting: Pitting fully exploitative agents against fully exploitative agents which have the benefit of recent exploration. This can be easily done by lagging policy models.
Of course the advantage of exploration needs to be divided by the cost of exploration, which is linear to the number of agents used in the swarm to explore at a particular state.
Note that the advantage of exploration depends on the state of the agent, so we might want to define an explorative critic to estimate this.
What's beautiful in this formulation is that we can incorporate autoregressive #WorldModels naturally, as the exploitative agents only learn from rewards, but the explorative agents choose their actions in a way which maximizes the improvement of the auto-regressive World Model.
It brings these two concepts together as sides of the same coin.
Exploitation is reward-guided action, exploration is auto-regressive state transition model improvement guided action.
Balancing the two is a swarm dynamic which encourages branching where exploration has an expected value in reward terms. This can be estimated by computing the advantage of exploitative agents utilizing recent exploration versus agents which do not, and returning this advantage to the points of divergence between the two.
Instead of "model is itself the environment" in an #LLM setting, you can take note that in normal #RL, you'd typically have state-action-reward-state-action-reward-... sequences, where the action inflicts itself upon the environment which changes and its new form projects into the next state.
For LLMs, there's only the outcome which comes from the environment. That's the final reward. Before that it's just auto-regressive action-action-action-...
So this makes advantage computation heavier than necessary and instead the advantage can be just returned backwards over the completion tree alternatives.
Looking into Asset Liability Management amongst other things for work (and how #rl can be used), it seems like the #private #insurance industry is an unregulated #banking industry. They take the insurances, invest them in the market and then try to figure out the timing between people asking them for the money and cashing out on their investments.
So why are people in the US insured? Is it a ritualistic cultural thing?