T O P
Nater5000

>I am trying to make algo which predicts an optimal option strategy (like IC, spreads, covered calls, strangle etc). I am trying with RL (a stupid choice I know) but it's a start... It's a bad start. You're just going to burn yourself out dealing with a bunch of complexity that won't get you anywhere. I know you're not asking for this advice, but if you actually want to make any progress, you *need* to start with something simple and work your way up. What you're trying to accomplish definitely doesn't need RL, at least not to start with. ​ >I am unable to find any Options only dataset with minute tickers. There's always threads on this sub (and other's, like r/options) which are asking for this same stuff. Apparently you can get your hands on some of this data for free, but I've found that if you're at the point where you need *this* kind of data, you're at a point where you need to pay for this data. Things are constantly changing, though, so my perspective may just be dated. Still, you should search for other threads asking the same thing (I know I've seen them on this sub over the last few weeks). ​ >Does such a dataset exists? Yes, I buy such data from the [CBOE DataShop](https://datashop.cboe.com/), but like I said, you may be able to find it free elsewhere. At the least I can tell you it *does* exist. ​ >I know option pricing can be derived from stock using Black-Scholes model No, it can't. You also need to know implied volatility (and the risk-free interest rate) to be able to derive option prices, something that isn't given from stock pricing data.


bitemenow999

>It's a bad start. You're just going to burn yourself out dealing with a bunch of complexity that won't get you anywhere. I know what I am doing, I am very well versed in ML and RL (specifically) with some good papers out for stochastic systems and RL... ​ CBOE DataShop will break the bank...


plinifan999

So I agree that you shouldn't use RL. I've done legitimate research in ML too, and have tried to do ML in finance, have read many papers about it. Basically, when using black-box DL models, you are hoping that there's enough predictive signal in the historical market data, but when the signal/noise ratio is too low, and/or future prices can only be profitably described by functions of variables that are outside of your training data (news and events), it is impossible for the model to not overfit. Predictive signals in pure market data are incredibly, incredibly sparse in an incredibly, incredibly high dimensional input space, they tend to not last long because participants are incentivized to disrupt alpha (by capitalizing on it). SL is powerful because of its ability to coherently train on very large data, and RL is even more sample inefficient than SL because there's variance in running episodes and sampling rewards, etc, so where will you get meaningful data when data from a few months ago is a completely different environment than data now? If you must try any kind of machine learning, why RL? You can just assume that the environment is not affected by your actions (you have no market impact) without loss of much alpha potential, and turn the solution into SL, which is a much easier problem with much more stable training. In general, I think ML *could* work, but only in very specific slices of the market. (Narrowing the input space to certain asset combinations over specific time periods that have been pre-classified as points of interest, for example.) And only then, the ML that I would imagine working would be "small data," very, very simple models with as few parameters as possible to avoid overfitting.


wsb202009

>I know what I am doing, I am very well versed in ML and RL (specifically) with some good papers out for stochastic systems and RL... It is still bad start, you will burn your time and will be frustrated with complexity. Start with simple algo and expand it.


tagfresca

Can you share the dataset that you already have?


bitemenow999

It's just for 2 tickers apple and Starbucks, for 2 weeks (with Thanksgiving holiday)... I can share but it is practically useless given the extreme volatility from Friday and Monday...


tagfresca

oh haha all right no worries then


Hot_World_5639

Mh


knightkidd

[OptionsDX](https://www.optionsdx.com/) has some datasets for the more heavily traded symbols. End of day quotes are for free, and even minutely for a few tickers that doesn't 'break the bank'.


Longjumping-Guard132

Here you will need pattern recognition to determine each pattern. Then testing the correlation of price movements after each pattern