Automatic import of classifications

Hello @fizban ,
thank you for your work ! Very useful script.

I have issue, each time I run the script, I have random issue with different secid not found error :

secid  0P0000684V not found in PortfolioSAL  retrieving it from x-ray...

and at the end :

Traceback (most recent call last):
  File "/***/Documents/pp-portfolio-classifier/portfolio-classifier.py", line 684, in <module>
    pp_file.add_taxonomy(taxonomy)
  File "/***/Documents/pp-portfolio-classifier/portfolio-classifier.py", line 550, in add_taxonomy
    securities = self.get_securities()
  File "/***/Documents/pp-portfolio-classifier/portfolio-classifier.py", line 652, in get_securities
    security_h = security.load_holdings()
  File "/***/Documents/pp-portfolio-classifier/portfolio-classifier.py", line 348, in load_holdings
    self.holdings.load(isin = self.ISIN, secid = self.secid)
  File "/***/Documents/pp-portfolio-classifier/portfolio-classifier.py", line 502, in load
    percentages.append(float(tr.select("td")[taxonomy['column']].text.replace(",",".")))

Do you have any idea to help me ?

Thank you !

Thank you for trying the script. I have uploaded a new version that just skips securities that do not have an ISIN.
Regarding @Balmy6009 error, the secid 0P0000684V is of type Stock and will not be classified, but it should not return an error at the end. The first message of secid not in PortFolioSal is just informative and indicates that the script did not find the security in the standard web of MorningStar’s funds and ETFs and will try its x-ray service. The error at the end might be related to strange structure of the xray page used for retrieving the information when querying a stock instead of a fund/ETF. I made some changes to improve the robustness of the script in that sense. Try the new version and see what errors you get.

Hello @fizban ,
another try with new version output me that :

❯ python portfolio-classifier.py pp_input.xml

secid 0P00000031 not found in PortfolioSAL retrieving it from x-ray...
secid 0P0000AR7O not found in PortfolioSAL retrieving it from x-ray...
secid 0P000067XW not found in PortfolioSAL retrieving it from x-ray...
isin CA92206A1066 not found in Morningstar, skipping it...
secid 0P0000053X not found in PortfolioSAL retrieving it from x-ray...
secid 0P0000Z316 not found in PortfolioSAL retrieving it from x-ray...
isin CA46431V1031 not found in Morningstar, skipping it...
isin CA92203B1076 not found in Morningstar, skipping it...
isin US37954Y4834 not found in Morningstar, skipping it...
secid 0P0000KU35 not found in PortfolioSAL retrieving it from x-ray...
secid 0P0000684V not found in PortfolioSAL retrieving it from x-ray...
isin CA05560U1049 not found in Morningstar, skipping it...
isin US97717W2089 not found in Morningstar, skipping it...
isin US00214Q1040 not found in Morningstar, skipping it...
isin US78462F1030 not found in Morningstar, skipping it...
secid 0P000004GV not found in PortfolioSAL retrieving it from x-ray...
isin CA46428D1087 not found in Morningstar, skipping it...
isin CA92205Y1051 not found in Morningstar, skipping it...
secid 0P000067YY not found in PortfolioSAL retrieving it from x-ray...
secid 0P00006899 not found in PortfolioSAL retrieving it from x-ray...
secid 0P000002OY not found in PortfolioSAL retrieving it from x-ray...
isin CA05591D1050 not found in Morningstar, skipping it...
secid 0P000004M1 not found in PortfolioSAL retrieving it from x-ray...
security 'META CDR (CAD HEDGED)' does not have isin, skipping it...
secid 0P0000POE1 not found in PortfolioSAL retrieving it from x-ray...
isin CA44055T1084 not found in Morningstar, skipping it...
secid 0P000000RD not found in PortfolioSAL retrieving it from x-ray...
secid 0P000080HU not found in PortfolioSAL retrieving it from x-ray...
secid 0P00015ZGZ not found in PortfolioSAL retrieving it from x-ray...
secid 0P0000NI99 not found in PortfolioSAL retrieving it from x-ray...
secid 0P00006800 not found in PortfolioSAL retrieving it from x-ray...
security 'CI Galaxy Ethereum ETF' does not have isin, skipping it...
isin CA46433H1029 not found in Morningstar, skipping it...
isin CA46431R1029 not found in Morningstar, skipping it...
secid 0P00008312 not found in PortfolioSAL retrieving it from x-ray...
secid 0P0000681O not found in PortfolioSAL retrieving it from x-ray...
isin CA05592F1099 not found in Morningstar, skipping it...
isin CA44050P1018 not found in Morningstar, skipping it...
secid 0P0000019Y not found in PortfolioSAL retrieving it from x-ray...
security 'APPLE CDR (CAD HEDGED)' does not have isin, skipping it...
secid 0P0000S493 not found in PortfolioSAL retrieving it from x-ray...
secid 0P000001BW not found in PortfolioSAL retrieving it from x-ray...
secid 0P0000032S not found in PortfolioSAL retrieving it from x-ray...
security 'Evolve Bitcoin ETF CAD - Unhedged' does not have isin, skipping it...
isin CA92205X1078 not found in Morningstar, skipping it...
isin CA92203Q1046 not found in Morningstar, skipping it...
security 'AMAZON.COM CDR (CAD HEDGED)' does not have isin, skipping it...
secid 0P0001CX12 not found in PortfolioSAL retrieving it from x-ray...
secid 0P00006829 not found in PortfolioSAL retrieving it from x-ray...
isin CA46433F1062 not found in Morningstar, skipping it...
security 'CI Galaxy Bitcoin ETF (CAD)' does not have isin, skipping it...
secid 0P00006893 not found in PortfolioSAL retrieving it from x-ray...
secid 0P0000E8LR not found in PortfolioSAL retrieving it from x-ray...
isin CA46436D1087 not found in Morningstar, skipping it...
security 'MICROSOFT CDR (CAD HEDGED)' does not have isin, skipping it...
isin US92206C6646 not found in Morningstar, skipping it...
isin CA46435V1094 not found in Morningstar, skipping it...
isin CA46431L1132 not found in Morningstar, skipping it...
secid 0P0001ILCL not found in PortfolioSAL retrieving it from x-ray...

my isin2secid.json :

{

"CA0158571053": "0P0001ILCL",

"CA05534B7604": "0P000067XW",

"CA0641491075": "0P000067YY",

"CA1363751027": "0P00006800",

"CA1367178326": "0P0000S493",

"CA14042M1023": "0P0000POE1",

"CA17039A1066": "0P0000Z316",

"CA29250N1050": "0P0000681O",

"CA3495531079": "0P00006829",

"CA56501R1064": "0P0000684V",

"CA6674951059": "0P0000NI99",

"CA7063271034": "0P000080HU",

"CA72585V1031": "0P00008312",

"CA7669101031": "0P0000E8LR",

"CA82509L1076": "0P00015ZGZ",

"CA83179X1087": "0P0000AR7O",

"CA87971M1032": "0P00006893",

"CA8911605092": "0P00006899",

"IE00B8FHGS14": "0P0000XOID",

"IE00BP3QZ601": "0P00014E82",

"IE00BP3QZ825": "0P00014G97",

"IE00BP3QZB59": "0P00014E87",

"IE00BP3QZD73": "0P00014E88",

"LU1778762911": "0P0001CX12",

"US00206R1023": "0P00000031",

"US0846707026": "0P000000RD",

"US11135F1012": "0P0000KU35",

"US17275R1023": "0P0000019Y",

"US1912161007": "0P000001BW",

"US4370761029": "0P000002OY",

"US4781601046": "0P0000032S",

"US7427181091": "0P000004GV",

"US7561091049": "0P000004M1",

"US8545021011": "0P0000053X"

}

On my pp_classified.xml I have the new Taxonomies Asset-Type, Stock-style, Sector, Holding, Region and Country but all is in Without Classification folder.

Particularity in Asset-Type, all stocks are in all Asset-Type/Stocks,Bounds,Cash,Other and Not Classified folders but Weight only active for Not Classified folder. All other, ETF and not isin set up type are in Without Classification with weight active and calculate with Asset-Type/Not Classified folder

pre-existing categories, I guess a possiblity would be to allow for some kind of mapping between existing categories and the new ones.

Yes, this also reminds me of the missing “update” feature. I think when existing category/taxonomy content would be used and updated, this could also handle to update all securities, when running the script 6 monthslater for an update. Thanks a lot!

Hi @fizban ,
I tried the script but it return me Error

  File "C:\***\PYTON\pp-portfolio-classifier\portfolio-classifier.py", line 503, in load
    percentages.append(float('0' + tr.select("td")[taxonomy['column']].text.replace(",",".")))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not convert string to float: '0-'

Can you advise me where the problem might be?

Thank you!

Thank you for trying the script, @segi . It seems that you have a security for which the x-ray service of Morningstar returns a result that I did not consider (a “-” instead of blank). Hopefully the version I just uploaded fixes the issue.
Regarding @Balmy6009 issue, there are a mixed set of messages. None of them are really errors.

secid xxxx not found in PortfolioSAL retrieving it from x-ray...

This message just indicates that the given security did not return any categorization when trying the first service of Morningstar (PortfolioSAL), so it will try the secondary service (x-ray).

isin xxxx not found in Morningstar, skipping it...

This just indicates that the given isin was not found in the Morningtar default domain. Since to get the categorization we need a the Morningstar Security ID (secid), if we are not able to retrieve the corresponding secid from the ISIN we cannot categorize the security. There are two solutions for this: update manually the isin2secid.json (if you can get the secid somehow) or try a different domain (by default is ‘es’, for morningstar.es). To make it easier to change the domain, I have added a new command line ‘-d’, so now you can run the script adding -d de, for example, to change the domain to morningstar.de. If you do not specify a domain, ‘es’ will be used.

security 'xxxx' does not have isin, skipping it...

This indicates that the given security was added to the Portfolio-performance file without specifying an isin. Without that code, there is no much we can do, since it is the starting point, so the security is skipped.

I tried it today with latest code, and also got an error. On trying to debug, I found out that (at least for me), when I got an error like
secid xxxx not found in PortfolioSAL retrieving it from x-ray...
for an valid secid, than the problem is that the response is an “<Response [401]>” (HTTP code 401, seems to be authorization releated)

So I took a look in response of “funds/snapshot/PortfolioSAL.aspx” where “Bearer”-token is fetched. In my case, the code need a change line “script = soup.find('script', {'type':'text/javascript'})” is not working. Because in my response, the tokenMaaS: is not in the “type’:'text/javascript” area. I needed to change the return to

resultstringtoken = str(soup).split('tokenMaaS:')[-1].split('}')[0].replace('"','').strip()
return resultstringtoken
(Important change, I search for “tokenMaaS” directly in “soup” and skipt the line to parse soap for “script”)
I am not sure why. Perhaps country related? (I am from Germany, and started the script with " -d de")

But even with that fix I still face “random” errors, similar people above explained. The interessing part is, that I do not touch the secids anymore (after fetching correct secids one time) and nevertheless I can run my script and some times get an error or not. If I get an error, it is again HTTP code 401. So, I guess its still a problem of bearer token (at least for me).
I see some replacements after fetching the bearer token in the code. Perhaps here more or less has to be replaced.

My “random” error turned out to be secid related. I missed that the script does not start everytime with the same secid, but with a random-order secid. So, on 0P00014G99, 0P0000Y2A1, 0P00014G96 I get code 200 and it works fine. On 0P00014G97, 0P00014G98 I get reponse 401.

But I don’t understand why. When I check the morning star website, the “API-failing secids” show me (at least) correct html response:
https://www.morningstar.de/de/etf/snapshot/snapshot.aspx?id=0P00014G98

@fizban: Do you have any knowledge about, why the PortfolioSAL API seems to work for some secids in each country? All ETFs are listed in EU and Germany, so I do not get the point why they do not work on API request.

Thanks @fizban now it is working.
But I see many of ETF that have less then 1% in unclassified Region and Country taxonomie.
And all my stocks are in Not classified classification in Asset-Type.

1 Like

Thank you, @Manni79 for finding the bearer token bug. It seems that Morningstar changes from time to time their page and that breaks this type of scripts that rely on assumptions on the web page structure. I fixed that in the current version. Hopefully the current code will be less fragile.
I added as well the status code between brackets when indicating that a secid was not found in PortFolioSal.
I believe the Morningstar API has some free access functionalities and some premium functionalities. Some securities might fall into the premium access category somehow and then we get the 401 code, even if they are available through their web page. Hopefully most of them will work fine with the x-ray fallback. Last year the web page was actually using the api internally using AJAX, but now they seem to embed the information directly within the page and only use the API for the sustainability information. I wouldn’t be surprised that they supress the API access altogether at some point.
The domain is only used to map the ISIN with the secid. It is not used for the retrieval of the the categories.
@segi you can try the new version and see if the categories improve. Note, though, that stocks are not classified with the current version of the script. Only ETFs and Funds are processed.

1 Like

Hello @fizban ,

Same issue, seems to be an Error 500 and 206 on few assets.
Checked on Morning Star website and secid found the ETF and information on Sector, asset-type etc exist on the page.

security 'MICROSOFT CDR (CAD HEDGED)' does not have isin, skipping it...
secid 0P0000S493 not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P0000KL46 not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P0000681O not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P0001O9N9 not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P0000XD8K not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P0000XD8J not found in PortfolioSAL (206) retrieving it from x-ray...
No information on Asset-Type for 0P00000031
secid 0P00000031 not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P0001ILCL not found in PortfolioSAL (500) retrieving it from x-ray...
No information on Asset-Type for 0P000000RD
secid 0P000000RD not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P0000Z316 not found in PortfolioSAL (500) retrieving it from x-ray...
security 'CI Galaxy Ethereum ETF' does not have isin, skipping it...
secid 0P0001I7EJ not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P0000P0O0 not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P0000E8LR not found in PortfolioSAL (500) retrieving it from x-ray...
No information on Asset-Type for 0P0000019Y
secid 0P0000019Y not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P0000N9P1 not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P0000SWUE not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P00006899 not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P00005ZQZ not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P000080SM not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P00006829 not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P0000SWUI not found in PortfolioSAL (206) retrieving it from x-ray...
security 'CI Galaxy Bitcoin ETF (CAD)' does not have isin, skipping it...
No information on Asset-Type for 0P0000053X
secid 0P0000053X not found in PortfolioSAL (500) retrieving it from x-ray...
No information on Asset-Type for 0P0000KU35
secid 0P0000KU35 not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P00006SO5 not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P00015ZGZ not found in PortfolioSAL (500) retrieving it from x-ray...
No information on Asset-Type for 0P000001BW
secid 0P000001BW not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P0000UT1D not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P0000ZGOB not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P00019S2Q not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P0000ZGAG not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P00002D7X not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P00008312 not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P00006800 not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P0000AR7O not found in PortfolioSAL (500) retrieving it from x-ray...
No information on Asset-Type for 0P0001CX12
secid 0P0001CX12 not found in PortfolioSAL (500) retrieving it from x-ray...
No information on Asset-Type for 0P000002OY
secid 0P000002OY not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P00006893 not found in PortfolioSAL (500) retrieving it from x-ray...
No information on Asset-Type for 0P000004GV
secid 0P000004GV not found in PortfolioSAL (500) retrieving it from x-ray...
security 'AMAZON.COM CDR (CAD HEDGED)' does not have isin, skipping it...
secid 0P0000684V not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P0000XD8M not found in PortfolioSAL (206) retrieving it from x-ray...
security 'Evolve Bitcoin ETF CAD - Unhedged' does not have isin, skipping it...
secid 0P0000NQMV not found in PortfolioSAL (206) retrieving it from x-ray...
No information on Asset-Type for 0P0000032S
secid 0P0000032S not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P000067XW not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P0000XD8I not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P0000SDEB not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P0000POE1 not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P0000NI99 not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P0000T32L not found in PortfolioSAL (206) retrieving it from x-ray...
secid 0P0001ANNU not found in PortfolioSAL (206) retrieving it from x-ray...
No information on Asset-Type for 0P000004M1
secid 0P000004M1 not found in PortfolioSAL (500) retrieving it from x-ray...
secid 0P000067YY not found in PortfolioSAL (500) retrieving it from x-ray...
security 'META CDR (CAD HEDGED)' does not have isin, skipping it...
security 'APPLE CDR (CAD HEDGED)' does not have isin, skipping it...
secid 0P000080HU not found in PortfolioSAL (500) retrieving it from x-ray...

Thank you, @Balmy6009 . Even if the first call to PortfolioSAL fails in a lot of the securities, the second call to x-ray does not seem to fail, so you should see some information for those securities within PortfolioPerformance. Are they all in the unassigned category? The securities 0P000004GV, P0000032S and so on (the ones with the message ‘No information on Asset-Type…’) do not seem to be available int the EMEA area or at least in Europe. That might be an additional limitiation of the script.

With the “multifaktortest.xml” I also had yesterday assets in unassigned categories. I figured out that all assets in “unassigned” categories had no “active stocks” (no bought stocks in history). So I opened “multifaktortest.xml” and added a “stock buy” for 2 stocks, which had no active stocks (I think it was “MSCI World” and “MSCI World Multifaktor”).

Then, after running the script again, with having active bought stocks in both of them, the output file has all categories assigned. I stopped at that point. But from my testing I would say, that only stocks get an auto classification which have bought stocks.
Perhaps that’s also the problem here?
I will give it a try tomorrow again, with your latest original code.

That is correct. Only the securities with transactions are processed.

Hello @fizban ,
you right, in fact about 54% are not classified.
I try with de, us and ca areas.
Do you know why a lot of my securities data can’t be fetched from website ?

Changing the domain parameter only affects the site to retrieve the mapping between isin and sid. It does not affect the site from where the security information is retrieved. The security information is currently retrieved from morningstar EMEA (Europe - Middle East - Asia) and a Spanish x-ray service. The only improvement I can see is to try to retrieve the security information as well from morningstar US, but if it fails due to “not authorized”, the fallback will still be the Spanish service…

@Balmy6009 Can you post three example ISINs of ETFs of your portfolio (best directly with related morning star secids), which are 100% uncategorized (unclassified)? I want to try manually if I can get some knowledge why they not working… or where the difference is.

@fizban: In your script code I found for xray a request to URL “https://lt.morningstar.com/j2uwuwirpv/xray/default.aspx?LanguageId=en-EN&PortfolioType=2&SecurityTokenList=*”. I opened it in a browser and it worked well. But my question is, how do you get the information that this xray service is “spanish”? It has a “.com” domain and on the languageID part I see “LanguageId=en-EN”. An advice would be very kind.
Thank you

@Manni79

LanguageId=es

@Manni79 The logo at the top left corner, should give you a clue. It’s from a Spanish bank. Actually the link is a well-known one within the Spanish non-professional investor community, openly shared and discussed for example at Rankia.

1 Like

@fizban: thanks a lot. Perhaps I will have next months some time to take a look if I am able to change URL fromhttps://lt.morningstar.com/j2uwuwirpv/xray/default.aspx?LanguageId=en-EN&PortfolioType=2&SecurityTokenList="SECID"
… e.g. to …
https://www.morningstar.de/de/etf/snapshot/snapshot.aspx?id=SECID https://www.morningstar.de/de/etf/snapshot/snapshot.aspx?id=SECID&tab=3&InvestmentType=FE
The response is not the same format, but result html tables seems to be very similar, perhaps change is not so big. As I see, these URLs deliver the same data (asset types, holdings, etc).

Or do I miss something and the spanish xray URL deliver more (or better?) information as the default website?