We have shown the results of basic techniques in various research articles as to how we scrape or collect public data off websites. This process has usually been done using a scraping tool, such as nokogiri or examining HTTP requests in chrome dev tools and making similar calls to those visible, public API endpoints.

While these strategies are very helpful and often generate very interesting data, we wanted to share some of our more advanced techniques to find publicly accessible API endpoints for companies which we are researching.

First, we want to emphasize why potentially finding these endpoints is important. Oftentimes applications will use one class to output JSON from a model and this same output will be used both on the client side and admin side. Some parts of the JSON might not be rendered for the client, but potentially interesting information will be exposed. Lets take the following theoretical JSON render of a model:

...
JSON.registerObjectMarshaller(Product) {
            return [
                id:               it.id,
                name:             it.name,
                price:            it.price,
                margin:           it.margin,
                storeCount:       (it.store.findAll { !it.closed }).size(),
                complaints:       it.customers?.complaints,
                numPurchase:      productService.getPurchased(it),
                ]
}
...

The above code outputs a JSON response for a Product class. Maybe all that is rendered on the website is the name and price of the product, but examining the actual JSON object shows much more interesting information, such as the id (maybe they are incremental and not uuids so we can track purchases and potentially extrapolate sales), margin, storeCount, complaints, and number of products purchased. This information is probably displayed on an admin only view, but it may be sloppily exposed to every user and potentially extremely valuable to an investor. What is more, sometimes old or unpublished endpoints return even more interesting information, making it really important to know how to search for them.

The first step is to comb various search engines for subdomains - we use a tool called Sublist3r.

Then, if the company has mobile apps, we use jd-gui to deconstruct .jar files into .class files to be able to read the java classes, which in turn allows us to map the mobile endpoints is extremely helpful. We also use MobSF which allows us to work directly with APK and IPA files (Android and iOS) and view the source code and more importantly analyze the API endpoints. Below is a screenshot from MobSF being used to analyze an Android App.

mobsf

This is just one strategy which we use when looking at a specific company’s website in an attempt to use the information the company exposes publicly to aid our investment decisions. Another tool we use is called https://mitmproxy.org which allows http traffic flows to be intercepted, inspected, modified and replayed in the terminal, allowing us to view API calls from any app we are using. Very few investment professionals are utilizing these strategies to research companies which we believe is a huge mistake given the amount of interesting information we are consistently able to discover.