Skip to content

Can it be used for non structured docs? #34

@namevinu

Description

@namevinu

Greetings to the creators of Page Index,

I recently discovered your library, and it seems to be an excellent solution.

From the description, it looks ideal for structured documents such as technical or financial reports. However, I would like to know if it can serve these use cases:

  1. Can it fetch context relevant to a user query when the context includes previous chats with LLMs or uploaded documents, right out of the box or any manual pre processing needed? These sources are mostly unstructured and may contain spelling and grammatical errors, especially in chat logs.

  2. Does it rely on the model’s context limit? For example, if the model has a maximum input of 128k tokens but the context is over 2 million tokens, does the library handle this automatically, or would manual chunking and splitting of the context be required?

  3. Is it limited to processing PDF or text-based content, or can it handle images and screenshots as well, whether they are embedded separately or within PDFs?

  4. Is it possible to save the tree structure for future use, to enable faster processing and retrieval?

I am considering using this library in my project because it appears to address context relevance challenges in long chats with multiple documents.

I look forward to your response!

Thanks and regards,
Vineet Mangal

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions